Replacing the Civ XML parser

AIAndy

Deity
Joined
Jun 8, 2011
Messages
3,428
The XML parser used by Civ4 is really annoying. It has a lot of quirks that are showing again now when I work on reading in the texts in UTF8 (but still trying to keep compatibility to a text XML that uses Latin-1).

So I would suggest replacing the XML parser. To avoid having to rewrite all the current reading code, the idea is to imitate the abstract reading interface that is currently provided by CvDLLXmlIFaceBase by implementing it in a new singleton class inheriting from it and then replacing all the references to gDLL->getXMLIFace() with references to the singleton. That can be done with an automatic replace in a text editor.

One result of that change would be that the schemas would be ignored (for good or bad) since those schemas are specific to older Microsoft parsers and any replacement parser would likely not support them.
On the other hand XML reading would likely be a lot faster.

What do you think?
 
The XML parser used by Civ4 is really annoying. It has a lot of quirks that are showing again now when I work on reading in the texts in UTF8 (but still trying to keep compatibility to a text XML that uses Latin-1).

So I would suggest replacing the XML parser. To avoid having to rewrite all the current reading code, the idea is to imitate the abstract reading interface that is currently provided by CvDLLXmlIFaceBase by implementing it in a new singleton class inheriting from it and then replacing all the references to gDLL->getXMLIFace() with references to the singleton. That can be done with an automatic replace in a text editor.

One result of that change would be that the schemas would be ignored (for good or bad) since those schemas are specific to older Microsoft parsers and any replacement parser would likely not support them.
On the other hand XML reading would likely be a lot faster.

What do you think?

Sounds reasonable to me. The need to sync schemas by name is baroque and unfortunate, so doing away with schema checking in the live game is a good idea.

We could (and should) have a SEPARATE schema checkers that modders can run as a tool - no need to keep checking it at runtime.
 
Which one would you suggest using?

Also, while we are at it would it be reasonable to introduce a regex engine for the XML too (to replace the Expression system) to increase flexibility?
 
Which one would you suggest using?
I guess a simple fast one like RapidXML should be enough.

Also, while we are at it would it be reasonable to introduce a regex engine for the XML too (to replace the Expression system) to increase flexibility?
I don't quite get what you mean here. Please elaborate.
 
I suppose we'd still want to include at least a modder's doc to list the tags that a class has access to. But otherwise, doing away with the schemas would be helpful for design efficiency as we wouldn't have to worry about the order of tags as used and that would be nice.

And of course it would take some learning to figure out how to use the new system but I'm not too intimidated by that given that we'd have a lot of existing examples anyhow.

Faster is better, more functionality is better and easier editing and adding content to the XML is better.

Would this greatly complicate the new tag design process? Sounds like it wouldn't change things much but I'm not fully understanding everything about the XML load process at the levels you're looking to replace things as it is so I have to ask.
 
I suppose we'd still want to include at least a modder's doc to list the tags that a class has access to. But otherwise, doing away with the schemas would be helpful for design efficiency as we wouldn't have to worry about the order of tags as used and that would be nice.

And of course it would take some learning to figure out how to use the new system but I'm not too intimidated by that given that we'd have a lot of existing examples anyhow.

Faster is better, more functionality is better and easier editing and adding content to the XML is better.

Would this greatly complicate the new tag design process? Sounds like it wouldn't change things much but I'm not fully understanding everything about the XML load process at the levels you're looking to replace things as it is so I have to ask.
No change to the way you implement tags (except that you don't need to update the schemas) as the interface will be the same to keep backwards compatibility.
 
No change to the way you implement tags (except that you don't need to update the schemas) as the interface will be the same to keep backwards compatibility.

Should still keep a (single) master schema and keep it up to date, so the modders can run a development-time check tool.
 
Ok, so I see no downside at all then. The whole picture looks good to me! :D

Should still keep a (single) master schema and keep it up to date, so the modders can run a development-time check tool.
Would that development-time check tool require that the tags be kept 'in order' as well though? It'd be nice to be able to reorganize my templates to an order that makes a bit more sense to me for simplicity but I fear doing so would throw the existing xml out of whack! So if its possible that we can make it so it doesn't worry about 'order' of tags, just syntax, that would be very appreciated. And yes it still helps to have something there to explain the proper syntax of tags to the xml modder.
 
I don't quite get what you mean here. Please elaborate.

Something like this

Code:
<traincondition SomeGameObject="something" condition="and" anothergameobject="somethingelse" on="city"/>

Sort of how the Expression system works except built into the load stack (so one wouldn't need to manually code expression support for everything).
 
Something like this

Code:
<traincondition SomeGameObject="something" condition="and" anothergameobject="somethingelse" on="city"/>

Sort of how the Expression system works except built into the load stack (so one wouldn't need to manually code expression support for everything).
Take a moment and think about when an evaluation can occur and what that means about what you have to store.
 
Ok, so I see no downside at all then. The whole picture looks good to me! :D


Would that development-time check tool require that the tags be kept 'in order' as well though? It'd be nice to be able to reorganize my templates to an order that makes a bit more sense to me for simplicity but I fear doing so would throw the existing xml out of whack! So if its possible that we can make it so it doesn't worry about 'order' of tags, just syntax, that would be very appreciated. And yes it still helps to have something there to explain the proper syntax of tags to the xml modder.

There should be no order dependencies
 
Yeah, that would be a lot of memory used I suppose.
Not a matter of memory. You simply can't evaluate the result at load time so what you end up with is having to store the expression, which is exactly what the expression system does. And to store the expression you have to change the variable that usually stores int or bool to store IntExpr or BoolExpr instead which needs to be supported in the part that uses the value.
There is no magic spell that does that kind of significant access changes for you in a generic way in C++ which means you specifically have to change the code at each point where you actually want it.

But that is not all. A normal int or bool tag is a kind of constant. It does not matter when you evaluate it but for an expression that is not the case. And when the value changes then all values that depend on it also have to change. That means you can't just keep an accumulated value on the city for instance unless you know when exactly the expression could change and then reevaluate the accumulated value accordingly.

So all in all, no, without fundamental, huge changes to how Civ4 deals with data you can't automatically support expressions everywhere. You have to do it on a case by case basis.
 
Having some of the Civ IV XML not checked against the schema already causes problems. Currently we can guess where the problems are occurring but if none of the XML is checked against the schema then we loose the whole ability to mod in XML successfully.
 
Having some of the Civ IV XML not checked against the schema already causes problems. Currently we can guess where the problems are occurring but if none of the XML is checked against the schema then we loose the whole ability to mod in XML successfully.

Koshling is suggesting we do check it against a schema - just not at load time - at least not the normal sort of load process anyhow.
 
Note: I am not against changing the XML parser. I am just worried that if the XML is not checked every time that we will loose a debugging tool.

There have also been time where I have tested new schema files, mostly when tags are available and supported by the dll but are not in the normal or C2C schema definitions eg the bGraphicalOnly tag was standard BtS but was not in any of the schema files. It is used in the pedia only.

Koshling is suggesting we do check it against a schema - just not at load time - at least not the normal sort of load process anyhow.

Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.
 
Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.

Right, but with a separate tool the XML to schema check would be way faster than the startup time of the game, so running the tool would waste LESS time I think.

Of course another approach would beto put the schema check into the live game load, enabled by a global define, which we'd turn off for release, but on as modders.
 
Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.

We could easily adapt the rapidXML library to perform the check independently of the load process (make a separate program to load the library and do the check) and do that with all of the modern optimizations, meaning that the check would be far faster than the load.
 
Right, but with a separate tool the XML to schema check would be way faster than the startup time of the game, so running the tool would waste LESS time I think.

Of course another approach would beto put the schema check into the live game load, enabled by a global define, which we'd turn off for release, but on as modders.

We could easily adapt the rapidXML library to perform the check independently of the load process (make a separate program to load the library and do the check) and do that with all of the modern optimizations, meaning that the check would be far faster than the load.

Having it a a global define would allow me to do two or more things at once so would be my preference. It would also give me confidence that the two things were doing the same thing in compariable ways. ;)
 
Back
Top Bottom