Replacing the Civ XML parser

AIAndy · May 12, 2013

The XML parser used by Civ4 is really annoying. It has a lot of quirks that are showing again now when I work on reading in the texts in UTF8 (but still trying to keep compatibility to a text XML that uses Latin-1).

So I would suggest replacing the XML parser. To avoid having to rewrite all the current reading code, the idea is to imitate the abstract reading interface that is currently provided by CvDLLXmlIFaceBase by implementing it in a new singleton class inheriting from it and then replacing all the references to gDLL->getXMLIFace() with references to the singleton. That can be done with an automatic replace in a text editor.

One result of that change would be that the schemas would be ignored (for good or bad) since those schemas are specific to older Microsoft parsers and any replacement parser would likely not support them.
On the other hand XML reading would likely be a lot faster.

What do you think?

Koshling · May 12, 2013

AIAndy said:
The XML parser used by Civ4 is really annoying. It has a lot of quirks that are showing again now when I work on reading in the texts in UTF8 (but still trying to keep compatibility to a text XML that uses Latin-1).

So I would suggest replacing the XML parser. To avoid having to rewrite all the current reading code, the idea is to imitate the abstract reading interface that is currently provided by CvDLLXmlIFaceBase by implementing it in a new singleton class inheriting from it and then replacing all the references to gDLL->getXMLIFace() with references to the singleton. That can be done with an automatic replace in a text editor.

One result of that change would be that the schemas would be ignored (for good or bad) since those schemas are specific to older Microsoft parsers and any replacement parser would likely not support them.
On the other hand XML reading would likely be a lot faster.

What do you think?

Sounds reasonable to me. The need to sync schemas by name is baroque and unfortunate, so doing away with schema checking in the live game is a good idea.

We could (and should) have a SEPARATE schema checkers that modders can run as a tool - no need to keep checking it at runtime.

strategyonly · May 12, 2013

Koshling said:
We could (and should) have a SEPARATE schema checkers that modders can run as a tool - no need to keep checking it at runtime.

Here here, i agree . . . SO

ls612 · May 12, 2013

Which one would you suggest using?

Also, while we are at it would it be reasonable to introduce a regex engine for the XML too (to replace the Expression system) to increase flexibility?

AIAndy · May 12, 2013

ls612 said:
Which one would you suggest using?

I guess a simple fast one like RapidXML should be enough.

Also, while we are at it would it be reasonable to introduce a regex engine for the XML too (to replace the Expression system) to increase flexibility?

I don't quite get what you mean here. Please elaborate.

Thunderbrd · May 12, 2013

I suppose we'd still want to include at least a modder's doc to list the tags that a class has access to. But otherwise, doing away with the schemas would be helpful for design efficiency as we wouldn't have to worry about the order of tags as used and that would be nice.

And of course it would take some learning to figure out how to use the new system but I'm not too intimidated by that given that we'd have a lot of existing examples anyhow.

Faster is better, more functionality is better and easier editing and adding content to the XML is better.

Would this greatly complicate the new tag design process? Sounds like it wouldn't change things much but I'm not fully understanding everything about the XML load process at the levels you're looking to replace things as it is so I have to ask.

AIAndy · May 12, 2013

Thunderbrd said:
I suppose we'd still want to include at least a modder's doc to list the tags that a class has access to. But otherwise, doing away with the schemas would be helpful for design efficiency as we wouldn't have to worry about the order of tags as used and that would be nice.

And of course it would take some learning to figure out how to use the new system but I'm not too intimidated by that given that we'd have a lot of existing examples anyhow.

Faster is better, more functionality is better and easier editing and adding content to the XML is better.

Would this greatly complicate the new tag design process? Sounds like it wouldn't change things much but I'm not fully understanding everything about the XML load process at the levels you're looking to replace things as it is so I have to ask.

No change to the way you implement tags (except that you don't need to update the schemas) as the interface will be the same to keep backwards compatibility.

Koshling · May 12, 2013

AIAndy said:
No change to the way you implement tags (except that you don't need to update the schemas) as the interface will be the same to keep backwards compatibility.

Should still keep a (single) master schema and keep it up to date, so the modders can run a development-time check tool.

Thunderbrd · May 12, 2013

Ok, so I see no downside at all then. The whole picture looks good to me!

Koshling said:
Should still keep a (single) master schema and keep it up to date, so the modders can run a development-time check tool.

Would that development-time check tool require that the tags be kept 'in order' as well though? It'd be nice to be able to reorganize my templates to an order that makes a bit more sense to me for simplicity but I fear doing so would throw the existing xml out of whack! So if its possible that we can make it so it doesn't worry about 'order' of tags, just syntax, that would be very appreciated. And yes it still helps to have something there to explain the proper syntax of tags to the xml modder.

ls612 · May 12, 2013

AIAndy said:
I don't quite get what you mean here. Please elaborate.

Something like this

Code:

<traincondition SomeGameObject="something" condition="and" anothergameobject="somethingelse" on="city"/>

Sort of how the Expression system works except built into the load stack (so one wouldn't need to manually code expression support for everything).

AIAndy · May 12, 2013

ls612 said:
Something like this

Code:

<traincondition SomeGameObject="something" condition="and" anothergameobject="somethingelse" on="city"/>

Sort of how the Expression system works except built into the load stack (so one wouldn't need to manually code expression support for everything).

Take a moment and think about when an evaluation can occur and what that means about what you have to store.

ls612 · May 12, 2013

AIAndy said:
Take a moment and think about when an evaluation can occur and what that means about what you have to store.

Yeah, that would be a lot of memory used I suppose.

Koshling · May 12, 2013

Thunderbrd said:
Ok, so I see no downside at all then. The whole picture looks good to me!

Would that development-time check tool require that the tags be kept 'in order' as well though? It'd be nice to be able to reorganize my templates to an order that makes a bit more sense to me for simplicity but I fear doing so would throw the existing xml out of whack! So if its possible that we can make it so it doesn't worry about 'order' of tags, just syntax, that would be very appreciated. And yes it still helps to have something there to explain the proper syntax of tags to the xml modder.

There should be no order dependencies

AIAndy · May 12, 2013

ls612 said:
Yeah, that would be a lot of memory used I suppose.

Not a matter of memory. You simply can't evaluate the result at load time so what you end up with is having to store the expression, which is exactly what the expression system does. And to store the expression you have to change the variable that usually stores int or bool to store IntExpr or BoolExpr instead which needs to be supported in the part that uses the value.
There is no magic spell that does that kind of significant access changes for you in a generic way in C++ which means you specifically have to change the code at each point where you actually want it.

But that is not all. A normal int or bool tag is a kind of constant. It does not matter when you evaluate it but for an expression that is not the case. And when the value changes then all values that depend on it also have to change. That means you can't just keep an accumulated value on the city for instance unless you know when exactly the expression could change and then reevaluate the accumulated value accordingly.

So all in all, no, without fundamental, huge changes to how Civ4 deals with data you can't automatically support expressions everywhere. You have to do it on a case by case basis.

Dancing Hoskuld · May 13, 2013

Having some of the Civ IV XML not checked against the schema already causes problems. Currently we can guess where the problems are occurring but if none of the XML is checked against the schema then we loose the whole ability to mod in XML successfully.

Thunderbrd · May 13, 2013

Dancing Hoskuld said:
Having some of the Civ IV XML not checked against the schema already causes problems. Currently we can guess where the problems are occurring but if none of the XML is checked against the schema then we loose the whole ability to mod in XML successfully.

Koshling is suggesting we do check it against a schema - just not at load time - at least not the normal sort of load process anyhow.

Dancing Hoskuld · May 13, 2013

Note: I am not against changing the XML parser. I am just worried that if the XML is not checked every time that we will loose a debugging tool.

There have also been time where I have tested new schema files, mostly when tags are available and supported by the dll but are not in the normal or C2C schema definitions eg the bGraphicalOnly tag was standard BtS but was not in any of the schema files. It is used in the pedia only.

Thunderbrd said:
Koshling is suggesting we do check it against a schema - just not at load time - at least not the normal sort of load process anyhow.

Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.

Koshling · May 13, 2013

Dancing Hoskuld said:
Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.

Right, but with a separate tool the XML to schema check would be way faster than the startup time of the game, so running the tool would waste LESS time I think.

Of course another approach would beto put the schema check into the live game load, enabled by a global define, which we'd turn off for release, but on as modders.

ls612 · May 13, 2013

Dancing Hoskuld said:
Currently, I can make many changes, get many XML errors but still test the results of the bits that did not get errors in game. This mostly happens when I am adding new terrain, terrain features, resources, animals and so on because I do them each in a separate module while doing my initial tests. That way I don't waste the 2-3 minute load time.

We could easily adapt the rapidXML library to perform the check independently of the load process (make a separate program to load the library and do the check) and do that with all of the modern optimizations, meaning that the check would be far faster than the load.

Dancing Hoskuld · May 13, 2013

Koshling said:
Right, but with a separate tool the XML to schema check would be way faster than the startup time of the game, so running the tool would waste LESS time I think.

Of course another approach would beto put the schema check into the live game load, enabled by a global define, which we'd turn off for release, but on as modders.

ls612 said:
We could easily adapt the rapidXML library to perform the check independently of the load process (make a separate program to load the library and do the check) and do that with all of the modern optimizations, meaning that the check would be far faster than the load.

Having it a a global define would allow me to do two or more things at once so would be my preference. It would also give me confidence that the two things were doing the same thing in compariable ways.

Replacing the Civ XML parser

Deity

Vorlon

C2C Supreme Commander

Deity

Deity

C2C War Dog

Deity

Vorlon

C2C War Dog

Deity

Deity

Deity

Vorlon

Deity

Deity

C2C War Dog

Deity

Vorlon

Deity

Deity

Similar threads