Civ4 - Savegame Library and Editor (Civ4SL)

hankinsohl

Warlord
Joined
Nov 16, 2001
Messages
163
Location
Seattle
Hello everyone.

Brief Project Overview
This thread is for a Civ4 savegame library and accompaning editor which I've been working on for the past several months. I'll use the acronym Civ4SL from now on to refer to the library.

The library will provide a C-style programming API which will enable the programmatic editing of savegame files, including the zlib-compressed data. The library implements the Civ4 checksum function. A Boost Property Tree-style interface is planned which will facilitate use of the API.

Included in the project is an editor, Civ4Edit.exe. The editor will be able to translate a binary savegame into ASCII text format. The translated text can then be edited using any text editor; the editor will be able to load translated text and save it in binary savegame format.

Pre-alpha versions of the editor were developed as a proof of concept. The last version was able to fully translate a savegame into text format and was able to update checksums for modified savegames.


Savegame Schema
Civ4SL will parse savegame files using a text-based schema. A schema will be developed for unmodded Beyond the Sword savegames; however, by editing the schema file, it will be possible to support modded games.


Purpose of this Thread
The purpose of this thread is to solicit feedback and interaction from the Civ4 community, especially modders. If there are features you want to Civ4SL to include please reply below.

A project of this scope is complicated and there's a good chance that I'll make a few implementation errors along the way. Too - I will have questions about Civ4 from time to time. This sort of community interaction is also appropriate in this thread.


Civ4SL Availablity and Project Timeline
Civ4SL has started development as of today, 9/24/2024. I would guess that project completion is probably 3 to 6 months away depending on the time I'm able/willing to commit to development. Once Civ4SL is complete I'll share the work on github.
 
Attached is a very preliminary draft of the Civ4SL schema file for Beyond the Sword. Civ4SL will use this schema to direct the parsing of savegame files.

The schema language implements an intentionally small, limited set of programming language features. It will support if-then-else constructs, for-loops and some predefined functions.

I haven't yet written a grammar for the schema language but once that's in place I'll post it. Meanwhile, I think that the schema is self-explanatory as-is and as such I'd like to get some feedback.
 

Attachments

  • Bts.Civ4Schema.txt
    43.6 KB · Views: 12
A probably far reaching question: Will it be possible to convert save files between BtS/mods?
Yes. You'd either have to manually edit the translated ascii text generated using Civ4Edit.exe for the mod1 savegame to the format for mod2. And of course schemas would be needed for both mods. The library will come with a separate configuration file. Within the file you'll be able to define CIV4_PATH and MOD_NAME to ensure that the correct XML files are used when processing each savegame.

Another possibility for conversion is to use the library itself. In other words, write a separate utility that links against the library and performs the conversions. This approach has the advantange that, once written, the separate utility can be used over and over and eliminates the need to manually edit translated ascii text.
 
Some more details about the project.

Attached is an example ASCII text translation of a savegame generated by the proof-of-concept version of Civ4Edit.exe. The example doesn't include the uncompressed data following the zlib data, nor does it include the checksum at the end because I stopped work on the proof-of-concept once it was clear that support for the entire savegame could be provided (I wrote a separate utility that updated binary savegame checksums to reflect edits made using a hex editor).

Note that the text-based translations tend to get quite large. You'll want a good text editor such as notepad++ when working with them.

The attached example provides a "flavor" of what to expect once the actual Civ4Edit.exe is written. Offsets are included in the text-based savegame making it easy to align the text-based output with the binary savegame. When editing the text-based savegame, the offsets need not be updated. When reading text-based savegame files the offsets are ignored.
====
As mentioned above, a Boost property tree-style interface is planned for the library. Information about Boost property tree can be found here.

-----
Change log:
9/29/2004 - Fixed CvPlot YieldTypes field to reflect correct type (16-bit int instead of 16-bit bool).
 

Attachments

  • Example.Civ4Edit.txt
    10 MB · Views: 7
Last edited:
I'm fond of your precise description of the project; a practised technical writer no doubt.

The schema looks more modder-friendly than I had anticipated. I had somehow assumed that the modder would have to count by hand how many elements they've added to each XML file and work out how that affects the array lengths in the savegame layout. But it seems, on the contrary, that added XML elements are taken care of entirely by the XML import, so e.g. BAT savegames should be parsed correctly without any changes to the schema file.

There's even support for custom container templates. (Though it's not going to handle the complicated templates in my mod.)

Maybe the schema should allow for some range checks to be specified so that the parser will stop once binary data is obviously being misinterpreted; e.g. uint32 Flag < 100. Not sure if that would significantly improve the workflow of modders adapting the schema file; may well not be worth the effort. Or automatic range checks for all enum32 data? Perhaps too likely that some few enum data members deliberately get used outside of their intended range, especially in mods ... Kind of crazy that Firaxis didn't at least sprinkle a few assertions into their read functions for the sake of debugging. Sanity checks for array sizes would've seemed especially prudent – so that the DLL doesn't end up allocating crazy amounts of memory.

In the sample editor output, this looks like a small mistake:
Code:
BeginArrayYieldTypes[3]
	[0:YIELD_FOOD]=true
	[1:YIELD_PRODUCTION]=true
	[2:YIELD_COMMERCE]=true
EndArrayYieldTypes
The corresponding the CvPlot member should be short* m_aiYield; – not a boolean array.

Regarding savegame conversion, I think there'll usually be a number of compatibility issues requiring special attention. Generally, just setting data to 0 that doesn't exist in the target format should work, but even when just converting BAT saves to BtS, I found that BAT had moved the Missionary units into a separate module, which causes them to be placed at the end of the UnitTypes enum. This reordering had to be carefully sorted out. And, to be thorough, any of the female Missionaries and Great People added by BAT had to be converted to their male counterparts in BtS (i.e. the data about such units if any existed on the map).
 
@f1rpo
Thanks for the comments.

RE Range Checks
The parser does check enumerators when parsing enums to ensure that they're in range. There are also a few checks such as maximum array membership size just in case something like -1 aligns with an array size. That said, the proof of concept parser would tend to emit a bunch of meaningless data if it misaligned with savegame binary. The prototype editor would eventually hit one of the range checks, throw an exception and then generate a crash dump (incomplete/misaligned text generated prior to the exception) which could be used to diagnose what went wrong.

I hadn't planned on including an assert feature but perhaps this is a good idea.

RE m_aiYield
The size of the value parsed is 16 bits as can be seen from the offsets:
Code:
0x00002307:             BeginArrayYieldTypes[3]
0x00002307:                 [0:YIELD_FOOD]=true
0x00002309:                 [1:YIELD_PRODUCTION]=true
0x0000230b:                 [2:YIELD_COMMERCE]=false
0x0000230d:             EndArrayYieldTypes

The interpretation of the values seems to be true/false though unless I'm misreading how the array members are used. The schema accomodates such cases: bool8 for the standard 8-bit boolean typically used, but bool16 and bool32 in case shorts or ints are treated as boolean values. The same thing happens quite often for enums. The c++ enum is int and most enums in the savegame are int; but some are byte and some are short, hence enum8, enum16 and enum32.

Looking at the draft schema it's described as:
int16[NUM_YIELD_TYPES:YieldTypes] Yield

:)

Probably should be bool16 instead of int16, LOL.


RE Savegame conversion
Agree. Conversion might either be simple or nearly impossible depending on the extent of the changes from one mod to another. Too, any required additions/deletions would have to make sense in the context of the target mod.
 
Last edited:
Attached is an extremely preliminary, first draft of an extended BNF grammar for the schema. I haven't checked the grammar and at present the lexical analyzer for it hasn't been written.

I wanted to post the grammar in case it helps clarify the draft schema.

The development plan going forward is:
1) Write a tokenizer
2) Write a simple sytax-checking lexer
3) Write the first-pass lexer (this will check syntax, import enums and constants, and create lookup tables for various objects used by the parser)
4) Write the second-pass lexer (this will generate a Boost-style tree for the savegame). As part of this work, the property-tree interface will be designed/coded.
5) Write code to parse the translated text-based output and generate the corresponding property tree
6) Finalize the library API
7) Write the Civ4Edit.exe utility. Civ4Edit.exe will use the API and will be able to import/export savegames and translated text for savegames.
 

Attachments

  • Civ4 Schema Grammar.txt
    6 KB · Views: 3
Enum range checks – great, you have it covered. After having worked out the BtS layout, you ought to be the person to know best how to ease the process.
RE m_aiYield [...] The interpretation of the values seems to be true/false though unless I'm misreading how the array members are used.
If you're saying that CvPlot stores booleans in m_aiYield, then you might misread the code in this case. I believe it's a cache for
int calculateYield(YieldTypes eIndex, bool bDisplay = false) const; (cf. CvPlot::updateYield)
So a short array in a world of ints, but should not mingle with bools.

I would've hoped that this structured approach via a formal grammar would allow much of the parsing to come off the shelf/ a library. Well, maybe that's the case, but you'll still need to customize things at every turn?
 
Enum range checks – great, you have it covered. After having worked out the BtS layout, you ought to be the person to know best how to ease the process.
If you're saying that CvPlot stores booleans in m_aiYield, then you might misread the code in this case. I believe it's a cache for
int calculateYield(YieldTypes eIndex, bool bDisplay = false) const; (cf. CvPlot::updateYield)
So a short array in a world of ints, but should not mingle with bools.

I would've hoped that this structured approach via a formal grammar would allow much of the parsing to come off the shelf/ a library. Well, maybe that's the case, but you'll still need to customize things at every turn?
Interesting...

I checked one of the saves in a hex editor and sure enough yields are not always 0 or 1 (I thought that they were for some reason).

At any rate, the schema has this field as type int16 so it'll parse correctly in the library. I should probably update the prototype editor to fix this since I'll likely use the prototype output as a check on the library.

>> I would've hoped that this structured approach via a formal grammar would allow much of the parsing to come off the shelf/ a library.
Almost certainly there are tools for this but I haven't bothered tracking them down. I'm writing a Pratt parser too... which allows the grammar to be looser and to my taste, more readable. Probably most parsing tools would require much tighter grammars which tend to increase the number of production rules and makes it challenging to get the grammar correct.
 
The purpose of this thread is to solicit feedback and interaction from the Civ4 community, especially modders. If there are features you want to Civ4SL to include please reply below.
To me, the most important thing is the conversion of the savegame to human-readable (which you have done) and the corresponding conversion back. I'm a little confused about what of the above is and isn't part of that, but it sounds to me as if you are working on tools to check the syntax of the edited savegame. That would be nice to have - but to me, it pales into insignificance next to being able to edit them _at all_.
 
To me, the most important thing is the conversion of the savegame to human-readable (which you have done) and the corresponding conversion back. I'm a little confused about what of the above is and isn't part of that, but it sounds to me as if you are working on tools to check the syntax of the edited savegame. That would be nice to have - but to me, it pales into insignificance next to being able to edit them _at all_.
Hi damerell:
Thanks for the reply.

The ability to convert between binary savegame and human-readable text format will be part of the editor utility.

The stuff about schemas is to support mods that make changes to the format of savegames. The editor (and other tools developed using the library) needs to understand binary format changes and the way this is accomplished is by reading a schema which desribes the binary layout.

And I'm making all of this available in a static library too. That way, various tools other than the editor can be developed. For example, a tool could be developed which automatically sets barbarian starting technologies. This might be of interest to those creating the Noble Club games. Currently Noble Club games are in world builder format and when you first start a game you'll need to manually set the barbarian starting techs.

Another use case for a tool which uses the static library would be to remove Lock Modified Assets (LMA) from a savegame. This might be useful if you have a savegame you'd like to play but are having trouble doing so because of LMA.

Other tools are possible too - anything that needs to manipulate a binary savegame file would use the library to do so.
 
Last edited:
The ability to convert between binary savegame and human-readable text format will be part of the editor utility.
Mmm. I think all I'm really saying is (for me) that would be so enormously useful I would be personally grateful if it emerged first. (I appreciate ofc you may have other priorities!)
 
Top Bottom