I know it's a month old, but I feel like the answers fail to really tell what OOS issues are all about or what to do about them. As somebody who started out fixing OOS issues even before civ4 was released, I feel like I can contribute.
A game (all network games) desync if one of the following happens:
- something, which happens on one computer is not transmitted to the other computers
- something, which is calculated in sync doesn't provide the same result on all computers
1 is usually clicking buttons or keyboard keys to do something. Say a random event asks you to buy a unit for X gold. If you say yes, the other computers needs to know that you lost gold and a new unit exists.
2 is more evil. It can be a result of 1 happening first, meaning they fail to get the same input data to do the calculations. It could also be something like using system random instead of the game random. It can even be that the C++ code enters undefined code and the same code will not behave the same on all computers. Luckily all players use x86 and the same compiler, meaning this is not a likely scenario for us, but it can be a really annoying problem for cross platform games if they aren't using nice code.
It doesn't matter if you mod the DLL or python. Both can create and avoid desyncs. Both can send info across the network and both can fail to do so when they should have done so. The idea is that say for a city, call sendDoTask() when you have something to send on one computer and it will call CvCity::doTask() with the same arguments on all computer. There are similar functions for players, units etc. If you want to go really advanced, you can even define your own network functions, but most modders will have their needs covered by the vanilla functions (though they might be modded and with more enum values).
My first suspicion would be aimed at the combat code itself. Check if random is called instead of the game random. The game random would look like this
PHP:
GC.getGameINLINE().getSorenRandNum(10, "some string for the log entry");
This provides a random number from 0 to 9. Python has the very same function as part of CyGame and can be used with the same arguments.
random() or getASyncRand() will give a completely random number while getSorenRandNum() gives a
predictable random number. While this may sound like an oxymoron, it's actually a real scientific challenge to generate what feels like completely random numbers, but still can be predicted. The game does this and it will ensure that the same random number will appear on all computers. Do note that calling it twice will not give the same number twice and as such the function will break if called on just one computer.
The game calculates some sort of checksum based on a lot of stuff in the memory and it just transmit the checksum and it goes into OOS mode if two computers figures out they have different numbers. The checksum is a one way calculation, meaning it can't be used to tell what went wrong, only that something is different.
I have an idea on helping to find the cause. If in OOS mode, one player sends a message to make all computers save at the same time. Comparing the savegames can then tell what went wrong. However sadly the savegames aren't really human readable (possible, but a really massive task!). Instead it should print a text based savegame, telling a bunch of variables for plot 0, plot 1, unit 0 and so on. I might code this in some distant future, but so far the size of the project has given it too low priority to even start.