OOS Discussion, Tracking and Fixing

I tried to get a OOSLog for Infernals but it didn't generate one. I have logging turned on in BUG. Not sure why the Kuriotates save generates a log and my infernal save doesn't. Here is a save file to quickly generate oos.

Interestingly I also got an oos error when Calabim took over an Infernal city, but I wasn't able to reproduce this when I tried a second time.

The infernal capture errors is easy to reproduce though. After you select "I'll manage the city" (don't raze it or inspect) then it creates the oos right away.
 

Attachments

  • inf-oos.CivBeyondSwordSave
    338.4 KB · Views: 249
For the Kuriotates oos here are the differences:

Player 0 (host) log:

Yields:
-------
Player 1 Food Total Yield: 15
Player 1 Production Total Yield: 2
Player 1 Commerce Total Yield: 18


Commerce:
---------
Player 1 Gold Total Commerce: 0
Player 1 Research Total Commerce: 18
Player 1 Culture Total Commerce: 2
Player 1 Espionage Total Commerce: 0

Player 1 (Kuriotate player) log:

Yields:
-------
Player 1 Food Total Yield: 14
Player 1 Production Total Yield: 2
Player 1 Commerce Total Yield: 20


Commerce:
---------
Player 1 Gold Total Commerce: 0
Player 1 Research Total Commerce: 20
Player 1 Culture Total Commerce: 2
Player 1 Espionage Total Commerce: 0


So as you can see Food Total Yield and Research Total Commerce do not match.
 

Attachments

  • Fr8train - Player 1 - OOSLog - Turn 42.txt
    332.6 KB · Views: 127
Thanks for the reports! Good to hear that the fix for the OOS logging is working.

I'll poke around a bit when I get a chance, but I'm going to leave the bulk of the OOS fixing to Terkhen as he's set up to test multiplayer and I am not.
 
Triggering a Golden Age using three GP (commander, sage, engineer) caused an OOS in our latest game. Unfortunately I didn't nab a save but will see if it is reproducible in our next one.
 
I played four quick ExtraModMod test games with my brother with the Kuriotates present on them and the new OOS logging code in its place. The Kuriotates casted Legends on turn 1, in case that it helps on triggering the issue faster. In all of these games the Kuriotates had a single city and no settlements at all. With the first game we tried to reconnect and continue it to get more logs from the same game, but on the second desync it did not generate an OOS log.

We created a second test game to see if logs were generated this time, but sadly it did not generate a log after the first OOS either.

For the third and fourth ones, we closed the game before starting new games, and we managed to get more logs. I'm attaching each pair of logs to this post.

The last time I managed to get Kuriotate OOS logs I got a single pair, and it was similar to Fr8Train's log. The logs I got this time are completely different, though. In the first pair, it seems that in one of the games the Sheaim decided to train a Settler while in the other they trained a Warrior. In the second and third games, the "Next Soren Rand Value" is different in each log. I'm not familiar with the different RNGs used by the game so I don't know what could have caused this but it seems to me that the presence of the Kuriotates somehow triggers a calculation or a random number generation that is not done in the other game.

I'm without a clear lead right now so I'm posting this in case someone knows where should I look. I'll try to get more logs and look around in the DLL code for places in which the calculations may be different for different city areas later.
 

Attachments

  • OOSLogs.rar
    111.3 KB · Views: 251
I dont really know where to start tracking down the issue either. I thought I had bookmarked a post from Sephi giving some advice on tracking down these issues but I cant seem to find it. I did manage to dig up a couple of useful posts though.

(from AIAndy) So how to debug it when you get an OOS.
The OOS log shows a good part of the game state including the current RNG so you know what is actually different between the computers. If the RNG is the same, then you have to search for code that might touch the stuff that is different in the log. But keep in mind that any desync kind of snowballs. If an async code adds a unit, then that unit will be considered by the AI so it might make different decisions.
The worst kind of snowballing is when the synced RNG is used by async code as any random number after this one will be different now. So a common effect will be that different animals will spawn (that uses quite a lot of random numbers) and the like even when that is not the source of the problem.
Luckily for that there is the random log. You don't need to log the seed of the RNG on every time slice when you instead log all synced random numbers that are generated. That is what the random log is for: Logging every use of the synced RNG with a specific note/phrase (like "animal spawn") so you know which part of the code generated a random number here.
The first difference between the random logs is what counts (because after that all will be different). Usually you will see a random number generated only on one computer and then you can search the code for the phrase and check if it was used in async context like by using the active player in synced code or some Python code that uses active player for synced stuff or usage of an uninitialized variable or ...
Unfortunately it can be quite hard at times to find the actual issue (on the other hand sometimes it is easy).

And there's also this which I have linked in the first post of this thread. Looks pretty informative.

http://forums.civfanatics.com/showpost.php?p=4609832&postcount=6
 
Also, did anything of note happen right before the games went OOS?

In the save I provided the OOS didn't begin until third ring was workable. Otherwise they were quite random. To me it seems safe to assume the error lies in the bsprawling code in the DLL, as it has been infamously OOS prone..

Actually, I might try creating a game where bsprawling is set to 0 for the kurios and seeing if the OOS continues. That should help narrow things.
 
I did manage to dig up a couple of useful posts though.

Thank you! I'm going to read them and try again with more information. I think that I will also review what information is available to python for each one of the instances logged in OOSLogger and try to expand them. For example, I noticed that city plot radius is not being logged.

EDIT: I also think I found the reason of the game being able to create only a single OOS log unless it is restarted. I'll report more about this once I have tested it.

Also, did anything of note happen right before the games went OOS?

I was always the Kuriotate player and I did not notice anything special happening. I never built any settlements or any other cities, my city had three rings from the start (as I casted Legends immediately) and I have had OOS with just 1 population in the city.

In the save I provided the OOS didn't begin until third ring was workable. Otherwise they were quite random. To me it seems safe to assume the error lies in the bsprawling code in the DLL, as it has been infamously OOS prone..

Actually, I might try creating a game where bsprawling is set to 0 for the kurios and seeing if the OOS continues. That should help narrow things.

The only references I was able to find in both DLL and python to bSprawling, isSprawling, getMaxCities and plotRadius are on either city founding code (which does not seem to cause OOS), settlement promotion code (which was not used on my games) or AI code (which I already discarded on my test games by having player controlled Kuriotates). Of course, I could be overlooking something so this test would be welcome :)

EDIT: In our last test game we had a city with "City of a Thousand Slums" for 50 or 60 turns, with no desyncs.
 
Hi Guys,

I did some mock-games to test kurio OOS. Here are my findings:

Game 1: Quick speed, Duel, Me and another human only. Didn't change bsprawling to confirm this setup would produce an OOS. It did on turn ~90.

Game 2: Same settings but with bsprawling set to 0. Also installed new OOS log files. OOS on turn 102. This confirms bsprawling isn't the issue. It went OOS only a few turns after Orthrus attacked me, so I wondered..

Game 3: Same settings again but with barbs, lairs and orthrus all turned off. No OOS by turn 200. We called it at this point.

Game 4: Same settings again with no barbs, but 1 AI player (ljos). OOS on turn 76. Large ljos army approaching my borders.

Not sure what to read into this. Possible that desynch is occuring due to AI warplan? Barbs would also cause desynch in this case due to also using a warplan like any other AI. Why would this happen though? I know very little about AI coding --- does it even call the random seed? Is there some special kurio-only rules in there, maybe about city targetting for the AI? Anyway it's late so that's all we have in us for now. Will try to run another game or two tomorrow/after christmas using identical settings to confirm if this is real or a fluke.

OOS logs attached, but they don't show much interesting as far as I could tell.

P.S.

Merry Christmas!
 
I did some mock-games to test kurio OOS. Here are my findings:

They look similar the logs I obtained... Random changes that do not seem related to the presence of the Kurios in another civilizations. For any future cases, could you please set the SynchLog variable in CivilizationIV.ini to 1? It generates the generation of an additional log file, MPLog, which includes additional information and should be sent along with OOSLog. In a few of my recent testing games for the Kuriotate OOS the MPLog files were identical except for ASYNC random values, though :(

Also, with regard to Game 2, bear in mind that there are also OOS issues when neither the Kuriotates nor the Infernals are present in the game. They are very rare, but I have seen them happening.

Tholal: In the link you posted I saw how to run multiple instances of the game on a single computer. Knowing that this is possible, I have modified the mechanism by which the MPLog file is generated to force it to have different names depending on the player active in the game. This means that it is now possible to debug OOS issues and obtain logs for it with a single computer.

The process goes as follow. To be able to launch multiple instances of the mod, you need to add "multiple" to the parameters of Civ4BeyondSword when launching it. For example, I'm using:

Code:
"C:\Program Files (x86)\2K Games\Firaxis Games\Sid Meier's Civilization 4 Complete\Beyond the Sword\Civ4BeyondSword.exe" multiple mod=\FFH2Dev

To generate synchronization logs with different names, you need the modified CvRandom file attached to this post. You will also need to connect with different player names to the multiplayer game, since the file will be called SynchLog - [USERNAME].txt.

This approach also allows to do something that is quite helpful in finding OOS errors; having both synchronization logs open at the same time with an application such as WinMerge. You can press F5 every few turns on WinMerge to check if something is already wrong, before the game catches the desynch and throws an OOS error. The only problem I have noticed with this approach is that sometimes you will get errors when creating autosaves (I wouldn't trust those too much when using this method).

I have also created a testing guide which gives instructions on how to enable logs and crash dump generation and how to provide useful information along with bug reports, intended for users. The tutorial speaks about ExtraModMod specific stuff such as beta builds and links to my tracker, but it should be trivial to adapt it to More Naval AI if you want to use it. You can check it here: https://bitbucket.org/Terkhen/extramodmod/wiki/Testing (feedback and suggestions to improve it are welcome). This guide is already updated for my SynchLog change.

I'm attaching three additional sets of logs for the Kuriotate issue, this time with synchronization logs too. The synchronization logs show no differences besides ASYNC random values, and therefore do not give us any new information. In one of the games I forgot to add AIs and I got no OOS errors at all (that game was played until turn 80 or so). I wonder how the presence of a human Kuriotate player can affect AIs that have not even met that player, though...

EDIT: I have made similar modifications to how the BBAI logs are generated. They now include the player name in the text file's name (to allow to differenciate them more easily when getting logs from multiple players and to allow to gather them when using multiple instances on the same computer). The log file no longer includes the timestamp, as that made WinMerge comparison between logs of different players very complicated. In case you are interested, I'm attaching the modified file to this post.
 

Attachments

  • CvRandom.rar
    856 bytes · Views: 150
  • KuriosOOS.rar
    790.5 KB · Views: 227
  • BetterBTSAI.rar
    384 bytes · Views: 220
I would also like to help with the testing for this. If needed, me and Tasunke can test out various scenarios.
 
Good news! I finally found the cause of the random Kuriotate OOS :)

After playing a few test games and getting logs for them, I noticed a common trend. In all cases, the cuase of the OOS was a slight difference in points for an AI, an AI deciding to research a different technology, build a different unit and so on. I got a very clear case in which in one of the computers the AI chose to research Education while in the other it chose Fishing. This made me think that, somehow, the presence of the Kuriotates as human player was forcing the AI to take different decisions. I dcided to set the level of BBAI logging to a very high value and to repeat the test.

In the attached BBAI logs (created with More Naval AI 2.52), "Terkhen" is playing as the Kuriotates, while "Dupe" is leading the Elohim. Although the logs are from just the first turn If you compare both logs with a tool like WinMerge, the cause of the OOS will be evident. When selecting which technology to research, if the active player is a Kuriotate, the AI will values education a lot more because somehow it seems to think that it will be able to build Enclaves. In the non-Kuriotate log, the AI gives a smaller value to Education because it thinks that it will let it build towns.

From what I get in the logs we know that the error is in the AI_techValue method, in the point where the value of an improvement is calculated (search for "IMPROVEMENT - " in CvPlayerAI.cpp). I don't know enough of the AI code to dabble with it without fear of causing other issues, but I hope that this information will be enough to fix the problem. I did not find an specific issue in the bug tracker of More Naval AI so I'm posting this report only here.

These improved logs were obtained by using a patch that includes many modifications to how logs are generated. As I mentioned in my earlier post, it makes possible debugging OOS issues in a single computer. Once that I clean it up and test all aspects of it I will upload it to the tracker for inclusion.

Since now I can debug OOS issues in a single computer quickly, it would be highly appreciated if someone could provide me with a multiplayer savegame in which the Infernals are about to conquer a city, and another one in which one of the player is about to be able to create a Puppet State. I could eventually create these savegames by myself, but pressing enter on two different civilization games at the same time for hours ends up being quite tedious. Thanks!
 

Attachments

  • Kuriotate BBAILog.rar
    11.3 KB · Views: 160
OK.... looking at the logs I see that the non-Kuriotate AI civs are overvaluing Education when the player on that computer is playing as the Kuriotates. Strange. I should be able to find what's causing that issue. I must have really mucked something up when messing with AI_techValue(). Thanks for the report!


Edit: Quick look at the code tracks the error to finalImprovementUpgrade() of CvGameCoreUtils. It's checking for getActivePlayer() which I thought would call the current player, not the human player. Need to figure out how to call the current player instead. Anyone have any ideas?
 
Yep. Looks like I added this code in Revision 1047. Try commenting that section out, recompiling and playing the game again to see if that fixes things.

In the meantime, I'll keep looking for a better way to keep the AI from valuing Enclaves when it cant build them.

Edit: Think that I'll have to pass a Player type to finalImprovementUpgrade(). We'll see if I can't make that work without too much hassle.
 
Edit: Quick look at the code tracks the error to finalImprovementUpgrade() of CvGameCoreUtils. It's checking for getActivePlayer() which I thought would call the current player, not the human player. Need to figure out how to call the current player instead. Anyone have any ideas?

Edit: Think that I'll have to pass a Player type to finalImprovementUpgrade(). We'll see if I can't make that work without too much hassle.

Oh, I see. I was not sure about AI_techValue being the only cause of the error, but if we can pinpoint the OOS to an utility method which is called in a lot of places (many of them in other parts of the AI code), I think that it's almost certain that finalImprovementUpgrade is the only cause of the random Kuriotate OOS error.

IMO the change made in r1047 to that method is correct except for the getActivePlayer bit. Since as I mentioned the method is called in many places, IMO forcing it to receive a PlayerTypes parameter is the best solution. Since most of the calls except one are being made in the AI code (which should just use the ID of the AI player for which the calculations are being made) it does not seem like a very intrusive change.

Yep. Looks like I added this code in Revision 1047. Try commenting that section out, recompiling and playing the game again to see if that fixes things.

I reverted r1047 and played until turn 61 in the first game and turn 93 in the second one. There were no desyncs and no differences at all between either the BBAI logs or the MPLog logs. Since without reverting r1047 the differences appear on turn 1, I think it is safe to assume that it is the place in which the OOS happens. According to the log all AIs seem to be rushing Education because they think that they will be able to build Enclaves, which I guess it is something to be expected after reverting r1047.

EDIT: The improved logging support patch has been posted at the MNAI tracker.
 
OK. Think I've fixed it in Revision 1637

Thank you for the quick fix! I can confirm that r1637 both solves the OOS issue and makes AIs make their calculations in MP games correctly (Enclaves for Kuriotate AIs, Towns for the rest, regardless of who is the human player).

I checked other calls to finalImprovementUpgrade and found two at AI_getImprovementValue in CvCityAI. At first I thought it should also use the PlayerTypes parameter, but then I noticed that it is not being used anywhere. The other calls are just Super Forts stuff.
 
Top Bottom