Multiplayer: Hacking A Fix Around OOS Errors

humm ,
why give a choice? why should a player wont pick to force?

That's something I discussed with Afforess before. Some OOSs happen because of a situation that should/shouldn't happen, and had different outcomes in each player's game (e.g. City flipping with revolution always go wrong in the gamestate of the one who should get the city, but goes right in the game of the other player who has nothing to do with the flipping). Always forcing a resync with the gamestate of the same player may lead to these undesired situations (which in loading the game would normally be solved by choosing the right gamestate to host).
 
Ok, yesterday I've started a test MP game with RyoHazuki and we had terrible experience with OOS. We have logs if Afforess wants to have a look at them but my main question is: how does the new resynch code work? I mean, what happens when the game goes OOS? Should it resynch by itself or what? We've also tried hot-joining the game but it didn't work every time. And a couple of times it happened that the game was in synch then OOS for a few seconds, then in synch again and so on; all without moving any unit or doing anything (well, I was opening the chat window to chat with RyoHazuki).
 
45°38'N-13°47'E;13501402 said:
Ok, yesterday I've started a test MP game with RyoHazuki and we had terrible experience with OOS. We have logs if Afforess wants to have a look at them but my main question is: how does the new resynch code work? I mean, what happens when the game goes OOS? Should it resynch by itself or what? We've also tried hot-joining the game but it didn't work every time. And a couple of times it happened that the game was in synch then OOS for a few seconds, then in synch again and so on; all without moving any unit or doing anything (well, I was opening the chat window to chat with RyoHazuki).

No it's not complete, I need to add a UI dialogue so that users can approve of resyncs before it occurs (automatically occurs currently).

Also there is a rare ctd when resyncing new units that I need to fix.
 
No it's not complete, I need to add a UI dialogue so that users can approve of resyncs before it occurs (automatically occurs currently).

Also there is a rare ctd when resyncing new units that I need to fix.

Each time we get an OOS RyoHazuki game CTD (I'm hosting the game); here are a couple of Minidumps if you want to check. I've tried debugging them but I'm not able to.

https://www.dropbox.com/s/xglt23a3ddt60t4/MiniDump_834.dmp?dl=0

https://www.dropbox.com/s/7dyw1pf1rh8dj80/MiniDump_835.dmp?dl=0

Edit: all in all, it's not so terrible right now, we're at the start of renaissance
 
We've stopped that game some days ago when we reached late industrial/early modern era because it was going OOS every turn now.

We've started another MP game and right now we're in Renaissance and so far game is going pretty good: 8 OOS in maybe 270 turns, 2 OOS in first 200 turns or so (blitz speed, for testing purposes). Some of those OOS might be avoided, I suppose, they were caused by wrong settings on my or RyoHazuki pc. This time we're playing with Revolution ON and we had no OOS caused by revolutions; I think that part of the code is pretty stable. We still get some OOS from diplomacy (there must be something wrong, I suppose in Advanced Diplomacy but I still need to check) and we've got some OOS caused by combats. We've disabled some BUG options some turns ago (Defender Withdrawal, Dynamic XP and Battlefield promotions) and got no OOS since we disabled them. I suspect one or more of those options are a source of OOS, if the game goes on without OOS, we'll try enabling them one at the time.
 
Just an update, we're around turn 370, industrial era and we had about a dozen of OOS; there has been plenty of wars, revolutions and we're engaged at the moment in a war against each other. So far so good, we'll see how it goes once we reach modern era.

On a side note, we've discovered that we can't play a 3 players game. Maybe this could be related to the (still incomplete) resync code. When the 3rd player joins in, he gets a message of connecting to player2 instead of connecting to player1 which hosts the game, even if he's connecting to the correct IP address of player1. We've also tried connecting player3 to player2 IP but it doesn't work (he gets an "invalid identification" message). I've started a game in MP with 3 players months ago and it was working so it's probably related to something we've done recently. Sadly, there's nothing I can do about it; we'll have to wait when Afforess has some free time to look into this problem.

Just to make sure: has anyone been able to play in MP a 3 (or more) players game recently? Thank you.
 
Did you made some progress on this ?

On my major check list to make this mod perfect, there is still:
- Multiplayer resync code from Afforess (n°1 :))
- Asian languages support
- xUPT code improved
- A good main menu animation
 
Did you made some progress on this ?

On my major check list to make this mod perfect, there is still:
- Multiplayer resync code from Afforess (n°1 :))
- Asian languages support
- xUPT code improved
- A good main menu animation
No, we have to find some time to test it with Afforess I suppose but lately we had other priorities although we've discussed the subject.

Also talking about improvements, I'm still trying to improve nukes usage by AI and some balancing work to cover as many options as possible.
Another minor task includes fixing totestra maps where resources are not spawned on mountains.
 
Did you made some progress on this ?

On my major check list to make this mod perfect, there is still:
- Multiplayer resync code from Afforess (n°1 :))
- Asian languages support
- xUPT code improved
- A good main menu animation

I hit a major issue where resyncing units caused a crash in the EXE, related to the graphic entities attached to the units. I could exclude units from the resync, but that means units would still cause OOS.

So right now I have been trying to theorize better ways to find OOS and fix those...since fixing OOS issues would remove the need for resyncing in the first place. The primary problem is that by the time 2 players get an OOS, tens of thousands of lines of code since the cause of the problem have already been executed, and tracking down where the issue occurred exactly is tricky.

I have been considering writing some sort of code to create a dump of the game state at regular intervals (a few thousand times a turn...) and trying to use that to tell where the game states between two players diverge. But the problem is that this would require me to add a lot of code to dump the state at critical sections in the code, and seriously hurt turn time in MP. At least while the code is active. So I am still in the theorizing & planning stage with this idea.
 
Well, anyway turn times in MP shouldn't be too much of an issue :) I also wonder what's causing the problem about preventing any other player to join the game after first two players are connected.
 
I wish it was simpler to resolve that issue. Maybe in the far future, Firaxis will release the Civ 4 source code :x Good luck anyway.
 
I wish it was simpler to resolve that issue. Maybe in the far future, Firaxis will release the Civ 4 source code :x Good luck anyway.

No, that will never happen. Civilization 4 uses the GameByro engine, which is proprietary and they can not release the code for that. The EXE uses the GameByro engine.

The whole reason for the separate DLL / EXE is to get past licensing issues and let modders have some access while not breaking the GameByro licensing agreements. Firaxis was generous to give us any source code at all, most games just store all the code in the EXE and never release anything.
 
I hit a major issue where resyncing units caused a crash in the EXE, related to the graphic entities attached to the units. I could exclude units from the resync, but that means units would still cause OOS.

So right now I have been trying to theorize better ways to find OOS and fix those...since fixing OOS issues would remove the need for resyncing in the first place. The primary problem is that by the time 2 players get an OOS, tens of thousands of lines of code since the cause of the problem have already been executed, and tracking down where the issue occurred exactly is tricky.

I have been considering writing some sort of code to create a dump of the game state at regular intervals (a few thousand times a turn...) and trying to use that to tell where the game states between two players diverge. But the problem is that this would require me to add a lot of code to dump the state at critical sections in the code, and seriously hurt turn time in MP. At least while the code is active. So I am still in the theorizing & planning stage with this idea.

The steps that worked for me - for repeatable OOS errors at least:

1) Use the MP logs and random results reports and compare them to get a place to start looking.
2) Scatter about reports on new log file among the suspicious functions and those functions related to anything taking place between the last time the game state was in synch and the point at which it begins to fall out of synch. This will help to narrow down which function is causing trouble.
3) Keep narrowing down with more detailed game state reports at various intervals within the identified function to the new log until you find the exact line of code causing trouble. Should be easy enough to diagnose from there.
4) Once you're there it should be fairly easy to see what goes wrong and resolve it.
5) Once resolved and proven to be resolved, go back and remove all the old report coding to clean up the code.

This process requires quite a few run throughs to keep triggering the issue and then reprogramming to narrow in but it has proven to work. I've not found a reliable way to resolve OOS errors reported by others without a save that repeats the process reliably each time.
 
I would like to play this mod with a friend in multiplayer, so I'm eagerly awaiting this fix :)
 
I would like to play this mod with a friend in multiplayer, so I'm eagerly awaiting this fix :)
Many people are, I'm one of them. Unfortunately, it's tricky and it requires time. Hopefully Afforess will be able to solve the problem sooner or later [emoji4]
 
Hey Afforess, I had an idea about a way to seek OOS errors (maybe a bit naive). This is just a draft:
We can try to find a way to create pitboss games and to log into the same game. Then make the game AND pitboss to generate logs and put the game to autoplay. That would generate OOS automatically. Even so, I don't know if there is a way to understand what is out of sync from the displayed number.
 
Hey Afforess, I had an idea about a way to seek OOS errors (maybe a bit naive). This is just a draft:
We can try to find a way to create pitboss games and to log into the same game. Then make the game AND pitboss to generate logs and put the game to autoplay. That would generate OOS automatically. Even so, I don't know if there is a way to understand what is out of sync from the displayed number.

The hard part about fixing OOS errors is not finding OOS errors. It's easy to create OOS errors, just play regular MP. The hard part is that by the time you have an OOS show up in the game, hundreds, if not thousands of executions have gone by and you don't know exactly where in the code the OOS occurred. What is needed is OOS tracing code throughout the codebase to effectively dump the state of the entire game (i.e a save) thousands of times a turn and use that to reduce the search area.

When the game goes out of sync, you are finding out FAR after the fact. When you see the OOS error on the screen, you are finding out so long ago it might as well be a different century as far as the game is concerned. So what is needed is earlier OOS detection.
 
Thank you for the answer. I'm not enough skilled to dive on this question.
 
Thank you for the answer. I'm not enough skilled to dive on this question.

Preamble: I am assuming you understand basic philosophy/logic and know what Determinism is. If not, see: http://en.wikipedia.org/wiki/Determinism

It's a really tricky problem, I agree. Basically the problem at a fundamental level is trying to ensure the game runs in a perfectly deterministic fashion. If it is deterministic, and you know the initial state, then you should never get an OOS.

Obviously, since we have OOS issues, it's not perfectly deterministic.

One random thought I had was wondering if we can somehow get the game to run in a headless mode, so we could script gameplay. If we could have a singleplayer game start, save the start, and run for a turn, save, and exit. Then load the singleplayer start save and run the turn again, we should get two turn 1 saves that are bit for bit, identical. If we don't, then there was a non-deterministic event.

The tricky part here is that as far as I know, you can't run Civ4 headlessly. There is a pitboss mode which is kinda close, but as far as I know, it still needs a client to connect to it to run. I wonder if we could modify pitboss to ai-autoplay with no clients.
 
Top Bottom