Afforess
The White Wizard
I know one of the biggest issues with Multiplayer in RAND is the inevitable OOS error that crops up. Restarting and resuming is a huge PITA.
What is an OOS?:
Civilization 4 uses an interesting and rather rare multiplayer mechanism. In order to save bandwidth, all the machines run civilization 4 independently. They all start from the same starting point, and because the AI and RNG are deterministic, they stay in "sync". They can fall out of sync one of 2 major ways:
1.) Failure to transmit player actions. When a human operator does an action, that action is broadcast to all other machines. If a modder adds a new action that needs to be transmitted, but isn't, OOS will occur after a player uses a particular feature. This is usually easy to detect, because choosing some setting in a menu leads to an OOS, and you can figure out what you just did to cause it.
2.) Undeterministic code. This is a bug in the codebase that breaks the deterministic design of civilization. Normally, if you start from the same starting point and RNG seed, and play the game the exact same way, you will have the exact same result, every time, without fail. Undeterministic code in some way breaks this and alters the outcome. Tricky to track down without dumping the entire state of every game and comparing. That is what the OOS Logger tries to do, but it can't always catch it.
Why does saving and rejoining fix OOS (if temporarily)?
You essentially are agreeing to use a common state, which fixes the undeterministic code issue.
What can be done to fix MP?
Obviously fixing the undeterministic code would be ideal. The problem is that even base BTS has a few rare OOS bugs. Perfect determinism is hard. Even Firaxis couldn't do it.
That isn't to say we shouldn't try, we should! But we shouldn't expect to succeed, just improve.
So if success is impossible, then what?
Embrace failure.
Saving and rejoining fixes OOS, right? Why not automate that? We can simply have the host (player 1) send their state to all players anytime an OOS is encountered. Internet speeds have vastly improved since the dial-up days, and 3-5MB of data is no problem. It might cost a second or two of lag when an OOS is encountered, but other than that, it should make MP playable again.
Basically how it would work is we could serialize all the state of the game the same way saves already serialize the state of the game, but instead of "saving" it to a file, pipe it over the network, and have the other players deserialize and load the state overtop of their own. Then the OOS should disappear (in theory).
I'd like comments and thoughts from some of the C2C developers, since I know Koshling wrote the new save format and is most familiar with the serialization aspects of the game. But this seems do-able in theory.
What is an OOS?:
Civilization 4 uses an interesting and rather rare multiplayer mechanism. In order to save bandwidth, all the machines run civilization 4 independently. They all start from the same starting point, and because the AI and RNG are deterministic, they stay in "sync". They can fall out of sync one of 2 major ways:
1.) Failure to transmit player actions. When a human operator does an action, that action is broadcast to all other machines. If a modder adds a new action that needs to be transmitted, but isn't, OOS will occur after a player uses a particular feature. This is usually easy to detect, because choosing some setting in a menu leads to an OOS, and you can figure out what you just did to cause it.
2.) Undeterministic code. This is a bug in the codebase that breaks the deterministic design of civilization. Normally, if you start from the same starting point and RNG seed, and play the game the exact same way, you will have the exact same result, every time, without fail. Undeterministic code in some way breaks this and alters the outcome. Tricky to track down without dumping the entire state of every game and comparing. That is what the OOS Logger tries to do, but it can't always catch it.
Why does saving and rejoining fix OOS (if temporarily)?
You essentially are agreeing to use a common state, which fixes the undeterministic code issue.
What can be done to fix MP?
Obviously fixing the undeterministic code would be ideal. The problem is that even base BTS has a few rare OOS bugs. Perfect determinism is hard. Even Firaxis couldn't do it.
That isn't to say we shouldn't try, we should! But we shouldn't expect to succeed, just improve.
So if success is impossible, then what?
Embrace failure.
Saving and rejoining fixes OOS, right? Why not automate that? We can simply have the host (player 1) send their state to all players anytime an OOS is encountered. Internet speeds have vastly improved since the dial-up days, and 3-5MB of data is no problem. It might cost a second or two of lag when an OOS is encountered, but other than that, it should make MP playable again.
Basically how it would work is we could serialize all the state of the game the same way saves already serialize the state of the game, but instead of "saving" it to a file, pipe it over the network, and have the other players deserialize and load the state overtop of their own. Then the OOS should disappear (in theory).
I'd like comments and thoughts from some of the C2C developers, since I know Koshling wrote the new save format and is most familiar with the serialization aspects of the game. But this seems do-able in theory.