Multiplayer: Hacking A Fix Around OOS Errors

Doesn't that need to be persisted though (across the save and reload)?

The last used ID (m_iCurrentID) is already persisted. With the loading line I removed, the iLastId = -1, so when the array goes to allocate it, it will have to a full scan to find the next open spot. Maybe you were just trying to improve performance?

That's my understanding of it. I admit, I am not a professional C++ programmer...mostly Java/Ruby these days.
 
2.) Reading in the m_sorenRand object in CvGame blows up. The seed right before (above) it reads in fine, but for some reason the m_sorenRand seed blows up. Skipping it causes the save stream to desync.
In what way does it blow up ?

You could also try out the other two FDataStreamBase backends to files to see if there is something problematic about the backend (though I guess not if only such specific things do not work).
 
In what way does it blow up ?

3361 CvSavedTaggedFormatWrapper:

Code:
FAssert(m_idDictionary.size() > (unsigned int)m_iNextElementNameId);

Somehow m_iNextElementNameId is negative, usually in the -13000 range.

In what way does it blow up ?

You could also try out the other two FDataStreamBase backends to files to see if there is something problematic about the backend (though I guess not if only such specific things do not work).


I did look at them as a reference for my implementation, in fact. I tested individual values (strings, bytes, etc) and mine seems to be fine, as far as data integrity goes.
 
3361 CvSavedTaggedFormatWrapper:

Code:
FAssert(m_idDictionary.size() > (unsigned int)m_iNextElementNameId);

Somehow m_iNextElementNameId is negative, usually in the -13000 range.




I did look at them as a reference for my implementation, in fact. I tested individual values (strings, bytes, etc) and mine seems to be fine, as far as data integrity goes.
As this is pretty much the only place an unsigned long is used, I would recommend looking at the Read(unsigned long* l) of your backend implementation (the corresponding Write one is not actually used, it is written from a struct in the tagged save format). If that reads a bad number of characters from the stream then the shifted reading pattern could get wrong values like large negative numbers.
 
As this is pretty much the only place an unsigned long is used, I would recommend looking at the Read(unsigned long* l) of your backend implementation (the corresponding Write one is not actually used, it is written from a struct in the tagged save format). If that reads a bad number of characters from the stream then the shifted reading pattern could get wrong values like large negative numbers.

The Read code seems correct to me:

Code:
void FDataStreamBuffer::Read(long* l)
{
        *l = (long)m_pByteBuffer->getLong();
}
 
void FDataStreamBuffer::Read(unsigned long* l)
{
        *l = (unsigned long)m_pByteBuffer->getLong();
}
 
void FDataStreamBuffer::Read(int count, long values[])
{
        for (int i = 0; i < count; i++)
        {
                values[i] = (long)m_pByteBuffer->getLong();
        }
}
 
void FDataStreamBuffer::Read(int count, unsigned long values[])
{
        for (int i = 0; i < count; i++)
        {
                values[i] = (unsigned long)m_pByteBuffer->getLong();
        }
}

My plan tonight is to comment out all of the Write/Read code except the two CvRandom objects and to see if they work in isolation.
 
No Joy. I tried reading/writing with just the 2 CvRandom objects and nothing else in the stream. I still have the same assert and same failure, and the stream desyncs. I posted my CvGame with the 2 CvRandom objects isolated for reference if someone wants to verify/reproduce.

CvGame: https://dl.dropboxusercontent.com/u/49805/CvGame.cpp

I am now going to switch to plan B here and write a separate read/write just for the OOS data. I don't need all the extra handling that the save format has to keep saves in sync, since I can make the assumption all MP players are on the same revision.
 
Brief update. I wrote a resync function which saves the binary state of just the CvGame object in a byte buffer and verified it could be saved and reloaded mid-game w/o issues. Now I need to expand this proof of concept to CvPlot (tiles) CvPlayer, CvTeam, CvUnit and CvMap. Not sure if anything else is worth resyncing.

Once the resync code is written, I'm not done. I need to break the data into chunks, send it to other players in pieces, then re-assemble. I'm confident this will work, just a matter of time investment.

Code so far is in the SVN repo.
 
No Joy. I tried reading/writing with just the 2 CvRandom objects and nothing else in the stream. I still have the same assert and same failure, and the stream desyncs. I posted my CvGame with the 2 CvRandom objects isolated for reference if someone wants to verify/reproduce.

CvGame: https://dl.dropboxusercontent.com/u/49805/CvGame.cpp

I am now going to switch to plan B here and write a separate read/write just for the OOS data. I don't need all the extra handling that the save format has to keep saves in sync, since I can make the assumption all MP players are on the same revision.
I still think it is most likely that the sequence of writing a struct containing an unsigned long as an array of bytes, reading that as a long and then casting it to unsigned long is somewhere binary incompatible in the backend.
 
I still think it is most likely that the sequence of writing a struct containing an unsigned long as an array of bytes, reading that as a long and then casting it to unsigned long is somewhere binary incompatible in the backend.

That's curious because I tested that exact behavior with individual unsigned longs and also I do the exact same process I'm CvGame::resync . Both work fine. Anyway, there should be no problem with storing an unsigned long (32 bits) in a uint_64 container, aside from the extra wasted bits.
 
That's curious because I tested that exact behavior with individual unsigned longs and also I do the exact same process I'm CvGame::resync . Both work fine. Anyway, there should be no problem with storing an unsigned long (32 bits) in a uint_64 container, aside from the extra wasted bits.
The tagged save game format does not use the reading and writing functions in a symmetric fashion.
For writing it puts the data in a struct together with other data, that it then writes as an array of bytes. On the other hand for reading it uses separate read operations for the contents of the struct in a sequence.
That means if the backend alters the single elements in some way like aligning them in a slightly larger container, it won't work.
 
The tagged save game format does not use the reading and writing functions in a symmetric fashion.

Ahhh.....

If I had known that from day 1, I could saved myself a lot of time. I'd have just written code to save to a symmetric buffer (which I am writing now) from the start and we might already be done....

Anyway, this just validates my plan B I guess. I am writing the game state in a symmetric fashion.
 
I'm a bit lost here (not surprisingly though as my knowledge on this is in the beginner level), but what exactly are you trying to fix that you finally discovered what AIAndy said there Afforess? Don't need much details, I just want to know the topic which is causing that specific OOS you're trying to solve.




About Barbarian Civs, I don't know if this will help much, but when I was still playing LoR we had that feature there and it rarely gave an OOS (it did sometimes, but not much).

What frequently issued OOSs was the Range Bombard feature (there it was only available from Artillery onwards) and it was not exactly frequent (it happened almost every time we got a pretty big stack of artilleries and used Range Bombard several times in a row), and now the most troublesome: AIs' cities or entire civs asking to become a part of your empire.

The curious about that last feature was that the pop-up appeared for both players, even it being supposed to target only one, and normally when it was accepted the city flipped on the other player's game (to the right player at least), while on the triggering player's game nothing happened.

As many features are common between the mods (but LoR's are probably outdated), maybe this can help your efforts.

I'm really happy in your effort to solve the OOSs, and I'm eager to help with playtests and ideas soon enough. :goodjob:
 
Spirictum - the goal of this endeavor is not to fix any particular OOS (although individual fixes are good and have been made). The goal here is much more ambitious. I aim to fix ALL OOS problems at once. This would be possible by resyncing the game state.

Remember an OOS literally means the game state for the two or more players has become out of sync. That means an action on one machine didn't happen on another.

My plan is to fix the entire system. Instead of having OOS be a problem that requires a restart, just have code send the entire game state to other... And the OOS is gone. This will end 99.99% of OOS issues and make MP stable again.

What I need to do to make this work is write code to turn the game state (your tiles, cities, units, etc) into a stream of bytes that can be sent across the network. Originally I thought I could just re-use the existing save/load code to do this , since saving the game is nearly the same processs, saving transforms the game state to bytes on your filesystem. I encountered bugs trying to reuse the save code that I could not explain, until AIAndy pointed out the save code can not do the task I wanted, fundamentally. So I have to write it myself. It's tedious, but I will likely be done by the weekend.

The end result should make RAND the most stable MP mod in existence, possibly even more stable than the base BTS, which has a few rare OOS issues. I realize this sounds like over optimism but the radical nature of this solution can not be overstated.

I should point out that my solution is not revolutionary or even uncommon. Nearly all MP games use a central server/client model where the server sends the game state to clients. Civ is rather unique in that it does NOT do this. My solution will effectively make the host player (or pit boss) player the central server which issues updates to all players as needed.

I will make some caveats. There is likely to be brief lag any time an OOS is auto-corrected by the game, depending on map size and network speed. LAN will probably be instant. Cable barely noticeable.
 
It seems a great achievement to the mod! Well done!

But how exactly do you plan to choose which game state is the right one?

If you always choose the same player, then problems like the asking to be a part of your empire won't be solved (it'll result in what was decided in the selected game state player, even if it was opposed to what should have happened), or if something nice happened to the other player that wasn't accounted on the "game state" player (a random event that awarded a Great Person, or a big ammount of gold) then the other player will get the game state that this benefit didn't happen. Once I had an OOS that surely had to do with the fact in my game 2 nations were at war while in my friend's game they weren't, and passing the turn triggered the OOS. Imagine if in the game state player a war doesn't happen while in the other player it does and because of that he spend the rest of the turn spending gold to hurry several troops and then when the turn is passed he loads the other player's game state and then there isn't any war and he spent all those resources instead of using them as he wanted. It'll be frustrating to the non game state to say the least.


At least with the OOS being solved in real time for LAN players, we may track down those problems easily (because with this I'm sure I'll convince my civ friend to start playing again), and then it'll be easier to solve then.
 
It seems a great achievement to the mod! Well done!

But how exactly do you plan to choose which game state is the right one?

On pitboss, this is easy - the pitboss server has the right gamestate.


If you always choose the same player, then problems like the asking to be a part of your empire won't be solved (it'll result in what was decided in the selected game state player, even if it was opposed to what should have happened), or if something nice happened to the other player that wasn't accounted on the "game state" player (a random event that awarded a Great Person, or a big ammount of gold) then the other player will get the game state that this benefit didn't happen. Once I had an OOS that surely had to do with the fact in my game 2 nations were at war while in my friend's game they weren't, and passing the turn triggered the OOS. Imagine if in the game state player a war doesn't happen while in the other player it does and because of that he spend the rest of the turn spending gold to hurry several troops and then when the turn is passed he loads the other player's game state and then there isn't any war and he spent all those resources instead of using them as he wanted. It'll be frustrating to the non game state to say the least.


At least with the OOS being solved in real time for LAN players, we may track down those problems easily (because with this I'm sure I'll convince my civ friend to start playing again), and then it'll be easier to solve then.

Yes, this is a problem. Right now for non-pitboss I am going to assume the host human player (human player with the lowest id) is the correct game state.

I agree the scenario you point out is a problem - however it will become a reproducible problem and easier to fix.

In my testing of MP and OOS issues, so far, your scenario has never been the case.
 
Yes, this is a problem. Right now for non-pitboss I am going to assume the host human player (human player with the lowest id) is the correct game state.

I agree the scenario you point out is a problem - however it will become a reproducible problem and easier to fix.

In my testing of MP and OOS issues, so far, your scenario has never been the case.

And I hope in RAND it doesn't happen as it used to do in LoR.

BTW, what about letting the players choose whose state is the one to be sent to the other when such situations happen? Would it be difficult or time-consuming? It would also help us reach the OOS problems and it would already let less frustrations in MP if something that benefitted the non-host doesn't just vanish after an OOS adjustment.
 
And I hope in RAND it doesn't happen as it used to do in LoR.

BTW, what about letting the players choose whose state is the one to be sent to the other when such situations happen? Would it be difficult or time-consuming? It would also help us reach the OOS problems and it would already let less frustrations in MP if something that benefitted the non-host doesn't just vanish after an OOS adjustment.

I could definitely create a popup that let players vote on who to use as the base state when an OOS occurs.

I'll look at doing that after I provide the fix itself works.
 
Top Bottom