Multiplayer Woes

Me and a friend are getting tired of the OOS phenomena and decided to help you guys get rid of it, by sending in some OOS- & rand num.- logs.

Most of these OOS appear when one of the players manually move hunters around. The hunter part might just be a coincidence since they are way more active in early game than other units.

OOS often appear if I use "go to" twice where the second one is used while the units is performing the first "go to" process. (quick move is turned on)

We don't know witch logs are useful, tell me if there are something missing and which logs are useless to you and if you need the saved games from both players or not at all.
we play with revolution on, no random events and start as minor.

Hope this helps.
View attachment 312762

View attachment 312763
Thank you, that is well sorted with all the information that I need.
Most of the OoS situations are also very pure with only the position of a single unit desynced.
 
3390 T511 - Player Eric manually ordered hunter to move after already attacking a unit. Unit was auto-moving to tile and discovered animal and OOS

3340 T516 - Eric unit named John Churchill (hunter) ordered to move OOS. DEFINITE bug in the upgrades and how the movement costs are calculated. One machine reports moves cost more than the other, upgrades are not understood by both computers. One machine reports the ability to move 2 squares or 3 while the other reports 1 or 2. Repeatable bug, if your interested.
Thank you, that will help a lot to narrow down the source of the problem (which probably causes quite some of the reported issues).
Are the promotions reported by the two computers the same?
 
Are the promotions reported by the two computers the same?
Unknown, we will be playing this to end of game so neither of us are looking at the map in worldbuilder or retiring or anything of that nature. I have not seen many of his men, and vise versa

Here are the OOS logs, we have now begun to restart the game completely every OOS (pain in the ass) and have narrowed down the OOS further I feel

1650 T685 - Unknown OOS, Movement bug W/ Jav suspected

1520 T698 - Another Jav causing movement OOS

1100 T740 - Movement bug W/ Churchill on accident (sucks only being allowed to move 1 square for fear of OOS)

The promotion in question is any promotion that allows double movement over select terrain, and possibly any of the +1's (but alas our game is still young so we dont know). Furthermore I think there could be an issue with the new terrain that was added, it could be compounding the bug and the promotions not working correctly.

And now, for your logs! (This posts, and lasts)

OOS_Report.7z

Thank you AIAndy
 
Unknown, we will be playing this to end of game so neither of us are looking at the map in worldbuilder or retiring or anything of that nature. I have not seen many of his men, and vise versa

Here are the OOS logs, we have now begun to restart the game completely every OOS (pain in the ass) and have narrowed down the OOS further I feel

1650 T685 - Unknown OOS, Movement bug W/ Jav suspected

1520 T698 - Another Jav causing movement OOS

1100 T740 - Movement bug W/ Churchill on accident (sucks only being allowed to move 1 square for fear of OOS)

The promotion in question is any promotion that allows double movement over select terrain, and possibly any of the +1's (but alas our game is still young so we dont know). Furthermore I think there could be an issue with the new terrain that was added, it could be compounding the bug and the promotions not working correctly.y
Most of your OoS (and the ones of Toffer) are the same problem source as you correctly state.
I checked the movement cost code. There was a bug in there but one that does not take effect until Map Making tech.
Then I checked the code that assigns promotions but that looks fine as all should be called on all computers.

So I suspect the issue is either in the path finding or the MOVE_TO mission handling.
 
I don't really understand how the MOVE_TO handling could have an issue unless it was modified specifically by this mod. The error is only occurring when somebody gets a 1/2 cost movement upgrade on their unit. Even then its not a very common occurrence (well now that we have units with 1,000's of XP it can happen 10 times a turn)

Both machines that have the units calculate correctly that the unit can move X spaces (at least the popup telling you how many turns that will take is accurate) The issue is the unit can still move with his .5 movement, and oftentimes will, especially if manually controlled. For all I know it could be a rounding issue of some sort, so doubling move and costs could be an effective solution (if that could be the case)

I wish I could be more helpful, maybe one day thats a possibility but for now its these amateurish reports.
 
I don't really understand how the MOVE_TO handling could have an issue unless it was modified specifically by this mod. The error is only occurring when somebody gets a 1/2 cost movement upgrade on their unit. Even then its not a very common occurrence (well now that we have units with 1,000's of XP it can happen 10 times a turn)

Both machines that have the units calculate correctly that the unit can move X spaces (at least the popup telling you how many turns that will take is accurate) The issue is the unit can still move with his .5 movement, and oftentimes will, especially if manually controlled. For all I know it could be a rounding issue of some sort, so doubling move and costs could be an effective solution (if that could be the case)

I wish I could be more helpful, maybe one day thats a possibility but for now its these amateurish reports.

Many of the callbacks used in path generation ARE chnaged in this mod, for performance reasons (path generation is much more efficient in C2C as a result). However, these are deterministic changes (no randomness involved) so should result in the saem path calculations on both machines.

Can you expand on the statement 'The issue is the unit can still move with his .5 movement, and oftentimes will'? Basically the way the pathing works is to enable the next tile's worth of movement if you have ANY movement points left over, subtract what it costs to do that, and repeat, which is why a 1-movement unit can move onto a hill for example. As such I'm unsure what you mean by the quoted statement, because 0.5 points left should alwasy allow the next tile's wiorth of movement to take place...

One theory we currently have, if this is only occuring with human units, is that there is a caching issue. There are known problems with path caching in the way the game engine's path generator (which is outside of the DLL so we cannot change the structure of, only influence by changing the code in the routines it uses as callbacks, which are in the DLL). Specifically, code is known to invoke the path generator, telling it that it's ok to use cached calculations from previous path generation, in circumstances when it is not (typically when the previous cached values refer to a different stack). The AI cases of this are all fixed in C2C, but the fix doesn't apply to human player stack path generation. I'm going to push a small change to SVN, to explicitly turn off cache usage for the human player, and we can see if that helps. I'll let you know when that is done (should be later today).

Edit - this change is now pushed to SVN. Please let me know if it makes any difference (though frankly I think it's a fairly long shot)
 
One theory we currently have, if this is only occuring with human units, is that there is a caching issue. There are known problems with path caching in the way the game engine's path generator (which is outside of the DLL so we cannot change the structure of, only influence by changing the code in the routines it uses as callbacks, which are in the DLL). Specifically, code is known to invoke the path generator, telling it that it's ok to use cached calculations from previous path generation, in circumstances when it is not (typically when the previous cached values refer to a different stack). The AI cases of this are all fixed in C2C, but the fix doesn't apply to human player stack path generation. I'm going to push a small change to SVN, to explicitly turn off cache usage for the human player, and we can see if that helps. I'll let you know when that is done (should be later today).
I think I found the problem.
A caching issue, but one in the caching of movement costs.

Here is the code in the movementCost function:
Spoiler :
Code:
	static std::map<int,int>* resultHashMap = NULL;

	if ( resultHashMap == NULL )
	{
		resultHashMap = new	std::map<int,int>();
	}
	int iResultKeyHash = pFromPlot->getMovementCharacteristicsHash() ^ (getMovementCharacteristicsHash() << 1) ^ pUnit->getMovementCharacteristicsHash();

	std::map<int,int>::const_iterator match = resultHashMap->find(iResultKeyHash);
	if ( match != resultHashMap->end() )
	{
		iResult = match->second;
	}
	else
	{
		FAssertMsg(getTerrainType() != NO_TERRAIN, "TerrainType is not assigned a valid value");

		if (pUnit->flatMovementCost() || (pUnit->getDomainType() == DOMAIN_AIR))
		{
			iResult = GC.getMOVE_DENOMINATOR();
		}
		else if (pUnit->isHuman() && !isRevealed(pUnit->getTeam(), false))
		{
			iResult = pUnit->maxMoves();
		}
		else if (!pFromPlot->isValidDomainForLocation(*pUnit))
		{
			iResult = pUnit->maxMoves();
		}
The caching itself is correctly only done later at the end of the else part of this if, but a cache hit bypasses this part:
else if (pUnit->isHuman() && !isRevealed(pUnit->getTeam(), false))
{
iResult = pUnit->maxMoves();
}

And since local code does call the pathfinder and therefore this movementCost function the cache state is not synced. So if for one player there is a cache hit but not for the other, then the generated path for one player will include the unrevealed plot information but not for the other and therefore it will likely cause an OOS.
 
nice catch you all! with this progress, I really hope to get some multiplayer games going from now on :) maybe we could even have a weekly game or so with the members of this forum?
 
I think I found the problem.
A caching issue, but one in the caching of movement costs.

Here is the code in the movementCost function:
Spoiler :
Code:
	static std::map<int,int>* resultHashMap = NULL;

	if ( resultHashMap == NULL )
	{
		resultHashMap = new	std::map<int,int>();
	}
	int iResultKeyHash = pFromPlot->getMovementCharacteristicsHash() ^ (getMovementCharacteristicsHash() << 1) ^ pUnit->getMovementCharacteristicsHash();

	std::map<int,int>::const_iterator match = resultHashMap->find(iResultKeyHash);
	if ( match != resultHashMap->end() )
	{
		iResult = match->second;
	}
	else
	{
		FAssertMsg(getTerrainType() != NO_TERRAIN, "TerrainType is not assigned a valid value");

		if (pUnit->flatMovementCost() || (pUnit->getDomainType() == DOMAIN_AIR))
		{
			iResult = GC.getMOVE_DENOMINATOR();
		}
		else if (pUnit->isHuman() && !isRevealed(pUnit->getTeam(), false))
		{
			iResult = pUnit->maxMoves();
		}
		else if (!pFromPlot->isValidDomainForLocation(*pUnit))
		{
			iResult = pUnit->maxMoves();
		}
The caching itself is correctly only done later at the end of the else part of this if, but a cache hit bypasses this part:
else if (pUnit->isHuman() && !isRevealed(pUnit->getTeam(), false))
{
iResult = pUnit->maxMoves();
}

And since local code does call the pathfinder and therefore this movementCost function the cache state is not synced. So if for one player there is a cache hit but not for the other, then the generated path for one player will include the unrevealed plot information but not for the other and therefore it will likely cause an OOS.

To clarify the point I think you are making:

  • On the originating machine the path is generated twice. Once while the UI is displaying the potential path, before the user releases the mouse, and then again when the user dispatches the order and the mission MOVE_TO is actually pushed.
  • On the other machine the path generation only occurs once (when the mission is pushed)
  • If nothing else in the game state changes between the first and second instances of the path being generated on the first machine, then the cached results from the first generation are correct and will produce the same result as if the cache were not used on the seond run. It will also match the calculated-for-the-first time values on the target machine, and all will be well
  • However, if the player generates a path via the UI but does NOT execute it, values will have been cached by the UI path generation that was never actually executed on the first machine. If the user then chooses to do something else first (e.g. - move another unit to reveal more tiles say), then the cached values will actually be potentially incorrect. Thus when the order is finally executed, the previously cached (and actually now incorrect) values will be used on the originating machine, potentially causing it to generate a 'cheat' path. The target machine however will calculate from scratch and get the correct values and generate a 'legitimate' path. On rare occassions these will differ adn an OOS will result
I think this is a general case, and not just about THAT cache (e.g. - stacking limits can change legalness of moves). It does not occur in the AI (which was where the caching was all developed and tested, single player), because the AI always handles one stack completely before moving on to another.

I suggest we need three fixes here:
  1. The change I already made (which will prevent analogous issues with other caches, I think)
  2. Change the movementCost() to not cache for human players (as a safety measure, though I think probably the next point obviates the actual need)
  3. Invalidate the movement cost cached value for a plot when a plot's reveal status is changed

I'll deal with it.
 
I suggest we need three fixes here:
  1. The change I already made (which will prevent analogous issues with other caches, I think)
  2. Change the movementCost() to not cache for human players (as a safety measure, though I think probably the next point obviates the actual need)
  3. Invalidate the movement cost cached value for a plot when a plot's reveal status is changed

I'll deal with it.
It does not actually cache the value for the plot but for the movement characteristics combination represented by the zobrist hash. Reveal status is not included there and probably should not anyway. Easier to just move the first two ifs before the cache check and keep the cache as it is.

EDIT:
However, if the player generates a path via the UI but does NOT execute it, values will have been cached by the UI path generation that was never actually executed on the first machine. If the user then chooses to do something else first (e.g. - move another unit to reveal more tiles say), then the cached values will actually be potentially incorrect. Thus when the order is finally executed, the previously cached (and actually now incorrect) values will be used on the originating machine, potentially causing it to generate a 'cheat' path. The target machine however will calculate from scratch and get the correct values and generate a 'legitimate' path. On rare occassions these will differ adn an OOS will result
Worse, it is not a rare case. When you are looking where to move to you usually check out lots of different possibilities without actually giving the order. So you have filled the cache for the movement characteristic combination of your hunter with lots of different terrain/feature types. Considerably more than the other player (whose hunter probably has one or two movement promotions different). Now you give an order to move to unrevealed terrain which has terrain/feature combinations that are already in your cache on the way. This means the cache hit bypasses the unrevealed check and you get the 'cheat' path (should actually be quite easy to check in a single player game).
 
@DocCox: I'm interested to know if you've got any update on this issue, how is it working out with the fixes the extraordinary Koshling and AIAndy have come up with?

Cheers
 
Great work figuring out the reason for this perticulary problematic OOS this fast.
If the fix works, I would really appreciate an update patch asap for us non SVN users.
 
Great work figuring out the reason for this perticulary problematic OOS this fast.
If the fix works, I would really appreciate an update patch asap for us non SVN users.
The big number of changes since the release does not allow for easy patching. I highly recommend to switch to the SVN version (that also makes it very easy to keep up to date).
 
I have not tried the fix at this time, but as soon as I do I will attempt to create an OOS and let you guys know if it works. Hopefully in the next couple of days I will get some play time

Not too interested in playing the SVN version, some of the proposed changes long term are not particularly interesting to me, and with the bugs that come with SVN releases and frequent changes... balanced game play could be extremely difficult. We will give it a chance when we figure out how to do SVN releases... but I cant promise we will stay up to date
 
Tried the fix and it works, kudos.
We started using SVN and only had 1 OOS after playing 5 hours :goodjob:.

The OOS report were too large for attachment so I published it on MediaFire:

http://www.mediafire.com/?ajit4ctb4b8n4qw

This one happened when a British Stone spearman attacked my roman javeliner and won on the first end turn of the save, the OOS could also be connected to the British great commander who fled the scene of the action.
 
Not too interested in playing the SVN version, some of the proposed changes long term are not particularly interesting to me, and with the bugs that come with SVN releases and frequent changes... balanced game play could be extremely difficult. We will give it a chance when we figure out how to do SVN releases... but I cant promise we will stay up to date

You can just get the latest SVN and not update it further. It is recomended if you want to enjoy multiplayer. revision 1668 seems stable and is almost free of OOS at least in early games.
 
Back
Top Bottom