1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

OOS problems and how to fix them

Discussion in 'Bugs and Crashes' started by 45°38'N-13°47'E, Jan 20, 2014.

  1. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,852
    Location:
    Just wonder...
    Ok, here we go with another OOS; this time it's combat-related. I haven't imported TB Combat Mod in AND, so we can be sure that there's something causing OOS errors in combat situation and it's outside of Combat Mod (of course there could be OOS caused by Combat Mod, but even if all of them can be solved there are still some of them in the previous code).

    Code:
    63365	230	Global	Combat	1000	915
    against

    Code:
    63365	230	Global	Withdrawal	100	91
    OOS Logs show

    Code:
    Player 0, Unit ID: 147467, Heavy Swordsman 1 (Rostov)
    X: 74, Y: 40
    [COLOR="Red"]Damage: 32[/COLOR]
    Experience: 0
    Level: 1
    Promotions:
    [COLOR="Red"]City Raider I[/COLOR]
    against

    Code:
    Player 0, Unit ID: 147467, Heavy Swordsman 1 (Rostov)
    X: 74, Y: 40
    [COLOR="Red"]Damage: 30[/COLOR]
    Experience: 0
    Level: 1
    Promotions:
    
    this leads me to

    void CvUnit::resolveCombat(CvUnit* ...

    and particularly to a part of the code which is similar in AND and C2C; this is AND version

    Code:
    		if (GC.getGameINLINE().getSorenRandNum(GC.getCOMBAT_DIE_SIDES(), "Combat") < iDefenderOdds)
    /************************************************************************************************/
    /* BETTER_BTS_AI_MOD                       END                                                  */
    /************************************************************************************************/
    
    		{
    			if (getCombatFirstStrikes() == 0)
    			{
    				if (getDamage() + iAttackerDamage >= maxHitPoints() && GC.getGameINLINE().getSorenRandNum(100, "Withdrawal") < withdrawalProbability())
    				{
    TB Combat MOD changed this part of the code but the common part which could cause troubles I suppose might be

    Code:
    GC.getGameINLINE().getSorenRandNum(GC.getCOMBAT_DIE_SIDES()
    or
    Code:
    iDefenderOdds
    I haven't had a chance to get further, but I'm still looking into it.
     
  2. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    Before I'd switched some things around I had some OOS errors that were very similar and my changes didn't stop them so I think we're still looking at the same basic underlying problem somewhere. And its very interesting that for you it's happening without the unit recalculating itself every round. Very interesting indeed... That suggests the problem is NOT where I had most suspected it might be found. (Still is baffling though.)
     
  3. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,852
    Location:
    Just wonder...
    Or the problem could be there AND somewhere else. ;)
     
  4. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    The problem is not up there at the combat roll but instead somewhere before the withdrawal roll. One computer must have gotten to a withdrawal roll on the previous combat round while the other computer did not so instead it got the combat roll of the next combat round next.
    So check the ifs that lead to a withdrawal roll and check what might have been different.
     
  5. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    It acts identical and I was having the problem in the same exact (withdrawal) spot so I'm pretty sure it's the same issue. The cure might differ a bit at this point but it must be the same problem as it has all the same signature.

    The if on a withdrawal roll is so simple its sick. Basically it's just checking to see if the unit can withdraw and if the amount of damage the unit is receiving this round would kill it. At which point it makes the check. This suggests that somehow the computers are calculating the damage differently which makes no sense at all since the rest of combat goes just fine.

    I wonder though... is it defensive withdrawal, standard withdrawal or combat limit withdrawal that is discovering the problem? I'd suspect it should be defensive withdrawal - there MAY be some more complexity in whether that would be checked or not that MAY be calling for which player is being evaluated in a manner that could cause an async. There's a check there to see if the unit would or wouldn't be threatened on any space it COULD withdraw into. Based on programming speed differences, a unit could've moved in to threaten that space that hadn't yet done so on the other computer perhaps. However... when I've WITNESSED this OOS in game I've not seen this kind of thing being able to explain it as it usually seems to be the units have no other units in sight. So it could just be an inappropriate player check somewhere in the defender withdrawal code...


    That indeed would be something that AND would share still with C2C.
     
  6. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    I am not sure in how this is in the AND codebase but it seems like the odds simulation hit point calculation in C2C is mixed with the hit points for the actual combat rounds.
    This is an unfamiliar part of the code for me so I am not sure but is it guaranteed that predicted hit points calculations is always reset to -1 after odds calculations are over?
     
  7. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    Predictions on hit point calculations would take place each round and would be based on the current running odds for that round so if I understand your question correctly, the predicted hit points are calculated anew at the beginning of each round and are not stored beyond the battle itself (I don't think...) Primarily predictions are a matter of calculations for odds display and should not be extraordinarily relevant during combat itself.

    If you're talking about the prediction that calls the withdrawal check to take place, it's more of a result rather than a prediction. The withdrawal check takes place once the unit is otherwise doomed to die based on current combat results for that round. If the unit successfully withdraws, the battle is called to cease before the damage is actually assigned to the unit from the hit.

    For this, at the beginning of each combat, variables such as these are calculated and then recalculated at the beginning of each subsequent round. This, thus takes place well before the 'hit check' is made and the determined damage - also calculated as part of that recalculation process - is calculated into the current hp of the struck unit to determine if a withdrawal check is necessary.

    For AND, the difference would be that the recalculation is not taking place and the original odds and stats of the unit remains constant throughout the rounds of battle regardless of how many times a unit has been hit and how much damage it has taken in previous rounds. Additionally, AND would not have the early withdrawal mechanism evaluating the HP threshold at which a unit will begin attempting withdrawal each round it would be again struck once it's taken damage past that point. It also would not be factoring in the opponent's pursuit - and for C2C, the opponent's CCs are evaluated into withdrawal and pursuit values as well as it is possible for units to have modifiers vs particular types of opponents. That would also not be present in the AND calculations.

    To make a long story short, since their code is getting the same OOS error (and ours happens with relative infrequency just as theirs does) just before a withdrawal check, I would theorize that the recalculation, early withdrawal and pursuit values are NOT part of the problem. I have to admit to some surprise at this considering I was suspicious of my added complexities albeit having no idea how they could've presented an additional OOS situation.

    However, since its something we share as a deviation from vanilla bts, this does cast some suspicion on the previous combat modification made during Afforess's term with AND which was the adaptation from python to dll of the defender withdrawal code. I can think of a few ways that MIGHT be causing trouble and I recall finding that there is some non-functional coding wrapped into that structure - obviously some attempts were made for certain dynamics that were abandoned but were never cleaned up. I made some attempts to adjust those but I didn't have any greater success than the original. That 'defunct code' doesn't seem to be doing anything however so it's not greatly suspect and since I've changed it from the original I found it again seems to be eliminated as an OOS suspect.

    I just have to wonder if it has something to do with that plot evaluation for whether the unit has a safe plot to escape to or not. If it doesn't, despite what would be a potential success otherwise, the unit is barred from defensive withdrawal. We may have to run that check at the beginning of combat and save the result as a boolean rather than running that evaluation at the point the defensive withdrawal is about to take place - where gamestates may differ.
     
  8. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    Game states must not differ during combat at all, neither at the start nor at the end as that is synced code.

    But you are misunderstanding what I meant with the prediction stuff.

    This is the currHitPoints function that is used in quite a lot of combat calculations:
    Code:
    int CvUnit::currHitPoints()	const
    {
    	return (AI_getPredictedHitPoints() == -1 ? maxHitPoints() - getDamage() : AI_getPredictedHitPoints());
    }
    As you can see, predicted hit points override the real hit points unless they are set to -1. And it is very hard to see from the code if this is always properly cleared up as in set back to -1 after usage.
    It might be useful to put some temporary check into the start of combat resolution to make sure this is always the case.

    The most likely cause for this OOS is another case of bad caching though. maxCombatStr caches its results for the current turn for each plot and unit but I don't think the values it depends on are constant for an entire turn (especially such things like surround and destroy definitely are not constant). And it is called from async places so the results can desync.
    It should be checked if caching like that is valid in the first place and does not screw some things like surround and destroy. And in any case there should be no caching in the async case. Since the bAsync is missing from that currently it will need to be added and set properly for all cases (which might require further bAsync to be added to calling functions).

    Btw, TB, please don't call expensive functions like maxCombatStr twice in a row with the same parameters (see CvUnit::currCombatStr).
     
  9. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    1) I've been puzzling over the AI_getPredictedHitPoints reference there - if I'm not mistaken its something koshling added and I only loosely follow its purpose. As for resetting it - I'll see if I can't further understand what I should be resetting it to and where. Obviously, this is something that should be redetermined prior to the start of all given combat rounds as it should be able to change as it goes. Then again, if you're saying its only the first round that it may not be properly setup then yeah, I can see an issue with that.

    But there's some reasons I suspect that may not be the problem. If it was, I'd think the OOS situation would be happening with nearly every combat wouldn't it? Could it be that after the first round, if the unit has survived the round, it resets safely but at the beginning of combat it doesn't so ONLY the first round combats are relying on a mis-presumption of the current HP of the unit?

    2) the caching on maxCombatStr is, I believe, another Koshling addition which I glossed over somewhat. Since it really should recalculate on a round to round basis it follows the same thinking as the previous thought.

    3) Looking where you directed me... erm... yeah. Good point. That maxCombatStr really should be defined before used in the equation so it doesn't have to be RUN twice, huh? Makes sense.
     
  10. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    The purpose of AI_getPredictedHitPoints is to overwrite the current HP of a unit with a fictional value for "simulated" battles as in odds calculation for the mouseover info and for the AI to determine what it should do. This should in no way be used by combatResolve and must be at -1 when you do the real combat calculations.

    Yes, if that is the case it has a serious effect on battles so if this is the cause (which I don't really think any more but might still be worth testing for in an assert) then it happens very rarely and definitely not by default.

    It might be somewhat expensive if you remove the caching altogether but you should probably move stuff out of there that does change during a turn and then apply bAsync as I outlined in the last post.
     
  11. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,852
    Location:
    Just wonder...
    A new and different OOS now; let's keep hunting them (I'm still checking the withdrawal code, I have some experiments I can make and see what happens).

    Code:
    Player 1, Unit ID: 155685, Caravel
    [COLOR="Red"]X: 2, Y: 10[/COLOR]
    Damage: 0
    Experience: 2
    Level: 2
    Promotions:
    Navigation I
    against

    Code:
    Player 1, Unit ID: 155685, Caravel
    [COLOR="Red"]X: 0, Y: 12[/COLOR]
    Damage: 0
    Experience: 2
    Level: 2
    Promotions:
    Navigation I
    It happened 4 different times, 4 turns in a row; each time quitting the game and restarting it for both players. Random logs are identical, this is the only difference shown in the logs beside a couple of unit names. It happened 3 times with my opponents caravel and another time with one of my caravels (on the other side of the world). No idea what could be causing it and I have no clue beside the fact that could be something in the pathing system. It happened both when the path was drawn to make the caravel move for 2 turns and when the caravel was being moved tile by tile (in fact once it went OOS before caravel finished its movement; the other player already had finished his turn and clicked EoT).
     
  12. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    I'm not sure if there would be some unintended consequences for setting AI_getPredictedHitPoints artificially this way but it occurs to me that it would be very easy to have combat itself check a similar near-clone of the current HP function that doesn't include this call at all. I'll take a deeper look at that AI code - I'd figured it was for AI determinations of odds and whether they'd want to attack or not so yeah, 'simulated battles' but it is and was the 'setting' of the value there that I wasn't sure if it would be safely, or easily, done.

    A quick check into the code reminds me of why I thought it might be dangerous to set this to -1 at the beginning of combat:
    Code:
    
    void CvUnitAI::AI_setPredictedHitPoints(int iPredictedHitPoints)
    {
    	m_iPredictedHitPoints	= iPredictedHitPoints;
    	if ( iPredictedHitPoints == -1 )
    	{
    		//	This is a reset
    		m_bHasAttacked			= false;
    	}
    }
    It's the 'm_bHasAttacked = false' part that I find disturbing. This suggests to me that it was intended only to be set to -1 if the unit did not attack after evaluation or something to that extent. So if we're going to set it to -1 at the very beginning of combat which does look to be wise, shouldn't we build a bypass to that 'if(iPredictedHitPoints == -1)' statement entirely? The unit has indeed attacked if this is being set to -1 at the beginning of a battle right? So it couldn't be right to be setting m_bHasAttacked to false when it should certainly be true.

    Looking at the checks taking place during battle such as
    Code:
    cdAttackerDetails.iCurrHitPoints = currHitPoints();
    which as I understand it is to update the running in-combat values of the hit points after damage is suffered, and the fact that such a check doesn't seem to take place PRE battle, then you may be right and it's only really causing an issue in the first round - this could well throw off the odds while not causing OOS errors unless the battle tends to end in the first round which would only be possible in the normal game if the unit was severely injured to start with. Thus, this limited infrequent event could well explain why the OOS is infrequent as well. It could also be corrupting combat results some.


    Makes sense.


    I'm not really suggesting to remove caching altogether but as you recall when I last attempted to use a bAsync I was about 3 miles off base from how to properly set that up and utilize it so I'm still shaky on how exactly to define WHEN to call true and when to call false for that variable as I'm not sure how to ask the game - "hey are we in an async gamestate or not?" I realize it has to do with identifying WHEN to throw true through the function call but that function is called for a lot of different reasons at different junctures and I'm not sure I'd be able to successfully identify which ones are being called async and which ones aren't.

    That said, a caching that isn't setup to be conditional like this does raise an eyebrow of suspicion. But the issue we had on the other spot still applies here too. Since this function is evaluated in combat EVERY round and most battles do this without problem, what kind of odd situation would be causing it to run from an async call during the middle of a battle, which as you stated, already is assumable to be entirely synchronized?

    From what I get of this, it means that either way we have to figure out HOW that is happening in the first place, how any portion of combat is suddenly taking place async.


    I seem to recall we had this issue with pathing and it had something to do with the units trying to path themselves through enemy territory and would be cleared up if in the same round you could get the unit back to the spot where it caused the OOS in the first place. Koshling had fixed THAT one though so I wonder if this fix was something that's been properly updated from C2C to AND dll codes. I don't THINK we're still getting this one but I could be wrong and it's a new situation.
     
  13. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,852
    Location:
    Just wonder...
    l think that pathing system has been imported completely but I'll check again. But now that you mention it both my and my opponent's caravel were near other civs borders, so I'll check that one also.
     
  14. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    m_bHasAttacked is only used by the AI and in the same context of "simulated battles". In this case for stack battle evaluation when you need to store temporarily if a unit in a stack has already attacked in the simulation.
    So it should be at false at that point anyway.

    It might well have been async from the start of the battle but you don't notice it until it causes a withdrawal on one computer but not the other (because the damage applied was different).
    And in general you would only get this result if in the time between the first odds check from one player (by holding right click on the target) and the actual initiation of the battle some of the stuff that is calculated in there changes.
    For instance, Surround and Destroy is active, the player checks the battle odds with one unit, then decides that he needs more surrounding and moves another unit close. Now he starts the battle and on his computer the false cached result from the odds check is used while on the other computer the additional unit surrounding has an effect. So more damage on one computer causing the unit to withdraw a combat turn earlier.
     
  15. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    hmm... there's a lot more randoms taking place in combat - each round new hit randoms etc... This theory would suggest that those aren't problematic until the last one (due to differing calculated damage). The more I think of it, the more it makes sense. And why SOME rare battles are going out of synch despite having nothing to do with withdrawal - if an extra turn had to run on one system and not the other... yep that'd throw off the random sequence.

    I'll take a deeper look into how to setup the bAsync check on there today by looking at other examples. Taking a stab at it doesn't mean I'm terribly confident I'll be able to do it successfully so I'll want you to audit that work thereafter. Would you be cool with looking it over afterwards?
     
  16. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,852
    Location:
    Just wonder...
    Ok, more news on those OOS from my current game.
    I've had a couple of OOS while in battle and they definetely happened when a unit I was attacking decided to retreat (it was a barbarian unit both times). On the Caravel OOS: I've been able to clear an OOS a couple of times by turning back my ship one tile on its path before the OOS error. Of course I can't always do that because the ship could have no more movements (and I wasn't near anyone's borders). BUT ther's somethimg else I think I've noticed in those cases with the Caravel (but I still need some more testing): I guess it went OOS when I was seeing with the Caravel some barbarian units on the ground. More will follow on this, I hope.

    Edit: no,it seems that in other turns I can pass near barbarian units and in and out of borders and the OOS will and will not happen randomly.
     
  17. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    Sure.
     
  18. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    Ok, so under 'int CvUnit::maxCombatStr(const CvPlot* pPlot, const CvUnit* pAttacker, CombatDetails* pCombatDetails, bool bSurroundedModifier) const'

    I've placed this to initiate the process:

    Code:
    	bool bAsync = (GET_PLAYER(getOwner()).isHuman() || GET_PLAYER(pAttacker->getOwner()).isHuman());
    Then below, I added ' || bAsync ' to the following:
    Code:
    	else if ( bSurroundedModifier )
    	{
    		PROFILE("maxCombatStr.Cachable");
    
    		if ( CombatStrCacheInitializedTurn != GC.getGameINLINE().getGameTurn() || bAsync)
    		{
    			FlushCombatStrCache(NULL);
    		}
    
    		int	iBestLRU = MAX_INT;
    
    		for(iI = 0; iI < COMBATSTR_CACHE_SIZE; iI++)
    		{
    			CombatStrCacheEntry* pEntry = &CombatStrCache[iI];
    
    			if ( pEntry->iLRUIndex == 0 )
    			{
    				pCacheEntry = pEntry;
    				break;
    			}
    			else if ( pEntry->pPlot == pPlot && pEntry->pAttackedPlot == pAttackedPlot && pEntry->pAttacker == pOriginalAttacker && pEntry->pForUnit == this )
    			{
    				//OutputDebugString("maxCombatStr.CachHit\n");
    				PROFILE("maxCombatStr.CachHit");
    				pEntry->iLRUIndex = iNextCombatCacheLRU++;
    				return pEntry->iResult;
    			}
    			else if ( pEntry->iLRUIndex < iBestLRU )
    			{
    				iBestLRU = pEntry->iLRUIndex;
    				pCacheEntry = pEntry;
    			}
    		}
    
    		//char buffer[300];
    
    		//sprintf(buffer,"maxCombatStr cache miss for unit %d, attacker %d @(%d,%d)\n", getID(), pOriginalAttacker == NULL ? -1 : pOriginalAttacker->getID(), pPlot == NULL ? -1 : pPlot->getX_INLINE(), pPlot == NULL ? -1 : pPlot->getY_INLINE());
    		//OutputDebugString(buffer);
    	}
    IF I'm reading the code in this section and later in the function correctly then these 2 simple adjustments ought to do the trick... would you agree?

    Note: the || bAsync addition takes place in this line:
    if ( CombatStrCacheInitializedTurn != GC.getGameINLINE().getGameTurn() || bAsync)
     
  19. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    While that is not the same kind of bAsync as in the other functions, it might be enough to do it like that.
    As you did it you just ban any cache usage if human units are involved. So I guess call it bInvolvesHuman or something to avoid confusion.
     
  20. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,114
    Gender:
    Male
    Location:
    Las Vegas
    I searched for a bit through other regions of code to determine how bAsync usages there were being set and it seemed like the only way I could determine it was ever set to anything other than default false was if the human player was involved. Perhaps some discourse on how this would 'usually' be setup could be helpful.

    Still... I can change bAsync terminology to make it clear.
     

Share This Page