1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

OOS problems and how to fix them

Discussion in 'Bugs and Crashes' started by 45°38'N-13°47'E, Jan 20, 2014.

  1. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,834
    Location:
    Just wonder...
    Ok, I'm trying to analyze the code to hunt that "AI Diplo Declare War Trade" OOS; I suppose that this part of the code is the one where AI is trading other AI to make war (am I correct?); anyway here's the first part of the code:

    Code:
    									if (!(GET_PLAYER((PlayerTypes)iI).isHuman()) || !(abContacted[GET_PLAYER((PlayerTypes)iI).getTeam()]))
    									{
    										PROFILE("CvPlayerAI::AI_doDiplo.DeclareWar");
    
    										int iDeclareWarTradeRand = GC.getLeaderHeadInfo(getPersonalityType()).getDeclareWarTradeRand();
    										int iMinAtWarCounter = MAX_INT;
    										for (iJ = 0; iJ < MAX_CIV_TEAMS; iJ++)
    										{
    											if (GET_TEAM((TeamTypes)iJ).isAlive())
    											{
    												if (atWar(((TeamTypes)iJ), getTeam()))
    												{
    													int iAtWarCounter = GET_TEAM(getTeam()).AI_getAtWarCounter((TeamTypes)iJ);
    													if (GET_TEAM(getTeam()).AI_getWarPlan((TeamTypes)iJ) == WARPLAN_DOGPILE)
    													{
    														iAtWarCounter *= 2;
    														iAtWarCounter += 5;
    													}
    													iMinAtWarCounter = std::min(iAtWarCounter, iMinAtWarCounter);
    												}
    											}
    										}
    										
    										if (iMinAtWarCounter < 10)
    										{
    											iDeclareWarTradeRand *= iMinAtWarCounter;
    											iDeclareWarTradeRand /= 10;
    											iDeclareWarTradeRand ++;
    										}
    										
    										if (iMinAtWarCounter < 4)
    										{
    											iDeclareWarTradeRand /= 4;
    											iDeclareWarTradeRand ++;
    										}
    
    										if (GC.getGameINLINE().getSorenRandNum(iDeclareWarTradeRand, "AI Diplo Declare War Trade") == 0)
    										{
    So, going backwards from the bottom line: a random number is generated based on iDeclareWarTradeRand; this is taken directly from xml values in int iDeclareWarTradeRand = GC.getLeaderHeadInfo(getPersonalityType()).getDeclareWarTradeRand(); (most leaders have a value of 40, at least in AND). This iDeclareWarTradeRand is modified differently if iMinAtWarCounter has different values.

    In turns, iMinAtWarCounter is calculated as above (or here below)

    Code:
    										int iMinAtWarCounter = MAX_INT;
    										for (iJ = 0; iJ < MAX_CIV_TEAMS; iJ++)
    										{
    											if (GET_TEAM((TeamTypes)iJ).isAlive())
    											{
    												if (atWar(((TeamTypes)iJ), getTeam()))
    												{
    													int iAtWarCounter = GET_TEAM(getTeam()).AI_getAtWarCounter((TeamTypes)iJ);
    													if (GET_TEAM(getTeam()).AI_getWarPlan((TeamTypes)iJ) == WARPLAN_DOGPILE)
    													{
    														iAtWarCounter *= 2;
    														iAtWarCounter += 5;
    													}
    													iMinAtWarCounter = std::min(iAtWarCounter, iMinAtWarCounter);
    It shouldn't be that hard until now... either is iMinAtWarCounter or iAtWarCounter. Since iMinAtWarCounter = MAX_INT, the only variable is iAtWarCounter which depends on AI_getAtWarCounter and on AI_getWarPlan

    Which leads me to

    Code:
    int CvTeamAI::AI_getAtWarCounter(TeamTypes eIndex) const
    {
    	FAssertMsg(eIndex >= 0, "eIndex is expected to be non-negative (invalid Index)");
    	FAssertMsg(eIndex < MAX_TEAMS, "eIndex is expected to be within maximum bounds (invalid Index)");
    	return m_aiAtWarCounter[eIndex];
    }
    and

    Code:
    WarPlanTypes CvTeamAI::AI_getWarPlan(TeamTypes eIndex) const
    {
    	FAssert(eIndex >= 0);
    	FAssert(eIndex < MAX_TEAMS);
    	FAssert(eIndex != getID() || m_aeWarPlan[eIndex] == NO_WARPLAN);
    	return m_aeWarPlan[eIndex];
    }
    I'm not sure if the clues I'm following and my line of reasoning make sense until now and I'm not even sure how to go on from here. But while looking AI_getWarPlan entries in CvTeam.cpp, I've found the following function

    Code:
    int CvTeamAI::AI_endWarVal(TeamTypes eTeam) const
    {
    	PROFILE_FUNC();
    
    	int iValue;
    
    	FAssertMsg(eTeam != getID(), "shouldn't call this function on ourselves");
    	FAssertMsg(isAtWar(eTeam), "Current AI Team instance is expected to be at war with eTeam");
    
    [COLOR="Red"]	//	Just evaluate once per turn and cache for the rest of that turn
    	if ( m_endWarValueCache[eTeam] != -1 )
    	{
    		return m_endWarValueCache[eTeam];
    	}[/COLOR]
    
    	iValue = 100;
    
    	iValue += (getNumCities() * 3);
    	iValue += (GET_TEAM(eTeam).getNumCities() * 3);
    
    	iValue += getTotalPopulation();
    	iValue += GET_TEAM(eTeam).getTotalPopulation();
    
    	iValue += (GET_TEAM(eTeam).AI_getWarSuccess(getID()) * 20);
    
    	int iOurPower = std::max(1, getPower(true));
    	int iTheirPower = std::max(1, GET_TEAM(eTeam).getDefensivePower());
    
    	iValue *= iTheirPower + 10;
    	iValue /= std::max(1, iOurPower + iTheirPower + 10);
    	
    	WarPlanTypes eWarPlan = AI_getWarPlan(eTeam);
    
    	// if we are not human, do we want to continue war for strategic reasons?
    	// only check if our power is at least 120% of theirs
    	if (!isHuman() && iOurPower > ((120 * iTheirPower) / 100))
    	{
    		bool bDagger = false;
    		
    		bool bAnyFinancialTrouble = false;
    		for (int iI = 0; iI < MAX_PLAYERS; iI++)
    		{
    			if (GET_PLAYER((PlayerTypes)iI).isAlive())
    			{
    				if (GET_PLAYER((PlayerTypes)iI).getTeam() == getID())
    				{
    					if (GET_PLAYER((PlayerTypes)iI).AI_isDoStrategy(AI_STRATEGY_DAGGER))
    					{
    						bDagger = true;
    					}
    					
    					if (GET_PLAYER((PlayerTypes)iI).AI_isFinancialTrouble())
    					{
    						bAnyFinancialTrouble = true;
    					}
    				}
    			}
    		}
    		
    		// if dagger, value peace at 90% * power ratio
    		if (bDagger)
    		{
    			iValue *= 9 * iTheirPower;
    			iValue /= 10 * iOurPower;
    		}
    		
    	    // for now, we will always do the land mass check for domination
    		// if we have more than half the land, then value peace at 90% * land ratio 
    		int iLandRatio = ((getTotalLand(true) * 100) / std::max(1, GET_TEAM(eTeam).getTotalLand(true)));
    	    if (iLandRatio > 120)
    	    {
    			iValue *= 9 * 100;
    			iValue /= 10 * iLandRatio;
    	    }
    
    		// if in financial trouble, warmongers will continue the fight to make more money
    		if (bAnyFinancialTrouble)
    		{
    			switch (eWarPlan)
    			{
    				case WARPLAN_TOTAL:
    					// if we total warmonger, value peace at 70% * power ratio factor
    					if (bDagger || AI_maxWarRand() < 100)
    					{
    						iValue *= 7 * (5 * iTheirPower);
    						iValue /= 10 * (iOurPower + (4 * iTheirPower));
    					}
    					break;
    
    				case WARPLAN_LIMITED:
    					// if we limited warmonger, value peace at 70% * power ratio factor
    					if (AI_limitedWarRand() < 100)
    					{
    						iValue *= 7 * (5 * iTheirPower);
    						iValue /= 10 * (iOurPower + (4 * iTheirPower));
    					}
    					break;
    
    				case WARPLAN_DOGPILE:
    					// if we dogpile warmonger, value peace at 70% * power ratio factor
    					if (AI_dogpileWarRand() < 100)
    					{
    						iValue *= 7 * (5 * iTheirPower);
    						iValue /= 10 * (iOurPower + (4 * iTheirPower));
    					}
    					break;
    
    			}
    		}
    	}
    
    /************************************************************************************************/
    /* BETTER_BTS_AI_MOD                      05/19/10                                jdog5000      */
    /*                                                                                              */
    /* War strategy AI, Victory Strategy AI                                                         */
    /************************************************************************************************/	
    	if( AI_isAnyMemberDoVictoryStrategy(AI_VICTORY_CULTURE4) )
    	{
    		iValue *= 4;
    	}
    	else if( AI_isAnyMemberDoVictoryStrategy(AI_VICTORY_CULTURE3) || AI_isAnyMemberDoVictoryStrategy(AI_VICTORY_SPACE4) )
    	{
    		iValue *= 2;
    	}
    
    
    	if ((!(isHuman()) && (eWarPlan == WARPLAN_TOTAL)) ||
    		  (!(GET_TEAM(eTeam).isHuman()) && (GET_TEAM(eTeam).AI_getWarPlan(getID()) == WARPLAN_TOTAL)))
    	{
    		iValue *= 2;
    	}
    	else if ((!(isHuman()) && (eWarPlan == WARPLAN_DOGPILE) && (GET_TEAM(eTeam).getAtWarCount(true) > 1)) ||
    		       (!(GET_TEAM(eTeam).isHuman()) && (GET_TEAM(eTeam).AI_getWarPlan(getID()) == WARPLAN_DOGPILE) && (getAtWarCount(true) > 1)))
    	{
    		iValue *= 3;
    		iValue /= 2;
    	}
    
    	// Do we have a big stack en route?
    	int iOurAttackers = 0;
    	for( int iPlayer = 0; iPlayer < MAX_CIV_PLAYERS; iPlayer++ )
    	{
    		if( GET_PLAYER((PlayerTypes)iPlayer).getTeam() == getID() )
    		{
    			iOurAttackers += GET_PLAYER((PlayerTypes)iPlayer).AI_enemyTargetMissions(eTeam);
    		}
    	}
    	int iTheirAttackers = 0;
    	CvArea* pLoopArea = NULL;
    	int iLoop;
    	for(pLoopArea = GC.getMapINLINE().firstArea(&iLoop); pLoopArea != NULL; pLoopArea = GC.getMapINLINE().nextArea(&iLoop))
    	{
    		iTheirAttackers += countEnemyDangerByArea(pLoopArea, eTeam);
    	}
    
    	int iAttackerRatio = (100 * iOurAttackers) / std::max(1 + GC.getGameINLINE().getCurrentEra(), iTheirAttackers);
    		
    	if( GC.getGameINLINE().isOption(GAMEOPTION_AGGRESSIVE_AI) )
    	{
    		iValue *= 150;
    		iValue /= range(iAttackerRatio, 150, 900);
    	}
    	else
    	{
    		iValue *= 200;
    		iValue /= range(iAttackerRatio, 200, 600);
    	}
    /************************************************************************************************/
    /* BETTER_BTS_AI_MOD                       END                                                  */
    /************************************************************************************************/
    
    	iValue -= (iValue % GC.getDefineINT("DIPLOMACY_VALUE_REMAINDER"));
    
    [COLOR="Red"]	if (isHuman())
    	{
    		iValue = std::max(iValue, GC.getDefineINT("DIPLOMACY_VALUE_REMAINDER"));
    	}
    
    	m_endWarValueCache[eTeam] = iValue;[/COLOR]
    
    	return iValue;
    }
    There's a cached value which leads me to pay more attention to this function because I know caching is dangerous. It's true that at the end of the code there's a isHuman check for that cached value, if I'm not mistaken, but I'm not sure the OOS we've got involved human players. By the way, until now I've encountered OOS regarding humans only but can an OOS involve AI players only? Or should somehow a human player always be involved? Anyway, is that caching safe? Or should I place there a check like isNetworkMultiplayer (or something like that)?
    Sorry for the long post, I'm writing down everything for reference and because it also helps me think. ;) If I can't come up with anything from that part of the code, I'll go up looking for other parts of AI_doDiplo, I suppose.
     
  2. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Don't OOS errors require human interaction being different? Thus on one computer something processes differently than on the other? I suppose it's an oversimplification...

    I had an interesting idea to help us track these down. Random generated numbers do tend to be a good indicator that an oos took place just before those show up as different when comparing the random logs. So what we could do to help us track them down is insert a generally meaningless random number generation of like a 100 roll and by simply labeling it it would be tracked by the random log. Insert a 1st check then a 2nd check just before and after any suspicious bits of code that enable us to prove to ourselves where the problem lies. Then remove them once the problem is repaired. We can put them all over the place wherever there is suspicion, first with wide focus to diagnose the correct problematic function, then move them in towards the problem once more closely identified to narrow in on the actual bad spot.

    As a procedural tool to identify the actual point in the code that's causing us trouble this could be an extremely useful method where we have TONS of suspect code.
     
  3. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,834
    Location:
    Just wonder...
    Well, I've seen OOS even when letting 2 pc fighting against each other with autoplay, that's why I was asking.

    I think you're idea is pretty good but I'm not sure how to implement it. Can we make an example so I'm sure I understand?
    Like for example, if I think the problem is here:

    Code:
    		int iMinAtWarCounter = MAX_INT;
    										for (iJ = 0; iJ < MAX_CIV_TEAMS; iJ++)
    										{
    											if (GET_TEAM((TeamTypes)iJ).isAlive())
    											{
    												if (atWar(((TeamTypes)iJ), getTeam()))
    												{
    													int iAtWarCounter = GET_TEAM(getTeam()).AI_getAtWarCounter((TeamTypes)iJ);
    													if (GET_TEAM(getTeam()).AI_getWarPlan((TeamTypes)iJ) == WARPLAN_DOGPILE)
    													{
    														iAtWarCounter *= 2;
    														iAtWarCounter += 5;
    													}
    													iMinAtWarCounter = std::min(iAtWarCounter, iMinAtWarCounter);
    Should I place a random generator somewhere between those lines? Something like

    GC.getGameINLINE().getSorenRandNum(100, "OOS TESTING"); would that be enough? It should appear in the random log, correct?

    Edit: so the code should look like

    Code:
         int iDeclareWarTradeRand = GC.getLeaderHeadInfo(getPersonalityType()).getDeclareWarTradeRand();
    		int iMinAtWarCounter = MAX_INT;
    		for (iJ = 0; iJ < MAX_CIV_TEAMS; iJ++)
    		{
    			if (GET_TEAM((TeamTypes)iJ).isAlive())
    			{
    				if (atWar(((TeamTypes)iJ), getTeam()))
    				{
    				  int iAtWarCounter = GET_TEAM(getTeam()).AI_getAtWarCounter((TeamTypes)iJ);
    			          [COLOR="Red"]GC.getGameINLINE().getSorenRandNum(100, "OOS TESTING");[/COLOR]
    				  if (GET_TEAM(getTeam()).AI_getWarPlan((TeamTypes)iJ) == WARPLAN_DOGPILE)
    				  {
    				  	  iAtWarCounter *= 2;
    					  iAtWarCounter += 5;
    				  }
    				  iMinAtWarCounter = std::min(iAtWarCounter, iMinAtWarCounter);
    				}
    			}
    		}
    and if AI_getAtWarCounter is causing troubles, OOS TESTING should appear in the random logs with different values. Have I understood your idea correctly?
     
  4. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Exactly. To pinpoint where you suspect you can do exactly that. But lets say its even more vague where the problem may lie and say you suspect its 'somewhere in here' like a large section of code, but not sure where.

    You can use the random logs like brackets in this case and then squeeze the bracketing down little by little until you find the source. At first you could lay out the random checks like a motion detector grid, segmenting off different sections - such as if you have a lot of functions you're trying to find the problem among, name the function in the random name.

    The problem is if the code isn't all that repeatable it then takes a lot of testing to go with it.


    I updated my wife and I's dll to the boat movement OOS fix and we didn't seem to have any issues with that - we'd been getting it fairly commonly before but then again she did delete HER ships which appeared to cause them more than mine so we'll see if it's really gone or not as the game progresses. We did have one OOS I'm going to need to look into the logs on though.
     
  5. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    That isHuman check does not change if caching is used or not.
    Depending on what causes the OOS it can occur without any human interaction. But the most common causes of OOS have bad async code from UI interaction which means the human player is involved.
    It might be unsafe. That depends on if that information is used somewhere in the UI and therefore called from async context. For safety you might want to deactivate that cache for network multiplayer. If the OOS still occurs, you can reactivate it (the price of deactivating it is just a bit of performance).

    Be careful. If that piece of code is run from async context, this will cause an OOS. You might be better off making a separate log with that information and compare it similar to the random log.
    On the other hand it will clearly show you that something is called from async context.
    And for pieces of code that already have random calls, it should be a fine way to find the problem. Best put it in #ifdef though so you can easily remove it later.
     
  6. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    What can cause one system to process a bit of code differently than the other without there being an element of individual interaction that differs on one system than the other? Can you give an example of how processing might take place in such a case?


    In testing use of the method last night I began to wonder about this.

    That is a mysterious magic indeed. Where would I begin learning how to generate a log?

    Right, so still quite usefull... I was theorizing to myself here that if you find these CAUSE an OOS, then procedurally the appropriate reaction may be to setup a bAsync bool in the parameters then determine (with the same method) which function call (or calls) is/are coming in from an async context and have them call it with bAsync true. Then isolate the random sensors (OOS feelers) out if bAsync and look for any other randoms inside the function to conditionally isolate out with bAsync and call with the async random instead at that point.

    If a bAsync bool is already established then I'll need to keep in mind to setup the OOS feelers just as I mentioned but with an understanding that the bAsync bool is already in place here. They would then be capable of determining if further calls to the function that SHOULD be considered bAsync but are not being declared as such when called are taking place. If so, then use feelers around each call to the function to determine which function is causing the issue.

    Of course, you're right that taking great care with the whole process not to misread results is going to be critical.

    That may not be a bad idea at all. I'm working with a separate code source for my multiplayer file and was thinking I'd just extrude fixes over to the main once found but a global variable to be used as the #define for the #ifdef calls would be most helpful as I should then not be finding it necessary to remove all the OOS feelers later.

    Just to make sure on something though, if you have a #ifdef call and the macro is not defined then it defaults ALWAYS to 0 (false) right?
     
  7. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    I've been working on OOS errors all day... they've put a bee in my bonnet here.

    @45°38'N-13°47'E: Apparently the OOS from boat movements wasn't fixed by the adjustment you made there. However, I THINK I found the problem.

    This line:
    if (!pSelectionGroup->AI_isControlled() || pSelectionGroup->getHeadOwner() == GC.getGameINLINE().getActivePlayer())

    in CvGameCoreUtils in int pathDestValid(int iToX, int iToY, const void* pointer, FAStar* finder)

    I think was causing trouble with its getHeadOwner == get Active Player bit. Replacing it with:

    if (!pSelectionGroup->AI_isControlled() || GET_PLAYER(pSelectionGroup->getHeadOwner()).isHuman())

    seems to have cleared some things up but I can't be 100% sure yet.


    @AIAndy:
    One thing I'm noticing in the random logs is that the Outcome Roll seems to be frequently coming up out of order (and not in a safe way). I've been trying to track down how it does this but I wondered if you could make something of it or had an idea as to where to look into the problem. I get on the left:
    Code:
    27055	497	Global	AI Best Unit	50	25
    27056	497	Global	Outcome roll	132	82
    27057	497	Global	Animal Attack	100	77
    27058	497	Global	Animal Attack	100	52
    27059	497	Global	Animal Attack	100	18
    27060	497	Global	Animal Attack	100	8
    27061	497	Global	Animal Attack	100	61
    27062	497	Global	Animal Attack	100	78
    27063	497	Global	Animal Attack	100	87
    27064	497	Global	Animal Attack	100	64
    27065	497	Global	Animal Attack	100	78
    27066	497	Global	Animal Attack	100	58
    And on the Right:
    Code:
    27055	497	Global	AI Best Unit	50	25
    27056	497	Global	AI Best Unit	50	25
    27057	497	Global	AI Best Unit	50	43
    27058	497	Global	AI Best Unit	50	33
    27059	497	Global	AI Best Unit	50	45
    27060	497	Global	AI Best Unit	50	3
    27061	497	Global	AI Best Unit	50	17
    27062	497	Global	AI Best Unit	50	26
    27063	497	Global	AI Best Unit	50	2
    27064	497	Global	Outcome roll	132	19
    27065	497	Global	AI Unit Birthmark	10000	5968
    27066	497	Global	Animal Attack	100	58
    At that last animal attack it went back together for a little bit but those in between the first and last of both segments there were getting entirely different results. I'm thinking that even if it has something to do with multi-threading, it isn't safely doing so. And sadly, multi-threading is something I'd be really lost looking into problems among. We even have our multi-core city usage set to 1 core apiece so from what I understand our dlls shouldn't be multi-threading at all right?

    Anyhow... what do you make of this?

    Also... is there any danger in setting a function parameter of bAsync to true in any cases where the function may NOT be async?
     
  8. 45°38'N-13°47'E

    45°38'N-13°47'E Chieftain

    Joined:
    Jun 7, 2008
    Messages:
    5,834
    Location:
    Just wonder...
    TB, I haven't had other movement OOS after the previous change but maybe we're too early in the game, although we've just started Renaissance. I'll let you know and if the game goes OOS, I'll try applying your fix. Thank you.
     
  9. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    For instance using an uninitialized variable can cause an OOS.

    The easy way is to use gDLL->logMsg

    Yes, #ifdef stands for "if defined".
    So #ifdef X is equivalent to #if defined X

    Those pieces of log don't seem to fit to each other. Are you sure they are from the same place of execution? Mind: The line numbers don't count.

    Yes, because using the async RNG in sync context will cause an OOS.
     
  10. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Interesting... I'll keep an eye out for that possibility now too.


    Is this so that we can keep the randoms from causing an OOS?


    Ok thanks for confirming.


    I'm uploading the logs for you to look at. From what I can tell the line numbers - or rather the sequence in which the randoms were derived - do count tremendously as if they fall out of sequence at all they'll be coming up with different results for those checks. Otherwise, how would I know the difference in how to answer your question about what the same place of execution would be?


    Damn... can't win for losing here huh? There's no way to simply say in code - if running async - so when a function call sometimes comes in async and other times does not there's no easy way to determine this is always or never or sometimes true. How VERY frustrating! Particularly when it comes to python calls into a function - some may be async others may not so what, we have to go into the python itself to define the call as async or not? Sheesh this nightmare of complexity grows worse by the day. Isn't there SOME way we can generate an async detection mechanism that doesn't cause an OOS that we can use to define whether a call should be made indicating an async context without having to delve into the headache inducing logic puzzle of how the game goes async at a particular point? Or make a random that's safe for both? Like one computer rolls the random then just reports it over to the other in a message and tells the other to pick up from there?

    My forehead is getting bloodied pretty good by this brick wall I keep smashing it into.
     
  11. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    A log entry that is different won't cause an OOS but you can compare it to see what was different on both computers.

    Rely on a comparison with TortoiseMerge or a similar program.

    That would be deadly for performance and you would not gain anything.
    Unfortunately only the entry points mark async and synced code and that information is not always stored (the bAsync bool that is passed through function calls is the exception).
     
  12. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Hmm... since it won't travel in sequence with the other randoms it might be tricky to read and glean info from but I may be able to make something like that work. I suppose using your method very liberally would give me a good roadmap without so many false results I was getting from the fact that the OOS errors would be caused BY the randoms I was generating.

    Still... wondering exactly how this would work...

    Looking into other uses in the code of gDLL->logMsg I find it only really comes up in the gametext manager such as:
    Code:
    			szDebugBuffer.Format("Religion %s, TGA index %i.", pInfo.getType(), pInfo.getTGAIndex());
    				gDLL->logMsg("CvGameTextMgr_buildCityBillboardString.log", szDebugBuffer.c_str());
    So the way this would work in this example would be that it would add a message line to a log file (will it auto generate the log file if it names one that hasn't been declared or created anywhere else?) named CvGameTextMgr_buildCityBillboardString.log, right? Where does it assume to place the log in the files? Is it where you'd expect it to end up?

    Then... we put something like (declaring szDebugBuffer first):
    szDebugBuffer.Format("Code Document, Function Name, ID#, (then use the random generator here) %i);
    gDLL->logMsg("OOS Diagnostics.log", szDebugBuffer.c_str());

    ?

    When we use the random generator inside this formatting line would we not cause an OOS just the same?

    EDIT: Further investigation found different methods than szDebugBuffer.Format to set this up and if I'm not mistaken, that really only works inside the context of the gametextmanager for some reason - I presume some preformatting takes place there to set that method up that I've not been able to identify. However, this was interesting:
    Code:
    				sprintf(szOut, "Line\tTurn\tStream\tReason\tMax\tResult\n");
    				gDLL->logMsg(szFile,szOut, false, false);
    
    Does sprintf work from anywhere in the solution? There are some other methods as well to build the strings and I'm a little concerned I don't know enough about string generation mechanisms to do this correctly.
    That's what I'm doing so I'm wondering what would be the difference of approach (I use BeyondCompare.) The line numbers show up side by side and show the differences in random result values and what was called differently and such... I'm not eyeballing the line numbers to tell the difference... that'd be madness! :)

    EDIT: I realized just now that you might be trying to say - take a look at the actual results as they may be coming up in the line number sequence incorrectly but may still have the same result on the same check. Sadly, that isn't the case with these logs, no. The results on those outcome checks differ even though you CAN find where it took place later on one machine then it did on the next.


    By entry points you mean the points where the game does actually go async and back to sync right?

    Again... it's very interesting that in the dll code there are no places where bAsync is ever defined as true. Not that I have found anyhow. So it's like we've generated a method for resolving a known problem but have never done anything to indicate 'here it is'.
     
  13. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    You can do something as easy as
    gDLL->logMsg("MyPrecious.log", "TestLog at Mount Doom", false, false);
    or generate any string you desire for the second parameter. If you don't know how to build strings, read up on it. This only needs to be some unique string so you know where it is in the code and then maybe the value of a variable or two that you want to make sure are the same on both computers.

    The random log pieces you posted don't fit to each other at all. It is likely from some part where it is already out of sync or it might be a bad merge.



    The entry points are where the program flow enters the DLL from the exe. The major synced entry points are CvGame::update and the messages. Async entry points are for instance the widget handling stuff.
     
  14. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    And for the value of the variable that differs... we can use a normal random call right? If that's going to take us out of sync then obviously I'm missing the point and it won't work. But if that's how we can get 'differing values' and it doesn't cause an OOS to do it then we've got a good method here. If it does cause an OOS to generate a random for that then how else would we do this reliably? It'd be some interesting guesswork to set these up with local variables in there that could differ ... they wouldn't always do so right?


    I had uploaded the dang log reports... I'll do so again for this post. Please take a look at them and tell me what you think. I'm seeing a lot of this sort of thing. You could be right that we're already out of sync by then and I'm just not inferring from the log correctly somehow.




    Ah... so all those functions that have a bAsync established for them... are those able to take a meaningful definition for bAsync because the exe is referring to them and at times it's the exe itself calling to this function stating that bAsync is true?

    If I'm running the game in single player mode and analyzing it with VS attached to the process, would I be able to somehow get the game to stop directly on a call to the bAsync variable whenever the variable is set to True so I can see the call stack from there and get an idea of what happened leading up to this point? Or will bAsync ever be set to true when running in single player mode thus muting the purpose of this question in the first place?

    Is there a way to know that a particular function is being used as an entry point by the exe? (I think I know the answer to this question but I'm intrigued to see how the 'official' answer goes on this.)


    BTW: THANK YOU for spending this time trying to get the information in your head over to mine. I know this is never easy for you to work with me and I think sometimes I must have you feeling like I'm a hopelessly lost cause but your explanations do help a lot even if I do get shaky understanding from them at times. They usually do eventually make sense.
     

    Attached Files:

  15. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    Random call? Why would you? You can just display the variable content without anything like a random call.
    The point here is to find out when the synced state starts to differ and find that out as early as possible and best also how it differs.

    The OOS starts at line 27033 and it is quite weird. Despite both computers doing AI Best Unit rolls with 50 as max, they get different results.

    I don't think the exe calls any functions passing a bAsync value (although it might at one point). But the entry point functions will pass true or false for bAsync on some calls or other functions that are only called from synced entry point functions might.

    Sure you can. There is no difference between single player and multi player there.

    Only functions that are exported from the DLL may be entry points, but not all of them are used from the exe. You can also check what functions the exe imports with a tool like dumpbin.
    In addition to that, the exe can call into Python which in turn calls into the DLL. So any functions exposed to Python can also be entry points.
     
  16. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Ah... ok now that I've got a confirmation that you're not saying to use the random calls to detect async environments in this way then I'm beginning to follow what you're getting at. It will take more time and thought to set these up but shouldn't be tooooo bad. And probably will indeed show a lot of useful information that would really enable fixes to be much easier. Ok... looking forward to setting up and trying this method then. Thanks for the advice and working me through it thus far.


    Is this what we can expect to see if the async random is used when the gamestate is in sync? About that too... if the answer here is no then what WOULD I be looking for to know that this is what has caused an OOS?


    On this subject it might be easier for me to look at the original Vanilla code and see if bAsync was ever used as a variable there. I'm still mystified by how these bAsync references NEVER have anywhere in the dll where bAsync would ever be set to true (except for a line of code I put in a while back that you said wouldn't be the correct way to determine that but also wasn't apparently anything necessary to reverse either.) So when/how would it ever be set to true? If the exe will never set it to true and it's never referenced in any of the python exposures and the dll never sets it to true then how is it working for us at all?

    Ok, so maybe some of them do send a result of true through the bAsync parameter when the function is called but it seems like every function I find all references for shows no way this would take place. So if this were to happen if the function was called from the exe then it would make more sense to me but did we set up the bAsync parameter there or was that something pre-existing our efforts?


    Cool... that's good to know. Now I just gotta go figure out how to get a stop point to only stop when the variable shows up as a specified result.


    So... functions that are exported from the DLL are the ones that are declared with DllExport right? dumpbin? Never heard of it - is it useful for more than this sort of thing? I may have to check that out.

    Wow... ok. I was suspecting some python calls could be introducing async states into the dll processing. How do you best find the calls from python into dll functions? I presume grepwin would be helpful here?
     
  17. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    No, it is quite a weird one. Something is wrong with the seeds there because using the RNG X times is supposed to give you the same sequence. Most likely this is a different random stream but then why is it named Globals instead of after a city.

    Have you even followed any of them backwards until the root?

    There are conditional breakpoints.

    It is a tool to display information about Windows executables (including DLLs).

    Any that are exposed to Python from the DLL can be called from there. That are a LOT.
     
  18. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    At first upon reading this I started thinking maybe there's a config ini problem here that differs on both machines but then I figure that would probably make a problem emerge much faster. Very curious problem indeed. Won't rule out that it's not something I incidentally caused somewhere though.


    I believe I have and have found a few that don't ever find a way to have derived a true for bAsync but I'll look again. My ability to confuse myself here is epic.

    To give myself more room to do so I'm going to double post here. In the next post I'm going to walk you through my analysis on a problem I've found and it relates to the above error perhaps. I'm sure you'll see something I don't.

    If that's the correct term for it that's at least helpful - I can probably find some help file on that somewhere.


    Again... interesting.


    Well... I meant if you have identified one python exposed function, how would you best scour the python files for the call to that function? I'm sure there's probably going to be found quite a few but what I'm asking for is an equivalent that will search through the python much the same way as you can 'Find All References' in VS through the dll side of the project. Is there a method or tool or mechanism I may not be aware of that would help in this effort? Do I need to consider it to be called slightly differently than the name given in the dll if I just use wingrep to search through all the py files?
     
  19. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    25,880
    Gender:
    Male
    Location:
    Las Vegas
    Ok, so I evaluated a log where according to all I understand about the random logs, the OOS took place here:
    iValue *= (getCitySorenRandNum(50, "AI Best Unit") + 100);

    inside the function:
    UnitTypes CvCityAI::AI_bestUnitAI(UnitAITypes eUnitAI, int& iBestValue, bool bAsync, bool bNoRand, CvUnitSelectionCriteria* criteria)

    I'm thinking I'm on to something here considering that this line has already been identified as a potential problem since apparently this function can be called in an async gamestate otherwise we wouldn't have gone to the trouble of the rest of what it's encompassed by:
    Code:
    								if ( !bNoRand )
    								{
    									if (bAsync)
    									{
    										iValue *= (GC.getASyncRand().get(50, "AI Best Unit ASYNC") + 100);
    										iValue /= 100;
    									}
    									else
    									{
    										iValue *= (getCitySorenRandNum(50, "AI Best Unit") + 100);
    										iValue /= 100;
    									}
    								}
    Cool.

    Ok, so, nowhere above this point in the function is bAsync ever defined to be true and it's coming in from the function parameters. A Find All References search on this function shows that it's exposed to python but the bAsync parameter is not. Ok... possible issue there. It also shows that ALL but one dll call to this function are passing false for the bAsync parameter.

    So I start with an assumption (or a prayer rather) that it's not a problem coming in from an undeclared async gamestate from a python use of this function. To figure out which call to this function shows me that there was an async gamestate upon making this call I used the random check method where you explained to me that if a random is called during an async state it'd certainly go OOS there and would let you thus know, when looking at the random logs, that an async gamestate had taken place at that location. I put lines like these:
    Code:
    //#ifdef OOS_DEBUG
    //				int	iDiag = GC.getGameINLINE().getSorenRandNum(100, "OOS DIAG CvCityAI 3 Async?");
    //#endif
    before each of the calls made to this function inside the dll.

    THIS one was interesting because it DID perform exactly as I hoped and was the one to trigger that there was an async state here at this location, which was just before the line:
    eUnit = AI_bestUnitAI(((UnitAITypes)iI), iUnitValue, bAsync, bNoRand, criteria);

    located in:
    UnitTypes CvCityAI::AI_bestUnit(int& iBestUnitValue, int iNumSelectableTypes, UnitAITypes* pSelectableTypes, bool bAsync, UnitAITypes* peBestUnitAI, bool bNoRand, bool bNoWeighting, CvUnitSelectionCriteria* criteria)

    huh... wouldn't you figure it was the ONE and only reference to CvCityAI::AI_bestUnitAI that passed anything but a 'false' for bAsync! Ok, I'm feeling 'on to something' here. So I search upwards from this call to AI_bestUnitAI through AI_bestUnit (the naming sucks btw - really doesn't help to make the function purpose very clear) and I find nowhere where bAsync has been set to anything but what it came into this function as (this function as shown does include a bAsync parameter itself.)

    Soooo... now I search all references to THIS function, AI_bestUnit.

    Again I find Python reporting soooo... ugh. Another potential entry point of mystery that ALSO is not given the bAsync parameter to consider there either.

    However there is only one other call to this function in the dll so that's a bit simpler. Aaaand that's right here in this line:
    eBestUnit = AI_bestUnit(iDummyValue, -1, NULL, false, &eUnitAI, false, false, criteria);

    Which is located in the function:
    bool CvCityAI::AI_chooseUnitImmediate(const char* reason, UnitAITypes eUnitAI, CvUnitSelectionCriteria* criteria)

    As you can see... the dll has solidly considered the original position I identified as NEVER being able to set bAsync to true throughout this entire processing path and Python was NEVER setup to pass true and NONE of these functions are declared with DllExport so from what you were saying earlier, there's no way the exe is coming into any of this with a declaration that bAsync would be true.

    YET we have an async state taking place at the original error and one step up. So what I DID before you told me this could also be a problem, was to change the line in the last function:
    eBestUnit = AI_bestUnit(iDummyValue, -1, NULL, false, &eUnitAI, false, false, criteria);
    to pass true for bAsync rather than false.

    The result was the next OOS and random logs - those were the logs I shared with you. Now... that either means that my fix worked and things are always async at CvCityAI::AI_chooseUnitImmediate and thus I correctly repaired the problem and we're seeing a new one, or we're seeing a new one that has nothing to do with this one and it triggered before this one came up again (the distance we made random checks without error in the random log I posted here compared to the one that started this search suggest otherwise as the one that started this search hadn't made it far into the random log before triggering the async state - oh and I should mention that I commented out the 'feeler check' after this attempted fix), or the problem you are struggling to explain about those posted OOS logs stems from this 'fix' somehow.

    I'm sure I've gone about this in a vastly different way than you might expect me to and I'm hoping you might be able to see something in all of this that I don't. I'm also praying there's evidence in all of this to suggest that some python somewhere ISN'T the problem!
     
  20. AIAndy

    AIAndy Chieftain

    Joined:
    Jun 8, 2011
    Messages:
    3,403
    It is a weird OOS as in not something that can be caused by the traditional sync problems like different config inis or a false bAsync.

    Wingrep is fine for that.

    Seriously, random changes will only make things worse. At least think a bit about what the function you changed that in is. As the AI in AI_chooseUnitImmediate tells you, this is an AI functions and the AI states are synced. Some AI functions are also used from the UI but not this one which is why it has no bAsync itself.

    If you want an example of an async call to AI_bestUnit check CvDLLButtonPopup::launchProductionPopup which uses it for the recommendations what unit to build.
     

Share This Page