Can't find cause of the crash

Maniac

Apolyton Sage
Joined
Nov 27, 2004
Messages
5,588
Location
Gent, Belgium
Been a while since I posted a thread here... It's also been a long time since I've had a bug in Planetfall, but alas... good things don't last forever. :(

I'm having a crash after clicking end turn I can't find the cause of. Perhaps someone with more C++ experience might see the light.
The situation:

1) When I compile an assert DLL, the crash doesn't happen anymore. :confused:

2) When I remove random cities or civilizations, the crash doesn't happen anymore. I can't find a particular city or faction causing the problem though. If I remove faction A, B & C, the crash stops. If I remove factions D, E & F - and leave A/B/C intact - the crash doesn't happen either. No pattern I can discover.

3) Then I starting removing some stuff from the DLL which I had added in the latest patch. I found this to be the culprit:

Code:
int CvPlayerAI::AI_civicValue(CivicTypes eCivic)
{
...
	for (iI = 0; iI < NUM_YIELD_TYPES; iI++)
	{
		iTempValue = 0;
...
		for (iJ = 0; iJ < GC.getNumFeatureInfos(); iJ++)
		{
			iTempValue += (AI_averageYieldMultiplier((YieldTypes)iI) * (kCivic.getFeatureYieldChanges(iJ, iI) * (countMinCityPopOrFeatures((FeatureTypes)iJ)))) / 100;
		}

In the latest patch I had removed '+ getNumCities() * 2' from this piece of code. It originally looked like this:

iTempValue += (AI_averageYieldMultiplier((YieldTypes)iI) * (kCivic.getFeatureYieldChanges(iJ, iI) * (countMinCityPopOrFeatures((FeatureTypes)iJ) + getNumCities() * 2))) / 100;

With that change, there is no crash.

Just like removing random cities, changing that line of cide still doesn't give me the cause of the crash though. It might be something seemingly unrelated. So I continued experimenting.

4) Perhaps the problem was with the countMinCityPopOrFeatures function. That's something I created myself, and rewrote too in the latest patch. If I change (countMinCityPopOrFeatures((FeatureTypes)iJ) + getNumCities() * 2) to (getNumCities() * 2), the crash goes away too. Given point 5), I don't think this function is the culprit though. For the record, I will attach the new and old version of this function in spoilers:

Spoiler :
New:
Code:
int CvPlayer::countMinCityPopOrFeatures(FeatureTypes eFeature) const
{
	PROFILE_FUNC();

	CvCity* pLoopCity;
	CvPlot* pLoopPlot;
	int iFeatureCount;
	int iUsefulFeatureCount = 0;
	int iLoop;
	int iI;
	int iPlotsNeedingImprovement = 0;

	for (pLoopCity = firstCity(&iLoop); pLoopCity != NULL; pLoopCity = nextCity(&iLoop))
	{
		iFeatureCount = 0;
		iPlotsNeedingImprovement = pLoopCity->goodHealth() - pLoopCity->badHealth(/*bNoAngry*/ false, 0);
		for (iI = 0; iI < NUM_CITY_PLOTS; iI++)
		{
	            if (iI != CITY_HOME_PLOT)
      	      {
				pLoopPlot = plotCity(pLoopCity->getX_INLINE(), pLoopCity->getY_INLINE(), iI);

				if ((pLoopPlot != NULL) && (pLoopPlot->getWorkingCity() == pLoopCity))
				{
	                        if (pLoopPlot->isBeingWorked())
	                        {
						if (pLoopPlot->getFeatureType() == NO_FEATURE && pLoopPlot->getImprovementType() == NO_IMPROVEMENT)
						{
							iPlotsNeedingImprovement++;
						}
					}
				}
			}
		}
		for (iI = 0; iI < NUM_CITY_PLOTS; iI++)
		{
	            if (iI != CITY_HOME_PLOT)
      	      {
				pLoopPlot = plotCity(pLoopCity->getX_INLINE(), pLoopCity->getY_INLINE(), iI);

				if ((pLoopPlot != NULL) && (pLoopPlot->getWorkingCity() == pLoopCity))
				{
					if (pLoopPlot->getFeatureType() == eFeature)
					{
		                        if (pLoopPlot->isBeingWorked())
		                        {
							iUsefulFeatureCount++;
						}
		                        else if (pLoopPlot->getBonusType(getTeam()) != NO_BONUS)
		                        {
							iUsefulFeatureCount += 2;
						}
						else if (iFeatureCount < iPlotsNeedingImprovement)
						{
							iFeatureCount++;
						}
					}
				}
			}
		}
		iUsefulFeatureCount += iFeatureCount;
	}

	return iUsefulFeatureCount;
}
Old:
Code:
int CvPlayer::countMinCityPopOrFeatures(FeatureTypes eFeature) const
{
	PROFILE_FUNC();

	CvCity* pLoopCity;
	CvPlot* pLoopPlot;
	int iCount;
	int iCityCount;
	int iLoop;
	int iI;

	iCount = 0;

	for (pLoopCity = firstCity(&iLoop); pLoopCity != NULL; pLoopCity = nextCity(&iLoop))
	{
		iCityCount = 0;
		for (iI = 0; iI < NUM_CITY_PLOTS; iI++)
		{
	            if (iI != CITY_HOME_PLOT)
      	      {
				pLoopPlot = plotCity(pLoopCity->getX_INLINE(), pLoopCity->getY_INLINE(), iI);

				if ((pLoopPlot != NULL) && (pLoopPlot->getWorkingCity() == pLoopCity))
				{
					if (pLoopPlot->getFeatureType() == eFeature)
					{
		                        if (pLoopPlot->isBeingWorked())
		                        {
							iCount++;
							iCityCount++;
						}
		                        else if (pLoopPlot->getBonusType(getTeam()) != NO_BONUS)
		                        {
							iCount += 2;
						}
						else if (iCityCount <= pLoopCity->getPopulation())
						{
							iCount++;
							iCityCount++;
						}
					}
				}
			}
		}
	}

	return iCount;
}

5) My next step was to remove the feature loop (that is: for (iJ = 0; iJ < GC.getNumFeatureInfos(); iJ++), etcetera) altogether. Then the crash DOES still occur.

I can't see any pattern:

Remove the feature loop (neither countMinCityPopOrFeatures or getNumCities): crash
Only keep countMinCityPopOrFeatures: crash
Only keep getNumCities: no crash
Have both countMinCityPopOrFeatures or getNumCities: no crash

I mean, WTF!?!

6) I figured perhaps the problem was not caused by those specific values, but perhaps by some AI switching civics. Messing around with the value of a civic might entice/prevent the AI decisions on switching civics. So I changed the code to make an AI's current civic ten times as attractive. It didn't make a difference. Then there is still a crash.

7) Sometimes a crash requires a certain random number being rolled, and making any change (like removing a city) will change the random number rolled, and thus prevent the crash. However there is no random element involved in the AI deciding what civic to pick, so I don't really see how the civic value calculations can influence random crashes.

I guess few will have read this far. -( Does anyone see some pattern in this mess?
 
When it seems, that there's no connection, then there is maybe no connection.
What do i want to say:
Maybe it's not your code.
In the last week, i worked on a software project, and found, that my code causes crashes, but at a point without a connection. It produced, without any reason, a memory allocation failure. After 3 days of searching, i concluded, that the project had reached a memory limit at this point, maybe because of two much internal variables, or through an OS-limitation, and that i have to put my code at an other place.

But this is just guessing, my c++ is near 0, and i haven't really worked with the civ-dll yet, but it could maybe be the problem.
 
All your iterations of removing code make me suspect that some values of iTempValue cause the code that follows to crash. It's really unfortunate that the debug DLL removes the crash, otherwise you could step through the code with the debugger to find exactly where the crash is. :(

Can you post the code that uses iTempValue? Maybe we can spot something there.

Given that adding a positive value averts the crash, I was thinking that perhaps the value is sometimes negative, but both your old and new versions of countMinCityPopOrFeatures() can return only non-negative values.

When you say you compile the DLL with asserts turned on, is this a debug version with other settings changed, or is it really only a matter of assertions are active? Assertions by themselves shouldn't cause program changes (unless you ignore an assertion failure). I'm wondering if other things are changed like memory allocation, checks for division by zero, etc.
 
memory allocation failure

I don't know much about memory allocation failure. Aren't those usually caused by huge maps, large numbers of players, huge mod sizes, etc? Neither is the case with the crashing save I have been given. While the game was 700 turns in progress (Marathon), the map was of Standard size. Save file size is 377kb. The memory used by Civ4 while having the game open is under 500.000k. And Planetfall, while having lots of changes to the DLL, still has far fewer than many other projects out there, such as FfH.

On the other hand, I have been given another crashing save today, and I can't even open that one! It gives me a runtime error. Can't remember ever having that before with Planetfall.

What does a runtime error mean? Is that an indication of MAF?

The changes in the latest patch weren't all that big. Changing some lines here and there. Here are the somewhat bigger changes, or those that required a header file change:

Spoiler :
I added a new tag for techs, bPholusMutagen. For that changes are made to CvInfos,CvTeam,CvGameTextMgr,CvDLLWidgetData,CvEnums

In CvEnums.h I added WIDGET_HELP_PHOLUS_MUTAGEN & UNITAI_IMMOBILE

I added a new unitai type for immobile units. I've already added a new unitai without problems in the past though.

I changed score and victory calculations so that water plots count too. Also, domination victory now requires a percentage of all owned plots instead of land plots.

I changed a unit attribute I had added before from a boolean to an integer: m_bHasPillaged -> m_iHasPillagedCount, and of course reworked the related code. This is the only change in the patch to what information is put in the save file.


It's really unfortunate that the debug DLL removes the crash, otherwise you could step through the code with the debugger to find exactly where the crash is. :(

Stepping through the code? :confused: That's beyond my ability I'm afraid. :( To answer the question later in your post, my assert DLL is exactly the same as a normal DLL, except for, what is it, three lines of code in FAssert.h to enable the asserts.

Assertions by themselves shouldn't cause program changes (unless you ignore an assertion failure).

What else than ignoring them can you do when you get an assert? I just tried. 'Abort' shuts down the program. 'Debug' crashes it.

Can you post the code that uses iTempValue? Maybe we can spot something there.

It's just Firaxis code for determining the value of a civic. The iTempValue is just added to the general iValue at the end. My own changes are that BadPlot line, the feature loop, and, in the latest patch, removing getNumCities() * 2 from both the feature and improvement loop.

Spoiler :
Code:
	for (iI = 0; iI < NUM_YIELD_TYPES; iI++)
	{
		iTempValue = 0;

		iTempValue += ((kCivic.getYieldModifier(iI) * getNumCities()) / 2);
		iTempValue += ((kCivic.getCapitalYieldModifier(iI) * 3) / 4);
		iTempValue += (kCivic.getBadPlotYieldChange(iI) * countMinCityPopOrBadPlots());
		CvCity* pCapital = getCapitalCity();
		if (pCapital)
		{
			iTempValue += ((kCivic.getCapitalYieldModifier(iI) * pCapital->getBaseYieldRate((YieldTypes)iI)) / 80);
		}
		iTempValue += ((kCivic.getTradeYieldModifier(iI) * getNumCities()) / 11);

		for (iJ = 0; iJ < GC.getNumImprovementInfos(); iJ++)
		{
			iTempValue += (AI_averageYieldMultiplier((YieldTypes)iI) * (kCivic.getImprovementYieldChanges(iJ, iI) * (getImprovementCount((ImprovementTypes)iJ)))) / 100;
		}

		for (iJ = 0; iJ < GC.getNumFeatureInfos(); iJ++)
		{
			iTempValue += (AI_averageYieldMultiplier((YieldTypes)iI) * (kCivic.getFeatureYieldChanges(iJ, iI) * (countMinCityPopOrFeatures((FeatureTypes)iJ)))) / 100;
		}

		if (iI == YIELD_FOOD)
		{
			iTempValue *= 3;
		}
		else if (iI == YIELD_PRODUCTION)
		{
			iTempValue *= ((AI_avoidScience()) ? 6 : 2);
		}
		else if (iI == YIELD_COMMERCE)
		{
			iTempValue *= ((AI_avoidScience()) ? 1 : 2);
		}

		iValue += iTempValue;
	}


Here are the asserts I get with the crashing save (the one I can open) by the way:

One is in CvPlayer::doGold

FAssert(isHuman() || isBarbarian() || ((getGold() + iGoldChange) >= 0) || isAnarchy());

If I understand correctly, this means the problem is
(!isHuman() && !isBarbarian() && ((getGold() + iGoldChange) < 0) && !isAnarchy())
?

This no doubt happens because a faction was almost entirely blockaded by barbarians. I wonder if this can cause crashes, or if Firaxis just added it so they know if their AI is messing up. The faction should normally just go into strike, right?

Another is under CvPlot::changeBlockadedCount

FAssert(getBlockadedCount(eTeam) >= 0);

IIUC, this means the number actually gets negative??
I've been having this assert for quite some time actually, but I think I just now understood the cause. I changed a unit's blockade range from a fixed number to dependent on its movement points. But I did not let the updatePlunder function run before and after a unit got extra movement points from a promotion for instance. So if the blockading unit got killed after being promoted, some plots would get a negative blockadingcount.

I made a mistake in some barbarian AI changes in the latest patch, which caused them to blockade much more frequently. Could all the resulting negative numbers cause inexplicable crashes and even a runtime error (whatever that exactly is)?? On the other hand, this problem already existed long ago, just not as frequent.
 
Have you tried deleting the entire Final Release directory, and the recompile?
 
I don't know much about memory allocation failure. Aren't those usually caused by huge maps, large numbers of players, huge mod sizes, etc? Neither is the case with the crashing save I have been given. While the game was 700 turns in progress (Marathon), the map was of Standard size. Save file size is 377kb. The memory used by Civ4 while having the game open is under 500.000k. And Planetfall, while having lots of changes to the DLL, still has far fewer than many other projects out there, such as FfH.

A MAF can happen, when it's too much for the program, like you say.
But this doesn't say, that it requieres big maps.
If you have too much variables in a function, it could also happen (but the number should be really, really large), and if the programm is not good and clean coded, it can happen by accident.

You've added values to the function, am i right?
If yes: Create a new function, where you do the needed operations, and use only the result in your function.

But i do not really think, that this is the problem, would not invest too much time in it.

On the other hand, I have been given another crashing save today, and I can't even open that one! It gives me a runtime error. Can't remember ever having that before with Planetfall.

Is this not one of the standard erros?


Stepping through the code? :confused: That's beyond my ability I'm afraid. :( To answer the question later in your post, my assert DLL is exactly the same as a normal DLL, except for, what is it, three lines of code in FAssert.h to enable the asserts.

"Stepping through the code" means just, that you follow the debuger in every step, and look, what works, what not, and change the things slightly, and see, what happens, until you find the error (or until you're going mad).
 
I've been programming for 30+ years and from my experience this sounds like a "memory clobber" An array or malloc overrun, or bad pointer hits some memory location but does not immediately cause a crash. The crash comes later. The problems often will change, comming and going with code changes.

I am not familiar with Microsoft development (I have worked almost exclusively on Unix/Linux) but look into malloc related debugging tools.

If you can get the debugger to stop at a crash and it is a memory error, dump memory (in hex) arround the addres and look for where the data was corrupted. A lot of times the thing overwriting the memory is writing a string. These thing will ususally reproduce as long as you don't recompile or change inputs. If you can get the crash quickly from a save, try setting watchpoints in the area where the memory fault occurs or step through the code and look at the memory spot until it changes.

If you can get a repeatable failure with a quick occurence. Set a break-point about half way to the error and look for the memory overwrite if it has occured retry going 1/4 of the way and if not 3/4 of the way (binary searching is much faster than single stepping if you can quick restart to the problem)

The above ideas are usful for any "indirecct crash" that is anything tha trashes the program but does not crash it immediately
 
Top Bottom