HOW TO: Isolate Crashes

Kael

Deity
Joined
May 6, 2002
Messages
17,401
Location
Ohio
[tab]Okay, you have finally done it. After many long hours you have the mod you always wanted to make. But then you, or someone else, try to play and it crashes. What do you do?


Isolation and Resolution

[tab]There are two steps to fixing problems, isolating to find out what piece of code is causing the problem and changing that code to resolve the issue. This article will only deal with isolation. Resolution is going to be specific to your code so I can’t offer general advice. The hardest part of fixing problems is isolating the issue, typically if you can do that resolution is easy.


The Changelog

[tab]Before we start talking through the steps it is important that you keep a changelog of the changes you are making between versions. Players will often give you a good indication of what was going on during the crash and if you have a good changelog you will be able to drill into the issue right away.

[tab]Changelogs become even more important as you start writing new functions. In these cases crashes may not be a problem in the function that crashes so much as in “bad” data passed to them. It is extremely frustrating to be working on a new crash in a function that hasn’t changed in forever and to have forgotten that you made a change to a related module that passed data to it.

Special note, the SDK changelog:

[tab]The changelog is good advice for modding, but if you are going to be making SDK changes it is mandatory. Not only is it required to help troubleshoot issues but you will need the changelog to convert all of your changes into the next SDK when Firaxis releases future patches.

[tab]The following is the format I use for SDK changes:

So that civs with the Scorched Earth Trait auto-raze cities (unless they created them)
Version: 0.12

CvPlayer.cpp CvPlayer::acquireCity()

Code:
	if (bConquest)
	{

//FfH: Modified by Kael 05/31/2006
//		if (((pOldCity->getHighestPopulation() == 1) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) ||
//			  ((GC.getGameINLINE().getMaxCityElimination() > 0) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) ||
//			  (GC.getGameINLINE().isOption(GAMEOPTION_ONE_CITY_CHALLENGE) && isHuman()))

		if (((pOldCity->getHighestPopulation() == 1) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) ||
			  ((GC.getGameINLINE().getMaxCityElimination() > 0) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) ||
			  (GC.getGameINLINE().isOption(GAMEOPTION_ONE_CITY_CHALLENGE) && isHuman()) ||
			  (hasTrait((TraitTypes)GC.getInfoTypeForString("TRAIT_SCORCHED_EARTH")) && pOldCity->getOriginalOwner() != getID()))

//FfH: End Modify

		{

[tab]It’s pretty self explanatory. It contains why the change was made, the version it was checked into, the file and function that was changed and the actual code. Notice when the code is changed the original code is always left in but commented out. I never remove “real” code, it is too important as a reference.


Step 1: Get a save

[tab]To fix a problem you are going to need a save game right before a reproduceable crash. Load up the save and verify that the crash occurs consistently. If the crash doesn’t happen then we can’t go any further. We will talk about intermittent issues later in this article.


The Little Hammer

[tab]Once we have a reproducible crash you can begin to isolate exactly what is going on in the save that is triggering the crash. I usually go through the following steps:

1. Remove all players except the player and one other in the Worldbuilder. See if the crash still occurs. If the Crash still occurs go to 2, if it doesn’t go to step 3.

2. If the crash still occurs with only the player and one other, we will need to delete more stuff. Remove all players except the player and one other again, but this time remove all of the units and cities for two existing players except for an empty city each. Make sure neither of these cities is finishing production the turn the crash occurs. If the crash still occurs go to the next section on ‘The Big Hammer’. If the crash doesn’t occur go to step 3.

3. Now that we have a save game where we can stop the crashes by removing a certain portion of the environment we have only to add pieces back until the crash happens again. This can get time consuming but I typically reload and only delete half of what I did the previous time (when the crash didn’t occur) and see of the crash still doesn’t happen. If it doesn’t repeat this step but delete even less. If it does repeat this step but only delete half of what I just added back in until I narrow down on the specific unit/action/city/etc that is causing the issue.


The Big Hammer

[tab]The Little Hammer doesn’t always work. Sometimes you can remove just about everything in the save game and it still crashes. The Big Hammer is another method of isolation that works at the mod level instead of within the save game. It is very useful for isolating python and SDK issues.

[tab]As above you will need a reproducible crash to test with.

1. If you are using an SDK mod rename the ‘CvGameCoreDLL.dll’ to ‘CvGameCoreDLL old.dll’. If the crash still occurs then you can assume that the problem isn’t in the SDK modified sections. If the crash no longer occurs then you have an SDK issue. You will have to go through your changelog (you do have a changelog right!) and review the changes that have been made in your latest version.

[tab]I keep all of the old CvGameCoreDLL.dll files from my versions so I can walk back through the changes until I find the first CvGameCoreDLL.dll that is broken, review the changes that were checked into that one and I should have my issue isolated.

2. If you aren’t using an SDK mod or removing the CvGameCoreDLL.dll didn’t fix your problem your next suspect is python. To rule out a python issue rename your python directory to ‘python.old’ then reload your save and see if the crash still occurs.

[tab]If it still occurs with your CvGameCoreDLL.dll and your python directory out of your way then you have to be looking at an XML issue and you should probably go back to the Little Hammer section and see what is still around to delete.

[tab]Assuming renaming the python directory keeps the crash from occurring you can start adding python file back until the crash returns and then see what changed in the last python file added since the last version. You could also swap in python directories from older version of your mod to see if those have the same issue (to prove if this crash is a new issue or if it has been around before).


Intermittent Issues

[tab]Intermittant issues are the worst! If they are extremely intermittent then they are almost impossible to trap because they may happen on only 1 out of 20 times. You may make a change that you think will fix it and load the save 40 times without a problem and you still wont know if you actually did anything to help with the issue or if you just got lucky.

[tab]Fortunatly since Civ4 saves the random generator these issues are less common. Even “random” events will occur exactly as before if no one consumes a random number they didn’t in the original.

[tab]For example I had a reproduceable crash that occured if I fortified my unit and ended the turn. If I used the unit to attack and kill a nearby barbarian the crash wouldn’t happen. So I spent hours trying to figure out what was up with that barbarian or the attacking unit that could have anything to do with the error. Then I tried to attack another unit with one of mine, again there was no crash. All I was doing with the attack was eating up some of the random numbers so that the actual random event that caused the crash didn’t occur. The lesson I learned from this was, if you have crashes related to random functions you need to make sure your actions are identical each time you reload the save, or your results won’t be meaningful.


Practice makes Perfect

[tab]I hope this helps. I plan on updating this article with me information and suggestions as I come across them. But don’t wait for me, post your own tips and tricks for isolating issues with mods in this thread!
 
AlCosta15 said:
Wow! Nice job Kael, should help out a lot of mod-makers.

Thanks, I know to some of the community this will be information they already know but I think it may be of value to new modders.
 
Kael said:
Thanks, I know to some of the community this will be information they already know but I think it may be of value to new modders.
True, but some of us already know because you've helped us out in the past ;)
 
For locating problems i usually add some code in the SDK.

I have a log message that reports when the program enters a major part of the code and it reports back when it leaves the code.

Eg. It reports when AI player 1 begins the turn, then it reporte when the cities are all worked, when the cities are finished, then when it starts doing units, when the units are finished, ..., and finally when the entire player has finished its turn. Then the same thing for the next player.

This way i simply can look which action was begin but was not finished. That often gives a good clue where to look for your bug.
 
Could you provide some example code?
 
This is my logging function that I use in the SDK. Just stick this at the top of the cpp file after all of the #include statements to use it. Also, if you use it in multiple files, you need to give it a different name for each file (or do it properly and give it a header file that is included, but this is simpler, especially if each file uses its own logfile).
Code:
void logMsg(char* format, ... )
{
        static char buf[2048];
        _vsnprintf( buf, 2048-4, format, (char*)(&format+1) );
        gDLL->logMsg("ai.log", buf);
}
This would log the messages that use this function to "Documents and Settings\<username>\My Documents\My Games\<whatever your civ4 dir is called>\logs\ai.log" not "program files\Firaxis Games\Civ4\anything".

you would call this exactly like printf - logMsg("message to log"); or logMsg("log:%d",aNumber);


Also, instead of using a changelog like Kael's, you might want to use version control if you can deal with it. It would allow you to save all the versions of the files so you know exactly what changes you made. I personally use Subversion for the spells code I wrote for FFH.
 
Could you shed some light on what the .log files tell us? How do we use the log files to isolate errors?

Thanks for this valuable thread, btw.
 
Well, its the standard printf method of debugging - you go to the area where you think the program is crashing + insert log statements like this:
Code:
someMethod(){
log("1");
<code>
log("2");
<code>
log("3");
...
}
where <code> is a few lines of the actual code. Then, after it crashes, you can go look at the log file. If the log shows
Code:
1
2
or even
Code:
1
2
3
1
2
3
...
1
2
you know that the CTD is occurring between the log("2") statement and the log("3") statement. (this is where java-style stacktraces, with line numbers would be really nice). However, this doesn't tell you exactly what the problem is (however, its almost certainly a null reference, the trick is finding out where and why), for that you need to do more debugging. You can print out a number like this: log("my number is:%d", mynumber) and it'll print something like this: "my number is:14". To print a string use %s instead of %d (for more info google for printf). If you want to tell if a given pointer is null you do one of these two things: either log("my pointer is this:%d", myPointer), which will print out 0 if it is null and some large number otherwise, or more properly log("my pointer is null:%d", myPointer==NULL), which will print "my pointer is null:1" if it is null and "my pointer is null:0" otherwise. However, even checking for nulls may not be enough as you might access a deleted object (this should rarely happen, as you should generally set any deleted references to NULL unless you are sure you don't have to). Then the best you can do is to try something like this:
Code:
someMethod(){
...
log("before");
log("%d", myPointer->someVariable); //or myPointer->someMethod();
log("after");
...
}
if it prints both "before" and "after" then the pointer is (probably) valid and the crash is somewhere else.

Isn't C++ debugging w/o a debugger fun? :crazyeye:
 
A simpler way of logging is to use gDLL->messageControlLog(char *), which prints to the MPLog. However, this only accepts char*and not wchar* which is what civ uses most of the time and converting one to the other can lead to some problems.
 
Instead of printing values and checks to the log, I have then print on screen in the game and to the ingame event log. That way you can see it right there without having to alt-tab and open the log file.
 
Only problem with that is that only so many messages will show up, and if one shows up, but then the program crashes immediately afterward, you may not see the message. Also, you can always run Civ in windowed mode, and have the file open in the background.
 
Thanks Kael for this wonderful thread.
And Many thanks talchas for your logging function. This is exactly where we at Total Realism are stuck. It is really hard to debug the SDK.

Regarding your code
void logMsg(char* format, ... )

What should the ... mean? More parameters?

Many thanks
Houman
 
That is one of the ways to define a variable argument list in C/C++ - I copied it from somewhere else in the SDK.
 
Houman said:
Thanks Kael for this wonderful thread.
And Many thanks talchas for your logging function. This is exactly where we at Total Realism are stuck. It is really hard to debug the SDK.

Np, if you get some time you should write a how-to on building an automated installed for a mod. Yours is really cool since it creates an icon on the desktop, adds an entry to add/remove programs and allows the player to select options during the install. I would love to know how to do all that.
 
Thanks Kael,

Yeah definitely its worth an artikel. I will publish it as soon as I fix these CTDs and prepare the Warlord release. Right now the community is close to throwing rotten eggs at us. :)
 
Gys,

I have problems: Some of the files are fine with the debugger function, others say:

cannot convert 'this' pointer from 'const Unit' to 'Unit &

I.e.:
Code:
bool CvUnit::isNukeVictim(const CvPlot* pPlot, TeamTypes eTeam) const
{
	CvPlot* pLoopPlot;
	int iDX, iDY;

	// Dale - MAD: don't declare war on self
					logMsg("CvUnit.cpp - 16");
	if(getTeam() == eTeam)
	{
		return false;
	}
.
.
.

I have already declared the method in each header file:
void logMsg(char* format, ... );


WHat shall I do please?
 
Try making that const void logMsg...
 
talchas said:
Try making that const void logMsg...


Ok by making the function itself to const void logMsg(...)

how do I call a const function from code?

Because now it says:
'CvUnit::logMsg' : cannot convert 'this' pointer from 'const CvUnit' to 'CvUnit &'

Thanks
Houman
 
Top Bottom