[tab]Okay, you have finally done it. After many long hours you have the mod you always wanted to make. But then you, or someone else, try to play and it crashes. What do you do?
Isolation and Resolution
[tab]There are two steps to fixing problems, isolating to find out what piece of code is causing the problem and changing that code to resolve the issue. This article will only deal with isolation. Resolution is going to be specific to your code so I can’t offer general advice. The hardest part of fixing problems is isolating the issue, typically if you can do that resolution is easy.
The Changelog
[tab]Before we start talking through the steps it is important that you keep a changelog of the changes you are making between versions. Players will often give you a good indication of what was going on during the crash and if you have a good changelog you will be able to drill into the issue right away.
[tab]Changelogs become even more important as you start writing new functions. In these cases crashes may not be a problem in the function that crashes so much as in “bad” data passed to them. It is extremely frustrating to be working on a new crash in a function that hasn’t changed in forever and to have forgotten that you made a change to a related module that passed data to it.
Special note, the SDK changelog:
[tab]The changelog is good advice for modding, but if you are going to be making SDK changes it is mandatory. Not only is it required to help troubleshoot issues but you will need the changelog to convert all of your changes into the next SDK when Firaxis releases future patches.
[tab]The following is the format I use for SDK changes:
[tab]It’s pretty self explanatory. It contains why the change was made, the version it was checked into, the file and function that was changed and the actual code. Notice when the code is changed the original code is always left in but commented out. I never remove “real” code, it is too important as a reference.
Step 1: Get a save
[tab]To fix a problem you are going to need a save game right before a reproduceable crash. Load up the save and verify that the crash occurs consistently. If the crash doesn’t happen then we can’t go any further. We will talk about intermittent issues later in this article.
The Little Hammer
[tab]Once we have a reproducible crash you can begin to isolate exactly what is going on in the save that is triggering the crash. I usually go through the following steps:
1. Remove all players except the player and one other in the Worldbuilder. See if the crash still occurs. If the Crash still occurs go to 2, if it doesn’t go to step 3.
2. If the crash still occurs with only the player and one other, we will need to delete more stuff. Remove all players except the player and one other again, but this time remove all of the units and cities for two existing players except for an empty city each. Make sure neither of these cities is finishing production the turn the crash occurs. If the crash still occurs go to the next section on ‘The Big Hammer’. If the crash doesn’t occur go to step 3.
3. Now that we have a save game where we can stop the crashes by removing a certain portion of the environment we have only to add pieces back until the crash happens again. This can get time consuming but I typically reload and only delete half of what I did the previous time (when the crash didn’t occur) and see of the crash still doesn’t happen. If it doesn’t repeat this step but delete even less. If it does repeat this step but only delete half of what I just added back in until I narrow down on the specific unit/action/city/etc that is causing the issue.
The Big Hammer
[tab]The Little Hammer doesn’t always work. Sometimes you can remove just about everything in the save game and it still crashes. The Big Hammer is another method of isolation that works at the mod level instead of within the save game. It is very useful for isolating python and SDK issues.
[tab]As above you will need a reproducible crash to test with.
1. If you are using an SDK mod rename the ‘CvGameCoreDLL.dll’ to ‘CvGameCoreDLL old.dll’. If the crash still occurs then you can assume that the problem isn’t in the SDK modified sections. If the crash no longer occurs then you have an SDK issue. You will have to go through your changelog (you do have a changelog right!) and review the changes that have been made in your latest version.
[tab]I keep all of the old CvGameCoreDLL.dll files from my versions so I can walk back through the changes until I find the first CvGameCoreDLL.dll that is broken, review the changes that were checked into that one and I should have my issue isolated.
2. If you aren’t using an SDK mod or removing the CvGameCoreDLL.dll didn’t fix your problem your next suspect is python. To rule out a python issue rename your python directory to ‘python.old’ then reload your save and see if the crash still occurs.
[tab]If it still occurs with your CvGameCoreDLL.dll and your python directory out of your way then you have to be looking at an XML issue and you should probably go back to the Little Hammer section and see what is still around to delete.
[tab]Assuming renaming the python directory keeps the crash from occurring you can start adding python file back until the crash returns and then see what changed in the last python file added since the last version. You could also swap in python directories from older version of your mod to see if those have the same issue (to prove if this crash is a new issue or if it has been around before).
Intermittent Issues
[tab]Intermittant issues are the worst! If they are extremely intermittent then they are almost impossible to trap because they may happen on only 1 out of 20 times. You may make a change that you think will fix it and load the save 40 times without a problem and you still wont know if you actually did anything to help with the issue or if you just got lucky.
[tab]Fortunatly since Civ4 saves the random generator these issues are less common. Even “random” events will occur exactly as before if no one consumes a random number they didn’t in the original.
[tab]For example I had a reproduceable crash that occured if I fortified my unit and ended the turn. If I used the unit to attack and kill a nearby barbarian the crash wouldn’t happen. So I spent hours trying to figure out what was up with that barbarian or the attacking unit that could have anything to do with the error. Then I tried to attack another unit with one of mine, again there was no crash. All I was doing with the attack was eating up some of the random numbers so that the actual random event that caused the crash didn’t occur. The lesson I learned from this was, if you have crashes related to random functions you need to make sure your actions are identical each time you reload the save, or your results won’t be meaningful.
Practice makes Perfect
[tab]I hope this helps. I plan on updating this article with me information and suggestions as I come across them. But don’t wait for me, post your own tips and tricks for isolating issues with mods in this thread!
Isolation and Resolution
[tab]There are two steps to fixing problems, isolating to find out what piece of code is causing the problem and changing that code to resolve the issue. This article will only deal with isolation. Resolution is going to be specific to your code so I can’t offer general advice. The hardest part of fixing problems is isolating the issue, typically if you can do that resolution is easy.
The Changelog
[tab]Before we start talking through the steps it is important that you keep a changelog of the changes you are making between versions. Players will often give you a good indication of what was going on during the crash and if you have a good changelog you will be able to drill into the issue right away.
[tab]Changelogs become even more important as you start writing new functions. In these cases crashes may not be a problem in the function that crashes so much as in “bad” data passed to them. It is extremely frustrating to be working on a new crash in a function that hasn’t changed in forever and to have forgotten that you made a change to a related module that passed data to it.
Special note, the SDK changelog:
[tab]The changelog is good advice for modding, but if you are going to be making SDK changes it is mandatory. Not only is it required to help troubleshoot issues but you will need the changelog to convert all of your changes into the next SDK when Firaxis releases future patches.
[tab]The following is the format I use for SDK changes:
So that civs with the Scorched Earth Trait auto-raze cities (unless they created them)
Version: 0.12
CvPlayer.cpp CvPlayer::acquireCity()
Code:if (bConquest) { //FfH: Modified by Kael 05/31/2006 // if (((pOldCity->getHighestPopulation() == 1) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) || // ((GC.getGameINLINE().getMaxCityElimination() > 0) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) || // (GC.getGameINLINE().isOption(GAMEOPTION_ONE_CITY_CHALLENGE) && isHuman())) if (((pOldCity->getHighestPopulation() == 1) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) || ((GC.getGameINLINE().getMaxCityElimination() > 0) && !(GC.getGameINLINE().isOption(GAMEOPTION_NO_CITY_RAZING))) || (GC.getGameINLINE().isOption(GAMEOPTION_ONE_CITY_CHALLENGE) && isHuman()) || (hasTrait((TraitTypes)GC.getInfoTypeForString("TRAIT_SCORCHED_EARTH")) && pOldCity->getOriginalOwner() != getID())) //FfH: End Modify {
[tab]It’s pretty self explanatory. It contains why the change was made, the version it was checked into, the file and function that was changed and the actual code. Notice when the code is changed the original code is always left in but commented out. I never remove “real” code, it is too important as a reference.
Step 1: Get a save
[tab]To fix a problem you are going to need a save game right before a reproduceable crash. Load up the save and verify that the crash occurs consistently. If the crash doesn’t happen then we can’t go any further. We will talk about intermittent issues later in this article.
The Little Hammer
[tab]Once we have a reproducible crash you can begin to isolate exactly what is going on in the save that is triggering the crash. I usually go through the following steps:
1. Remove all players except the player and one other in the Worldbuilder. See if the crash still occurs. If the Crash still occurs go to 2, if it doesn’t go to step 3.
2. If the crash still occurs with only the player and one other, we will need to delete more stuff. Remove all players except the player and one other again, but this time remove all of the units and cities for two existing players except for an empty city each. Make sure neither of these cities is finishing production the turn the crash occurs. If the crash still occurs go to the next section on ‘The Big Hammer’. If the crash doesn’t occur go to step 3.
3. Now that we have a save game where we can stop the crashes by removing a certain portion of the environment we have only to add pieces back until the crash happens again. This can get time consuming but I typically reload and only delete half of what I did the previous time (when the crash didn’t occur) and see of the crash still doesn’t happen. If it doesn’t repeat this step but delete even less. If it does repeat this step but only delete half of what I just added back in until I narrow down on the specific unit/action/city/etc that is causing the issue.
The Big Hammer
[tab]The Little Hammer doesn’t always work. Sometimes you can remove just about everything in the save game and it still crashes. The Big Hammer is another method of isolation that works at the mod level instead of within the save game. It is very useful for isolating python and SDK issues.
[tab]As above you will need a reproducible crash to test with.
1. If you are using an SDK mod rename the ‘CvGameCoreDLL.dll’ to ‘CvGameCoreDLL old.dll’. If the crash still occurs then you can assume that the problem isn’t in the SDK modified sections. If the crash no longer occurs then you have an SDK issue. You will have to go through your changelog (you do have a changelog right!) and review the changes that have been made in your latest version.
[tab]I keep all of the old CvGameCoreDLL.dll files from my versions so I can walk back through the changes until I find the first CvGameCoreDLL.dll that is broken, review the changes that were checked into that one and I should have my issue isolated.
2. If you aren’t using an SDK mod or removing the CvGameCoreDLL.dll didn’t fix your problem your next suspect is python. To rule out a python issue rename your python directory to ‘python.old’ then reload your save and see if the crash still occurs.
[tab]If it still occurs with your CvGameCoreDLL.dll and your python directory out of your way then you have to be looking at an XML issue and you should probably go back to the Little Hammer section and see what is still around to delete.
[tab]Assuming renaming the python directory keeps the crash from occurring you can start adding python file back until the crash returns and then see what changed in the last python file added since the last version. You could also swap in python directories from older version of your mod to see if those have the same issue (to prove if this crash is a new issue or if it has been around before).
Intermittent Issues
[tab]Intermittant issues are the worst! If they are extremely intermittent then they are almost impossible to trap because they may happen on only 1 out of 20 times. You may make a change that you think will fix it and load the save 40 times without a problem and you still wont know if you actually did anything to help with the issue or if you just got lucky.
[tab]Fortunatly since Civ4 saves the random generator these issues are less common. Even “random” events will occur exactly as before if no one consumes a random number they didn’t in the original.
[tab]For example I had a reproduceable crash that occured if I fortified my unit and ended the turn. If I used the unit to attack and kill a nearby barbarian the crash wouldn’t happen. So I spent hours trying to figure out what was up with that barbarian or the attacking unit that could have anything to do with the error. Then I tried to attack another unit with one of mine, again there was no crash. All I was doing with the attack was eating up some of the random numbers so that the actual random event that caused the crash didn’t occur. The lesson I learned from this was, if you have crashes related to random functions you need to make sure your actions are identical each time you reload the save, or your results won’t be meaningful.
Practice makes Perfect
[tab]I hope this helps. I plan on updating this article with me information and suggestions as I come across them. But don’t wait for me, post your own tips and tricks for isolating issues with mods in this thread!