how to debug repeatable crash?

davidlallen

Deity
Joined
Apr 28, 2008
Messages
4,743
Location
California
I am working on the Dune Wars mod, and I have a particular save-game. When I load the save-game and have my artillery unit attack a city, the game crashes to the desktop. I have tried deleting all the buildings out of the city; it still crashes. I have tried deleting all the enemy units but one out of the city; it still crashes. I have played several other games with the same mod and never encountered this; so it does not happen on all artillery attacks.

I have logging and python alerts turned on; there is no information in the log files. I do not have a sdk build environment or debugger. The Dune Wars mod has a highly customized dll with a lot of stuff in it; if anybody could get information from the save-game file let me know and I will upload it.

Any suggestions on how to debug this?

(EDIT: even if I remove all the enemy units from the city with WB, and then step one unit into the city to take it unopposed, it crashes. The city is a holy city. I have tried removing the holycity flag from the city, and removing all the buildings plus all the units from the city, and still it crashes. There are a number of defenders and the crash still happens when I attack with a single unit, which would obviously not be enough to conquer the city. The same crash also happens when I attack with another unit type, not just artillery.)
 
If it is popping up the screen asking if you want a full or normal crashdump, then get WinDbg and your problems MAY resolve a bit quicker. Sometimes this will manage to point you right to the line of the DLL causing you problems.

Install that, get a crash to happen, then load WinDbg, hit CTRL+D, find the crashdump, and then type "!analyze -v" and it should inform you of what it knows.
 
Thank you for the suggestion. I am not quite sure how to make this work. I downloaded windbg from the link you provided and installed it. When the game crashes, I get a standard win-xp dialog which says, basically, "BTS has encountered a problem and needs to close. We are sorry for the inconvenience ... For more information about the error, click here". The screen is black everywhere else. I use the windows key to get the windows menu and launch windbg. But, I guess civ has the screen locked. I can see that windbg is running when I use the task manager, but I cannot get anything displayed on the screen besides the inconvenience dialog and the blackness.

Is there some step I may have missed?

The mod includes both a CvGameCoreDLL.dll and a CvGameCoreDLL_debug.dll; I have not taken any steps to use the debug one. Would that help? How do I use the debug one? Is there a step I should take to create an actual dump file which windbg could open later? If this is covered in a FAQ somewhere please just point me.
 
I would download and install the free Visual C++ Express Edition from Microsoft. Then switch to using the debug DLL and attach VC++ to it. I believe Refar had a miniguide to doing this, or maybe it was xienwolf. You'll want to get the source to the DLL for the mod so that when it points out where the problem is you can see the code itself.
 
Oops, forgot one step: Make sure this line is changed in your ini


; Create a dump file if the application crashes
GenerateCrashDumps = 1




Then it should offer you crashdumps which you can try to solve.

The guide for setting up Visual Studio is Refar's guide. Linked in the first post of the Idiot's Guide in my sig if you need to find it.
 
Install that, get a crash to happen, then load WinDbg, hit CTRL+D, find the crashdump, and then type "!analyze -v" and it should inform you of what it knows.

Stuck on step 5. Where is the crashdump file? I got the dialog asking me to dump one, and I selected "full dump", but I cannot find any file *.dmp in anyplace I looked.

(EDIT: Never mind. Apparently a "full dump" takes many minutes to finish, and I killed the job thinking it was done. I selected "normal dump" and it gives a popup which contains the file location.)
 
This thread isn't quite dead, and it's my thread, so I'll update it and ask for more help.

This is for Dune Wars version 1.2. The sdk author (keldath) updated it to 3.19 which involves about 10 different components including RevDCM and WOC. Now it runs, but consistently CTD's around turn 300. He has tried removing different parts and has different theories about why, but it's very consistent.

Another sdk person, koma13, built a debug DLL and I ran windbg on it according to the instructions above in this thread. I replaced the original CvGameCoreDLL.dll with the debug one. It came along with files with extension .pdb, .lib, .ilk. I put all of these into the same directory as the dll, that is, <bts>/Mods/dune wars/assets.

Now I get a crash dump, bring it up in windbg, and type the magic phrase "!analyze -v". It complains for hundreds of lines about missing symbol files, and tells me that the crash is somewhere inside getTerrainInfo().

I was hoping for more, like a stack trace, and a reference to a line of a source file.

Is there more setup that I need to do? I have attached the entire output of windbg.
 
How about attaching to the process using VS 2008 while it's running and making it crash. Do you have asserts turned on with that debug build? That might allow you to see the problem.
 
Hm, no compiler. There are a dozen or so asserts which come during program initialization but to me, they don't seem to indicate much. Once the game actually starts, there are no asserts, then the CTD around turn 300.

Sadly, I have done another experiment to get to a crash point manually, which did not work. I autoplayed 250 turns, saved, and then autoplayed and saved every 10 turns. Once I got the crash, I went back to the previous save and manually played 10 turns. It did not crash. Of course I was making different decisions than the AI would have for those 10 turns, but I was still disappointed that it did not crash.
 
Each failed assert should indicate that something (input or state) is incorrect and must be fixed. What are those asserts? Where are they and why do they happen? Maybe they set some data up that doesn't get used until 300 turns later and then BOOM!

With that last save, auto play each turn and save until you crash again (should be less than 10). Now you have the very turn on which it will crash and a good repeatable test case. Maybe with that last save you can manually do whatever the AI does to crash the game.
 
Those asserts are from RevDCM, and have been reported. They don't cause any problems though. And they aren't related to this crash, as they happen in every RevDCM based mod, and many of these are stable.
 
It's generally a bad idea to have asserts that are not actual program failures because a) it wastes your time having to ignore them when testing and b) it trains you to start ignoring asserts so you might miss a real one. Is RevDCM still under active development?
 
Yes, it's still being actively developed by glider1 and jdog5000. I agree with you, but I don't have any power over their decision to leave in the couple of asserts. There are three that always happen. One makes a reference to the path I compiled the gamecore in, it's new with the current build, and it says "ignore this" for the message... The Second happens when starting a game, and makes a reference to NO_IMPROVEMENT, this one has baffled jdog, he can't figure it out. The last one is making a call, checking a city for the precense of NO_RELIGION. This is the main one I'm concerned about, it doesn't seem to cause any problems, but it's irritating, and definatly something is going wrong with it (there is no reason, or logic behind asking a city if it has NO_RELIGION, if it really wants to know there is no religion in the city, there are better ways to do this, and I don't think checking for NO_RELIGION will even work). But it is being ignored, and I've pressed the issue a few times, but there is only so much I'm willing to say before I think I'm just being annoying, and not really improving anything.
 
Yeah, at this point I'd say it's either find it and fix it yourself or let them work on their own pace.

The one with NO_RELIGION could definitely cause errors due to what I suspect is happening. If it is truly calling CvCity::isHasReligion(NO_RELIGION), it will return whatever happens to be stored in the memory address before the array holding the religions present in that city. The reason is that NO_RELIGION is -1, so it checks array element # -1 which is stored outside the bounds of the array.

The way to figure it out is to create a debug build and attach to it from the VS debugger. When the assert fires, you jump into the debugger and follow the callstack backwards until you find the code that is doing something wrong. Most likely this is a case of doing

Code:
pCity->isHasReligion(GET_PLAYER(pCity->getOwnerINLINE()).getStateReligion())

without first ensuring that

Code:
GET_PLAYER(pCity->getOwnerINLINE()).getStateReligion() != NO_RELIGION

It's an easy mistake to make, but it can cause random problems which are harder to diagnose. This is precisely the reason there are asserts. An assertion says, "If <x> is true, it might cause a problem. Don't let <x> ever be true!" It's very wise to head this warning. :mischief:

</preaching-to-choir>
 
When this assert fires and I use the step functions it gets into the assembler code very quickly. And the few snippets of source it shows are all references to python. I think this assert is somehow being caused by the inquisitions component.

Anyway, I'll go ahead and try this:
GET_PLAYER(pCity->getOwnerINLINE()).getStateReligion() != NO_RELIGION

And see if it works.
 
Ah, then perhaps the Python layer is calling CyCity::isHasReligion(). You can do a grep over the Inquisitions code for all places that call that function and make sure they check for a valid religion before calling it.
 
Top Bottom