Bug in getDescription usage

God-Emperor

Deity
Joined
Jul 18, 2009
Messages
3,551
Location
Texas
There is a bug in CvGameTextMgr.cpp regarding getting the description of a unit.

This could be the cause of various mystery crashes. It is probably the culprit in this post by SO, and it also turns up in various other threads (one or two of them also by SO). It could explain any crash pointing to CvMainInterface's updateHelpStrings function (on the line looking like "screen.setHelpTextString( CyInterface().getHelpString() )" in both C2C and RoM:AND.

The issue is the misuse of the getDescription method of the getDescription for the CvUnitInfo class.

Specifically, it hasn't got one so it uses the one inherited from the CvInfoBase. This has no clue about the existence of the "new" (if you consider something probably over 3 years old new) civilization specific unit names. Passing it a civilization type as the argument is wrong.

In CvGameTextMgr::setPlotListHelp when there is a stack of units on the plot that is greater in number than the set limit (15, currently) it runs some code that smaller stacks don't use - the code that shows multiples of units with aggregate promotions and average strength. The relevant line in the v26 code is line 2113, which is in an Afforess comment and looks like this:
Code:
/************************************************************************************************/
/* Afforess	                  Start		 08/25/10                                               */
/*                                                                                              */
/*                                                                                              */
/****************************************************************/************************************************************************************************/
/*
						szString.append(CvWString::format(SETCOLR L"%s" ENDCOLR, TEXT_COLOR("COLOR_UNIT_TEXT"), GC.getUnitInfo((UnitTypes)iI).getDescription()));
*/
						[B]szString.append(CvWString::format(SETCOLR L"%s" ENDCOLR, TEXT_COLOR("COLOR_UNIT_TEXT"), GC.getUnitInfo((UnitTypes)iI).getDescription(GET_PLAYER((PlayerTypes)iJ).getCivilizationType())));[/B]
/****************************************************************/************************************************************************************************/
/* Afforess	                     END                                                            */
/****************************************************************/************************************************************************************************/

The problem is that the getDescription function of the unit info takes as a parameter the "uiForm", that is the form of the text - like the singular masculine or plural female form. If you are playing as civilization 0 this will not be a problem since it will get a correct form. If you are playing as, say, civilization type 42 then the code in the CvInfoBase::getDescription will try to load another 42 forms, which I expect never exist, and load whatever happens to be past the end of the real data as if it were text strings. Most likely as null terminated text strings, so you could end up with some pretty long chunks of random text, or other junk, possibly from whatever is 42 entries later in the stored text data.

The CvUnit::getDescription function (added by Afforess) deals with the civ specific unit names, and does not need to be passed the civ number either - if you do pass it (it may be happening multiple times through the source code) then it will also be treated as the "uiForm" and do the same thing if the unit does not have a civ-specific unit name for your civ. Basically, you should never ever pass the civilization type to the getDescription call for a unit or unit info since the argument is never used for that.

This specific bug is in CvGameTextMgr in at least 4 locations.

In CvGameTextMgr it also does the lookup for the civ specific name correctly in at least one place, and also in the code for CvUnit.getDescription. A fixed version of the above line of code should look more like this, using the same style as the way it is already done in there:
Code:
						bFound = false;
						if (GC.getUnitInfo((UnitTypes)iI).getCivilizationName(GET_PLAYER((PlayerTypes)iJ).getCivilizationType()) != NULL)
						{
							if (!CvWString(GC.getUnitInfo((UnitTypes)iI).getCivilizationName(GET_PLAYER((PlayerTypes)iJ).getCivilizationType())).empty())
							{	
								bFound = true;
								szUnitName = gDLL->getText(GC.getUnitInfo((UnitTypes)iI).getCivilizationName(GET_PLAYER((PlayerTypes)iJ).getCivilizationType()));
							}
						}
						if (!bFound)
						{
							szUnitName = GC.getUnitInfo((UnitTypes)iI).getDescription();
						}

Or something like that (don't forget to define bFound as a bool). Yes, it's a lot more code. Yes, it should probably be made into a function.

I have no idea how many other places in the code outside CvGameTextMgr make this same mistake of passing a civilization type to getDescription for either a CvUnit or CvUnitInfo.

Another 3 places in CvGameTextMgr that do this:
line 14246, in CvGameTextMgr::setBuildingHelpActual
line 14414, in CvGameTextMgr::setBuildingHelpActual also
line 14454, in CvGameTextMgr::setBuildingHelpActual also also
 
Nice catch. Have you repaired this on the SVN source codes or are you just blowing the whistle on a pervasive problem you found? Great observation either way!
 
Just reporting it.

There is some chance that it won't actually do the gruesome stuff mentioned in the first paragraph after the first code chunk. It is possible that the gDLL->getObjectText won't mess up and grab random junk to put into the m_aCachedDescriptions vector, but that leaves the question of what is it putting in there? I have actually never noticed incorrect text from this so it may be defaulting all the rest past the end of the real data to the form 0 type, or doing a mod function to get it into the correct range, or some other thing that may avoid the extensive corruption. The function does apparently occasionally fail, sometimes just with a Python error reporting an "unidentifiable C++ exception" and sometimes it crashes, in a location that is doing this. That is why I looked: I had a game crash when running that Python mentioned in the second paragraph. It is possible there is something else wrong in there, but I stopped looking when I found this issue. It is something it is doing different in the case where the stack is big, and in the crash I had the stack of units I was moving another unit to had just gotten big enough that this code would be used.

In any event, it is wrong. At the very least it is wasting memory by adding extra things to the m_aCachedDescriptions vector for any unit info that is accessed in this way with the number of things depending on the owner's civilization number.
 
I've merged the fix, by having looked at this code (for handling large stack display) it BADLY needs a rewrite. It includes this section (bear in mind this is in generation of hover text that gets refreshed every frame the cursor is there):
Code:
for (int iI = 0; iI < (GC.getNumUnitInfos() * MAX_PLAYERS); ++iI
{
	m_apbPromotion.push_back(new int[numPromotionInfos]);
}
Do the arithmetic. There are approx. 1000 unit types in C2C (order of magnitude), and circa 500 promotion types. This loop allocates approx. 50,000 arrays of 500 ints!! That's about 100M allocated every frame you hover there (and deallocated too - it's not leeking, but still...). It's FANTASTICALLY memory hungry (transiently), and FANTASTICALLY inefficient.

I'll rework it sometime soonish.
 
Wow... and with so many more promos to come... ew! Nice spot! But at least it only occurs on player prompting, thus it does not impact turn times.

EDIT: I suspect you may have made a small mistake somewhere on this effort Koshling. Seems like since your last push we've had a LOT of CTD reports rolling in and I've made a lot of mistakes in the Text Editor enough to know its easy to accidentally cause a CTD there. This involves the plot mouseover so its fairly understandable why they would be reporting a varied amount of turns between crashes as well since if that's the issue it wouldn't manifest until they get this help box popup. I can take a look but I'm not sure I'm skilled enough to figure out what went wrong since I didn't code it myself.
 
Wow... and with so many more promos to come... ew! Nice spot! But at least it only occurs on player prompting, thus it does not impact turn times.

EDIT: I suspect you may have made a small mistake somewhere on this effort Koshling. Seems like since your last push we've had a LOT of CTD reports rolling in and I've made a lot of mistakes in the Text Editor enough to know its easy to accidentally cause a CTD there. This involves the plot mouseover so its fairly understandable why they would be reporting a varied amount of turns between crashes as well since if that's the issue it wouldn't manifest until they get this help box popup. I can take a look but I'm not sure I'm skilled enough to figure out what went wrong since I didn't code it myself.
Given where the CTDs occur (mostly in the exe or its libraries), it is more likely that Koshling was right and the memory allocator used here is not threadsafe and the multi-threaded property solver caused some double allocation and then released it while still used.
 
Wow... see.. that's why I wouldn't have been able to fix that if I tried... lol

But my suspicion that the change made here has brought forth the CTD reports at least seems accurate.
 
Given where the CTDs occur (mostly in the exe or its libraries), it is more likely that Koshling was right and the memory allocator used here is not threadsafe and the multi-threaded property solver caused some double allocation and then released it while still used.

I've been playing without incident on my own game, so I doubt it's that actually. It certainly isn't the stack display stuff, because that is only executed for stacks over 15 units, and people are reporting CTDs on turn 1. Since it wasn't happening fro me (latest DLL but not quite latest assets) I rather suspect an XML foul-up that the DLL mis-handles. If nobody has got to the bottom of it before I start modding today, I'll look into it (has anyone looked at the minidumps yet?)
 
I've been playing without incident on my own game, so I doubt it's that actually. It certainly isn't the stack display stuff, because that is only executed for stacks over 15 units, and people are reporting CTDs on turn 1. Since it wasn't happening fro me (latest DLL but not quite latest assets) I rather suspect an XML foul-up that the DLL mis-handles. If nobody has got to the bottom of it before I start modding today, I'll look into it (has anyone looked at the minidumps yet?)
Let's see if you find something and if not, then I can still implement some thread specific memory pool or lock all allocations and deallocations with a mutex.
 
That main vector is not allocated/deallocated every frame. The vector is a private member of the CvGameTextMgr class. It is allocated once, the first time the human looks at a plot with a stack big enough to trigger the grouping, and then stays allocated until CvGameTextMgr::DeInitialize() is run. So in C2C it is evidently about 100MB of (mostly) junk that is allocated once for the entire game.

How is this for inefficiency...

That main vector of arrays has every element in each array zeroed every time the function runs, which is not surprising (ill advised, but not surprising). There are also 3 other vectors of ints that are local variables which will be the same size, but at least they don't hold an array of ints the size of the number of promotions for every element (just an int) so they are each "only" in the neighborhood of 200KB each although they are allocated/deallocated every time. Each time the 3 of them are created it gets the entire "GC.getNumUnitInfos() * MAX_PLAYERS" number of ints (all 0) pushed onto each of the 3 vectors. If they know how big it was going to be every time, which they do, why did they not just use arrays? Big sparse arrays do not seem like the best way to do this, but it would be a lot faster than pushing bunches of zeros into 3 vectors every frame update just to achieve the same effect. In C2C each will be over 50,000 push operations for a total of 150,000+ just to load all the 0 values. If it averages 2 nanoseconds for each push that still makes it over 0.3 milliseconds per frame just for this pointless operation. That time assumes the vectors allocate enough space using very few expansions, which I doubt is the case - they will most likely do multiple allocations as the vectors grow (probably 10 allocations or more each, but I don't remember how it decides how much space to allocate when it grows or how much space they start with), each of which takes time particularly since I seem to recall that the vectors require contiguous memory space so if the current block of memory for each can't be extended it will reallocate elsewhere at the new target size and copy everything over (keeping the older shorter list allocated simultaneously until the copy is done, of course). (50,000+ entries in each vector for perhaps a couple dozen entries that are non-zero? Really? Really.)

I would not be surprised if just initializing these vectors and zeroing out the promotion related vector of arrays and then filling them with the little bit of data they hold was taking more than a millisecond per frame in C2C. Less in regular BtS, of course - but it is still a bad idea for that too. And then there's the waste of memory on top of the time issue (or vice-versa) just to compound the badness of it.

By the way, MAX_PLAYERS should be 51 for C2C since it includes 1 for the barbarians, so that's even more entries. I also just did a quick search of all the unit infos XML files in C2Cv26 and that indicates 1178 unit types (although a few are probably in modules that are not being loaded). 1178*51 = 60,078, so it is actually probably nearly 60,000 instead of around 50,000 entries in each vector. On the other hand there are apparently "only" about 478 promotions. That would still make the persistent promotion vector of arrays of ints (almost all of which are set to 0 at any given time) take up almost 110MB. That's a whole lot of zeros.
 
Given how many occurances of sparse arrays like this there are in the DLL, especially in the info classes, I'd say someone seems to be in love with them.
 
That main vector is not allocated/deallocated every frame. The vector is a private member of the CvGameTextMgr class. It is allocated once, the first time the human looks at a plot with a stack big enough to trigger the grouping, and then stays allocated until CvGameTextMgr::DeInitialize() is run. So in C2C it is evidently about 100MB of (mostly) junk that is allocated once for the entire game.

How is this for inefficiency...

That main vector of arrays has every element in each array zeroed every time the function runs, which is not surprising (ill advised, but not surprising). There are also 3 other vectors of ints that are local variables which will be the same size, but at least they don't hold an array of ints the size of the number of promotions for every element (just an int) so they are each "only" in the neighborhood of 200KB each although they are allocated/deallocated every time. Each time the 3 of them are created it gets the entire "GC.getNumUnitInfos() * MAX_PLAYERS" number of ints (all 0) pushed onto each of the 3 vectors. If they know how big it was going to be every time, which they do, why did they not just use arrays? Big sparse arrays do not seem like the best way to do this, but it would be a lot faster than pushing bunches of zeros into 3 vectors every frame update just to achieve the same effect. In C2C each will be over 50,000 push operations for a total of 150,000+ just to load all the 0 values. If it averages 2 nanoseconds for each push that still makes it over 0.3 milliseconds per frame just for this pointless operation. That time assumes the vectors allocate enough space using very few expansions, which I doubt is the case - they will most likely do multiple allocations as the vectors grow (probably 10 allocations or more each, but I don't remember how it decides how much space to allocate when it grows or how much space they start with), each of which takes time particularly since I seem to recall that the vectors require contiguous memory space so if the current block of memory for each can't be extended it will reallocate elsewhere at the new target size and copy everything over (keeping the older shorter list allocated simultaneously until the copy is done, of course). (50,000+ entries in each vector for perhaps a couple dozen entries that are non-zero? Really? Really.)

I would not be surprised if just initializing these vectors and zeroing out the promotion related vector of arrays and then filling them with the little bit of data they hold was taking more than a millisecond per frame in C2C. Less in regular BtS, of course - but it is still a bad idea for that too. And then there's the waste of memory on top of the time issue (or vice-versa) just to compound the badness of it.

By the way, MAX_PLAYERS should be 51 for C2C since it includes 1 for the barbarians, so that's even more entries. I also just did a quick search of all the unit infos XML files in C2Cv26 and that indicates 1178 unit types (although a few are probably in modules that are not being loaded). 1178*51 = 60,078, so it is actually probably nearly 60,000 instead of around 50,000 entries in each vector. On the other hand there are apparently "only" about 478 promotions. That would still make the persistent promotion vector of arrays of ints (almost all of which are set to 0 at any given time) take up almost 110MB. That's a whole lot of zeros.

I'll just rewrite it (sometime) to use a map with a compound key so it's not sparse at all.
 
Top Bottom