Single Player bugs and crashes v36 plus (SVN) - After the 24th of October 2015

Status
Not open for further replies.
Now I have even seen a case where the quality promo does nothing but reset the exp counter. Unfortunately, I don't have a save game.

Interesting...
 
I have CTDs because of air units. I had several CTDs during my game. But when I deleted city, which built air unit, I played normally then. Now again I have CTD and can't find solution (but I think, it air unit spawned somewhere). I use C2C svn with space colonization modmod and with manually enabled Atompunk tech (I enabled it by myself in XML).
Here is the save and minidump. Crash after pushing "end of turn" button.
 

Attachments

  • CTD.rar
    4.5 MB · Views: 63
I have CTDs because of air units. I had several CTDs during my game. But when I deleted city, which built air unit, I played normally then. Now again I have CTD and can't find solution (but I think, it air unit spawned somewhere). I use C2C svn with space colonization modmod and with manually enabled Atompunk tech (I enabled it by myself in XML).
Here is the save and minidump. Crash after pushing "end of turn" button.

You're saying it's that the air unit is being built, not that it's being made to do something right? This SOUNDS like a 'normal' graphic issue with the unit itself. What unit is it?
 
I deleted city which built early air unit (Blimp, I think). And then I have no ctds until now.
But now I don't know what causing CTD. I think, it is air unit somewhere. But I can't find it via world builder.
 
I deleted city which built early air unit (Blimp, I think). And then I have no ctds until now.
But now I don't know what causing CTD. I think, it is air unit somewhere. But I can't find it via world builder.

Ok. It will be solved in the order it was registered. Crashes do get priority and I've been working on one crash scenario all week already.
 
as advised i'm posting my problem here ...

the game crashes after ending the turn without any comment, no memory message, no entry in the logging files... nothing

version, latest svn
no special map

Resolved. Not PERFECTLY as I'd hoped, but I tracked it to the source as far as I could at least. Apparently, a unit had somehow taken on a duplicate ID in its plot list so when it was killed, the duplicate remained but the unit did not so the next loop through the units on that plot caused a crash when they found an ID that didn't have a unit attached to it. I suspect the jumptovalidplot function BUT that's only due to the unit being an exile and they always start off right away that way - but to defy that suspicion, there were many other exiles killed on that plot this round, not just this one. So more likely I simply don't have a clue how the problem initiates really, just that I've made it so that it cannot cause an issue nor persist if it ever does take place. Unfortunately, this MIGHT mean a slight slowdown in end turns but I don't think it should be noticeable.
 
Resolved. Not PERFECTLY as I'd hoped, but I tracked it to the source as far as I could at least. Apparently, a unit had somehow taken on a duplicate ID in its plot list so when it was killed, the duplicate remained but the unit did not so the next loop through the units on that plot caused a crash when they found an ID that didn't have a unit attached to it. I suspect the jumptovalidplot function BUT that's only due to the unit being an exile and they always start off right away that way - but to defy that suspicion, there were many other exiles killed on that plot this round, not just this one. So more likely I simply don't have a clue how the problem initiates really, just that I've made it so that it cannot cause an issue nor persist if it ever does take place. Unfortunately, this MIGHT mean a slight slowdown in end turns but I don't think it should be noticeable.

Newer versions of Windows are less forgiving about certain programming errors. Take a look at this thread that CTD took place in an similar scenario.
 
Now I know how to avoid (at least in most cases) the bug with super-big strength of units after adjusting their stats in WB (it exists in Size Matters, I am not sure if it exists without it).

If you enter WB and adjust unit properties, duplicate it etc., after the first promotion (or chosing a new promo after old one is obsolete) you end with a super-strong unit. The problem is that by default the field "Base strength" shows actual strength of the unit (after effects of Quality Up and promotions increasing strength) multiplied by 100. This way e.g. an Ambusher (Str 3) with three +Str (+1, +2, +2 IIRC, so +5 base Strength) and one Quality Up (Strength * 1.5, HP * 1.5) has 12 Strength and 150 HP. If you enter WB and look at the unit's properties, you will see Base Strength 1200. If you leave it, at the next promotion (and changing base Strength causes the unit to lose +Strength promotions, so it may happen immediately after leaving WB) you get an unit with 1200 base Strength - if you include Quality Up, you get 1800 Strength!

To avoid this, it is enough to put correct number in the "Base strength" field - in this case this is 3. I checked that and it works. I am not sure if it works for weakened units which start with e.g. 0.67 Strength - maybe one would need to put 1 instead.

S.
 
Newer versions of Windows are less forgiving about certain programming errors. Take a look at this thread that CTD took place in an similar scenario.
Now that was a fascinating thread! I may comment on this bug there. You should take a look at the rather simple change that this debug required. The thread you posted suggests that this may be an issue elsewhere throughout all vanilla based mods and when I found the issue I was left to wonder about that because the 'fix' was made in a function I do not suspect we've ever modified.

Mind you, the problem with the gamestate doesn't begin there. Where it actually begins is a bigger mystery. How would you get two duplicate unit IDs on the same plot in the first place? In this case, that may have even happened in rounds that took place before this save. It won't cause a problem until the unit then tries to move again (or gets killed which results in a move to an invalid plot.)

A little further investigation should be done on this still I think. Perhaps I should place an assert on CvPlot's add unit function to try and catch when a duplicate unit ID is added to the list.

The same thinking applies to the bug you were discussing in the other thread. A NULL pointer is only returned by the GetUnit function when cycling through a list, be it a plot list or a selection group list if something has gone horribly wrong. Even eliminating a unit mid-loop should still not allow such a problem since the 'next' pointer is re-established properly during the removal process. removeunit not being run while the unit is otherwise deleted would cause this problem. Most of these reasons would be painfully obvious immediately upon being incorrectly programmed.

But if you have a situation like this one where the unit was somehow added to the list twice, in this case, the plot list, you get a crash when the unit dies and the assumption there's only one instance of its ID makes it only removed once while the rest of the unit data is eliminated, making it apparent why you then get a crash the next time the unit data is asked for since the plot list is saying it should still exist given that only one of the two instances listed there was removed when the unit died.

Ultimately, my point is:
Code:
		if (pLoopUnit == NULL)
		{
			//	Koshling  - this really isn't a valid condition I think but it's been seen
			//	and until we can understand the underlying cause, mitigate by ignoring the
			//	invalid unit
			continue;
		}
as a fix is terribly flawed. As Nightinggale was pointing out, this is only addressing a symptom, not addressing the disease. There are so many places in the code where node lists are used that just patching up one instance, especially on plot node lists, of a null call will simply lead to another elsewhere. There's a reason this should be safe despite getUnit having a valid NULL response, and that is that if you get that NULL response, the problem is in a corruption in the node list, and that should never happen... unless there's a big problem elsewhere. If you don't fix that initial flawed gamestate creating error, you are going to set yourself up for later crashes and other larger problems down the line.

Only by following the debugger run very carefully was I able to figure out what happened as it was so far out of the box of any presumptions I might have had coming into the issue that I had to be forced to recognize what was actually taking place by watching the code process through every single step. Took me a while to narrow in on it. I would have thought it was impossible what was actually taking place.

In this case, the effort led to deeper analysis of what was happening here:
Code:
void CvPlot::removeUnit(CvUnit* pUnit, bool bUpdate)
{
	CLLNode<IDInfo>* pUnitNode;

	pUnitNode = headUnitNode();

	while (pUnitNode != NULL)
	{
		if (::getUnit(pUnitNode->m_data) == pUnit)
		{
			FAssertMsg(::getUnit(pUnitNode->m_data)->at(getX_INLINE(), getY_INLINE()), "The current unit instance is expected to be at getX_INLINE and getY_INLINE");
			m_units.deleteNode(pUnitNode);
at m_units.deleteNode(pUnitNode);

Code:
template <class tVARTYPE>
inline CLLNode<tVARTYPE>* CLinkList<tVARTYPE>::deleteNode(CLLNode<tVARTYPE>* pNode)
{
	CLLNode<tVARTYPE>* pPrevNode;
	CLLNode<tVARTYPE>* pNextNode;

	FAssert(pNode != NULL);

	pPrevNode = pNode->m_pPrev;
	pNextNode = pNode->m_pNext;

	if ((pPrevNode != NULL) && (pNextNode != NULL))
	{
		pPrevNode->m_pNext = pNextNode;
		pNextNode->m_pPrev = pPrevNode;
	}
When pNextNode also showed up as the same player and unit ID I did a double-take. Could there REALLY be a duplicate unit ID Node??? Sure enough there was. So the easiest 'fix' to correcting this was to back up and adjust removeUnit to:
Code:
	while (pUnitNode != NULL)
	{
		if (::getUnit(pUnitNode->m_data) == pUnit)
		{
			FAssertMsg(::getUnit(pUnitNode->m_data)->at(getX_INLINE(), getY_INLINE()), "The current unit instance is expected to be at getX_INLINE and getY_INLINE");
			m_units.deleteNode(pUnitNode);
			/*break;*///TBDebug: It has been found possible that a node may have somehow duplicated itself and thus the plot may have more than 1 of the node.  If these duplicates aren't destroyed, we get a crash as well.  Best case would be to somehow catch how this gets setup but in the debugging case that discovered this, it likely happened long beforehand.  Jumptonearestvalidplot is the suspected culprit as it was an exile that showed the issue.
		}
		//else
		//{
			pUnitNode = nextUnitNode(pUnitNode);
		//}
	}

I'd bet quite a bit that the same thing (a duplicate ID) was happening in the CVSelection remove Unit command in their mod.

Again though, now I need to figure out what happened to create this duplication!

That different OPs and compilers handle the issue differently isn't too surprising really. It's a matter of a difference in what would happen in the loop when it hits a null unit pointer ejecting from the while sequence rather than holding on and hitting a crash due to a null reference. Either that or the gamestate altering problem in their case was based on a random being generated that was varying on one system to the next.

Now I know how to avoid (at least in most cases) the bug with super-big strength of units after adjusting their stats in WB (it exists in Size Matters, I am not sure if it exists without it).

If you enter WB and adjust unit properties, duplicate it etc., after the first promotion (or chosing a new promo after old one is obsolete) you end with a super-strong unit. The problem is that by default the field "Base strength" shows actual strength of the unit (after effects of Quality Up and promotions increasing strength) multiplied by 100. This way e.g. an Ambusher (Str 3) with three +Str (+1, +2, +2 IIRC, so +5 base Strength) and one Quality Up (Base Str * 1.5, HP * 1.5) has 12 Strength and 150 HP. If you enter WB and look at the unit's properties, you will see Base Strength 1200. If you leave it, at the next promotion (and changing base Strength causes the unit to lose +Strength promotions, so it may happen immediately after leaving WB) you get an unit with 1200 base Strength - if you include Quality Up, you get 1800 Strength!

To avoid this, it is enough to put correct number in the "Base strength" field - in this case this is 3. I checked that and it works. I am not sure if it works for weakened units which start with e.g. 0.67 Strength - maybe one would need to put 1 instead.

S.
Ok, yeah, that makes total sense based on how the math works. You may want to post that in the tips and tricks section somewhere.
 
Even eliminating a unit mid-loop should still not allow such a problem since the 'next' pointer is re-established properly during the removal process. removeunit not being run while the unit is otherwise deleted would cause this problem. Most of these reasons would be painfully obvious immediately upon being incorrectly programmed.

That was exactly the mistake because the pUnitNode pointer is invalid as soon as the unit with the id that pointer is pointing to is removed from that plot for whatever reason during the loop. The memory segment that pointer points to is deallocated then the unit gets removed from the plot. From that point on that memory segment can be overwritten with other data and the pointer points to something else.

But this problem only became visible in newer windows versions because they have more aggressive memory handling to reduce the memory usage.

The fix that i posted in that thread works around the problem in that function. But there can be lots of other possible issues and duplicates can be possible too.
 
That was exactly the mistake because the pUnitNode pointer is invalid as soon as the unit with the id that pointer is pointing to is removed from that plot for whatever reason during the loop. The memory segment that pointer points to is deallocated then the unit gets removed from the plot. From that point on that memory segment can be overwritten with other data and the pointer points to something else.

But this problem only became visible in newer windows versions because they have more aggressive memory handling to reduce the memory usage.

The fix that i posted in that thread works around the problem in that function. But there can be lots of other possible issues and duplicates can be possible too.

Theory:
Given that the node list is readjusted properly when a unit is removed from the list, the problem is not so much that a unit was removed mid-loop, but that the loop was programmed incorrectly, which you proved with your adjustment. However, you did a very interesting trick to proxy the loop to get around it. But could it be as simple as this...

The original loop code started in this manner:
Code:
	while (pUnitNode != NULL)
	{
		pLoopUnit = ::getUnit(pUnitNode->m_data);
		pUnitNode = nextUnitNode(pUnitNode);
By putting this line : pUnitNode = nextUnitNode(pUnitNode);
immediately after : pLoopUnit = ::getUnit(pUnitNode->m_data);
we've already switched pUnitNode over to the NEXT node before the actual processing in the loop takes place.

So could their error have been more simply addressed by placing 'pUnitNode = nextUnitNode(pUnitNode);' at the END of the While brackets, AFTER the processing has already taken place?

I'm almost sure I've had to make this change a time or two already to fix some previous crashes and while it made sense at the time, I didn't bring it up in the forums because I would have assumed it was a pretty basic issue.

Perhaps if I see another crash like that I'll try to address it in that way before freaking out over it otherwise.
 
For awhile now, in Python, we have been coding to cover for the call to be for a NULL whatever. We check at the start of each function that all the parameters we are going to be working with are not NULL before we do anything. If any are we do nothing and return.

edit naturally we are only doing that in new versions of modules or ones we edit. Old ones are probably still coded assuming that all objects are not NULL.
 
For awhile now, in Python, we have been coding to cover for the call to be for a NULL whatever. We check at the start of each function that all the parameters we are going to be working with are not NULL before we do anything. If any are we do nothing and return.

edit naturally we are only doing that in new versions of modules or ones we edit. Old ones are probably still coded assuming that all objects are not NULL.

You gotta do that most places but its really intended that in a unit linked list and in a 'where' loop such as this, you shouldn't have to or you have a problem where the linked list doesn't match the rest of the game state (aka a unit exists or doesn't exist when and/or where it should or shouldn't.)

This is certainly a unique case from the usual if (xstuff != NULL) then proceed fine that happens in thousands of places in the code for good reason.
 
Theory:
Given that the node list is readjusted properly when a unit is removed from the list, the problem is not so much that a unit was removed mid-loop, but that the loop was programmed incorrectly, which you proved with your adjustment. However, you did a very interesting trick to proxy the loop to get around it. But could it be as simple as this...

The original loop code started in this manner:
Code:
	while (pUnitNode != NULL)
	{
		pLoopUnit = ::getUnit(pUnitNode->m_data);
		pUnitNode = nextUnitNode(pUnitNode);
By putting this line : pUnitNode = nextUnitNode(pUnitNode);
immediately after : pLoopUnit = ::getUnit(pUnitNode->m_data);
we've already switched pUnitNode over to the NEXT node before the actual processing in the loop takes place.

So could their error have been more simply addressed by placing 'pUnitNode = nextUnitNode(pUnitNode);' at the END of the While brackets, AFTER the processing has already taken place?

I'm almost sure I've had to make this change a time or two already to fix some previous crashes and while it made sense at the time, I didn't bring it up in the forums because I would have assumed it was a pretty basic issue.

Perhaps if I see another crash like that I'll try to address it in that way before freaking out over it otherwise.

To handle all possible scenarios the fix that i posted there is the only working solution. It doesn't make a difference where you put the 'pUnitNode = nextUnitNode(pUnitNode);' because wherever it is something can always go wrong if units are removed from the plot in the while loop. Removing the current unit is a problem if the nextUnitNode call is at the end of the loop and removing the next unit is a problem if it is where it was.
 
Sorry if i ask a dumb question. I already searched the description...
I re-installed C2C. I installed the SVN Program. I downloaded the files and get a file with a green symbol. I exported the files there to another diretcory and then copied it to my C2C - but then i get only errors for missing files. What did i do wrong?

Thanks

Moderator Action: Moved as requested by SO.
 
Sorry if i ask a dumb question. I already searched the description...
I re-installed C2C. I installed the SVN Program. I downloaded the files and get a file with a green symbol. I exported the files there to another diretcory and then copied it to my C2C - but then i get only errors for missing files. What did i do wrong?

Thanks

Is your BtS/Mods folder have Caveman2Cosmos in it or did you name it C2C? It must be Caveman2Cosmos.

And if you used the SVN why did you re-install an old Caveman2Cosmos before doing the Export?

Delete all contents of the BtS/Mods/Caveman2Cosmos folder and then copy paste the appropriate folders and files into it. Starting with Assets. Do not copy the .svn folder into Caveman2Cosmos folder.

Finally any further trouble with your installation Please use the Bug and Crash thread to report it. This thread is for the Modders documentation of changes.

Thanks
JosEPh
 
Removing the current unit is a problem if the nextUnitNode call is at the end of the loop
Would that not just come up with a NULL response and thus break the While loop rather than trying to call to unit data on a NULL pointer thereafter?

BTW, I ask this as a question to help myself understand but otherwise, I have to say how impressed I am by the depth of your solution overall. It's extremely innovative. It also begs the question if in-general, we should be applying that sort of loop setup in all linked list unit loops throughout the code? I suppose it's not necessary unless the list can change due to the processing IN the loop...
 
@T-brd,
The EoT wait after turn 200 on any GS is getting Bad. at turn 200 it's already over 1min 15 sec. But by turn 260 it's over 3+ mins, almost triple the EoT wait in 60 turns! This is getting bad.

I reduced the Num City Pipeline Multithread value from 4 down to 2 (Afforess A_New_DawnGlobalDefines) and it jumped to over 5+ min for EoT wait. I put it back to 4 asap! I've not see it this bad since last year's Ren Era game when the seige rams were exploding everywhere.

Save game attached so you can load it up and see what I mean.

JosEPh

You said a bit ago that your turn times improved quite a bit. I still took a look at a debug run on this and found nothing too out of the ordinary or too greatly delaying. Nothing special really. It did point out to me an interesting phenomenon about units losing promos though and how I could solve it. When a unit is given a promotion as free on its own unit data, even if it isn't enabled by the unit's combat classes, which might be cause for investigation into the combat class access structure for that promotion, it still shouldn't lose that promotion - it was assigned specifically to that unit for free right?

So seeing how many LE units were losing a promotion they were being given for free, and remembering how there are a huge # of cultural units losing their free promos, I figured I'd make it so that unitcombat causes to lose a promo don't count IF that unit gets the promotion for free on its unit xml. Should solve a few things immediately there.

As for speedups, I do have a few ideas but it would be much later development cycle stuff to try out. I'm glad earlier fixes solved the great delays you were having here though because just running a check as I did here would not have produced much in the way of results.

I'll have to do some real time profiling before we do a release and see if there's some other standout places I can speed up some though.
 
Would that not just come up with a NULL response and thus break the While loop rather than trying to call to unit data on a NULL pointer thereafter?

BTW, I ask this as a question to help myself understand but otherwise, I have to say how impressed I am by the depth of your solution overall. It's extremely innovative. It also begs the question if in-general, we should be applying that sort of loop setup in all linked list unit loops throughout the code? I suppose it's not necessary unless the list can change due to the processing IN the loop...

The problem is that the data that pointer points to could be changed to anything. That makes it possible that the GetUnit call returns units which are on other plots or even units from other players. In the other thread there was a problem with strange unit movements after the null check was added. Everything is possible really that is why the null check isn't enough.
 
First of all, English is not my first language, I defend myself more bad than good, with the help of google translate :crazyeye:

I can't load my savegame from yesterday. Any attempt ends with a CTD after 10-15 minutes of loading.

Last night I stopped playing and Save My game without any problem. Although the last days the game crash to the desktop from time to time :(

A summary of the game options would be:
- C2C_PerfectWorld2df
- Gigantic Map
- Eternity Speed
- Turn 2100, approximately.
- About 15 AI players, maybe more, beacuse the barbarians transform into civs.

PC specs:
- CPU: i5 2500k (no OC)
- AMD 7970
- 8GB RAM
- Windows 10 x64

I post here but the game version is V36 SVN 9230 from this thread:
http://forums.civfanatics.com/showthread.php?t=559944

The minidump file say "The thread tried to read from or write to a virtual address for which it does not have the appropriate access".

Perhaps the map is too large and the game has reached the limit of the mod?

Apart from that I have seen in the previous thread there's a new version (9265).

Maybe the only option is to start with a less ambitious size and the new version?... But the game has taken me some time and I do not like to lose it. :(

If there is a trick you can do in Windows 10 (I upgraded recently) that can solve this problem, it is very welcome.

I upload the savegame to dropbox:

https://www.dropbox.com/s/j5h3kxeh8llmesz/Bas_Imm_Gig_C2C_Ete_Pre_4760-BC_Tem_Low_Aug-02-2016_01-18-28.CivBeyondSwordSave?dl=0
My deepest apologies but I cannot fix this crash since the save has been made incompatible with the current assets at some point and cannot be loaded without crashing as a result. The source of that crash is not a bug but a change that breaks compatibility. I COULD try to back up my assets to that revision and try to fix it from there but I serve the mod's needs, and spending those kinds of hours on an out of date issue is not going to be an efficient use of my time towards that primary goal.

If I ever get a moment where I have nothing pressing to do, I'll take the time and post you a special dll once fixed but for now I must press on. Sorry.
 
Status
Not open for further replies.
Top Bottom