Strange hang on turn end in midgame using #8729 build

I'll take a look. Putting it on the pre-release bugfix list. Though it's low on the totem pole at the moment since there's other big issues I'm sorting out. (So if someone wants to jump in on this one have at it!)
 
@Thunderbrd Thank you! So you think you know what the actual issue is? It's the first time I experience a hang like that with CPU utilization going up and 3 identical-looking threads appearing.
 
No idea. It might be one that's beyond me. But I can certainly take a look. It sounds like a standard infinite loop taking place somewhere.


EDIT: Ok, I didn't get a hang on this save.

From what you explained, you may have an issue with the number of threads your system can run being at odds with the # of threads the xml is set to use. You may want to go into A_New_Dawn_GlobalDefines.xml and change the NUM_CITY_PIPELINE_THREADS to a lower amount than the default of 4.

That's my best guess.
 
@Thunderbrd Thank you for looking into this! It's surprising that it does not hang on your end.

Even though I have quite capable PC (core i7 - 4 cores with hyperthreading, and 24 GB RAM), I did as you suggested and set the NUM_CITY_PIPELINE_THREADS to 2. As a result, only 1 thread (instead of 3) was created. But with the same result... the turn was never completed, and I did wait quite a bit. Here's a momentary call stack of that thread, if it's in any way revealing:

Code:
0, ntoskrnl.exe!KeSetCoalescableTimer+0x716
1, ntoskrnl.exe!KeWaitForMultipleObjects+0x15e6
2, ntoskrnl.exe!KeWaitForMultipleObjects+0xd38
3, ntoskrnl.exe!KeWaitForSingleObject+0x385
4, ntoskrnl.exe!KeTestAlertThread+0x1103
5, ntoskrnl.exe!KeSetCoalescableTimer+0x800
6, ntoskrnl.exe!KeSynchronizeExecution+0x2543
7, CvGameCoreDLL.dll!CvMessageData::createMessage+0xabf4
8, 0x4883dc093392718
9, 0x93392718048841a5
10, CvGameCoreDLL.dll!CvMessageData::createMessage+0xa940
11, 0x7c3416f8042cfeac
12, 0x4ca1a80042cff14
13, CvGameCoreDLL.dll!CvStatString::`vftable'+0x30b08
14, 0x494ce4a042cff4c
15, 0x45e1191a258e578
16, 0x93392718042cfeeb
17, 0x9339271800000a58
18, 0x19f35c
19, 0x45e1f60042cff4c
20, 0x45e105c92b908a8
21, 0x19f35c04883dc0
22, 0x42cff4004883d01
23, 0xffffffff04ca1f18
24, 0x7c3494f6045e121d
25, 0x9292030092921698
26, 0x9339271804883dc0
27, 0x42cff240494ce40
28, 0x4ca1f38042cff70
29, 0x42cff8000000001
30, 0x19f35c7c349565
31, 0x7c3494f67c3494f6
32, 0x92920300
33, 0x42cff58
34, 0x7c34240d042cffcc
35, msvcr71.dll!__non_rtti_object::`vftable'+0x4b0
36, 0x75513744042cff94
37, 0x7551372092920300
38, 0x42cffdc312400cd
39, 0x92920300778ca064
40, 0xd0bca6d
41, 0x9292030000000000
42, CvGameCoreDLL.dll!CvMessageData::createMessage+0xabf4 (No unwind info)
43, CvGameCoreDLL.dll!CvPopupInfo::setData3+0xf1a (No unwind info)
44, msvcr71.dll!endthreadex+0xa0 (No unwind info)
45, kernel32.dll!BaseThreadInitThunk+0x24 (No unwind info)
46, ntdll.dll!RtlSetCurrentTransaction+0xd4 (No unwind info)
47, ntdll.dll!RtlSetCurrentTransaction+0x9f (No unwind info)

When I was refreshing the call stack in Process Hacker, it did look different but always there were "CvGameCoreDLL.dll!CvMessageData::createMessage" items.

Let me know if you need any logs that may be of use. If could also run the game with VS debugger attached... just don;t know where to set breakpoints and what to look for.
 
You shouldn't have to bother to set breakpoints. Just rename the CvGameCoreDLL.dll in the assets to CvGameCoreDLL.dll.core and rename CvGameCoreDLL.dll.debug to CvGameCoreDLL.dll then run the game. Open VS and attach to process. You might find where you're having trouble as it should show up when the game goes to crash.

But if it doesn't do this while I run it here I'm not sure what more I can do. Multi-threading is a subject I stay far clear of. I don't work with the code that generates or works with that at all. You could try to change the number of city threads to 1 and turn off the other two threading bools and see if running with a single processor at least gets you past the round. (Search for the term THREAD in that same global file to find the other two global bools that set multi-processing.)
 
@Thunderbrd Thank you for suggestions. I did set USE_MULTIPLE_THREADS_SPAWNING and USE_MULTIPLE_THREADS_PROPERTY_SOLVER to 0, as well as set NUM_CITY_PIPELINE_THREADS to 1. As a result no threads were created on turn end. However it did not solve the issue. I waited for 1 hour, and it still would not finish the trun even though only 1 main thread was effectively active. Here's the sample of it's call stack:
Code:
0, ntoskrnl.exe!KeSetCoalescableTimer+0x716
1, ntoskrnl.exe!KeWaitForMultipleObjects+0x15e6
2, ntoskrnl.exe!KeWaitForMultipleObjects+0xd38
3, ntoskrnl.exe!KeWaitForSingleObject+0x385
4, ntoskrnl.exe!KeTestAlertThread+0x1103
5, ntoskrnl.exe!KeSetCoalescableTimer+0x800
6, ntoskrnl.exe!KeSynchronizeExecution+0x2543
7, python24.dll!PyString_Eq+0xb
8, 0xdf057001e052353
9, 0xdf1ede00df1ede0
10, 0xdf060300df08630
11, 0x3b44a3000000002
12, 0xdf060301e052617
13, 0x5b86c22d0df1ede0
14, 0x100000000
15, 0xdf060301e07f441
16, 0xd3aa2ec0df1ede0
17, 0xdf1ede01e19d1b0
18, 0xd3aa2ec
19, 0x1e07f0eb1e19d1b0
20, 0xdf1ede00d3aa2ec
21, 0xdf651b000000000
22, 0x1e019a9800000000
23, 0xdf1ede00d3aa2ec
24, 0x6dacd8c80d3aa2ec
25, 0xd3aa2ec00000000
26, 0x1e019b150d3aa2ec
27, 0xdf651b00d3aa2ec
28, 0x4b9a425000003e8
29, 0xdf651b00d3aa2ec
30, 0x100079630dc80120
31, 0xd3aa2ec0d3aa2ec
32, 0xe0bc8cc0019e9a4
33, 0x100079a3037486bc
34, 0xd3aa2ec04b7d5a5
35, 0x6dacd8c803746ea4
36, 0x4c7dc6504ee1f00
37, 0x6dacd8c80d3aa2ec
38, 0xe0c162800000002
39, 0xd3aa2ec00000000
40, 0x100146900019e9ac
41, 0xd3aa2e000000001
42, 0x6dacd8c81000ea8d
43, 0x19ea9c00000000
44, 0x19ea7c0019ea9c
45, 0x6dacd8c86dacd8c8
46, 0xe0c1628
47, 0xa24b605400000002
48, 0x200000002
49, 0x19ea440e0c1628
50, boost_python-vc71-mt-1_32.dll!boost::python::detail::wrapper_base::get_override+0x17d8
51, 0x6dacd8c81000eb45
52, 0x19ea8400000000
53, 0x19ea1b100136c1
54, 0xd00019ea9c
55, 0x200000000
56, 0x1a00000000
57, 0xa24ba23800000000
58, 0x778adab80019ea40
59, 0x19ea8000000000
60, 0x100179e80019ea70
61, 0x10013766ffffffff
62, 0x19ea840019ea9c
63, 0x6dacd8c8
64, 0x19ea5494a85bd8
65, 0x19f2700019ea54
66, 0x110017a08
67, 0x1000eb8c00000000
68, 0x19ea9c1000c7b0
69, boost_python-vc71-mt-1_32.dll!boost::python::objects::function::call+0x300
70, 0x1000ebd00019ea84
71, 0xe0c16280019eab4
72, 0x6dacd8c8
73, 0x19eb50
74, 0x1e0193cc00000000
75, 0x6dacd8c80e0c1628
76, python24.dll!PyString_Eq+0xb (No unwind info)
77, python24.dll!PyWrapper_New+0x753 (No unwind info)
78, python24.dll!PyDict_GetItem+0x57 (No unwind info)
79, python24.dll!PyObject_GenericGetAttr+0xe1 (No unwind info)
80, python24.dll!PyObject_GetAttr+0x7b (No unwind info)
81, python24.dll!PyObject_CallFunctionObjArgs+0x388 (No unwind info)
Strangely it seems to be doing some sort of thread sync things in any case, if I am interpreting these call stacks correctly. Not sure if it is the culprit though.

I will run the debug version attached to VS as you suggested, though because it does not crash (just hangs) not sure how it might help.
 
I ran the debug dll with "single thread" xml config options still in place, attached to VS.

Amazingly, I was able to finish the turn, but I was presented with quite a few of Assertion Failed messages. Below are all of them with my answers (Ignore once for the first few, then Ignore Always).

Code:
Assert Failed

File:  CvPlayer.cpp
Line:  18777
SVN-Rev:  8706
Expression:  getBuildingClassCount(eIndex) <= (GC.getBuildingClassInfo(eIndex).getMaxPlayerInstances() + GC.getBuildingClassInfo(eIndex).getExtraPlayerInstances())
Message:  BuildingClassCount is expected to be less than or match the number of max player instances plus extra player instances

----------------------------------------------------------
[Ignore Once]

Assert Failed

File:  CyPlayer.cpp
Line:  2454
SVN-Rev:  8706
Expression:  m_pPlayer->getID() != (PlayerTypes)ePlayer
Message:  shouldn't call this function on ourselves (Python)

----------------------------------------------------------
[Ignore Once]

Assert Failed

File:  CvPlayerAI.cpp
Line:  498
SVN-Rev:  8706
Expression:  false
Message:  UnitAI miscount

----------------------------------------------------------
[Ignore Once]

Assert Failed

File:  CvPlayerAI.cpp
Line:  498
SVN-Rev:  8706
Expression:  false
Message:  UnitAI miscount

----------------------------------------------------------
[Ignore Once]

Assert Failed

File:  CvPlayerAI.cpp
Line:  498
SVN-Rev:  8706
Expression:  false
Message:  UnitAI miscount

----------------------------------------------------------
[Ignore Always]

Assert Failed

File:  CvPlayer.cpp
Line:  18777
SVN-Rev:  8706
Expression:  getBuildingClassCount(eIndex) <= (GC.getBuildingClassInfo(eIndex).getMaxPlayerInstances() + GC.getBuildingClassInfo(eIndex).getExtraPlayerInstances())
Message:  BuildingClassCount is expected to be less than or match the number of max player instances plus extra player instances

----------------------------------------------------------


[Ignore Always]

Assert Failed

File:  CvSelectionGroup.cpp
Line:  4539
SVN-Rev:  8706
Expression:  false
Message:  Pathing failed to apparently recahable city

----------------------------------------------------------

[Ignore Always]

Assert Failed

File:  CvCityAI.cpp
Line:  17642
SVN-Rev:  8706
Expression:  AI_buildingValueThresholdOriginal(eBuilding, iFocusFlags, iThreshold, bMaximizeFlaggedValue) == 0
Message:  

----------------------------------------------------------
[Ignore Always]

Assert Failed

File:  CvArea.cpp
Line:  1139
SVN-Rev:  8706
Expression:  getNumAIUnits(eIndex1, eIndex2) >= 0
Message:  

----------------------------------------------------------
[Ignore Always]

Assert Failed

File:  CvPlayer.cpp
Line:  6768
SVN-Rev:  8706
Expression:  iNumSelectionGroups <= iNumUnits
Message:  The number of Units is expected not to exceed the number of Selection Groups

----------------------------------------------------------
[Ignore Always]

Assert Failed

File:  CvPathGenerator.cpp
Line:  2922
SVN-Rev:  8706
Expression:  !newNode->m_bIsKnownRoute || node->m_iCostTo + iEdgeCost == newNode->m_iCostTo
Message:  

----------------------------------------------------------

So, apparently in the debug version I was able to ignore something which was causing a hang in the normal version.
 
Those asserts are known.

The first comes from some unknown way global buildings can be built (and thus exist which is what the assert has discovered and is blowing the alert on) more than once.

The second is a python problem somewhere. Further investigation might help to determine where but it comes up in many games and is ignorable.

The third is a known persistent bug that Koshling and I have been looking at unsure of what it is really trying to say since about 3 or 4 years ago. Probably has something to do with units with multiple AI settings in C2C.

None of those would be ignorable to get around a problem. All they are is warnings that something might be incorrect somewhere. All are common and have been long outstanding bugs that aren't causing much in the way of any known trouble.
 
The second is indicating that somewhere in the python a call is being made to something in the dll that the dll does not handle. It probably returns a null or invalid value to Python as well as the assert and the Python knows what to do in the case of the null or invalid value. Most likely it was laziness on the part of the designers of the connection/interaction that they did not define what should happen at that point so the programmers at both ends came up with a work around of some sort.
 
@Thunderbrd Thank you for your help! After having made through the first turn, I switched back to regular version and re-enabled "multithreading", and then was able to play without problems. It seems that whatever condition caused the hang existed only during that save game. Too bad we were not able to identify it, but hopefully this does not happen often.
 
I thought it might be something we could simply bypass. It certainly was strange we couldn't find it and its possible the debug dll doesn't have the problem... which is very unusual to say the least.
 
Back
Top Bottom