DLL development discussions

IIRC Koshling tried this early last year (at least the part about moving the DLL to C++ 11.0) and found that without a LOT of work it wasn't going to go anywhere.

The shim DLL might work, but given that it would, if I'm understanding correctly, deallocate everything the exe sends it, reallocate it in a manner that the c++ 11 DLL would be OK with, and then forward it to another interface I can see preformance issues here. That, and the additional issue of how to handle Python calls to and from the DLL. Another issue would be whether or not the C++ 11 debugger chokes when you and look at something going on in the shim.

The benefits would probably be worth it though. Last year I tested one of alberts DLLs compiled with the Intel C++ compiler and turns went almost twice as fast.
There are actually not that many calls over the DLL boundaries that are problematic. Main issue are the few ones that use vectors or pass CvString and similar or return it.
It seems like several of the programmers at Firaxis were aware of the issues of passing STL classes over DLL boundaries and some were not.
The calls into and from the DLL are not that performance relevant in general because the AI turns happen nearly entirely in the DLL.

In regards to Python: The calls from Python to the DLL do not touch the exe at all but currently they do go over a Boost Python DLL. In my tests I bypassed that by linking a new version of Boost Python statically which then directly communicates with the Python24.dll.
 
Can I make 512 civilizations instead of 50? Then I can make a script of my dreams on a huge map of medieval Europe
 
There are actually not that many calls over the DLL boundaries that are problematic. Main issue are the few ones that use vectors or pass CvString and similar or return it.
It seems like several of the programmers at Firaxis were aware of the issues of passing STL classes over DLL boundaries and some were not.
The calls into and from the DLL are not that performance relevant in general because the AI turns happen nearly entirely in the DLL.

In regards to Python: The calls from Python to the DLL do not touch the exe at all but currently they do go over a Boost Python DLL. In my tests I bypassed that by linking a new version of Boost Python statically which then directly communicates with the Python24.dll.

Can you post the changes that you made to Boost Python? I want to do some testing with newer versions.
 
We have a rare CTD then a new City is founded. I saw this one happening a couple times in the last 3 Months and it was never really repeatable. What can be the issue that leads to a CTD in marked line?

Code:
void CvCity::FlushCanConstructCache(BuildingTypes eBuilding)
{
	//OutputDebugString(CvString::format("[%d] FlushCanConstructCache (%d), workitem priority = %08lx\n", GetCurrentThreadId(), eBuilding, (m_workItem == NULL ? -1 : m_workItem->GetPriority())).c_str());
	EnterCriticalSection(&m_cCanConstructCacheSection);

	if ( eBuilding == NO_BUILDING )
	{
		[B][COLOR="Red"][SIZE="3"]SAFE_DELETE(m_bCanConstruct);[/SIZE][/COLOR][/B]
	}
	else if ( m_bCanConstruct != NULL )
	{
		(*m_bCanConstruct).erase(eBuilding);
	}

	LeaveCriticalSection(&m_cCanConstructCacheSection);
}
 
We have a rare CTD then a new City is founded. I saw this one happening a couple times in the last 3 Months and it was never really repeatable. What can be the issue that leads to a CTD in marked line?

Code:
void CvCity::FlushCanConstructCache(BuildingTypes eBuilding)
{
	//OutputDebugString(CvString::format("[%d] FlushCanConstructCache (%d), workitem priority = %08lx\n", GetCurrentThreadId(), eBuilding, (m_workItem == NULL ? -1 : m_workItem->GetPriority())).c_str());
	EnterCriticalSection(&m_cCanConstructCacheSection);

	if ( eBuilding == NO_BUILDING )
	{
		[B][COLOR="Red"][SIZE="3"]SAFE_DELETE(m_bCanConstruct);[/SIZE][/COLOR][/B]
	}
	else if ( m_bCanConstruct != NULL )
	{
		(*m_bCanConstruct).erase(eBuilding);
	}

	LeaveCriticalSection(&m_cCanConstructCacheSection);
}
Likely cause is that the m_bCanConstruct pointer is not initialized to NULL in the constructor of CvCity.
 
Likely cause is that the m_bCanConstruct pointer is not initialized to NULL in the constructor of CvCity.

m_bCanConstruct is a mutable std::map<int,bool>* should i try to initialize it to NULL or new std::map<int,bool>();?
It's not crashing everytime and if it crashes i load the save from the turn before and the city is founded on the same plot, by the same settler.... without a crash.
 
m_bCanConstruct is a mutable std::map<int,bool>* should i try to initialize it to NULL or new std::map<int,bool>();?
It's not crashing everytime and if it crashes i load the save from the turn before and the city is founded on the same plot, by the same settler.... without a crash.
Initilize it to NULL. That way the SAFE_DELETE will not attempt to delete it (which is bad if there is a random non NULL value in there from whatever was in memory before the city instance was constructed).
Some other piece of the code will create the map once it is needed.
 
I'm wondering if I can trust the results I'm getting by applying a new trick I've figured out with a minidump file.

If I can... this is a problem I cannot begin to hope to be able to self-determine a solution for and will need some advice. I believe we've seen this sort of thing before but I can't fathom how it happens nor how to determine that nor how to fix it.

When I pull up these mini's I've been getting (which don't repeat for me but then maybe that's BECAUSE I'M RUNNING THE DEBUGGER TO CHECK THEM) I get the following in the call stack:
Code:
	73616220()	
>	CvGameCoreDLL.dll!CvUnitAI::AI_update()  Line 430	C++
 	CvGameCoreDLL.dll!CvSelectionGroupAI::AI_update()  Line 293 + 0x7 bytes	C++
 	CvGameCoreDLL.dll!CvPlayerAI::AI_unitUpdate()  Line 1909 + 0xb bytes	C++
My presumption is that the final reference in the stack there is in the EXE?

This only shows when I'm running the minidump in a folder with both the dll and pdb documents in the same folder as the minidump.

When I run it from the Final Release or Debug folder that includes all the codes - not JUST the pdb and dll, I get a very different call stack that runs through a LOT of python calls then ends the same way but never gets back far enough to report anything taking place in the dll.

But here's the trick I discovered that I finally tried tonight: When I double click on the last dll call there in the stack, it directs me to search through my computer for the location of the source files. Guiding it to that source folder it immediately opens up that file and shows where it last visited the function:
Code:
#ifdef _DEBUG
	if ( NULL != getGroup() && !isDelayedDeath() )
	{
		getGroup()->validateLocations();
	}
#endif

	return (!isDelayedDeath() && AI_isAwaitingContract());
}
Pointing to the last return line there in the AI_Update() function.

Now HERE'S the VERY WEIRD THING: When hovering over isDelayedDeath() it shows a hover that states:
0x05363be0 CvUnitInfo::isNoRevealMap(void)

Now... if I'm not mistaken, this is telling me that whatever isDelayedDeath() is returning is 'whatever the value of isNoRevealMap(void) happens to be'.

How on Earth does this happen? And of course... how is it repaired?
 
I'm wondering if I can trust the results I'm getting by applying a new trick I've figured out with a minidump file.

If I can... this is a problem I cannot begin to hope to be able to self-determine a solution for and will need some advice. I believe we've seen this sort of thing before but I can't fathom how it happens nor how to determine that nor how to fix it.

When I pull up these mini's I've been getting (which don't repeat for me but then maybe that's BECAUSE I'M RUNNING THE DEBUGGER TO CHECK THEM) I get the following in the call stack:
Code:
	73616220()	
>	CvGameCoreDLL.dll!CvUnitAI::AI_update()  Line 430	C++
 	CvGameCoreDLL.dll!CvSelectionGroupAI::AI_update()  Line 293 + 0x7 bytes	C++
 	CvGameCoreDLL.dll!CvPlayerAI::AI_unitUpdate()  Line 1909 + 0xb bytes	C++
My presumption is that the final reference in the stack there is in the EXE?

This only shows when I'm running the minidump in a folder with both the dll and pdb documents in the same folder as the minidump.

When I run it from the Final Release or Debug folder that includes all the codes - not JUST the pdb and dll, I get a very different call stack that runs through a LOT of python calls then ends the same way but never gets back far enough to report anything taking place in the dll.

But here's the trick I discovered that I finally tried tonight: When I double click on the last dll call there in the stack, it directs me to search through my computer for the location of the source files. Guiding it to that source folder it immediately opens up that file and shows where it last visited the function:
Code:
#ifdef _DEBUG
	if ( NULL != getGroup() && !isDelayedDeath() )
	{
		getGroup()->validateLocations();
	}
#endif

	return (!isDelayedDeath() && AI_isAwaitingContract());
}
Pointing to the last return line there in the AI_Update() function.

Now HERE'S the VERY WEIRD THING: When hovering over isDelayedDeath() it shows a hover that states:
0x05363be0 CvUnitInfo::isNoRevealMap(void)

Now... if I'm not mistaken, this is telling me that whatever isDelayedDeath() is returning is 'whatever the value of isNoRevealMap(void) happens to be'.

How on Earth does this happen? And of course... how is it repaired?

I also noticed what you are saying about the minidump.

I think this could be a Compiler Bug and if not :confused:!! This is from the diassembly
Code:
		// if no longer automated, then we want to bail
		return (!isDelayedDeath() && !getGroup()->isAutomated());
05732400  mov         ecx,esi  
[COLOR="Red"]05732402  call        CvUnitInfo::isNoRevealMap (05733BE0h) [/COLOR] 
05732407  test        al,al  
05732409  jne         $L228152+245h (0573241Dh)  
0573240B  mov         ecx,esi  
0573240D  call        CvUnit::getGroup (05697670h)  
05732412  mov         ecx,eax  
05732414  call        CvSelectionGroup::isAutomated (056613E0h)  
05732419  test        al,al  
0573241B  je          $L228152+29Ah (05732472h)  
0573241D  pop         edi  
0573241E  pop         ebp  
0573241F  pop         esi

Code:
	return (!isDelayedDeath() && AI_isAwaitingContract());
0573245C  mov         ecx,esi  
0573245E  call        CvUnitInfo::isNoRevealMap (05733BE0h)

It is calling CvUnitInfo::isNoRevealMap and not CvUnit::isDelayedDeath():confused:
 
I also noticed what you are saying about the minidump.

I think this could be a Compiler Bug and if not :confused:!! This is from the diassembly
Code:
		// if no longer automated, then we want to bail
		return (!isDelayedDeath() && !getGroup()->isAutomated());
05732400  mov         ecx,esi  
[COLOR="Red"]05732402  call        CvUnitInfo::isNoRevealMap (05733BE0h) [/COLOR] 
05732407  test        al,al  
05732409  jne         $L228152+245h (0573241Dh)  
0573240B  mov         ecx,esi  
0573240D  call        CvUnit::getGroup (05697670h)  
05732412  mov         ecx,eax  
05732414  call        CvSelectionGroup::isAutomated (056613E0h)  
05732419  test        al,al  
0573241B  je          $L228152+29Ah (05732472h)  
0573241D  pop         edi  
0573241E  pop         ebp  
0573241F  pop         esi

Code:
	return (!isDelayedDeath() && AI_isAwaitingContract());
0573245C  mov         ecx,esi  
0573245E  call        CvUnitInfo::isNoRevealMap (05733BE0h)

It is calling CvUnitInfo::isNoRevealMap and not CvUnit::isDelayedDeath():confused:
hmm... compiler error does sound like it might explain the situation. I'm glad you could find further cause to confirm what I'd discovered there. Could this be a matter of my using 7.0 and you 7.1? What other causes could produce such a compiler error do you think?

And of course this is only one theory of who knows what else could explain it.
 
hmm... compiler error does sound like it might explain the situation. I'm glad you could find further cause to confirm what I'd discovered there. Could this be a matter of my using 7.0 and you 7.1? What other causes could produce such a compiler error do you think?

And of course this is only one theory of who knows what else could explain it.

I look into this because it is very strange.

Edit:

With a recompiled dll the game still crashes but the diassembly looks fine.
This means the error has to be somewhere else.
 
I strongly suggest that from now on we always perform a Full Rebuild when we build the Dll in the Release or Final_Release configurations. Just using Build can lead to such problems because of the Compiler Optimizations.
 
Can you post the changes that you made to Boost Python? I want to do some testing with newer versions.
I got around to doing some more tests now and to reduce the complexity I tried the intermediate step of a newer Boost library with the 2003 toolset.
Boost 1.55.0 caused some building issues so I went with the older 1.44.0 you mentioned.

The main important change is to remove the new and delete overrides (linking statically to the newer Boost library causes some allocations before the gDLL pointer is set properly) and then adding three lines to CvDLLPython.cpp.
If you are interested I guess we could add a branch for this to the SVN and then I could check that stuff in there.

While the game loads like that, there are several Python issues that need to be addressed. As the Boost Python stuff is split with those changes between EXE and DLL, there are some issues with Boost Python not knowing about a type that the other part has exposed.
The result is that some calls from Python to C++ fail and vice versa.
Looks fixable though.
 
I look into this because it is very strange.

Edit:

With a recompiled dll the game still crashes but the diassembly looks fine.
This means the error has to be somewhere else.

I haven't looked deeply enough to say but if that's the case, could it be possible that the debug dll taking this action prevents the crash that the final release isn't avoiding somehow?
Code:
#ifdef _DEBUG
	if ( NULL != getGroup() && !isDelayedDeath() )
	{
		getGroup()->validateLocations();
	}
#endif
If this is making the difference between crash and no crash then might we want to remove the #ifdef _DEBUG condition entirely?
 
I haven't looked deeply enough to say but if that's the case, could it be possible that the debug dll taking this action prevents the crash that the final release isn't avoiding somehow?
Code:
#ifdef _DEBUG
	if ( NULL != getGroup() && !isDelayedDeath() )
	{
		getGroup()->validateLocations();
	}
#endif
If this is making the difference between crash and no crash then might we want to remove the #ifdef _DEBUG condition entirely?

If it makes the difference then yes but you have to test it both ways.
Run the debug without this code and then with it and check if there is a difference.
Then do the same with the release.

If this code makes the difference the debug should crash without it and the release not crash with it!

And keep this in mind
I strongly suggest that from now on we always perform a Full Rebuild when we build the Dll in the Release or Final_Release configurations. Just using Build can lead to such problems because of the Compiler Optimizations.
 
Yeah on that last point I always do anyhow. (Well... sometimes a quick adjustment after the main rebuild I won't but I'll always make sure to now.)
 
I got around to doing some more tests now and to reduce the complexity I tried the intermediate step of a newer Boost library with the 2003 toolset.
Boost 1.55.0 caused some building issues so I went with the older 1.44.0 you mentioned.

The main important change is to remove the new and delete overrides (linking statically to the newer Boost library causes some allocations before the gDLL pointer is set properly) and then adding three lines to CvDLLPython.cpp.
If you are interested I guess we could add a branch for this to the SVN and then I could check that stuff in there.

While the game loads like that, there are several Python issues that need to be addressed. As the Boost Python stuff is split with those changes between EXE and DLL, there are some issues with Boost Python not knowing about a type that the other part has exposed.
The result is that some calls from Python to C++ fail and vice versa.
Looks fixable though.

Sorry for the late answer.
I'am interested in this but first i want to finish the things i work on right now.
 
I'll be interested to see what you came up with and I'd love it if you'd share the process by which you came up with that as a possible solution (on the dll discussion thread of course.) I'm really struggling with these so any advice is helpful.

We have two different CTD's reported. Today i had 2 AIAutoPlay's running and played one game myself in sum about 4000 turns but i never saw one of them. The only thing i found was that sometimes NO_BUILDING is built i found three places in the code there this seems possible.

The one CTD reported by Joe is in CvUnit::killUnconditional but it never happens to me not even with the saves he posted. This makes it very difficult to say if possible changes fix it or not or....

The other one reported by Strategyonly is very :confused: i never had something like this happening.
Code:
		// if no longer automated, then we want to bail
		return (!isDelayedDeath() && !getGroup()->isAutomated());
05732400  mov         ecx,esi  
[COLOR="Red"]05732402  call        CvUnitInfo::isNoRevealMap (05733BE0h) [/COLOR] 
05732407  test        al,al  
05732409  jne         $L228152+245h (0573241Dh)  
0573240B  mov         ecx,esi  
0573240D  call        CvUnit::getGroup (05697670h)  
05732412  mov         ecx,eax  
05732414  call        CvSelectionGroup::isAutomated (056613E0h)  
05732419  test        al,al  
0573241B  je          $L228152+29Ah (05732472h)  
0573241D  pop         edi  
0573241E  pop         ebp  
0573241F  pop         esi

Very strange i would like to help more but i'am very very burned out this week sorry.
 
Back
Top Bottom