So I've introduced a deadlock between the Lua engine and the game core

S3rgeus

Emperor
Joined
Apr 10, 2011
Messages
1,270
Location
London, United Kingdom
As the title says, I've discovered that I've introduced a deadlock between the game core and the lua engine. My mod includes lots of fun changes that could have done this, but I'm particularly suspicious of my new Lua events.

First, a little bit of context. This issue manifests as a hang at the start of the game, when you press the "Begin your journey" button, the game locks up. Audio keeps playing, but the main game stuff never gets anywhere.

Now, debugging where all of the currently running threads are sitting, we see mostly "No source found" for 95% of the threads (and everything in their callstacks). But, there are two interesting and very suspicious threads. The main thread is here in CvDllGame.cpp:

Code:
CvDllGame::CvDllGame(CvGame* pGame)
	: m_pGame(pGame)
	, m_uiRefCount(1)
{
	if(gDLL)
		gDLL->GetGameCoreLock();
}

It's apparently stopped on that last line (the closing brace). We also see a worker thread, here in CvLuaSupport.cpp:

Code:
//------------------------------------------------------------------------------
bool LuaSupport::CallHook(ICvEngineScriptSystem1* pkScriptSystem, const char* szName, ICvEngineScriptSystemArgs1* args, bool& value)
{
	// Must release our lock so that if the main thread has the Lua lock and is waiting for the Game Core lock, we don't freeze
	bool bHadLock = gDLL->HasGameCoreLock();
	if(bHadLock)
		gDLL->ReleaseGameCoreLock();
	bool bResult = pkScriptSystem->CallHook(szName, args, value);
	if(bHadLock)
		gDLL->GetGameCoreLock();
	return bResult;
}

It's on the bool bResult = ... line. Now, as I'm sure people will note, there's a helpful comment that pertains to exactly what's gone wrong here. Farther up the callstack from the CallHook function is just a single function that isn't resolved (it's in some other DLL which I don't have source/pdb for, which I imagine is the UI DLL, since what else calls Lua hooks?) and then the callstack ends.

The value of szName is apparently <Bad Ptr> and bHadLock is false. I'm thinking the main thread acquired the core lock between when the worker thread queried whether it had it and when it tried to get the lua lock. And then everything was sad.

Now, that looks like a timing issue to me. The question is how I've introduced it and how I can stop it from happening again. (I really hope I haven't just increased load times by turning off optimization and that's caused this, because that would be a pain.)

Does anyone know if calling Lua hooks too early in the loading process might cause this? Or if anyone's seen anything similar?
 
Back
Top Bottom