DLL development discussions

Koshling

Vorlon
Joined
Apr 11, 2011
Messages
9,254
New thread for DLL coding discussions so we don't pollute other threads with long exchanges. Suggested by n47.
 
OK, so let's bore our selves a little bit with C++.

IDInfo
Why do I think tweaking this may help? Well, if CvUnit::getCommander() is called millions times, then at least those millions of times is called CvUnit::getUsedCommander(), bla bla bla, then millions of times is called FFreeListTrashArray::getAt().
Code:
T* FFreeListTrashArray<T>::getAt(int iID) const
{
	int iIndex;
	T*	result = NULL;

	if ((iID == FFreeList::INVALID_INDEX) || (m_pArray == NULL))
	{
		return NULL;
	}
	
	EnterCriticalSection(&m_cModifySection);

	iIndex = (iID & FLTA_INDEX_MASK);

	assert(iIndex >= 0);

	if ((iIndex <= m_iLastIndex) && 
		(m_pArray[iIndex].pData != NULL))
	{
		if (((iID & FLTA_ID_MASK) == 0) || (m_pArray[iIndex].pData->getID() == iID))
		{
			result = m_pArray[iIndex].pData;
		}
	}
	
	LeaveCriticalSection(&m_cModifySection);

	return result;
}
Quite a lot of work, just to get an object. And I believe there are many many more accesses to objects by ids. That is why, I think there is at least a chance, that tweaking this may seriously increase performance. But if you have reliable data saying, the total time of getAt() calls is small, then I will not go into it. It, indeed, is quite a lot of work.

4) Pointers are not reliable across turns (or even within turns across processing of different units sometimes) - if you went to something pointer-based, you'd have to reference count or (more probably) indirect via some sort of handle that can be reliably cleaned up on unit/city destruction
Do you mean the unreliability coming from possible destruction of objects or something more? The first I'm aware of.

1) They ARE used in the API to the game engine (e.g. - see CvDLLInterfaceIFaceBase.h), so they would need some sort of an adapter wherever those interfaces are used. It is **possible** that the game engine only uses them opaquely and just hands them back (so no adapter would be needed), but I have no evidence to support that, and frankly I doubt it.

2) They ARE passed to Python in a few APIs (e.g. - CyGetCity() in CyGameCoreUtils.cpp)
According to this, probably we shouldn't touch IDInfo class and at least the method prototypes exposed directly to the engine.
http://forums.civfanatics.com/showpost.php?p=12739262&postcount=2
However, IDInfo::iID has 4 bytes and pointers also have 4 bytes. :D
(I'm more and more close to say, we do not need FFreeListTrashArray to store cv-objects. :) )

3) In order to retain save game compatibility you'll have to dehydrate/re-hydrate whatever structure you change it to a compatible format in the read/write routines used on persisted objects
Where is the code?
 
OK, so let's bore our selves a little bit with C++.

IDInfo
Why do I think tweaking this may help? Well, if CvUnit::getCommander() is called millions times, then at least those millions of times is called CvUnit::getUsedCommander(), bla bla bla, then millions of times is called FFreeListTrashArray::getAt().
Code:
T* FFreeListTrashArray<T>::getAt(int iID) const
{
	int iIndex;
	T*	result = NULL;

	if ((iID == FFreeList::INVALID_INDEX) || (m_pArray == NULL))
	{
		return NULL;
	}
	
	EnterCriticalSection(&m_cModifySection);

	iIndex = (iID & FLTA_INDEX_MASK);

	assert(iIndex >= 0);

	if ((iIndex <= m_iLastIndex) && 
		(m_pArray[iIndex].pData != NULL))
	{
		if (((iID & FLTA_ID_MASK) == 0) || (m_pArray[iIndex].pData->getID() == iID))
		{
			result = m_pArray[iIndex].pData;
		}
	}
	
	LeaveCriticalSection(&m_cModifySection);

	return result;
}
Quite a lot of work, just to get an object. And I believe there are many many more accesses to objects by ids. That is why, I think there is at least a chance, that tweaking this may seriously increase performance. But if you have reliable data saying, the total time of getAt() calls is small, then I will not go into it. It, indeed, is quite a lot of work.
99.9% of the time (ish) units do not have commanders in range, whuich is why CvUnit::getCommander() begins with:
Code:
	//	This routine gets called a HUGE number of times per turn (100s of millions in large games!)
	//	so short-circuit the most common case of the unit having no commander when we can
	//	Similarly protect against calls during initialisation of a unit (before it has a plot set)
	if ( m_iCachedCommander == NO_COMMANDER_ID || plot() == NULL )
	{
		return NULL;
	}
Access by id only occurs when the unit has already found a commander on a previous call and it (the id) is cached.

I don't have solid evidence that getAt() consumes a small amount of time, but I do have indirect evidence. Basically it's a bit too low a level of granularity to profile well with our current profiling technology (the overhead of the profiling is comparable with the routine itself, which makes the signal-to-noise ratio a bit awkward when trying to profile very small routines). However, based on profiling of the routines a level up from here (all the CvUnitAI, CvUnit, etc. routines) times are dominated by things outside of anything that calls getAt() for the most part (pathing, building/promotion/unit value calculations etc.). Based on this I think the scope for improvement from changes to getAt() are limitted (max of perhaps 5% of total time, but probably more like 1%). Of course, this assessment **may** be incorrect due to indirect effects, such as high non-locality of memory access causing a lo of cache eviction from the CPU cache. Bottom line is that I cannot be totally certain, but suspect the gains to be had are not that great.
Do you mean the unreliability coming from possible destruction of objects or something more? The first I'm aware of.
Yes. Many things hold references to entities via IDInfos (e.g. - to units), and nothing cleans these up on destruction of the target unit (or whatever) concerned, instead relying on getAt() to resolve to NULL when it is next referenced. A level of indirection would solve that of course (i.e. - replace IDInfo's with handles in effect)
According to this, probably we shouldn't touch IDInfo class and at least the method prototypes exposed directly to the engine.
http://forums.civfanatics.com/showpost.php?p=12739262&postcount=2
However, IDInfo::iID has 4 bytes and pointers also have 4 bytes. :D
(I'm more and more close to say, we do not need FFreeListTrashArray to store cv-objects. :) )


Where is the code?
Where is which code? I'm not sure which you're referring to here.
 
Code:
	//	This routine gets called a HUGE number of times per turn (100s of millions in large games!)
	//	so short-circuit the most common case of the unit having no commander when we can
	//	Similarly protect against calls during initialisation of a unit (before it has a plot set)
	if ( m_iCachedCommander == NO_COMMANDER_ID || plot() == NULL )
	{
		return NULL;
	}
Access by id only occurs when the unit has already found a commander on a previous call and it (the id) is cached.
Well, by this code and that I have missed the another assignment of it, I thought that NO_COMMANDER_ID is set when the GAMEOPTION_GREAT_COMMANDERS option is off. ;)
Code:
void CvUnit::clearCommanderCache(void) 
{
	if (!GC.getGameINLINE().isOption(GAMEOPTION_GREAT_COMMANDERS))
	{
		m_iCachedCommander = NO_COMMANDER_ID;
	}
	else
	{
		m_iCachedCommander = -1;
	}
}
But still have a feeling, there should be terribly lot of getAt() calls. Yup, this function isn't heavy, but each operation which interacts with any object other then its parent, will probably call it. ... Aren't there terribly lot of such operations?

Hmm... I guess, I have no other options, then put counters into it. For now I will use OutputDebugString, but it would be nice, if you point how to use logs and some in-game notification.

Where is which code? I'm not sure which you're referring to here.
The code which runs the pink elephant safari, what else could I mean? :sniper:
Ok, ok, I mean the code responsible for the read/write routines for persistents. And would probably be nice to see the code which explains, how the persistence is used here. I know the mechanism only from the old legends ... well, not exactly, from my toy applications in Java too. :) Still, would rather like to see, how it is done here, then fly in imaginations.
 
Well, by this code and that I have missed the another assignment of it, I thought that NO_COMMANDER_ID is set when the GAMEOPTION_GREAT_COMMANDERS option is off. ;)
Code:
void CvUnit::clearCommanderCache(void) 
{
	if (!GC.getGameINLINE().isOption(GAMEOPTION_GREAT_COMMANDERS))
	{
		m_iCachedCommander = NO_COMMANDER_ID;
	}
	else
	{
		m_iCachedCommander = -1;
	}
}
But still have a feeling, there should be terribly lot of getAt() calls. Yup, this function isn't heavy, but each operation which interacts with any object other then its parent, will probably call it. ... Aren't there terribly lot of such operations?
It's also set to NO_COMMANDER_ID at the end of the first call that is made (for that unit in that turn) to getCommander():
Code:
		m_iCachedCommander = (pBestCommander == NULL ? NO_COMMANDER_ID : pBestCommander->getID());
Hmm... I guess, I have no other options, then put counters into it. For now I will use OutputDebugString, but it would be nice, if you point how to use logs and some in-game notification.
OutputDebugString is #defined to nothing in the 'FinalRelease' build. It works normally in either Release or Debug builds. If you want to get detailed profiles (counts, times, times with descendants in call tree, etc.) use the Release build and do the following to get a profile:
  1. Enable logging if you haven't already in your INI file (I'm sure you already have logging enabled though)
  2. Enable 'chipotle' mode (cheatcode=chipotle in the INI file). This enables lots of extra debug functionalty
  3. Load the game you want to profile a turn of
  4. hit ctrl-alt-D - in the resulting screen switch to thwe 'Profile' tab and tick 'Show DLL Profile' and 'Profile Logging'
  5. Hit end-turn
  6. When the turn completes exit Civ
  7. In your Logs folder will be a file called IFP_Log.txt - load into some text editor:
  8. The file contains a sequence of tab-separated tables, one for each time slice, with the end-turn time-slices all amalgamated into one (which is the only one you are really interested in). Each sectio begins with the table headers row (search for '(mS)' is a good way to find these. There will be a bunch of tiny sections like this which represent the time slices between when you enabled profiling and hit end-turn. Delete these so that the header row for the large section (which will be the end-turn processing) is the top line. From there find the NEXT header line (i.e. - the bottom of the large section containing the end turn table) and delete it and all subsequent lines. Save the file
  9. Open the file in a spreadsheet - it should be a single table with columns of method name, time, time with children, number of calls, etc. Usually sort on time, though it depends exactly waht you're looking for
This sound complicated but you get a LOT of useful information from it - try it - it's well worth it.
The methods in the table are those that include the PROFILE_FUNC macro. Other named sections represent instances of PROFILE(<name>) in the code (which must be unique in a scope)
The code which runs the pink elephant safari, what else could I mean? :sniper:
Ok, ok, I mean the code responsible for the read/write routines for persistents. And would probably be nice to see the code which explains, how the persistence is used here. I know the mechanism only from the old legends ... well, not exactly, from my toy applications in Java too. :) Still, would rather like to see, how it is done here, then fly in imaginations.
Persistence is handled by the read() and write() calls in the classes that need persistence (CvUnit, CvCity, CvSelectionGroup, CvGame, CvPlot, etc.). Mostly they use a tagged save format mediated by CvTaggedSaveFormatWrapper, though a few simple ones just write directly to the stream (if those are ever changed we'll have save version incompatability, since the backward compatibility relies on teh tagging, which is lacking in the simple structures that just dump their binary image to the stream)
 
A btw question, does this fastdep work fine with the project? I know the compilation is great time for doing Sudoku ... but there is the limit of how many Sudokus can be solved a day. :twitch:
 
A btw question, does this fastdep work fine with the project? I know the compilation is great time for doing Sudoku ... but there is the limit of how many Sudokus can be solved a day. :twitch:

Works ok for me, yes.
 
I'm disappointed. Averagely -- 1.5M of getAt calls and 37-70M CPU instructions per turn for a large PerfecrMongoose map, 12 civilizations and Medieval age. Maybe for larger maps and longer gameplay, or if there would be many calls from different threads at once, tweaking it would be noticeable -- like half a second -- but I'm afraid, I'm not crazy enough to redo whole dll for this. Still, if we care so much about the plot size, I wouldn't use IDInfo-s to store references in them. 5 bytes should be enough -- 1 for player, 4 for index in the array (3 would do, but 1 byte is used by FFreeListTrashArray for some magical id spaces, which looks like being always 1). Or am I wrong and there can be more then 254 players + 1 barbarians?

Btw, I remind there are 68 bytes per plot for landmarks. Which for me looks like hell, if we care even about 4.
 
Or am I wrong and there can be more then 254 players + 1 barbarians?

I'm not a coder just long time player/tester, But...
We've been told for years that the Civ IV BtS engine can only handle 48 AI + 1 barbarian + player = 50 total. In fact a specially DLL was made to enable the 50 total back around the time of the Official 3.13 patch release (may have been the 3.17 release). That was when the Mega Leader and Civ packs began to proliferate here on the forum. So you think the game can handle 254?

JosEPh
 
I'm disappointed. Averagely -- 1.5M of getAt calls and 37-70M CPU instructions per turn for a large PerfecrMongoose map, 12 civilizations and Medieval age. Maybe for larger maps and longer gameplay, or if there would be many calls from different threads at once, tweaking it would be noticeable -- like half a second -- but I'm afraid, I'm not crazy enough to redo whole dll for this. Still, if we care so much about the plot size, I wouldn't use IDInfo-s to store references in them. 5 bytes should be enough -- 1 for player, 4 for index in the array (3 would do, but 1 byte is used by FFreeListTrashArray for some magical id spaces, which looks like being always 1). Or am I wrong and there can be more then 254 players + 1 barbarians?

Btw, I remind there are 68 bytes per plot for landmarks. Which for me looks like hell, if we care even about 4.

Yeh, I plan to look at the landmarks when I get time. 4 bytes as a one-off isn't a big deal, but the problem is a tendency to keep adding 4 bytes here, 4 bytes there, and over time it builds up. For that reason I try to be skeptical about new additions to CvPlot or CvUnit if I can see another way to do it (or the gain in functionality/performance is likely to be small). Of course, there are always exceptions.

If you'd like to have a go at the 68 byte landmarks issue (or at memory usage more generally) please be my guest. I was planning to spend a few weeks on memory optimization, including looking into the possibility of some rather radical options (paging stuff out via AWE or a cooperating 64-bit process linked via a named pipe or a small shared-memory window or something), but that won't be until about October I don't think.

The really big (and not obviously tractable) user of memory is the graphical entities, which the main game engine does not handle very efficiently (e.g. - identical plots are fully instantiated as independent graphical entities without any reuse). There isn't a lot we can do (directly) about this since the game engine is a closed black box, but there are approaches to 'fooling it' so that it thinks it only has to manage (and render) a subset. Viewports are one such approach, but there are other possibilities, such as virtualizing graphical entities to the null entity (or in some cases where there has to be SOMETHING a minimal null graphic), when they are off screen (part of the problem is that we don;t get events for things like scrolling, so dynamic determination of what is off screen is tricky).
 
So you think the game can handle 254?
Maybe. But I'm not quite sure, would we find a player who can be happy waiting an hour or a day ... or maybe more :) per turn.

@Koshling, I believe, can take care about the landmarks. At last, I would stretch my fingers. :bump: Put the landmarks into CvMap or have a better place? I would introduce SmallIDInfo class for holding references in the dll too, if 3 bytes per reference would do a change.

The really big (and not obviously tractable) user of memory is the graphical entities, which the main game engine does not handle very efficiently (e.g. - identical plots are fully instantiated as independent graphical entities without any reuse). There isn't a lot we can do (directly) about this since the game engine is a closed black box, but there are approaches to 'fooling it' so that it thinks it only has to manage (and render) a subset. Viewports are one such approach, but there are other possibilities, such as virtualizing graphical entities to the null entity (or in some cases where there has to be SOMETHING a minimal null graphic), when they are off screen (part of the problem is that we don;t get events for things like scrolling, so dynamic determination of what is off screen is tricky).
Don't say me, the engine originally store a whole map at once. :faint: Ok, I'm a hard girl. I will withstand everything ... if it is not a spider!! :run: Ok, but explain me, how the viewport works as in my understanding, if it works, it should already know were is the player watching.
 
Maybe. But I'm not quite sure, would we find a player who can be happy waiting an hour or a day ... or maybe more :) per turn.

@Koshling, I believe, can take care about the landmarks. At last, I would stretch my fingers. :bump: Put the landmarks into CvMap or have a better place? I would introduce SmallIDInfo class for holding references in the dll too, if 3 bytes per reference would do a change.
Doesn't really matter where you put them provided the APIs to access them stay the same (i.e. - methods on CvPlot, persisted through CvPlot:read/write). I'd be inclined to add a static member to CvPlot to hold a singleton dictionary of them or something, but it's up to you. In order to be backward compatible with saves though, you need to handle their persistent via CvPlot::read and write, so it makes sense to keep them encapsulated within that class.
I don't think the small Id Info works (in 3 bytes anyway) because even if the player element can be shrunk to 1 byte, the id element cannot be shrunk to 2 (it's 4). You cannot just use the index into the array within the freelist because you have to guarantee not to reuse ids when the array slot gets reused (otherwise a reference to a deleted unit can incorrectly resolve to a new one that happens to be in the same array slot)
Don't say me, the engine originally store a whole map at once. :faint: Ok, I'm a hard girl. I will withstand everything ... if it is not a spider!! :run: Ok, but explain me, how the viewport works as in my understanding, if it works, it should already know were is the player watching.
Viewports work by telling the game engine that a small subset of the map IS the entire map, so it CONSTRAINS what the user CAN look at (without using meta-UI to move the viewport). Without viewports nothing tells us what region of the map is visible at the user's current zoom and centering, or triggers any reporcessing as the user scrolls round.
 
I don't think the small Id Info works (in 3 bytes anyway) because even if the player element can be shrunk to 1 byte, the id element cannot be shrunk to 2 (it's 4).
Eh, no one understands me. I'm going to cut myself. :cry:
I meant we can save 3 bytes per id. The question is, how many ids are stored in the dll. There are 3 per plot and 23 or something like this per unit, more I didn't count. Is it enough to fight for?

Viewports work by telling the game engine that a small subset of the map IS the entire map, so it CONSTRAINS what the user CAN look at (without using meta-UI to move the viewport). Without viewports nothing tells us what region of the map is visible at the user's current zoom and centering, or triggers any reporcessing as the user scrolls round.
So you was saying to nullify even those graphics which are inside the viewport? -- In the other case, I do not know what would be the advantage of this virtualization, when we have already the viewport.
 
Eh, no one understands me. I'm going to cut myself. :cry:
I meant we can save 3 bytes per id. The question is, how many ids are stored in the dll. There are 3 per plot and 23 or something like this per unit, more I didn't count. Is it enough to fight for?
Marginal I'd say, but if it looks easy and there are not too many places you'd have to add translations to pass full IdInfos across interfaces to Python or the game engine might be worth a go. I'd start with the landmarks stuff though - that looks like a definite win.

So you was saying to nullify even those graphics which are inside the viewport? -- In the other case, I do not know what would be the advantage of this virtualization, when we have already the viewport.
Viewports are a compromise. They are awkward to play with and get a bit annoying when you have units all over the map. A solution that removed the need to use viewports, or delayed it/allowed larger viewports would still be highly beneficial.
 
interesting topic and surprisingly i can follow it easily :)

but the game just has a Seperate graphical "unit" (sorry i am not very known to all the technical terms) for everything even if there are 2 or more of it?? (i think i will need some simpler explanations on that)
so it goes through the whole process of creating that "unit" not once but every time it needs it? (i need some coffee :p)
 
but the game just has a Seperate graphical "unit" (sorry i am not very known to all the technical terms) for everything even if there are 2 or more of it?? (i think i will need some simpler explanations on that)
It seams that way. I'm afraid the programers were totally bad paid or were at the level of prehistoric era at game programming.

so it goes through the whole process of creating that "unit" not once but every time it needs it? (i need some coffee :p)
Actually I too, but already had one this day. :( Nevertheless, no, they are probably created once when they first time appear in the game. The problem is, there is a copy of data describing the graphics per each "unit", even if units look the same.

200 archers in game -- 200 data packages describing their graphics. Even when they are off the screen. :rolleyes:

edit
@Koshling, oh, is it possible to add a new field to the persistence format, but not in the end of the object and save the compatibility like this?
Code:
int idHint, idSeq = -1;
if (wrapper.Expect("CvMap::LandmarksCount", isHint, idSeq, SAVE_VALUE_TYPE_INT))
    stream->Read(&landmarks_count)
else
    do some magic
Btw, what are the mysterious idHint, idSeq?

Maybe a good idea would be adding something like tryRead to the wrapper?
 
edit
@Koshling, oh, is it possible to add a new field to the persistence format, but not in the end of the object and save the compatibility like this?
Code:
int idHint, idSeq = -1;
if (wrapper.Expect("CvMap::LandmarksCount", isHint, idSeq, SAVE_VALUE_TYPE_INT))
    stream->Read(&landmarks_count)
else
    do some magic
Btw, what are the mysterious idHint, idSeq?

Maybe a good idea would be adding something like tryRead to the wrapper?

Use the macros, they hide most of the complexity.

Code:
WRAPPER_READ(wrapper, "CvMap", &landmarks_count);

on the read side; and

Code:
WRAPPER_WRITE(wrapper, "CvMap", landmarks_count);

on the write side, is all you need. The effect will be add the data on write (generating a tag name from the provided class name and the name of the variable). On the read side it will read the tagged value if present, else skip. For the most part, provided you initialize members to default values in the constructor (0 I presume in this case?) the skip-if-not-present (ie - old format save) will leave the correct result. If you need to take a specific action on old-format, then the idiom used is this:
Code:
landmarks_count = -1; // some invalid value that cannot legitimately be in a save
WRAPPER_READ(wrapper, "CvMap", &landmarks_count);
if ( landmarks_count == -1 )
{
    // Take whatever action you need to in order to set this up for an old format save
}
 
@Koshling, tricky, tricky. I thought, Expect consumes a tag and if it fails the tag is lost. OK. It is reasonable, to no go backward on fail and cache what is read. But wouldn't be nicer to add a bool return value to Read functions, then those tricks with an invalid value, to check was reading successful?
 
so even without coffee i managed to nail my conclusion :) i hope we could one day fix that Prehistoric method and make those 200 archers just one data package probably that would enhance things significantly probably
 
@Koshling, tricky, tricky. I thought, Expect consumes a tag and if it fails the tag is lost. OK. It is reasonable, to no go backward on fail and cache what is read. But wouldn't be nicer to add a bool return value to Read functions, then those tricks with an invalid value, to check was reading successful?

Could, but it was motivated by having to convert several thousand existing calls when I first added tagged saving to the original BTS code, so the macros are a fairly direct replacement for the (void) calls. However, you're quite right that the method (and indeed the macro) could return a bool, which would typically be ignored, but which could be used more cleanly in the cases where you need to know. Please feel free...
 
Top Bottom