Optimizing CvPlot data

Leoreth

Blue Period
Moderator
Joined
Aug 23, 2009
Messages
37,060
Location
東京藝術大学
It seems to me that Firaxis was very conservative about member variables of CvPlot, using only a minimal selection of values and using short wherever possible. That makes sense considering how many plots in particular large maps can have. In my mod, the map I am about to use has 15000 plots, so anything that is being added gets multiplied by that number and can have a big impact.

Especially in my mod, I have to keep track of a "settler value" and "war value" (controlling AI city site and war target selection) for each plot for each civilization. It seems that it's worth thinking about optimizing this, considering how memory is at a premium for Civ4.

But how? Does it actually help to go down from int fields to shorts (I think with a better selection of the range of possible values I can even bring them down to byte size)? A cursory read of some discussion on SOF suggests that in practice a smaller type does not actually use less memory because it still occupies an integer sized register, which introduces processing overhead rather than memory saving, and the overall recommendation seems to be not to bother with sub-int types.

The exception seems to be arrays. Some of these plot fields are already arrays (like settler/war value as per-civilization arrays), but others aren't. Does that mean using smaller types is actually worth it? Is it worth it to go one step further and instead of storing these as members of the CvPlot class, to create one big short/byte array in CvMap that is index by plot ID?

Another aspect is that roughly half of the map is water where these values do not matter but memory still has to be reserved for them. Is it worth optimizing for that, and how could I do that?
 
If you want to save memory, one way is to start using bit fields. For optimization purposes one variable can only be in a single 32 bit section so multiple 10 bit variables in a row will only use 30 bits and then place the next in the next 32 bit section. Also you can't mix variables of different sizes into the same 32 bit.

PHP:
class A
{
    TerrainTypes aa : 16;
    short b;
};

class B
{
    TerrainTypes aa : 16;
    int b : 16;
}
Here sizeof(A) = 8 (sizeof reports in bytes at compile time) while sizeof(B) = 4. The reason is that TerrainTypes is 4 bytes while short is 2 so they don't mix. TerrainTypes and int are both 4 bytes so they can be set to fill 2 bytes each of the same 4 byte block.

PHP:
const int SIZE_OF_PLOT = sizeof(CvPlot);
This is a good one to use when trying to shrink the size of a class instance. It will calculate the size at compile time and (at least in MSVC 2017) the compiler will report the size by just hovering the mouse over the variable name.

Does it actually help to go down from int fields to shorts (I think with a better selection of the range of possible values I can even bring them down to byte size)? A cursory read of some discussion on SOF suggests that in practice a smaller type does not actually use less memory because it still occupies an integer sized register, which introduces processing overhead rather than memory saving, and the overall recommendation seems to be not to bother with sub-int types.
Yes they will align as ints so if you have two ints and a short, you use the memory as 3 ints. However if you have 3 shorts, you will place two shorts in the place of one int and then the next will get the memory of an int so you went down from 3 to 2 ints in memory. This means you have to consider the order and how you declare the variables. On a side note you should consider the order anyway because they are stored in the order they are declared. This means if you have two variables, which are always used together as a pair (like x,y coordinates) declare them next to each other as odds are that if one goes into the CPU cache, the other one does too. A cache miss takes like 1000 CPU cycles (heavily CPU and RAM dependant, but 1k is realistic). This means fitting more data in the cache can easily improve performance more than it hurts with the extra overhead. On the other hand writing a 32 bit variable is just a write and then the CPU doesn't care about it anymore after it is sent to the RAM. If it is 32 bit containing two shorts, then it has to read them from memory, replace one when writing and then write because writing is always done in blocks of 32 bit (or 64 for 64 bit applications, but that doesn't affect us). This is why bools by default are 32 bit as it saves reading from memory whenever you write to a bool. In other words performance impact improves when reading data, but it can slow down writing when packing the variables.

The exception seems to be arrays. Some of these plot fields are already arrays (like settler/war value as per-civilization arrays), but others aren't. Does that mean using smaller types is actually worth it? Is it worth it to go one step further and instead of storing these as members of the CvPlot class, to create one big short/byte array in CvMap that is index by plot ID?
What I have done is We The People is creating a class, which stores an array of ints, each used to store 32 bools. This is then used to store arrays like if the plot is explored by team X. With 64 players it will use 8 bytes while vanilla bool arrays will use 260 bytes (including the 4 byte pointer). Additionally the stored data is then static rather than based on a pointer meaning rather than reading a pointer from CvPlot and then looking up the address in memory, it just reads from CvPlot. Skipping looking up the pointer removes a likely cache miss, which in turn boosts performance. Sadly this class is near impossible to move to another mod and it will be easier to rewrite it from scratch. It's not like it's a complex class.

Another aspect is that roughly half of the map is water where these values do not matter but memory still has to be reserved for them. Is it worth optimizing for that, and how could I do that?
You use the concept of unions. Simply put, they are like classes, but rather than having all the variables in a row in memory, every single variable starts at the same address meaning they share memory. It will then make sense to use a union of classes/structs to get a set for each condition, like
PHP:
union
{
    struct land
    {
        // land only variables
    }
    struct water
    {
        // water only variables
    }
};
This way you end up with two structs for land and water, but you can only use one at a time. This means outside the union you need a way to determine quickly which case the plot is. The beauty of this is obvoiusly that the union will only take up max(sizeof(land), sizeof(water)) rather than sizeof(land)+sizeof(water) like it does right now.

One thing to watch out for when using unions is if there are pointers, which the deconstructor needs to clear. You will have to conditionally clear memory based on which type you are using. Also when setting (switching) type you should set all pointers to NULL.
 
Thanks for this detailed response! There's a lot for me in there. Especially the idea to use a constant based on sizeof for the class will definitely help me a lot because right now I feel like I am guessing what the impact of my choices would be based on half-informed impressions.

I think I want to stay at a lower level of some of your suggestions because I don't have enough confidence in my C++ experience to do them correctly, and I feel that it would be prematurely optimizing on performance while I am still in the process of implementing the feature. I guess another way to put it is that I want to make sure that I am using the built-in mechanisms correctly and not making things needlessly wrong before trying more custom solutions.

My takeaway in that regard is broadly, and correct me if I'm wrong:
  • I can get benefits from using unsigned char or short instead of int as long as I take care to declare them in such a way that they can share a block
  • For the above, the order of declaration matters, not the initialization.
  • There is a performance overhead for writing - I think I can disregard that concern because the values impacted by this are rarely updated, some of them only at the start of the game
I am still wondering about arrays though. Let's look at a concrete example:
Code:
class CvPlot
{
    [...]

    // Leoreth: initialized by Python at the beginning of the game
    bool* m_abCore;
    short* m_aiSettlerValue;
    short* m_aiWarValue;
    short* m_aiReligionSpreadFactor;
}
and
Code:
CvPlot::CvPlot()
{
    [...]

    m_abCore = new bool[NUM_CIVS];
    m_aiSettlerValue = new short[NUM_CIVS];
    m_aiWarValue = new short[NUM_CIVS];
    m_aiReligionSpreadFactor = new short[NUM_RELIGIONS];
}
How should I actually think about the allocation of memory for these arrays? In the CvPlot declaration, it is only a pointer and the size of the array isn't known, so I assume this only occurs during the constructor call. Doesn't that also mean the above sizeof metric does not account for it?

When the requested number of array fields is allocated then I can safely assume that those share blocks as much as possible?

Looking at only e.g. CvPlot.m_aiSettlerValue, do I gain anything if I defer the array allocation to the setter method, i.e. instead of doing it in the constructor I have something like:
Code:
void CvPlot::setSettlerValue(CivilizationTypes eCivilization, short iNewValue)
{
    if (m_aiSettlerValue == NULL)
    {
        m_aiSettlerValue = new short[NUM_CIVS];
   }

    m_aiSettlerValue[eCivilization] = iNewValue;
}

short CvPlot::getSettlerValue(CivilizationTypes eCivilization)
{
    if (m_aiSettlerValue == NULL)
    {
        return 0;
   }

    return m_aiSettlerValue[eCivilization];
}
The idea being that I can avoid even allocating the short * NUM_CIVS for plots that never have a value set, including all water tiles.

Going one step further, is it worth considering replacing the array entirely with an STL map type? Not all civilizations are used the entire game, so some of the array entries are very likely to remain unused. Like I said, I care primarily about memory consumption, followed by read efficiency. Write efficiency is not really a concern.
 
I can get benefits from using unsigned char or short instead of int as long as I take care to declare them in such a way that they can share a block
Yeah they can reduce memory if ordered correctly, like char, int, char will use 12 bytes while char, char, int will use 8 as the two chars next to each other can share a 4 byte "block".
For the above, the order of declaration matters, not the initialization.
Static memory allocation for a class is purely about declaration. Initialization won't affect how much memory is used.
Static: memory used without following pointers.

How should I actually think about the allocation of memory for these arrays? In the CvPlot declaration, it is only a pointer and the size of the array isn't known, so I assume this only occurs during the constructor call. Doesn't that also mean the above sizeof metric does not account for it?
sizeof will tell the amount of memory, which is needed when you allocate a single instance without considering the constructor. This means it will add the 4 bytes used by the pointer itself, but not anything it points to as the array data will be allocated outside of the memory allocated for the class itself.

When the requested number of array fields is allocated then I can safely assume that those share blocks as much as possible?
Memory allocation is done in blocks of 4 bytes. If we assume new char[7] it will allocate 8 bytes and you might as well use char[8]. Won't make a difference in memory usage. And yes the computer will try to use as little memory as possible.

Looking at only e.g. CvPlot.m_aiSettlerValue, do I gain anything if I defer the array allocation to the setter method, i.e. instead of doing it in the constructor I have something like:
The idea being that I can avoid even allocating the short * NUM_CIVS for plots that never have a value set, including all water tiles.
Obviously if you never allocate memory for the array then you will save that memory so yeah you can reduce memory usage by not allocating arrays you don't need. Vanilla does this to some extend and I have modded code, which is more aggressive in that regard than vanilla is.

Going one step further, is it worth considering replacing the array entirely with an STL map type? Not all civilizations are used the entire game, so some of the array entries are very likely to remain unused. Like I said, I care primarily about memory consumption, followed by read efficiency. Write efficiency is not really a concern.
That will kill performance. For some reason STL maps are dead slow when used with our compiler (not sure why, but it might be a bad compiler in that regard). Arrays are optimized for performance because a read is pointer+offset, like pointer+5 for team 5. This will generate a new temp pointer, which points to the correct variable with no regard to whatever else is in the array. As such this is the read approach with the least overhead you can do.

If you really want to save on memory, you can use std::list and make a double linked list. It will be slower and fragment the memory some more, but you can skip storing whatever is default values. For instance if you make a class containing TeamTypes and an int, it's 8 bytes (or 4 if you use : 16 on both) and then it adds 8 bytes for the pointers in the list. That's 12 bytes for each non-default short value while 12 bytes would save 6 shorts in an array. This means in theory you could save memory if less than 1/6 of the values are non-default. It doesn't seem worth the extra code or overhead to save a few bytes. Now if the list saves class instances of say 40 bytes, then the list approach would make sense if you can save say 3 instances instead of 60, but I don't see a need for a list with objects that large in CvPlot.

In general std::list is rarely a good option because it comes with a lot of memory I/O overhead, which hurts performance. std::vector is sequential as well as C style arrays and they will always win on performance.
Code:
m_abCore = new bool[NUM_CIVS];
sizeof(bool) = 4 while sizeof(char) = 1. Change bool to char and you will save 3*NUM_CIVS bytes. Change the set function to something like
Code:
m_abCore[iIndex] = bNewVal ? 1 : 0;
This will ensure that the data will always be 1 or 0, hence fit in a byte. The C++ standard says bools should be 0 or 1, but I can easily write code, which violates that standard so I wouldn't rely on it being followed. Read is just returning m_abCore[iIndex] with bool as a return type. No typecasting needed.
 
You can use a dynamic array.

Pros:
- allocate memory only for civs that are actually in the game
- use performance of arrays

Cons:
- you have to allocate the arrays each time a player is added or removed from the game

something like this:
actualCivs = CalculateNumberOfCivilizationsInGame() // loop over all players and check if they are alive
m_aiSettlerValue = new short[actualCivs]

then to access to the settlervalues you need a new DynamicPlayerId for each player

So your players might have
Id: 0, 1, 3, 6, 10
DynamicPlayerId: 0, 1, 2, 3, 4

The DynamicPlayerId you have to set each time a player is added/removed along with the m_aiSettlerValue arrays.
 
You can use a dynamic array.
If you want, you could use an std::vector as that is internally an array, which you can add data to. When you add, it will allocate new memory, copy all of the data and then free the old array. The issue with this approach is that the implementation is aware that there is an overhead allocating and freeing so it won't allocate one at a time. It will allocate extra so it can add later. Going from 4 to 5 entries might make it jump from 4 to 8 in terms of allocation. You could manually tell it to allocate for X entries, but that's extra code.
Id: 0, 1, 3, 6, 10
DynamicPlayerId: 0, 1, 2, 3, 4
So you are saying that CvPlayer should have a getDynamicID(). That's an interesting idea and it might work fairly well. If so I would make a new enum and call it DynamicPlayerTypes to make it clear which one a function expects and cause a compiler error if the wrong one is used.

Another thing to consider is making a class to store multiple variables and then make an array of those. Like
PHP:
struct PlotVisibilityStuff
{
    PlayerTypes eOwner : 8;
    bool bRevealed : 1;
    int iVisibilityCount : 16;
    int whatever : 7;
};
BOOST_STATIC_ASSERT(sizeof(PlotVisibilityStuff) == 4);
Doing this will turn it into 4 byte blocks, but you can store so much more in those 4 bytes. Also this will reduce memory I/O (hence latency) by putting values commonly used together into the same index in an array as the CPU will fetch the whole 4 bytes into the CPU cache, something it won't do if it reads the same index from two different arrays.

BOOST_STATIC_ASSERT will cause a compile error if the argument is false. Here is why sizeof being set at compile time really becomes useful. You just add and if it compiles then the size is ok. If not, then you won't forget to check. I once did this with a template argument because the code in the function would only work as intended if the variable used 4 bytes. It helps prevent current or future bugs by adding such checks. BOOST_STATIC_ASSERT will do nothing at runtime meaning unlike FAssert it will not affect performance. Yes you can disable asserts when compiling, but then you won't check. Also you can't disable BOOST_STATIC_ASSERT meaning it's checked even when compiling a release.
 
Thanks for the detailed explanations and additional suggestions. Sorry for asking basic questions and asking for clarification - I don't have much experience with C++ or other close-to-memory programming language and what I have learned for better or worse comes from the Civ4 code base, so I want to make sure I don't make wrong assumptions and take the wrong thing away from it.
 
Sorry for asking basic questions and asking for clarification - I don't have much experience with C++ or other close-to-memory programming language and what I have learned for better or worse comes from the Civ4 code base
That goes for most modders and if nobody asks then the answer won't appear on the forum. Without forum posts everybody will have to learn everything the hard way :badcomp:
 
[...] I think I want to stay at a lower level of some of your suggestions because I don't have enough confidence in my C++ experience to do them correctly, and I feel that it would be prematurely optimizing on performance while I am still in the process of implementing the feature. [...] Looking at only e.g. CvPlot.m_aiSettlerValue, do I gain anything if I defer the array allocation to the setter method [...]
I think this is a good approach. Apart from avoiding memory wasted on water plots, the late allocation should place the contents of the arrays far away from the CvPlot instances on the heap, so I don't suppose there should even be a particular concern about "messing" with a performance-critical class (CvPlot). If a timed test could be run before and after adding the first of those arrays, that might set your mind at ease more (but that's easier said than done). Regarding unused civs, I would only consider a more complex data structure than an array (or vector) if the data were very sparse, i.e. needed only for a couple of civs (or usually for none at all).
 
Yeah. After thinking about it for a bit the room for optimizing around unused civs isn't as big as I thought - there is a reason that I decided to store civs in the first place instead of players (which still can be dead or alive but are already a subset of all civilizations). There are parts of the logic that need to consider civilizations that are currently not assigned to player, mostly to decide whether that civilization should be assigned to a player. Of course you could still optimise around that by doing so a lot smarter than I am currently doing - but then any memory optimisation is predicated on that work.
 
After thinking about it for a bit the room for optimizing around unused civs isn't as big as I thought
I'm not surprised. Often when the language has something optimized like array for quick access, outsmarting it is much harder than it first look. Often you need a significant gain, like saying a tech enabling a free promotion for for a specific unit combat type. Rather than making a 2D array of NUM_UNIT_COMBAT*NUM_PROMOTIONS with an int for each, store a list of (UnitCombatTypes, PromotionTypes, int allow) entries because rather than needing the full array, the amount given is usually 1-3 if any. Also xml data is mostly read only. There is little room for optimization here because this is the vanilla approach to this specific type of problem.

Optimizing an array of ints for each player/team is harder as there isn't the massive amount of data to save in the first place. Much less "idle/+0 entries". If you really want to save memory, you could do something like saying that the game in question has the current number of teams (say 25), disallow making new teams while the game is running and then use max number of teams ever used in the current game instead of MAX_PLAYERS for array length. Not sure if this saves enough to justify the work or risk of introducing bugs through and it will mess with savegames.

I will say that stuff like using char arrays instead of bool arrays will be able to save memory with relatively little effort or risk of bugs. Outsmarting C++ with your own implementation of arrays can be done, but it requires more effort.

On a site node, the reason why bool arrays use 4 bytes for each entry is it's more efficient when writing to it as it doesn't have to read first. Also since 4 bytes is written with each write, two threads writing to different variable within the same 4 byte block is not thread safe. Not a problem for single threaded games, but if somebody wants to use multithreading, that is something to consider. Not sure if thread safe or performance is the reason for using 4 bytes to store a bool, but both are valid arguments in general. Neither of those arguments apply to a read only array through.
 
Yeah, you're right. Premature optimization is never a good idea. Especially when the consideration becomes read performance vs write performance vs memory - makes little sense to think about that on the drawing board when you're not even ready to do benchmarks of the alternatives.

I'm just a little paranoid about memory right now because I am moving to a larger map and more plots feels like a significant multiplier on what I am currently using when there already people who are struggling with the current memory needs.

But I have already learned a lot of patterns in this thread that seem to be unambiguous wins which already makes me feel good about it. I can try more involved strategies later on when I have the time to actually measure the results.
 
Last edited:
Oh, I thought you already managed to get 15k plots working in your mod. Yes, paranoid you should be. You will face the endboss of civ4 modding. Many tried over the years to use giant maps in civ4 with mixed results.

The cvplot data will be the least of your concerns. The graphic engine stores the whole map in memory for performance reasons and it needs way more memory per plot than a couple arrays in cvplot. As far as I know the graphic engine is still 32 bit so there are limits to what it can handle.

You might want to try out if base BTS can handle 15k plots by just increasing height and width of worldsize huge.
 
You might want to try out if base BTS can handle 15k plots by just increasing height and width of worldsize huge.
We The People has a world size of 100*215, which is 21500 plots. The game engine can handle it. I tried 350*148 (51800 plots) once and while the game started fine, it did use 3 GB of memory meaning it would have crashed if I didn't expand the allowed memory usage to 4 GB. Also performance also left something to be desired, particularly considering that it was early game. Big maps are possible, but they will eat up memory.

Let's say we change a bool array to a char array. That's 3 bytes saved for each entry. Say the length is 64 (could be MAX_PLAYERS). Multiply this with 15k plots and we get 3 bytes * 64 * 15 = 2880000 or roughly 2.75 MB. Yeah there is potential to save memory in how the map is stored, but with a bigger map there will be room for more cities, more units and they all use memory.
 
We The People has a world size of 100*215, which is 21500 plots. The game engine can handle it. I tried 350*148 (51800 plots) once and while the game started fine, it did use 3 GB of memory meaning it would have crashed if I didn't expand the allowed memory usage to 4 GB. Also performance also left something to be desired, particularly considering that it was early game. Big maps are possible, but they will eat up memory.
Oh this is great news.
 
Oh, I thought you already managed to get 15k plots working in your mod.
Working is a poorly defined term. It's in the game. It's not lategame tested.
 
Top Bottom