I know that mods with a lot of content run into this problem sooner than later. I don't know what is the situation with Caveman 2 Cosmos, but other mods such as Realism Invictus directly state that they don't support 32 bit operative systems. Since it happens in mods with a lot of game entities, each one of them with a lot of extra tags, I reckon that eventually the 3 GB limit would be surpassed. I don't think this would be a problem with FFH2, though. I never did actual memory tests but I don't believe I ever saw ExtraModMod going over 2 or even 1 GB. Since it has less game entities, More Naval AI should have lower memory usage.
I don't really understand the thing that 64 bit doesn't crash but 32 bit does, but I'm sure you can't use more memory than the 32 bit limitation, so we should try to stay below 3 GB, and shouldn't value performance too high over memory.
I propose a dynamic size. I have added a more detailed description of this class to the end of the proposal. It includes a very rough performance analysis of how efficient it would be (my knowledge on those is a bit rusty so corrections are welcome). Memorywise it should be great for entities with a lot of different types but very few types actually assigned, such as promotions.
O Notation! Didn't think of that. I was thinking about a finer analysis (not that I'm an expert in these things, but O Notation is pretty rough), but it's a really good idea to find potential O(n^2) first.
However, I think the O(n^2) complexity isn't necessary (if I understood your example correctly). We just have to loop through all promotions once, check if the array says -1 or 1 on that promotions, check if the unit has it, thus O(N) (compared to O(n) with a list of required promotions in the spell).
I'm not sure if I understood CvConditionArray. I got the impression that CvConditionArray is a wrapper for std::map<int, bool>, defining additional methods, is that right?
I'll write down my thoughts on this. Again, I'm no expert and it's mostly guessing (in fact, I wouldn't have known what O Notation is three months ago). Especially the memory consumption varies greatly with different environments AFAIK, so don't take that too serious.
I'll use your definitions of N as the total number of entities of a type and n as the number of entries in a list. M is the limit defined for entities stored in array lists (such as NUM_SPELL_XXX).
I'll use t_a for the time required to access a specific entity in an array (very fast), t_v to access a specific entity in a std::vector (fast), t_lm to loop/iterate over a single entity in a std::map (fast), t_m to access a specific entity in a map (slow).
We have two cases of "entity lists" (promotion/unit/building/... lists):
1. We know the entity, and want to know if it's in the list. Examples: UnitOrPrereqs, UnitClassOrPrereqs, UnitCombatOrPrereqs
2. We don't know a specific entity, but instead have two lists of Entities to check against each other. The first "list" is implemented as a bool array with size N. Examples: TechPrereq (a player has a bool-array "list" of techs)
3. Same as 2, but we have another ("negative", could also something else) list of the entities. Examples: PromotionAndPrereqs, PromotionAndBlocks (a unit has a bool-array "list" of promotions)
We have roughly two strategies to store the spell/effect prerequisites:
A. An array of bools with size N. Direct access to an entity: t_a. Looping all entities in the list (that are true in the array) requires N single accesses, N*t_a. Size is N byte.
B. A list with size n.
- Ba: std::vector<int>. Just a list. Access to a single element is, in the worst case, n*t_v, since we have to search all elements in an unsorted vector. Looping over a single entity is t_v, all elements is n*t_v.
- Bb: int array, a non-dynamic list. Needs a limit M. Access to a single element is, in the worst case, (n+1)*t_a ( as we can stop iterating after we found the first "NONE"/-1 after n+1 iterations). Looping all elements is (n+1)*t_a. Size is 4*M bytes.
- Bc: std::map<int, bool> (I assume this is CvConditionArray. It simulates two lists, a positive (true) and a negative (false) prerequisite. Access to a single element is t_m. Looping a single element is t_lm, but considering we can check two prereqs at a time it's t_lm/2, all elements is n*t_lm/2. Size is, as you stated, good for small lists, bad for big ones because of ELEMENT_OVERHEAD. We have 5 byte size because we have pairs of values (int, 4b and bool, 1b), but since we simulate two lists, we're at 2.5*n + n*ELEMENT_OVERHEAD/2 + CONTAINER_OVERHEAD/2 byte.
- Bd: int array AND bool array, like a array of pair<int,bool> and similar to std::map. Needs a limit M (most probably higher than Bb, since two different prereqs are stored). Access to a single element is, in the worst case, ((n+1)*t_a) (we need two accesses but check two prerequisites at the time). Looping all elements is (n+1)*t_a. Size is 2.5*M bytes ((1+4)*M/2...).
So...
The obviously best solution for 1 is A. We only need direct access, and we get it at the best possible performance. Total size is f.e. numPromotions*numSpells, in MNAI about 70 KB.
Code:
for unit in units
for spell in spells
if( spell.isUnitOrPrereq( unit.type ) )
OK
For 2 it's more difficult:
2A: Performance (loop): N*t_a; Size: N
2Ba: Performance (loop): n*t_v; Size: depends. AFAIK it allocates always 2^x fields at once.
2Bb: Performance (loop): (n+1)*t_a; Size: 4M; Requires M, therefore either more inelegant (constant M) or more complex (M determined at loading XML)
I did some testing and it seems accessing a vector is up to 2 times slower than accessing an array (using an vector::iterator to loop a vector is
much slower). That means the vector is only less efficient if n >= N/2, which is highly unlikely. Bb is the fastest, but also most problematic solution.
Code:
// 2A
for unit in units
for spell in spells
for promotion in promotions
if( spell.isPromotionOrPrereq( promotion ) and unit.hasPromotion( promotion ) )
OK
// 2B
for unit in units
for spell in spells
for promotion in spell.promotionOrPrereqs
if( unit.hasPromotion( promotion ) )
OK
3 is much like 2 (performance and size is for one tag, so the Bc and Bd is halved):
3A: Performance (loop): N*t_a; Size: N
3Ba: Performance (loop): n*t_v; Size: depends. AFAIK it allocates always 2^x fields at once.
3Bb: Performance (loop): (n+1)*t_a; Size: 4*M; Requires limit of M
3Bc: Performance (loop): n*t_lm/2; Size: n + n*ELEMENT_OVERHEAD/2 + CONTAINER_OVERHEAD/2 (overheads for std::map)
3Bd: Performance (loop): (n+1)*t_a; Size: 2.5*M (2*M/2...); Requires limit of M
Iterating over a key-value pair of a map (reading both in a two new variables) was about 8-10 times slower than accessing an array, and about 5 times slower than accessing a vector. This would means Ba is 2.5 times faster than Bc.
What I didn't take into account is that we need only one method call for the map variant (f.e. "getPromotionAndPrereqBlock(int index)"), but two for the vector variant (get...Prereq(int index) and get...Block(int index)). The map is probably still worse than the vector, since the vector requires 2 times an array access, and what makes the vector actually slower is the method call AFAIK (since it uses an array internally), so a method call costs roughly 1 array access, but the map is 4 array accesses slower than the vector.
Again, A is only faster than Ba if we have half of all promotions in the list, which is unlikely.
Bb and Bd are clearly faster than the other options. Bb is less complex, but Bd could be faster, since it only requires one method call. Bd requires more memory (since its M is higher).
EDIT/Correction: Bd could actually require less memory.
My performance measures are probably pretty inaccurate, but it seems using a map, hasn't better performance nor memory (while I really liked the idea). Using two arrays (int and bool) instead could be faster if I'm right about the method call costs.
It's also worth noting we're talking about differences of at most 1.5 microseconds (array access - vector access), so the amount of saved time (provided we use 1A, 2Bx, 3Bx) for one tag is 1.5e-6 * units * (total entities in lists of spells). Let's say most spells (200 of 270) need a promotion, so we have 0.0003 seconds saved per unit for the PromotionAndPrereqs tag. I'm not sure how many units a later game normally has, but this change would not be noticeable until you have about 1000 units (that can cast). If that is common in late games, we should try several methods and measure performance. If my calculation is correct and the spell prerequisites take 10% of turn time, I think details like these could affect performance noticeable.
If something is unclear, I'll code some examples.
EDIT: I made a mistake valuing an int 1 byte, while it's 4 bytes. This makes strategies using bools a bit better in comparison.