Caveman 2 Cosmos (ideas/discussions thread)

Thunderbrd · Jun 8, 2019

Anq said:
@Thunderbrd,
I'm using C-style arrays because once the engine has settled everything inside an reference object (info object), their contents would never change. A vector has the advantage of easier expansion, but we are using this feature only in the startup process (loading XML stage). I have demonstrated how to copy these arrays and concatenate two of them together, and it is easy to do the calculation to make it right.

Are you suggesting that it then becomes faster to reference the information? I might need some examples of syntax here on all the steps of use to fully follow. A struct has an advantage we've never used as well and that was brought up recently by AIAndy when we were talking about ways to save on memory usage, where when the int or bool is declared, apparently you can limit how many bits of data are enabled to be assigned to those tags. I'm still not 100% clear on its usage or how it might change things.

Thunderbrd · Jun 8, 2019

Anq said:
It will be only one additional boolean value in every info-object. I don't want to apply this to every property inside an object. I may have put it that way in my initial post, but I soon realized that and edited it out.

If you're limiting the boolean to one per info-object, that's the same as a global tag on that object then right? So how do you specify which tags it applies to and which it doesn't unless you're including it for each multi-list tag on each info object?

bForceOverwrite is in essence one bool per info object so I'm unclear how you're talking about distinguishing what tags get overwritten and which don't. I don't THINK bForceOverwrite is a true global tag - it's intent, if I recall correctly, is to be applied to an individual info object. It may not work quite right for most of them right now, I don't know.

Dancing Hoskuld · Jun 8, 2019

Anq said:
Why is ForceOverwrite a global setting? It's not doing what I'd like. Such switch should be granular to individual objects, and I'd like to make it that with this switch on, you don't have to copy everything to your XML and redefine the entire object. I want to make the engine copy the values that your XML doesn't change, and use your values over the defaults when you specify them, including null values (to cancel the default).

In the XML it is not about what a programmer or the reading program like it is about what the people who have to write the code like

. Your suggestion will make it much harder to write XML and be confident that what you have done will happen. Also since we don't have 6 people trying to change XML at the same time it appears as wasted effort.

Anq · Jun 8, 2019

Thunderbrd said:
Are you suggesting that it then becomes faster to reference the information? I might need some examples of syntax here on all the steps of use to fully follow. A struct has an advantage we've never used as well and that was brought up recently by AIAndy when we were talking about ways to save on memory usage, where when the int or bool is declared, apparently you can limit how many bits of data are enabled to be assigned to those tags. I'm still not 100% clear on its usage or how it might change things.

It does not improve access time. It's only a matter of programming style. Plain old data is more compatible to use in my opinion, so that I don't have to modify the header to replace every instance of pointer with some vector... You know when you modify the header just a bit, your build time is going to suffer a lot. Vectors have little advantage over plain old C arrays if we're not modifying the data every now and then but just the startup process. It's a waste of space and capability and adds another layer (although thin) between the name and the data.

Thunderbrd said:
If you're limiting the boolean to one per info-object, that's the same as a global tag then right? So how do you specify which tags it applies to and which it doesn't unless you're including it for each multi-list tag on each info object?

~~OK, I misunderstood the process. It's fine to use a global switch because the objects are read from XML only one at a time. Still the behavior can be more refined...~~

Wait. The flag is set when an info object is read. But I want to check this flag in CopyNonDefaults. It has to pertain to the object, or otherwise there will be no way to know if the flag is set for the current object.

As it read Objects A,B,C,D, the flag is set for D. When it later tries to merge two definitions of A, it is looking up the wrong flag. The global approach doesn't work.

Dancing Hoskuld said:
In the XML it is not about what a programmer or the reading program like it is about what the people who have to write the code like. Your suggestion will make it much harder to write XML and be confident that what you have done will happen. Also since we don't have 6 people trying to change XML at the same time it appears as wasted effort.

The current code forces us to copy the full definition of an object if we want to ForceOverwrite with it, and this hinders me to seek what are changed in the overwriting object. It'll be clearer if we allow the overwriting object to omit the values that are the same as default. This makes reviewing differences much easier, and since you don't have 6 people trying to change XML simultaneously anymore, you'll like to make reviewing easier.

PS. (Pardon for my ignorance here, I haven't grasped the whole process yet...)

Thunderbrd · Jun 8, 2019

Anq said:
It does not improve access time. It's only a matter of programming style. Plain old data is more compatible to use in my opinion, so that I don't have to modify the header to replace every instance of pointer with some vector... You know when you modify the header just a bit, your build time is going to suffer a lot. Vectors have little advantage over plain old C arrays if we're not modifying the data every now and then but just the startup process. It's a waste of space and capability and adds another layer (although thin) between the name and the data.

However, the vector method does have the advantage of being faster to loop through. Rather than having to loop through all the objects in an info, you're usually just looping through all the objects stored in the vector for that tag. I can show you a comparison later - heading out at the moment. You may be able to show me how I could use an array faster than it seems possible to use to me though too so I'm still keeping an open mind, provided CLEAR examples can be given of your method throughout every possible application step of the process so I can procedurally replicate it, even procedurally replicate interacting with it properly. I've gotten a little used to the methods we have.

Thunderbrd · Jun 9, 2019

I'll take some time tomorrow to try to absorb it all. It's rather fascinating.

AIAndy · Jun 9, 2019

The force overwrite is only global because otherwise it would have to be in the root object of the info class hierarchy and iirc there is some issue with the exe accessing that so you can't easily add values there. So globals are used to communicate from the specific info XML reading code with the main XML reading code.
The XML objects are always first read as if it was a new object and only afterwards it is merged with an existing object of the same ID. This merging is the copyNonDefaults step and inserts any info from the old object that is still at default value in the new object. With force overwrite the merging step is skipped and the new object overwrites the old one. You mainly need that if you want to set some value back to the default value.

In regards to your data type suggestions: Remember this is C++ so you are better off turning your data structures into classes (probably even templated). That way they become easier to use. The infos are indeed mostly immutable so there are some optimization possibilities compared to the std data types. Compared to std::vector you mainly save some bytes in the overhead as you only need one length instead of separate capacity and length.
Your flat set and map variants I would suggest to sort so faster access is possible. Mostly that requires also changing the places that use the specific info as far too often in the DLL and Python code there are loops over all possible IDs as you noticed up there. Remember that Python needs to be checked as well if you replace the access method so you might be better off to rename it or to add a separate method for the more efficient access as it does not interpret the given parameters in the same way.

Anq · Jun 9, 2019

Three kinds of functions:

Input--index value. Output--enum key at that index. eg. int CvBuildingInfo::getPrereqOrBonuses(int)
- int* m_piPrereqOrBonuses is as large as GC.getNUM_BUILDING_PREREQ_OR_BONUSES().
Input--enum key to look up. Output--integer value for that key. eg. int CvBuildingInfo::getSpecialistCount(int)
- int* m_piSpecialistCount is as large as GC.getNumSpecialistInfos().
Input--enum key to look up, Output--whether that key exists in the set. eg. bool CvBuildingInfo::isBuildingClassNeededInCity(int)
- bool* m_pbBuildingClassNeededInCity is as large as GC.getNumBuildingClassInfos()

(While I am talking about enum here, they are actually stored as integers, not those uppercase enum words.)
Now I change them:

Input--index value. -1 for count. Output--enum key at that index. eg. int CvBuildingInfo::getPrereqOrBonuses(int)
- int* m_piPrereqOrBonuses is now N+1 in length with N enum keys stored.
Input--enum key to look up. -1 for count. Output--integer value for that key. eg. int CvBuildingInfo::getSpecialistCount(int)
- int* m_piSpecialistCount is now 2N+1 in length with N key-value pairs stored.
Input--enum key to look up, Output--whether that key exists in the set. eg. bool CvBuildingInfo::isBuildingClassNeededInCity(int)
I made a copy of them to aid with iterating. New functions look like this, and are the same as category 1:
Input--index value. -1 for count. Output--enum key at that index, eg. int CvBuildingInfo::getBuildingClassNeededInCity(int)
- For both, I changed the internal array to int* m_piBuildingClassNeededInCity which is now N+1 in length with N enum keys stored.
- I made those copies exposed to python too, and tried my best to adapt other code components to use getX instead of isX for looping.

How to loop through these lists:

getPrereqOrBonuses(-1) returns the number of keys stored in the array, and you use getPrereqOrBonuses(i) where i=0 to N-1 to loop through the members.
getSpecialistCount(-1) returns the number of pairs stored in the array, but you still have to loop through all enums if you wish to look up all values. Possible improvement is to use std:: pair<int, int>'s or some struct instead of a flattened map (talking std::map here), so that it's easier to loop through stored keys.
getBuildingClassNeededInCity(-1) (new function) returns the number of keys stored in the array, and you loop through the members exactly the same way as the first category.

For categories 1 and 3, the idea of sets (std::set) can apply, but I don't like to sort them and ensure no repetitive member is added unless I figure out the easy way to do these two jobs...

Here is my list of changes:

Code:

[1st category] int CvBuildingInfo::getPrereqOrVicinityBonuses(int)
[1st category] int CvBuildingInfo::getPrereqOrRawVicinityBonuses(int)
[2nd category] int CvBuildingInfo::getUnitClassProductionModifier(int)
[3rd category] bool CvBuildingInfo::isPrereqOrBuildingClass(int) --> original still in use by CvCityAI, and by CvBuildingInfo itself
New function--> int CvBuildingInfo::getPrereqOrBuildingClass(int) -> CvCity and CvGameTextMgr are now using
[2nd category] int CvBuildingInfo::getTechHappinessChanges(int)
[2nd category] int CvBuildingInfo::getTechHealthChanges(int)
[3rd category] bool CvBuildingInfo::isPrereqOrTerrain(int) --> no more use except python
New function--> int CvBuildingInfo::getPrereqOrTerrain(int) -> CvCity and CvGameTextMgr are now using
[3rd category] bool CvBuildingInfo::isPrereqAndTerrain(int) --> no more use except python
New function--> int CvBuildingInfo::getPrereqAndTerrain(int) -> CvCity and CvGameTextMgr are now using
[3rd category] bool CvBuildingInfo::isPrereqOrImprovement(int) --> no more use except python
New function--> int CvBuildingInfo::getPrereqOrImprovement(int) -> CvCity and CvGameTextMgr are now using
[3rd category] bool CvBuildingInfo::isPrereqOrFeature(int) --> no more use except python
New function--> int CvBuildingInfo::getPrereqOrFeature(int) -> CvCity and CvGameTextMgr are now using
[2nd category] int CvBuildingInfo::getBuildingClassProductionModifier(int)
[2nd category] int CvBuildingInfo::getGlobalBuildingClassProductionModifier(int)
[2nd category] int CvBuildingInfo::getBonusDefenseChanges(int)
[3rd category] bool CvBuildingInfo::isPrereqNotBuildingClass(int) --> no more use (and was never exposed to python)
New function--> int CvBuildingInfo::getPrereqNotBuildingClass(int) -> CvCity and CvGameTextMgr are now using (I didn't add the python interface for this one)
[3rd category] bool CvBuildingInfo::isReplaceBuildingClass(int) --> original still in use by CvCity, CvCityAI, and CvGameTextMgr
New function--> int CvBuildingInfo::getReplaceBuildingClass(int) -> CvPlayer, CvCity(partly), CvCityAI(partly) and CvGameTextMgr(partly) are now using
[2nd category] int CvBuildingInfo::getUnitCombatExtraStrength(int)
[1st category] int CvBuildingInfo::getPrereqAndTechs(int)
[1st category] int CvBuildingInfo::getPrereqOrBonuses(int)
[2nd category] int CvBuildingInfo::getReligionChange(int)
[2nd category] int CvBuildingInfo::getSpecialistCount(int)
[2nd category] int CvBuildingInfo::getFreeSpecialistCount(int)
[2nd category] int CvBuildingInfo::getBonusHealthChanges(int)
[2nd category] int CvBuildingInfo::getBonusHappinessChanges(int)
[2nd category] int CvBuildingInfo::getBonusProductionModifier(int)
[2nd category] int CvBuildingInfo::getUnitCombatFreeExperience(int)
|--> removed bool m_bAnyUnitCombatFreeExperience
|--> adapted bool CvBuildingInfo::isAnyUnitCombatFreeExperience()
[2nd category] int CvBuildingInfo::getDomainFreeExperience(int)
|--> removed bool m_bAnyDomainFreeExperience
|--> adapted bool CvBuildingInfo::isAnyDomainFreeExperience()
[2nd category] int CvBuildingInfo::getDomainProductionModifier(int)
[2nd category] int CvBuildingInfo::getBuildingHappinessChanges(int)
[2nd category] int CvBuildingInfo::getPrereqNumOfBuildingClass(int)
[2nd category] int CvBuildingInfo::getImprovementFreeSpecialist(int)
[3rd category] bool CvBuildingInfo::isBuildingClassNeededInCity(int) --> original still in use by CvCityAI and CvGameTextMgr, and by CvBuildingInfo itself
New function--> int CvBuildingInfo::getBuildingClassNeededInCity(int) -> CvPlayer and CvCity(partly) are now using

With these changes it now takes 388.8MB vs 437.7MB (1 test) of memory. Tested using PPIO and opening up the pedia to the same building and waiting until steady state.
50 Megabytes saved from these arrays!

Code samples that use macros for everything: Watch out, can hurt your eyes!
>> https://pastebin.com/6pxEkyqY <<

AIAndy · Jun 9, 2019

Good work, but I assume you come from a C background, right?
I had a look at it, so here are some suggestions for improvements:

The macros can be replaced by inline functions defined in a header. That way it will be less confusing while debugging while it will still be inlined in release resulting in the same assembly.
I would suggest adding a separate function for returning the size for iteration as that states the intent more clearly than passing -1 to the function and it will also work consistently for all kind of data instead of only integers.
If you want to play a bit with templates, you could replace the array with a templated class or struct that contains the size and the array separately from each other (you'd have to give the size as 1 and just allocate enough when you actually create a variable of the type). There are several advantages if done right like you could add a specialization for bools that only store bits later without having to change the places where it is used. If you are interested I can describe it in some more detail.

Anq · Jun 9, 2019

AIAndy said:
Good work, but I assume you come from a C background, right?
I had a look at it, so here are some suggestions for improvements:

The macros can be replaced by inline functions defined in a header. That way it will be less confusing while debugging while it will still be inlined in release resulting in the same assembly.

I would suggest adding a separate function for returning the size for iteration as that states the intent more clearly than passing -1 to the function and it will also work consistently for all kind of data instead of only integers.

If you want to play a bit with templates, you could replace the array with a templated class or struct that contains the size and the array separately from each other (you'd have to give the size as 1 and just allocate enough when you actually create a variable of the type). There are several advantages if done right like you could add a specialization for bools that only store bits later without having to change the places where it is used. If you are interested I can describe it in some more detail.

Thank you for these suggestions, I'll try inline functions and templates and most importantly make a separate function for the size. I'm a novice in C so I took documents from Microsoft (VC2003) and c-faq.com, like jman2050 on the discord channel suggested. Understanding pointers and arrays helps me a lot. But maybe a storage class wrapped around the values (but immutable to save space) is better to use. It's funny every C/++ programmer ends up inventing their own wheel to suit their use, but that's the joy of programming too.

Toffer90 · Jun 9, 2019

@Anq: Your work here looks promising both in regards to memory savings but also in my opinion in regards to practicality/functionality.

Am I right in that the hard limits like GC.getNUM_BUILDING_PREREQ_OR_BONUSES(), and others like it, would disappear?
Replaced by a dynamic size array system that allows as many prereqs as one want in a specific array for an object, where none of the object instances will have any null values in their array?
That would be nice. ^^
It annoyed me that I had to loop through a set amount of values for every object instance in the pedia python even if some, or all, the values could be null for most of those instances.

P.S. For those that wants to follow this discussion without having much programming knowledge:
Dynamic size arrays are only troublesome if they need to be changed a lot during runtime as one would have to make a new array and copy all the values, one by one, from the old array to the new one each time it is changed.
This is time consuming for long arrays.
The arrays discussed here are only created once during game launch and never changed again while inside the game.

Thunderbrd · Jun 9, 2019

OK, well I'm completely out of my depth at this point, at least in large part. I can't comment on pros and cons here in the least. Thank you @AIAndy for helping with this part since you were there to set things up that I could follow the methods you'd established. I'm going to have to just trust this stuff works and learn by examples how to use it, just as I have with the rest of our data storage systems.

I think that 2 things are important with all of this:
1) We should release what we have before any of this comes in, in case there are any instabilities or bugs to sort out from it.

2) We need to get Anq on the team completely because I will be incapable of helping much with the issues it may introduce so he'll need to be willing to stick these changes out for the long haul.

Are you cool with those conditions Anq? Otherwise, I agree it's all sounding great and if you have AIAndy and Toffer's approval, you've certainly got mine. I can adapt. If you're capable of helping to figure things out to save memory at this stage of things, I can't wait to see how you'd improve memory for the units themselves - it's very promising to think we might be able to truly nip these MAF problems in the bud! So many projects are stalled with the concern that they may be useless without some major memory processing improvements taking place.

EDIT: This bit of advice actually was something that stood out to me as well:

AIAndy said:
I would suggest adding a separate function for returning the size for iteration as that states the intent more clearly than passing -1 to the function and it will also work consistently for all kind of data instead of only integers.

This would be more familiar to me to work with at least. I also thought it might interact better with the occasional bugged call parameter.

Whisperr · Jun 9, 2019

Wes_ said:
I was told to post it here. Some ideas I had whilst playing C2C for the first time.

New Wonder and Group Wonders

These are only rough ideas.

I will look into these Group Wonder Ideas. As for the castle that may fall into another category of Group Wonders planned.

alberts8 · Jun 9, 2019

@Anq
Your ideas go in the right direction i'am just not sure why it wouldn't be possible to use std::vector??????
You can look at the changes from 10591 for an example how to use an std::vector instead of an array.

Thunderbrd · Jun 9, 2019

alberts2 said:
@Anq
Your ideas go in the right direction i'am just not sure why it wouldn't be possible to use std::vector??????
You can look at the changes from 10591 for an example how to use an std::vector instead of an array.

If you see any arguments for the array form over his proposal, aside from familiarity with method for those of us established as having worked on this for a while, please, let's hear them! His is SOUNDING like a slight improvement but requiring some adaptation of style. I don't know enough to see which is better and I'm wondering if unfamiliarity is enough of a reason to argue to stay with the vector usage.

alberts8 · Jun 9, 2019

Thunderbrd said:
If you see any arguments for the array form over his proposal, aside from familiarity with method for those of us established as having worked on this for a while, please, let's hear them! His is SOUNDING like a slight improvement but requiring some adaptation of style. I don't know enough to see which is better and I'm wondering if unfamiliarity is enough of a reason to argue to stay with the vector usage.

The std::vector kinda works like a dynamic array that makes it just unnecessary to hand craft a dynamic array.

Anq · Jun 9, 2019

OK, we may use std::vector instead. I was poking around in cpp.sh to make a prototype "vector" class, that uses some tricks to make two vectors easily stitched together.

How about making the function return a reference to this object, and do everything with this object, such as access its size, its values, and so on (but be careful not to modify it)
(It's easy to add a lock mechanism)

AIAndy · Jun 9, 2019

alberts2 said:
The std::vector kinda works like a dynamic array that makes it just unnecessary to hand craft a dynamic array.

The advantages compared to std:vector are lower overhead (4 byte for empty, 1 pointer, 8 bytes for non empty, 1 pointer and 1 size int, compared to usually 16 byte for vector) and guaranteed size (although vector with proper usage of reserve should have that as well in reality despite the standard not guaranteeing it).
So it depends if those memory savings are considered sufficient for the effort.

alberts8 · Jun 10, 2019

AIAndy said:
lower overhead (4 byte for empty, 1 pointer, 8 bytes for non empty, 1 pointer and 1 size int, compared to usually 16 byte for vector)

It's possible to avoid having empty vectors and only initiate them if they are needed.

AIAndy said:
guaranteed size

Having a variable size instead of a fixed size to save memory was the whole point of what @Anq is trying to do.
If you say something like each array has a size of 20 you're no longer saving memory in some cases you would use more.

AIAndy said:
So it depends if those memory savings are considered sufficient for the effort.

Compared to the amount of memory that is wasted at the moment the memory usage difference between using a dynamic array or an std::vector can be ignored.

AIAndy · Jun 10, 2019

alberts2 said:
It's possible to avoid having empty vectors and only initiate them if they are needed.

Yes, but then you have two indirections instead of only one.

Having a variable size instead of a fixed size to save memory was the whole point of what @Anq is trying to do.
If you say something like each array has a size of 20 you're no longer saving memory in some cases you would use more.

That is not what I meant. There is no guarantee that a vector only has the size of its content. It can be considerably larger (because of its allocation scheme). With reserve it might be exactly as large as necessary but the standard does not guarantee that (the specific implementation might though).

Compared to the amount of memory that is wasted at the moment the memory usage difference between using a dynamic array or an std::vector can be ignored.

I agree, I am just listing the arguments.

Caveman 2 Cosmos (ideas/discussions thread)

C2C War Dog

C2C War Dog

Deity

Prince

C2C War Dog

C2C War Dog

Deity

Prince

Deity

Prince

C2C Modder

C2C War Dog

Warlord

Emperor

C2C War Dog

Emperor

Prince

Deity

Emperor

Deity

Similar threads