Optimized DLLs for C2C rev9632

With original files back in place the turn times were 1min 03 sec for longest to :55 secs for shirtest. So your files reduced turn times by approximately 5 secs. But to load the game from the desktop shortcut your files took 30 secs longer; 2 min 2secs vs original at 1 min 23 secs.

I run an i7 2600K CPU with 8GB of DDR3 1066Mhz ram. My vid card is a Nvidia GTX 550 Ti 1GB DDR 5 vram.
 
With original files back in place the turn times were 1min 03 sec for longest to :55 secs for shirtest. So your files reduced turn times by approximately 5 secs.
That's a reduction of the turn wait time by 8 to 15%. Benchmarking a civ4 game is tricky, meaning we will likely have to just accept a normally unacceptable accuracy. However I would have to say that I had hoped for a more significant difference.

But to load the game from the desktop shortcut your files took 30 secs longer; 2 min 2secs vs original at 1 min 23 secs.
This is very surprising and I would likely discard this result. The problem is windows and the file cache. If you read a file, it is copied from the disk to a buffer in memory. If the computer doesn't need the memory, it will just keep the file. If the file is read again, it can then be read from memory, making it a lot faster. The more RAM you have, the more files windows will keep in memory.

To get consistent measurements for this, start the game, quit and start again. Once you get consistent start times, you have stabilized the disk cache and you can use the time. Replace DLL, start a few times to ensure the disk cache is set to provide a stable start time and use the measurement of a stable start time.

Naturally usage of XML cache also affects startup times.

Also personally I only optimize to reduce the wait for next turn. Code run during your turn doesn't matter unless it downright causes interface lag and startup of the game happens once every for every gaming session. Waiting for next turn will however take place every turn, meaning it will have the greatest impact on the overall gaming experience.

I run an i7 2600K CPU with 8GB of DDR3 1066Mhz ram. My vid card is a Nvidia GTX 550 Ti 1GB DDR 5 vram.
Based on how the civ4 engine works, I would say the following about impact from hardware configurations (related to startup time and waiting for AI turn)
  1. Graphics card has no influence at all (they affect FPS and polygon details, not wait time)
  2. The amount of memory should be high enough to avoid paging. 8 GB is certainly enough and you will not benefit from having more.
  3. CPU speed is important
  4. RAM throughput doesn't matter much
  5. RAM latency is extremely important (time from CPU requests data from RAM until it arrives to the CPU)
  6. HD speed doesn't really matter once the game is done reading files at startup
Simple summary: CPU as fast as possible and being single threaded, the game prefers high clock frequency over number of cores (not unusual for games). RAM response time should be as low as possible. Everything else doesn't really matter, at least for AI wait time.

More technical (it's ok not to get this part)
The AI turn delay is 100% CPU driven in a single thread. Whenever data is needed from RAM, the thread stalls until the data arrives from RAM. The reading of MB/s from RAM doesn't really matter because usually only 32 bit is needed (int or pointer or similar). The hardware is optimized for sequential reading, which reduces memory latency. However vanilla is as hostile to sequential reads as possible, making low latency memory even more important. For instance if we have a list of unit infos and we loop all to read a specific variable, the CPU will notice a pattern after reading 3-4 classes and start to request more of the list because it will likely be needed shortly. However vanilla doesn't store infos as a list. It uses a vector (which is ok, that one stores sequentially), but the vector stores pointers, meaning each info class is at a completely random location in memory can reading has no pattern, which can be predictable. It's everywhere. Plots are stored sequentially, meaning looping the plot array works as the hardware prefers. However looping all plots in range of a city doesn't It could have been a nested loop where it loops rows and then for each row loops plots in that row (which would be semi predictable). Instead it loops in some spiral pattern, which makes sense on paper for humans, but it makes plot reading unpredictable to the hardware. In some cases it makes the AI favor plots closest to the city (which is ok), but in other cases it won't matter at all (does the city has a plot with bonus X or is there a water plot next to the city etc).

Part of the reason is that it's an old game. CPUs get faster and faster, but it's a bit hard to make RAM respond faster. Instead we get hyper threading. Essentially each CPU core looks like two cores to the outside world, but only one can be active at once. The idea is that if thread A stalls due to waiting for memory, the core works on thread B instead. This is claimed to increase the work done by each CPU core by around 30% in real life situations where all cores are fully loaded. This issue with CPUs being too fast for the memory latency was already a problem with the hardware when civ4 was written, but it has only gotten worse since. Needless to say civ4 is written at a time when most people had single core CPUs and as such no efforts were put into using more than one core, meaning the modern tricks like hyper threading won't help at all.

DLL modder advise:
In addition to store data sequentially (which could be tricky to add now), other tricks are to just not read as many variables. Sometimes this can be done with caches (which may or may not be hard to implement), but other times performance can be gained surprisingly easy. Imagine a function, which returns an int. First it adds up some data and then it applies some multipliers. Maybe it even has to loop techs because one tech tag can add a multiplier. One really simple way to optimize this is once the number is known before the multipliers are added, if the number is 0, return 0 because 0*x=0. It doesn't matter what the multipliers are. It saves both on waiting for reading data from memory and some calculations and the result will always be the same.

Some of the first optimization I did on a mod was caching yield requirements for professions in RaR. Vanilla loops all yields, read requirements from an info class and applies modifiers from founding fathers and traits. Since input rarely changes, I just cached it in vectors and recalculate cache when a new FF joins (which is roughly the same as getting a tech in BTS). One change, which severely reduced the need to loop up memory all over and the wait for next turn dropped from 40 to 33 seconds (17.5%). Later I added more optimization of similar nature (mainly reducing memory I/O) and the combined reduction in wait is now 40%. That gives an indication of how much time is spend finding data. Some of the time spend searching for data is actually inside GC.getDefineINT. Caching the int value instead of calling that function over and over will increase performance, perhaps more than you might think. The easiest way to cache such an int is like this:

PHP:
int class:someFunc()
{
    static int iCachedINT = GC.getDefineINT("some string");
This line will run the first time the function is called and since the int is static, the value will not be deleted once the program exits the function. The next time the function is called, the static int still has the old value. This approach will always increase performance and if it's a frequently called function it might make a difference reducing just one getDefineINT call. I think my record was to reduce the wait by 1% by just caching one int used in one line, but called over and over by the AI.

This turned out to be longer than planned and might be partially off topic (not directly related to the optimized DLL). However I feel like it should be mentioned in a thread about performance and I would like all DLL modders to be aware of this (or any programmer in general). Also this post has some general civ4 performance info which to my knowledge is not on the forum.
 
CPU speed is a factory default of 3.8Ghz. The K series can be overclocked significantly. Even though the 2600K is only 2nd gen i7 they are still sought out avidly by gamers. Ebay listings for them go quickly and average around $200 mark used. New are still selling in the $350 range.

EDIT:
This is very surprising and I would likely discard this result. The problem is windows and the file cache. If you read a file, it is copied from the disk to a buffer in memory.

Yes 1 load per .dll is way too narrow a sample. So it can be discarded. No problem.

The more RAM you have, the more files windows will keep in memory.

And that is why I have a "custom" paging file set up. To keep as much of my ram out of Windows grubby little hands.
 
Last edited:
So is interest to this ended completly?
I'll let T-brd give you his take on this subject as it's more his area.

But I would not say ended. but other parts of the Mod are Demanding attention and this has been put on the back burner so to speak. We have some big issues we are dealing with right now and it is consuming all our Modding time.
 
Updated to rev 9632

@Nightinggale i have access only as employee.

Moderator Action: Removed the link. Please do not link to, advocate for or in any other way support pirated versions of software. This site has a zero tolerance policy regarding pirated software. leif
Please read the forum rules: http://forums.civfanatics.com/showthread.php?t=422889
 
Last edited by a moderator:
Updated to rev 9632 Moderator Action: snipped link. leif
Please read the forum rules: http://forums.civfanatics.com/showthread.php?t=422889

@Nightinggale i have access only as employee , so you can get it from internet using illegal way or be part of some firm that bought it long time ago..
Used to be if you even suggested getting a "pirated" or "illegal way" of any program was an instant ban from the forum. Be careful what you post.

And just because your firm "bought it" does not mean you have the right to publicly distribute any work coming from it. Again very shakey ground you are walking on. Tread lightly.
 
Last edited by a moderator:
Back
Top Bottom