Increase Memory on 64bit Machines

Nightinggale · Jul 27, 2017

Oh while we are on the topic of performance, I might as well add this:

Civ4 was written in the days when most people had just one CPU core. This means it lacks code to utilize many cores and it will primarily use a single core. It will use another core for graphics and stuff if it's available, but you will not gain anything from having more than two. However it needs to get the most out of the core it's using, meaning it greatly favors high clock speed. This means the ideal CPU in any price tag is likely the one with the fewest cores. This is actually true for most games, including most new ones. I would however recommend 4 cores if you buy new hardware today because software is starting to use more cores. This means the most expensive CPUs aren't the ones, which plays civ4 the fastest. Instead they have many cores, which benefits specific tasks (often not gaming).

Civ4 is horribly written when it comes to reading from memory. It's not really using any of the guidelines, which is meant to maximize the effect of the CPU cache. As a result, it will have very frequent cache misses and as such often needs to read strait from the memory. Data throughput isn't important as it's often a matter of just 4 bytes, but latency is. In fact memory latency is likely the dominating factor in how fast a computer can play civ4. In fact memory latency becomes more and more important as the CPU speed increases. It's a myth that high bandwidth memory has high latency, but it can be the case, meaning it could get complex to get the right RAM.

I don't expect all of you to go buy new hardware after reading this, but this too is something I haven't seen written anywhere and since it's known, we might as well spread the knowledge. I'm quite sure that most people are unaware that it's not the most expensive hardware, which will play the fastest.

Leoreth · Jul 27, 2017

What do you think the most common bottleneck is these days? CPU or RAM?

Nightinggale · Jul 27, 2017

Leoreth said:
What do you think the most common bottleneck is these days? CPU or RAM?

The biggest bottleneck is RAM latency. CPU speed keeps on increasing, meaning waiting for RAM takes more and more CPU cycles. Hyperthreading is added to reduce the problem. It means one CPU core acts as two and while one is waiting, it switches to do tasks on the other. When that one starts waiting, it switch back to the first and hope data has arrived. This increases processing power by around 30% on a fully loaded CPU with even load on all threads. Multiple CPU cores is also a tool to hide latency as 2 cores@2 GHz will waste less time waiting than one core@4 GHz, even though they in theory have the same CPU power.

The problem with all those tricks is that they require many threads with ideally even load to work well. Particularly servers and databases with multiple users at the same time scales very well on multiple cores while games tend to still rely heavily on a single core. Because of this, low frequency 16 CPU cores exist and they should never be used for gaming.

Having said that, you can't really single out just one thing and say "focus on this" because if you focus on one thing and buy the cheapest of something else, the cheap component will become the bottleneck. CPU speed is really important in gaming, both turn based (wait time) and for games where you want a high frame rate. GPUs are insanely fast at some tasks, but can't do other stuff. The CPU is general purpose, meaning it can do everything, but will never do as well as a chip optimized to a specific task. This means GPU relies on the CPU for certain tasks and can stall while waiting for the CPU. This is often single threaded and if you buy a high end GPU and run some 3D game at 4K, you might encounter that upgrading the GPU will not increase the frame rate because the bottleneck is a CPU thread using 100% of one core.

I would say for gaming, you would want a high CPU clock speed and low latency RAM. If the game is heavy on graphics, fast GPU and RAM as well (high data throughput in RAM). Remember that the north bridge in the chipset includes data transfer to and from memory and as such is included in RAM quality.

Lots of memory is also good, even more than you need. I have seen people complaining about windows using a certain percentage of RAM and if they double the RAM, windows will just use more. It's actually not far from the truth. When you write to a file, you write to RAM and then application carries on. Some thread in windows will then write the contents from RAM to the actual disk. Likewise when you request data from the disk, windows will copy the date into RAM and the application will read it from RAM. What's important here is that those file copies stays in the RAM until the space is needed and the more RAM you have, the more files can stay there. If you happen to need a file, which already has a copy in RAM, it will just read that and things will be a lot faster. You can see the result of this. If you start a mod, it can take ages. If you quit once you reach the main menu and then start the mod again, if your RAM cache is working as intended, the game will start a lot faster the second time. As a result, plenty of RAM will help hiding slow HDs, though it will even improve on SSD reading speed.

Gianmarco Silvestro · Nov 20, 2019

Hello everybody, sorry to resurrect this thread. Is this file for 64 bit systems still working and useful?
Thank you in advance

Nightinggale · Nov 20, 2019

Gianmarco Silvestro said:
Is this file for 64 bit systems still working and useful?

I think so. At least the problem it's meant to fix is still present and the link appears to work too.

Leoreth said:
What do you think the most common bottleneck is these days? CPU or RAM?

Last time I answered RAM and I will do so again, but this time the explanation will be a bit longer.

Imagine having the calculation A+B+C+D. The CPU can only add two numbers at once, meaning it will do:
X = A+B
X = X+C
X = X+D
A modern CPU will reorder and do this:
X = A+B and Y=C+D
X = X+Y
Same result, but in two cycles instead of 3. Each generation of CPU will be even better at reordering the instructions to run more and more in parallel like this. As a result even if the RAM latency is the same in terms of CPU cycles, the amount of instructions the CPU can do while waiting for the RAM to respond will grow.

You really want a decent amount of CPU cache as well as low latency memory.

Nightinggale said:
This means the ideal CPU in any price tag is likely the one with the fewest cores.

This is no longer true. AMD released ryzen and now the war is on between AMD and Intel for having the best/fastest CPU. What they do now is they test the CPUs and try to overclock them. The well produced ones will be sold under some other name where they are the same, but with higher speed. AMD produces CPU cores on chiplets, meaning they can sort chiplets according to the speed they can perform. They place the fastest chiplets in the most expensive CPUs, meaning a 16 core CPU has better (faster) chiplets than an 8 core CPU. This isn't really a problem for heat either as they change the speed according to the CPU load and the change on a per CCX basis (one chiplet has 2 CCX. One CCX has 3 or 4 cores).

This means to get the fastest civ performance you either have to go with 8 core Intel or 16 core AMD. I would however question if going for those CPUs is cost efficient because we are talking about expensive CPUs here, like $750+ price range. You can get something a whole lot cheaper with minor performance loss.

It's however still true that you should care about the single core performance rather than the number of cores for civ4 purposes.

KeeperOT7Keys · Nov 21, 2019

Nightinggale said:
A modern CPU will reorder and do this:
X = A+B and Y=C+D
X = X+Y
Same result, but in two cycles instead of 3. Each generation of CPU will be even better at reordering the instructions to run more and more in parallel like this. As a result even if the RAM latency is the same in terms of CPU cycles, the amount of instructions the CPU can do while waiting for the RAM to respond will grow.

isn't this done by compiler though? I don't think CPUs can reorder the instructions on the fly. There is no way that a processor can auto-paralellize the code. since unless you do this for really large instructions this would result in more overhead and is risky for the processor.

JHLee · Nov 21, 2019

KeeperOT7Keys said:
isn't this done by compiler though? I don't think CPUs can reorder the instructions on the fly. There is no way that a processor can auto-paralellize the code. since unless you do this for really large instructions this would result in more overhead and is risky for the processor.

AFAIK, the hardware does not 'remake' given instructions, but they can 'reorder' them.
I don't think Nightingale was directly referring to assembly instructions with his example (but relatively higher-level works).
But having said that, perhaps a more intuitively appropriate example would be the following set of instructions.
X=A*B
Y=C+D
X=X+Y

if multiplication takes a lot of cycles, CPU will execute the second instruction without waiting for the first to be terminated.
By doing so, the third instruction can start almost instantly as soon as the first instruction is finished.

Nightinggale · Nov 22, 2019

KeeperOT7Keys said:
I don't think CPUs can reorder the instructions on the fly. There is no way that a processor can auto-paralellize the code.

Early on CPUs did execute the assembly directly (or rather the resulting machine code, but for this purpose it's the same thing). However it turns out that x86 isn't a good system for fast CPUs where one instruction can take multiple cycles. Also the x86 instruction set has become bloated with more than 1500 different instructions, something which would require an insane amount of hardware if you add hardware for each instruction. What happens instead is the CPU decodes the x86 instructions into some internal more primitive instructions meaning one x86 instruction can be converted into multiple internal instructions if needed.

Precisely how this is done on the fly is very complex, depends on the CPU and is a trade secret for each CPU company.

KeeperOT7Keys said:
isn't this done by compiler though?

Yes and no. The compiler will optimize a whole lot, but the out of order execution optimization depends on the CPU. For instance if your CPU has 3 adders, then it can do 3 additions in a single cycle. However the same code also has to work on a (possibly future) CPU with 4 adders. This CPU will then do out of order execution where it aims at 4 additions each cycle. For this reason you can't really make the compiler generate code, which is optimized for all CPUs. The compiler can do stuff, which optimizes for all CPUs, but the CPUs themselves have to do the last part about optimizing for that specific CPU.

Note that I wrote adder, not ALU. That's because an adder is part of an ALU and it's not uncommon to have more adders than you have full sized ALUs. Because adders are cheap in hardware, it makes sense to "spam" them and have many for parallel execution. Integer addition is actually really frequently used, most commonly read/write data to classes because the memory address is pointer to class+compile time offset for the variable in question. This means whenever you access member data in classes, you need hardware to add two numbers. This means integer additions are far more common than say integer division and odds are the hardware reflect this.

You can google IPC or "Instructions per cycle" for more on this, but while it's easy to find many pages about it, it's tricky to find good information. Usually hits are like "8% or 15% higher IPC than last generation" and that's all you get because the actual IPC for a CPU is not in the documentation meaning what you get will be a relative number.

While on this topic, it's also worth adding that hyperthreading (as in two cores for each physical core) is implemented as two cores sharing the units to do the actual workload. This means if the core has 4 ALUs, it can be 2 for each core, but it can also be 4 for one core if the other one doesn't need any ALU that cycle. Because it's really hard for a CPU to use all the hardware in parallel, allowing another core to use the unused hardware will increase the performance by around 30% without actually adding more hardware to do calculations. It's more of a tradeoff than a pure win though because not only can one program spy on another (like try to use all the hardware at once and then watch what gets queued and that way figure out which hardware the other program is using), it also bottlenecks the cache usage and increase the temperature. There are multiple valid reasons why you might want to turn hyperthreading off. It entirely depends on what you are doing. The Civ4 engine will not gain anything from hyperthreading as it only works well with certain types of multithreded workloads.

Enyavar · Dec 1, 2019

Has someone thought to put this tool to the recommended software list for running the mod/game?

This new tool could seriously help with loading times. And seems to not actually be "new".

Leoreth · Dec 1, 2019

It's linked in the pinned thread.

Playsoneasy · Feb 29, 2020

Late to the party, but I've got to say this really does seem to work.

I'm currently playing Rise of Mankind - A New Dawn, industrial era, on a giant continents map with more than 2 dozen civs and barb cities to boot, and turn times and the general feel of the game are a lot faster. Same goes for loading, saving and exit times too.

This can certainly make playing Civ4 in late eras on bigger map sizes a more viable option, even with mods.

Hickman888 · Feb 29, 2020

Playsoneasy said:
Late to the party, but I've got to say this really does seem to work.

I'm currently playing Rise of Mankind - A New Dawn, industrial era, on a giant continents map with more than 2 dozen civs and barb cities to boot, and turn times and the general feel of the game are a lot faster. Same goes for loading, saving and exit times too.

This can certainly make playing Civ4 in late eras on bigger map sizes a more viable option, even with mods.

Thanks for the bump, had no idea this existed!

Theophilos · Mar 1, 2020

mods for Third Age Total War in fact require people to do this, they have it labeled as "large address aware". If not done, the mods will crash eventually

So, for people on fence about using this, I recommend it

(at least it helps for that game, I do not know how it affects this game)

Akbarthegreat · Apr 25, 2020

need my speed said:
And while reading through a random thread a few days ago, I stumbled upon this, which is said to vastly increase the inbetween turn time speed (among other things, I guess?). I haven't tried it yet, but it sounds interesting.

rigo92 said:
On the other hand, RAMDISK seems to have a lot of potential. I installed it and created a 4 GB RAMDISK (the largest possible in the freeware version), however after copying vanilla Civ 4 and BTS I am unable to start BTS as it always gives an error along the lines of "failed to initialize python". Anyone managed to get it working? If somebody helps me out I will post a speed comparison pre and post RAMDISK.

Has anyone managed to get RAMDisk to run Civ4 successfully? I haven't run two different installations of civ4 before, so how does the game cope with multiple installations that (presumably) refer back to the same Documents/My Games folder?

Nightinggale · Apr 25, 2020

Akbarthegreat said:
Has anyone managed to get RAMDisk to run Civ4 successfully? I haven't run two different installations of civ4 before, so how does the game cope with multiple installations that (presumably) refer back to the same Documents/My Games folder?

I haven't tried and I assume it won't really benefit much. The issue is that it increases disk read/write speed, but the game turn delay is caused by CPU power and latency between CPU and RAM. There is no disk activity, at least not until it might autosave. It won't even increase reading speed during game startup because disk I/O isn't causing the slowdown. I tested that already.

What you can do to speed up the game is most likely to get low latency memory and/or tighten the memory timings. The CPU cache is optimized for sequential memory reads and the civ4 engine and most (all?) mods seems to avoid sequential reads whenever possible. The turn wait time is mainly about the old code design and old compiler not being optimized for modern hardware and in general could have been optimized better from the start.

Akbarthegreat · Apr 26, 2020

Nightinggale said:
I haven't tried and I assume it won't really benefit much. The issue is that it increases disk read/write speed, but the game turn delay is caused by CPU power and latency between CPU and RAM. There is no disk activity, at least not until it might autosave. It won't even increase reading speed during game startup because disk I/O isn't causing the slowdown. I tested that already.

What you can do to speed up the game is most likely to get low latency memory and/or tighten the memory timings. The CPU cache is optimized for sequential memory reads and the civ4 engine and most (all?) mods seems to avoid sequential reads whenever possible. The turn wait time is mainly about the old code design and old compiler not being optimized for modern hardware and in general could have been optimized better from the start.

Surprisingly, my problem isn't actually turn times. It's the large loading time for the game to start, and more importantly, very frequent crashes during the first turn after loading a save game. From what I've read online RAMDisk does seem to help with improving game startup times, but I'm not sure whether it will help with the crashes.

Nightinggale · Apr 26, 2020

Akbarthegreat said:
Surprisingly, my problem isn't actually turn times. It's the large loading time for the game to start, and more importantly, very frequent crashes during the first turn after loading a save game. From what I've read online RAMDisk does seem to help with improving game startup times, but I'm not sure whether it will help with the crashes.

First obvious question: do you have issues with any other games? It would make sense to first check if your computer is having issues if you suffer from issues worse than other people. Really slow disk access and crashes could be a corrupted disk or some other nasty error.

As for turn loading times, modern games are big. Say you have a 20 GB game (they exist) and you have to load 3D graphics into memory. This is usually bottlenecked by disk performance. If you have a mechanical disk with a 100 MB/s bandwidth or a RAMdisk with say 50 GB/s bandwidth. Now the game starts up and you need to read 10 GB, process it and then place some of it in the VRAM (the GPU RAM). Which disk will perform the task the fastest? That's where it's very beneficial, particularly if disk access is needed during the game. Decent NVME drives will also boost a lot here, though RAMdisk would be even fast and cheaper if you already have the memory.

However disk access isn't the issue with the civ4 engine. The disk is nearly idle. The slow start is a mystery because the startup, which can take minutes is not bottlenecked by CPU, RAM, disk or anything else I have measured. It looks like the computer is just sitting there mostly idle and waiting for a timeout or something before moving on to the next step, which needs to time out.

Starting, quitting and then starting again will make the game start much faster. This time it's CPU bottlenecked as a single CPU core will get 100% load.

My current setup (used for WTP) starts up the game consistently in 30 seconds and not faster if repeated. I'm not sure how I fixed it, but for some reason I don't have to wait minutes anymore.

Leoreth · Apr 26, 2020

Interesting context on the slow initial startup, or at least reassurance that it seems to be the way things are.

Panopticon · Apr 27, 2020

By start, do people mean the loading of a game save file? I always load, wait for Setup Map, then hit Windows key, program gets minimised, then click back into the program and it has usually skipped the longest loading step. If that helps.

Nyayr · Apr 29, 2020

More games have this issue when they have big mods with a lot of data being read at the start or setup it just takes long (The stalker series modded for example). Here also starting a game then quit and restarting it makes it go a lot faster (about 2-3 times) and also reduces the time needed to load saves (2-3 times), as well the chance to run out of memory for example. As for loading saves in game I generally (unless it is classical age or earlier they quick load works fast) but easily medieval times and beyond I quit the game and go to the main menu and load it from there cause late in game loading directly while playing takes literally 10-12 times longer.

Maybe the thing that it closes a game and loads a new one at the same time is heavy on the CPU? since it run only 1 core (If I'm not mistaken?).

Even with late games I sometimes get a CTD when going to the main menu after playing for a long time. unsure what causes this, I assume a memory issue

Panopticon said:
By start, do people mean the loading of a game save file? I always load, wait for Setup Map, then hit Windows key, program gets minimised, then click back into the program and it has usually skipped the longest loading step. If that helps.

True Loading is faster that way. Similar when you for example play a 3000BC game with a much later nation, setting it all up then minimizing it 'seems' to run faster to get to the age the civ spawns, rather then watching the ticks. But i'd have to time it, however it feels faster minimized.

Increase Memory on 64bit Machines

Deity

Bofurin

Deity

Chieftain

Deity

did nothing wrong

Prince

Deity

Prince

Bofurin

Chieftain

Prince

Warlord

Angel of Junil

Deity

Angel of Junil

Deity

Bofurin

Utilitarian

Prince

Similar threads