I have some ideas on how to improve performance. For starters I want to flatten the memory layout of the xml data. The idea is to not have to touch the xml files themselves or the code, which makes use of the xml data, but just the xml storage layout itself. The reason is the same as
Factorio's fluidbox optimization, which actually happened after I came up with the idea. It will improve speed in cases like the AI looping through all BuildInfos to figure out what a pioneer should build. Currently it will read each build info from a random location in memory, but by placing the data linearly, the RAM interface of the CPU can figure out a reading pattern and read data before the CPU needs it, hence less waiting time for the data to arrive.
Another concept is hardcoding xml data into the DLL. I made something of that nature in Medieval Conquest. The idea is that instead of looping X times where X is read from memory, the compiler knows the loop will take place maybe 3 times and then the compiler can optimize the code accordingly, maybe by writing the loop code 3 times and not include the loop overhead. The tradeoff is it will break if anybody changes certain xml data meaning there should be a version for xml editing and one for fast gameplay.
I also want to add a memory pool to reuse memory allocations instead of releasing and allocating all the time. When used frequently with the same sizes (mainly one int for each yield), the same memory will be used over and over, which will boost performance. Making it part of JustInTimeArray means the code using it won't have to be updated to use it.
That's 3 ways of boosting performance without touching the code itself, only "support code". It's not like we don't have ideas on what to do regarding performance.
I think the main problem that WTP has right now is the lack of active programmers
This seems to be the main issue.
unless some genius invest a massive amount of time re-coding everything...
Not everything. What would be nice would be some clever person (perhaps genius) who can rewrite pathfinding. It's A* pathfinding, which is widely used in games and in principle isn't too bad. The issue is it will always seem to be the slow part in games and getting it to run fast, ideally with multiple CPU cores is problematic. To get an idea of the problem, I propose reading about
Better Pathfinder mod for Rimworld. It shows A* pathfinding and the needed number of plots each approach need to look at. Many plots = slow. Low number of plots = stupid pathfinder with weird and slow routes.
If we can have a proper pathfinder, most of the performance issues would likely be gone.
Note that some Civ4 mods like Caveman 2 Cosmos did implement multithreading for unit processing but for various reasons they did not achieve any meaningful improvements in performance.
Looks like it's mainly thread locks, which caused issues as threads are apparently waiting for each other. Quoting C2C from a week ago.
The multithreading in C2C is really inefficent causing lots of waits and it is no longer needed. It was added years ago them the turn times where alot longer as they are now. The reason for those long turn times was really inefficient coding in some places which i optimized alot.
That multithreading will be removed in a big commit including some other performance improvements some time after v39 is released .
That's not saying multithreading is bad. It's awesome if done correctly. What's important to remember is that there are correct and incorrect ways of implementing threads. Doing it correctly is a difficult task and the approach
@devolution brings to the table seems like it has a good chance of boosting performance. The question is if we can apply it successfully to the slow parts.