[IMPORTANT] Multi-threading in WTP mod - [required to run the mod]

raystuttgart

Civ4Col Modder
Joined
Jan 24, 2011
Messages
9,638
Location
Stuttgart, Germany
Hi guys,

please note that the DLL you can currently build from "New Hope" already uses multi-threading now, since it was merged with "develop2".
(It is supposed to dramatically increase performance to values currently unknown by any other Civ4BTS and Civ4Col mod.)

This branch uses Intel Threading Building blocks to achieve concurrent AI calculations in order to speed up the AI (inter)turn.
Thus you need to follow the instructions so the DLL will work once you want to use "New Hope".

-----

Instructions:

Copy tbb.dll and
tbbmalloc.dll from "\Project Files\tbb" to the directory where Colonization.exe resides.

-----

This is currently unrelated to our Players, since new Hope is just used internally.

Once New Hope will be published, this is then of course also relevant to our Players then.
 
Last edited:
Wow ... I just tested the performance change of our new Multi-Threading in Autoplay (with "New Hope") ... :eek:

I have never seen 300 turns Autoplay executed that fast ... I would estimate that it was twice as fast as before.
(I did not measure the time - also I do not have a time measurement of a "before-value" to compare.)

Edit:
50 Turns Autoplay
in about 10 seconds !!!!!
(I have not yet tested longer Autoplay sessions with time measurment. Also just tested with MapSize "Normal".)

:bowdown:@devolution :bowdown:
 
Last edited:
This seems really great! Already quoted it at AdvCiv in case it may be ported there too.

Btw reading some of the features implemented/planned, have seen performance limitations at multiple places. Or discussions like "better to not implement X thing that way, even if it would be better, because it would produce more slowdowns between turns".

Have you had any discussion about revisiting features considering the new performance gains? i.e. if it's really a 50% improvement, that leaves a lot of room to use part of the gains to include features previously discarded.
 
i.e. if it's really a 50% improvement, that leaves a lot of room to use part of the gains to include features previously discarded.
Multi-threading cannot be applied to anything, it is actually pretty tricky and a lot of effort to do so.
In many cases it is also not possible to actually gain any performance from it, since the task cannot be separated to run independently in parallel.

----

Currently Multi-Threading is e.g. applied for:
(At least that is what I understood.)

1) some of the performance heavy City Logic (e.g. Professions logic)
2) some of the performance heavy Unit Logic (e.g. Pathfinding logic)

-----

In other words it is not a tool you can use for everything. Parts of the code cannot be parallellized.
And to use it requires to "reprogram" the according logic to actually make use of Mulit-Threading.

-----

It is not a magic plug&play that just makes everything faster with little ease.
It is actually quite a lot of effort, needs a lot of design thinking and is potentially also risky to use.

------

So yes, we might apply Multi-Threading to more features in the future. :)
But it is a lot more effort and complexity involved to do so than it may have sounded.

You really need to consider wisely for which part of the code you want to invest all the effort.
(There are huge parts of the code that need to be treated sequentially or otherwise it will break the mod.)

------

But @Nightinggale and @devolution can explain this better. :thumbsup:
 
In order to understand multi-threading, first I will explain single threading and how it's important for the game to work properly.

The code assumes same input, same output. If you load a savegame 1000 times and do the same each time, you expect the same thing to happen (assuming same random seed). Single player and in particularly network games relies on this assumption.

Now let's assume the AI has 900 gold and it wants to buy two units, A for 500 and B for 600. When single threaded, it will consider A, buy it and when considering B it has (900-500=400) so not enough and it ignores B. Now if considering A and B in two threads, we have no idea which one will go first meaning sometimes it will buy A and sometimes it will buy B. In network games some computers will buy A while the rest buys B and we have a desync. Even worse if they by chance happen to run at precisely the same time, this is what happens:
  1. C1 reads money (900)
  2. C2 reads money (900)
  3. C1 buys A and writes 900-500 = 400 as money
  4. C2 buys B and writes 900-600 = 300 as money
Now because both threads used a local storage for money the player ended up with both A and B and 300. Same thing can happen where it ends up with 400. To solve this case, money can be locked to only be used by one thread, but then a thread can stall because it's waiting in queue. Also to lock data like that, the thread locking it announces to the other CPU cores that it is locking and then it has to wait to see if another one complains because two cores tried to lock at the same time. This communication when locking can eat up more time than multi-threading saves if you aren't careful. Also it doesn't solve the issue of sometimes A is bought and sometimes B.

To get around this, the game is essentially still single threaded, but once in a while there is a single task, which is split into multiple threads. This can be when the AI decides what to build in a city. It loops through all candidates to determine the value of buying each unit, each building etc. If the resulting values are stored in a list, then we will not care which order the list is filled. Also since such calculations will not alter game state, it will not write to shared memory and no locks will be need. Do note that using random numbers will write to shared memory that a random seed was used and requesting a new one. This means two threads both calling a random number will get random seeds depending on which one asks first.

This means to benefit from multi-threading, you will need to identify parts of the code, which can be executed out of order and won't fight each other for memory access. On top of that it needs to be slow enough to justify the overhead of splitting into multiple threads. Such places in the code will be highly mod dependent and usually can't be transferred from one mod to another. I wouldn't even assume being able to copy paste into the RAR source code even through it is in many ways similar, it's no longer similar enough to copy paste.

In short while the concept can be copied as well as the two tbb dll files needed to run it, the actual implementation, which makes the difference is a rewrite, not a copy paste.
 
I believe the develop2 branch also introduced the K-Mod pathfinder(?), plus maybe some other optimizations, so it might be that the bigger portion of the speed gained is not actually due to multi-threading.

There is one place, in KmodPathFinder::ProcessNode, that looks quite (back-)portable to any BtS mod based on K-Mod:
KmodPathFinder.cpp#L510
That would fall under this item:
2) some of the performance heavy Unit Logic (e.g. Pathfinding logic)
Correctness is not immediately obvious to me, but I suppose you would've been getting units stuck in loops and other issues if the pathfinder were behaving inconsistently. But it's impossible to say, from looking at the code, how much time this one optimization saves (if any). Given that pathfinding in my mod currently takes up "only" something like 15% of an AI turn, I'd expect to gain at most a few percentage points, not enough to justify the implementation effort (and introducing an external library). I would indeed have to go through this (rather challenging) process:
This means to benefit from multi-threading, you will need to identify parts of the code, which can be executed out of order and won't fight each other for memory access. On top of that it needs to be slow enough to justify the overhead of splitting into multiple threads. Such places in the code will be highly mod dependent and usually can't be transferred from one mod to another.
 
Given that pathfinding in my mod currently takes up "only" something like 15% of an AI turn
Back when we used vanilla pathfinding and just one thread, AI would spend more time on pathfinding than all other tasks combined. This likely has to do with a number of the WTP maps being rather big. Pathfinding calculations are anything but linear with size.
 
In order to understand multi-threading, first I will explain single threading and how it's important for the game to work properly.

The code assumes same input, same output. If you load a savegame 1000 times and do the same each time, you expect the same thing to happen (assuming same random seed). Single player and in particularly network games relies on this assumption.

Now let's assume the AI has 900 gold and it wants to buy two units, A for 500 and B for 600. When single threaded, it will consider A, buy it and when considering B it has (900-500=400) so not enough and it ignores B. Now if considering A and B in two threads, we have no idea which one will go first meaning sometimes it will buy A and sometimes it will buy B. In network games some computers will buy A while the rest buys B and we have a desync. Even worse if they by chance happen to run at precisely the same time, this is what happens:
  1. C1 reads money (900)
  2. C2 reads money (900)
  3. C1 buys A and writes 900-500 = 400 as money
  4. C2 buys B and writes 900-600 = 300 as money
Now because both threads used a local storage for money the player ended up with both A and B and 300. Same thing can happen where it ends up with 400. To solve this case, money can be locked to only be used by one thread, but then a thread can stall because it's waiting in queue. Also to lock data like that, the thread locking it announces to the other CPU cores that it is locking and then it has to wait to see if another one complains because two cores tried to lock at the same time. This communication when locking can eat up more time than multi-threading saves if you aren't careful. Also it doesn't solve the issue of sometimes A is bought and sometimes B.

To get around this, the game is essentially still single threaded, but once in a while there is a single task, which is split into multiple threads. This can be when the AI decides what to build in a city. It loops through all candidates to determine the value of buying each unit, each building etc. If the resulting values are stored in a list, then we will not care which order the list is filled. Also since such calculations will not alter game state, it will not write to shared memory and no locks will be need. Do note that using random numbers will write to shared memory that a random seed was used and requesting a new one. This means two threads both calling a random number will get random seeds depending on which one asks first.

This means to benefit from multi-threading, you will need to identify parts of the code, which can be executed out of order and won't fight each other for memory access. On top of that it needs to be slow enough to justify the overhead of splitting into multiple threads. Such places in the code will be highly mod dependent and usually can't be transferred from one mod to another. I wouldn't even assume being able to copy paste into the RAR source code even through it is in many ways similar, it's no longer similar enough to copy paste.

In short while the concept can be copied as well as the two tbb dll files needed to run it, the actual implementation, which makes the difference is a rewrite, not a copy paste.
this explanation + example should be included in college
 
If I run -CPUCount:1 (or any number less than or equal to available cores) as the launch options for the colonization exe; how will the effect multi threading?
I know multi threading is not the the same as multi core, but just wondering how this works in practice.

Thanks.

Edit: After rereading I realize that in fact it is only ever using one core, and this is some advanced optimizing code. So the launch option would make no difference. Correct?
 
Last edited:
Edit: After rereading I realize that in fact it is only ever using one core, and this is some advanced optimizing code. So the launch option would make no difference. Correct?
Yeah most of the time only a single thread is used. For certain tasks all cores are used and I don't think you can disable that without altering the code itself. I also don't see the need to introduce a feature, which will make the game run slower as the default is to use all cores.
 
Top Bottom