Civ 6 Multicore performance

Cromagnus · Nov 3, 2016

I have both a 6-core (Westmere) and a 12-core (Haswell-EP), so I did some utilization comparisons.

One caveat, the GPU in my 12-core is an older AMD, so I ran the game at 1600x900 with MSAA (and a few other full screen passes) turned off. I didn't want to turn off any rendering feature that would affect CPU utilization, so I left most things turned on.

It's possible the utilization wasn't 100% because the game was still GPU-bound. I'll have to go back and check the GPU utilization to know for sure. I'll update this thread after I do.

Anyway, I found that the benchmark fully utilized (at times) all 12 hyperthreads in my 6-core. The utilization in my 12-core maxed out at 20 hyperthreads, and was below 16 most of the time.

So, at the moment, at least, the sweet spot seems to be between 12 and 16 logical cores. But, keep in mind that DX12 might change this, and they could throw in future optimizations at any time.

Also, I have no idea how much bearing the benchmark has on actual performance. I haven't invested the time necessary to take a Huge map on Deity to the late game. (I would assume the most CPU resources would be used on Deity... if, as I would hope, the AI uses more CPU time on Deity than say Prince...)

I did play a Huge Deity map to turn 100 on Quick, and the game never even came close to fully utilizing *any* core, let alone all of them, even when waiting for next turn, mostly I assume because Quick Movement isn't instant...

Is there a way to run an AI-only game, so you can just let it run? If someone can show me how, I'll do comparisons on the late game. I really only bought the game to benchmark it... I'm waiting for the first major balance/bugfix patch to dive in.

Also, hello to all my old Civ5 compatriots.

dexters · Nov 3, 2016

Big gap between not using any core to using 20 hyperthreads but not to their fullest which isn't unexpected as game is specced for a 4th Gen core i5 (4 coree) or a core i3 as a minimum spec (2 cores 2 ht threads)

The multicore support is there as I can navigate menus and even pull up the diplo screens while the game is precessing a turn

This is where more cores is probably not better. A skylake i5 6600k might outperform your 6 core 12 thread westmere cpu on the strength of its per core performance.

I'm certainly hopeful they throw people with lots of cores a bone in the future. Civ5 launched without multicore support iirc. it was added later.

Cromagnus · Nov 3, 2016

PS If someone has a late game Huge map save they could upload... where turn times and rendering are really chugging, I'd love to run some perf comparisons... would save me the trouble of playing 10 hours+!

Thanks
-Cro

Cromagnus · Nov 3, 2016

dexters said:
Big gap between not using any core to using 20 hyperthreads but not to their fullest which isn't unexpected as game is specced for a 4th Gen core i5 (4 coree) or a core i3 as a minimum spec (2 cores 2 ht threads)

The multicore support is there as I can navigate menus and even pull up the diplo screens while the game is precessing a turn

This is where more cores is probably not better. A skylake i5 6600k might outperform your 6 core 12 thread westmere cpu on the strength of its per core performance.

I'm certainly hopeful they throw people with lots of cores a bone in the future. Civ5 launched without multicore support iirc. it was added later.

Yeah, there is a big gap, but I'm not particularly concerned with how a 4-core 8HT machine performs. I mean, most games take advantage of a system like that nowadays. The real question for me is how scalable Civ6 is, because turn times were by far my biggest frustration with Civ 5.

If a 10-core 20HT 3.3GHz outperforms a 4-core 8HT 3.3GHz of the same generation, this is very relevant information to me.

Also, IMHO they've already thrown us a bone by maxing out 12 hyperthreads. A game that wasn't multicore-friendly wouldn't even max out 8. I'm just curious to see how far it goes.

My second, ulterior motive is that I use a 12C/24HT machine for compiling, and a 6C/12HT for gaming right now. I'd love to ditch my old 6-core, and move my GTX1070 to the 12-core, but I won't until I see definitive proof that my 12-core will at least match it, performance-wise, in games. (My 12-core can't be OC'd, so only outperforms my 6-core when all 24 hyperthreads are engaged)

Larsenex · Nov 3, 2016

Cromagnus said:
PS If someone has a late game Huge map save they could upload... where turn times and rendering are really chugging, I'd love to run some perf comparisons... would save me the trouble of playing 10 hours+!

Thanks
-Cro

I may have one. I played for about 16 hours and won a Culture victory and went on to keep playing and nuke England cause..well .nukes! I still have that save.

Cromagnus · Nov 4, 2016

Further experimentation shows that the game appears to suffer from synchronization stalls. Watching the GPU and CPU perf while zooming in and out, I found the following:

Framerate doubled when I zoomed in, even though neither the GPU nor any single core ever went above 60% utilization. In my experience, you see situations like this is when two or more tasks are running in lockstep.

For example, let's take the common case (with naive rendering implementations) where the CPU builds up an entire frame of render commands before sending anything to the GPU.

The GPU will be idle while it waits for those commands, unless it is allowed to run in parallel while the CPU builds up the next frame of commands. If the CPU waits for the GPU to finish before building the next frame, you end up with the CPU idle for half the frame and the GPU idle for half the frame.

This situation is improved by multithreaded rendering, because it takes the CPU less time to build up the render commands. In DX11 mode, Civ6 appears to use two threads for rendering, as I see two threads spike to 60% when I zoom out.

Now, I'm just guessing here, but usually when the CPU and GPU run in lockstep it's because something in the game isn't properly double-buffered. Sometimes the problem is as simple as an erroneous GPU Wait command. (Waiting for the GPU to finish rendering the current frame, instead of waiting for it to finish rendering the previous frame)

The good news is, this is a relatively easy problem to fix, and so I would expect it to get resolved in a patch, and no matter what CPU/GPU you're using, if and when they do fix it, the framerate should dramatically improve.

What's bizarre to me, after experimenting with a variety of resolutions, is that the benchmark framerate is actually noticeably worse for me at 1024x768 than it is at 1600x900...

callan · Nov 4, 2016

I would be surprised if they are ever able to process the End Turn itself in multiple threads because under the current design the AI is deterministic which means that unless you execute the series of end-turn actions sequentially there would be no way to maintain sync in multiplayer games. Another way of looking at this is that, for any given function, if the result of a calculation is bound to the result of a previous calculation then all calculations within that function must be processed in linear time.

Cromagnus · Nov 4, 2016

callan said:
I would be surprised if they are ever able to process the End Turn itself in multiple threads because under the current design the AI is deterministic which means that unless you execute the series of end-turn actions sequentially there would be no way to maintain sync in multiplayer games. Another way of looking at this is that, for any given function, if the result of a calculation is bound to the result of a previous calculation then all calculations within that function must be processed in linear time.

Well, that assumes that each step can't be parallelized. It also assumes that the majority of AI decisions *would appear noticeably less intelligent* if they were relying on information that wasn't up-to-date.

So, for example, deciding which tiles each city should work could be parallelized in multiple ways.

1) Process cities with non-overlapping boundaries simultaneously

2) Process multiple AIs simultaneously, then resolve conflicts when each AI turn comes up, possibly recalculating if necessary.
This conflict would occur if you allow predictive tile-buying near other AI territory... they could expand into the tile, invalidating your action, forcing that one city to recalculate its tile operations.

In the case of relying on old information, there are many decisions which wouldn't appear noticeably less intelligent. Troop movements might, but research decisions, like researching a new unit when you get attacked, being one turn behind, wouldn't be as noticeable.

So, while I agree that turn-based gameplay does limit parallelization options, it's not as bad as you suggest.

tekjunkie28 · Nov 4, 2016

Remember in hyper-threading it is only an illusion.

The AI on deity is the same dumb AI on settler. there is no more "processing" Its not like he AI uses the processor to think.. its scripted.

Cromagnus--- The FPS doubled when zoomed in bc you are effectively NOT displaying the other areas of the map. The GPU is only drawing whats on the screen
-

dexters said:
This is where more cores is probably not better. A skylake i5 6600k might outperform your 6 core 12 thread westmere cpu on the strength of its per core performance.

I guarantee you the i5 skylake is faster then the westmere. Probably by 20% but I doubt more. If anything it might come out to the same.

Cromagnus · Nov 4, 2016

tekjunkie28 said:
Remember in hyper-threading it is only an illusion.

The AI on deity is the same dumb AI on settler. there is no more "processing" Its not like he AI uses the processor to think.. its scripted.

Cromagnus--- The FPS doubled when zoomed in bc you are effectively NOT displaying the other areas of the map. The GPU is only drawing whats on the screen
-
I guarantee you the i5 skylake is faster then the westmere. Probably by 20% but I doubt more. If anything it might come out to the same.

You're half right. The FPS doubled because the CPU wasn't generating draw calls for the part that wasn't displayed. As I mentioned, neither the CPU or GPU ever exceeded 60% utilization.

And it was the CPU utilization that jumped up the most significantly, not the GPU, when I zoomed out. Yes, they both increased, but the CPU usage increased more.

To oversimplify, leaving synchronization out of it for the moment, you can boil rendering down to two things:

1) The CPU cost of generating commands for drawing an object.
2) The GPU cost of drawing that object.

If your CPU generates commands faster than your GPU can process them, you're GPU-bound.
If your GPU processes commands faster than your CPU can generate them, you're CPU-bound.

In this case, my CPU wasn't generating calls as fast as the GPU was processing them.

However, the fact that the GPU and CPU couldn't operate in parallel was by far the bigger issue.

Since my framerate went from 60fps to 30 fps, and there was only 60% utilization of the CPU and 50% utilization of the GPU, you can do basic math to calculate how much faster the game would have run if the synchronization stalls were removed:

The GPU is less busy, so the 40% idle time of the CPU will be filled up before the GPU runs out of free time.

Therefore, the speedup from removing the stall would take the game from 30fps to (30*(1/0.6)) or 50fps. At that point, the GPU would be active for 5/6 of the time, or be 83% utilized. Assuming my math is right.

TehJumpingJawa · Nov 4, 2016

dexters said:
The multicore support is there as I can navigate menus and even pull up the diplo screens while the game is precessing a turn

That's not how multi-threading works.

TehJumpingJawa · Nov 4, 2016

Cromagnus said:
Well, that assumes that each step can't be parallelized. It also assumes that the majority of AI decisions *would appear noticeably less intelligent* if they were relying on information that wasn't up-to-date.

So, for example, deciding which tiles each city should work could be parallelized in multiple ways.

1) Process cities with non-overlapping boundaries simultaneously

2) Process multiple AIs simultaneously, then resolve conflicts when each AI turn comes up, possibly recalculating if necessary.
This conflict would occur if you allow predictive tile-buying near other AI territory... they could expand into the tile, invalidating your action, forcing that one city to recalculate its tile operations.

In the case of relying on old information, there are many decisions which wouldn't appear noticeably less intelligent. Troop movements might, but research decisions, like researching a new unit when you get attacked, being one turn behind, wouldn't be as noticeable.

So, while I agree that turn-based gameplay does limit parallelization options, it's not as bad as you suggest.

Determinism was the crux of callan's post.
You can't go processing AI decision trees using arbitrarily out-of-date information because you destroy determinism.

dexters · Nov 4, 2016

TehJumpingJawa said:
That's not how multi-threading works.

Eh. devs stated specifically when Civ5 received multicore support that it allowed players to navigate menus ibt

So at least a core is dedicated to handling menu and UI functions and that works independemtly from the cores processing the turn.

There is some debate up thread about how much multi core/thread can reduce turn times but that is a separate issue. My post is only confirming they assigned a core to handle ui allowing players to flick through menus and even enter into diplomacy screens while the game is crunching ai moves in civ6

Not that hard to understand

Cromagnus · Nov 4, 2016

TehJumpingJawa said:
Determinism was the crux of callan's post.
You can't go processing AI decision trees using arbitrarily out-of-date information because you destroy determinism.

True. You have to be selective about what information you're ok with being out-of-date. But games are merely simulations, and the only thing that matters is the approximation of intelligence, so there's some leeway.

Simultaneous decision-making with correction is perfectly valid. Players do it all the time. You look at the cards in your hand when you draw and start planning your next action. By the time your turn comes around, some things may have occurred that will change your final decision, but parts of your internal decision tree are still valid. Hence the correction element. Just because events have invalidated parts of the tree doesn't mean I have to recalculate everything.

Traversing the tree structure is in part expensive because of the calculations at each decision point, not just the number of decisions. Some of those calculations will not be affected by changes. If the AI can avoid recalculating things that haven't changed, then there is value in simultaneous calculation. So, I planned to send a worker to this tile. By the time my turn came around, someone attacked me and that tile is occupied. That decision was invalidated. But maybe the workers in my other 12 cities still do what they already planned to do. Such a system would be quite sophisticated but not impossible.

The other thing to consider is that each AI has a limited sphere of awareness, just like the player. They don't have to react to every event. Players miss things all the time when the board is very active.

So, if you have idle cores, why not kick off the next AI's turn without waiting, then make corrections. It's not perfectly parallel, but it can still be a big win. IMHO.

tekjunkie28 · Nov 4, 2016

I can confirm that the game only uses 2 cores 50% each and the other 2 cores just sit at 20-30% each. When a turn is processing all 4 get used but none ever spike crazy high. Also the game has 32 threads running.

i5 4670k @ stock
EVGA GTX 1070 SC

tekjunkie28 · Nov 4, 2016

Cromagnus said:
So, if you have idle cores, why not kick off the next AI's turn without waiting, then make corrections. It's not perfectly parallel, but it can still be a big win. IMHO.

What your saying is have the AI do their turn in the background while you are moving your units and blah blah blah?? That sounds good but in reality it would be horrible. The AI is reactive to your units positions and what you do.

Cromagnus · Nov 4, 2016

"Remember in hyper-threading it is only an illusion."

This couldn't be further from the truth. 2 hyper-threaded cores (4 logical) are provably much faster than 2 single-threaded cores in most real-world scenarios. I won't go into the details of instruction pipelining or cache & memory latency, but hyper-threading usually benefits most games because of the way their code is structured.

The only programs that don't benefit from hyper-threading are those with perfect cache prediction and pipelining, which basically never happens in real life except on compute farms. It's a nice goal, and we should all endeavor to write our code (and compilers) with this in mind, but that just ain't the way it goes down with non-linear data structures.

For years AMD tried to get away with splitting one core in half and calling it a "two core", but those cores had reduced access to the resources they need, meaning that the best case performance when the other core is idle is half what it would be with hyper-threading. Thank god AMD finally woke up and implemented hyperthreading. Maybe now that they have Zen they can become competitive again. Hyper-threading. It's what's for dinner.

Cromagnus · Nov 4, 2016

tekjunkie28 said:
What your saying is have the AI do their turn in the background while you are moving your units and blah blah blah?? That sounds good but in reality it would be horrible. The AI is reactive to your units positions and what you do.

I'm saying they can start doing the work ahead of time. And not just during your turn, during each other's turns as well. If your computer has the spare cycles that is. Yes, your actions may invalidate some of their decisions but not all of them.

Plus, a lot of an AI's turn can be done in parallel with their other actions. The long turn times are mostly an issue on Huge maps with tons of cities and units. As a player, I'm forced to perform actions one at a time. The AI doesn't have this limitation. If an AI has 20 cities, and 100 units in 3 distinct theatres of war, plus 50 more in defensive positions, there is great potential for them to calculate a lot of those actions in parallel. multiple pathfinding queries, build queue decisions, builder actions, tile assignments... You can also calculate branches of a decision tree in parallel. It's no cure for overly complex decision trees, but think about it. If traversing the tree takes 20 seconds, and you can break the tree down into four sub-trees that you traverse at once, you can conceivably traverse it 4x faster. This is a non-trivial improvement, despite being a brute force solution.

There's tons of room for improvement. IMHO.

tekjunkie28 · Nov 4, 2016

As for hyperthreading

Cromagnus said:
"Remember in hyper-threading it is only an illusion."

This couldn't be further from the truth. 2 hyper-threaded cores (4 logical) are provably much faster than 2 single-threaded cores in most real-world scenarios. I won't go into the details of instruction pipelining or cache & memory latency, but hyper-threading usually benefits most games because of the way their code is structured.

The only programs that don't benefit from hyper-threading are those with perfect cache prediction and pipelining, which basically never happens in real life except on compute farms. It's a nice goal, and we should all endeavor to write our code (and compilers) with this in mind, but that just ain't the way it goes down with non-linear data structures.

For years AMD tried to get away with splitting one core in half and calling it a "two core", but those cores had reduced access to the resources they need, meaning that the best case performance when the other core is idle is half what it would be with hyper-threading. Thank god AMD finally woke up and implemented hyper-threading. Maybe now that they have Zen they can become competitive again. Hyper-threading. It's what's for dinner.

Negative ghost rider... Hyper threading is the illusion of an additional core to the OS. The only time hyper-threading is even close to double speed is if there is 2 operations going on as the same time that utilized different parts of the core. The premise of hyper-threading originally was that intel found that only about 60-80% of the core was being used at any one time so to fool the OS into thinking there was more cores was hyper threading.

If hyper-threading was really that good I would have spent the extra $100 on the 4770k but its not worth it.. In fact i specifically didnt get the hyper threading model because of issues with stuttering in gaming. Also I suggest you look up hyper-threading for gaming. I can list titles that have known performance degradation with hyper-threading turned on if you want a list?

tekjunkie28 · Nov 4, 2016

Cromagnus said:
I'm saying they can start doing the work ahead of time. And not just during your turn, during each other's turns as well. If your computer has the spare cycles that is. Yes, your actions may invalidate some of their decisions but not all of them.

Plus, a lot of an AI's turn can be done in parallel with their other actions. The long turn times are mostly an issue on Huge maps with tons of cities and units. As a player, I'm forced to perform actions one at a time. The AI doesn't have this limitation. If an AI has 20 cities, and 100 units in 3 distinct theatres of war, plus 50 more in defensive positions, there is great potential for them to calculate a lot of those actions in parallel. multiple pathfinding queries, build queue decisions, builder actions, tile assignments... You can also calculate branches of a decision tree in parallel. It's no cure for overly complex decision trees, but think about it. If traversing the tree takes 20 seconds, and you can break the tree down into four sub-trees that you traverse at once, you can conceivably traverse it 4x faster. This is a non-trivial improvement, despite being a brute force solution.

There's tons of room for improvement. IMHO.

Their calcualtions are done in parallel when you click the turn button. The end turn button might as well be called the Calculate Turn button because that is exactly what it is. Yes there is room for improvement but how? Its turn based not RTS style. The AI doesn't 'think' so there is no reason to start processing their turn while your doing yours. That would be a nightmare to implement.

Civ 6 Multicore performance

Deity

Gods & Emperors

Deity

Deity

King

Deity

Warlord

Deity

Prince

Deity

Warlord

Warlord

Gods & Emperors

Deity

Prince

Prince

Deity

Deity

Prince

Prince

Similar threads