1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

Civ 6 Multicore performance

Discussion in 'Civ6 - General Discussions' started by Cromagnus, Nov 3, 2016.

Thread Status:
Not open for further replies.
  1. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    I ran the game at 1600x900. GPU wasn't a factor. GPU never exceeded 50% utilization.

    The fact that neither CPU nor GPU goes over 100% utilization is a strong indicator that the bottleneck threads are just waiting around, pointing to a poor multithreading implementation.

    Even after turning off hyper-threading, dropping down to 6 cores, no core got close to 100% utilization on my 4.4GHz. That's just not how it should work. If a game is built with proper thread synchronization, there should always be at least 1 core (CPU or GPU) at 100% utilization unless you have V-Sync turned on. If that isn't the case, you have a game that could dramatically benefit from threading improvements. Which doesn't surprise me terribly, after all it did just come out.

    What really makes me sad is the AI benchmark never exceeding 50% utilization on any core. Probably in part due to the "Quick Combat" and "Quick Movement" not being instantaneous. But it could also indicate that betwen-turn AI performance is bottlenecked by synchronization with rendering. :p
     
    TheMeInTeam likes this.
  2. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    I played around with clock speed, core count and hyper-threading, and came up with some recommendations based on the current state of the game... or rather they haven't changed: 4-core w/ HT or 6-core w/o HT is the sweet spot. Anything more than that is a waste for this game right now. The min clockspeed before you'll see performance issues at 60fps (no matter how many cores) is roughly 3.2GHz on an older Intel, 2.5Ghz on a newer Intel.

    So, basically, as usual, Skylake is your best bet, for now, which is a disappointment. I'll have to hope they improve the threading.

    In my testing, the 12-core Haswell-EP (2014) hit 100% utilization on one thread at 2.5GHz. On the 6-core Westmere (2010) I had to reduce the clockspeed to 3.2GHz to get one core to hit 100% utilization. (Only one core was at 100% in these tests)

    Above those clockspeeds, no single core was ever maxed out... at 18-19sec/turn for the AI benchmark, running at 4.4GHz the average utilization was 40-50% per core. In a well-synchronized game, it should have been at 100% on at least one core...

    Reducing core count didn't hurt the frame rate on either machine until I got down to 4 cores (8HT) and 6 cores (0 HT) respectively, so basically, if you have a 4-core, run it with hyper-threading turned on. If you have a 6-core or better, run it with hyper-threading turned off. If you have an 8-core or better that has a variable turbo based on number of active cores, you might be better off only enabling 6 cores, unless you also have thermal-drive core switching active in the BIOS.

    FYI I found that DX12 ran slightly faster than DX11 for me. (10% maybe)

    In GPU-bound scenes, I saw a much bigger improvement though. (Both DX11 and DX12 ran faster for me than DX11 did prior to the patch)

    In CPU-bound scenes, DX11 scored exactly the same for me.

    My old AMD card (7870) saw the biggest performance improvement in DX12. My GTX1080 benefited more in DX11.
     
  3. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    Here are some issues that would hurt perf like this...

    1) The game could be mutex-bound. A lot of older engines that were remodeled for multi-threading suffer from this. Because the engine wasn't designed to be thread-safe, parts of it must be protected inefficiently. When one thread is accessing that resource, the other cannot. If there are enough of these waits, both threads can spend 50% of the frame waiting. (Since only one can be active at a time)

    2) The threads might run in lockstep. The worst possible way to thread a game is to not let the render thread and the main thread run at the same time. This can also apply to the GPU. The main thread runs, then waits. Then the render thread runs, then waits. Then the GPU runs, then waits... then the main thread runs again. This is necessary if assets aren't double or triple-buffered.

    3) The engine might be sleep-bound. In this scenario, the one thread kicks off a worker thread, keeps processing, then waits for the worker to finish. If the worker thread has significantly more work to do, the main thread waits for the other to finish before it can proceed, doing nothing.

    There are a lot of other more arcane scenarios but I doubt it's that complicated. My guess is that in the case of Civ6 it's all three. :p
     
    Last edited: Nov 18, 2016
    Mglo likes this.
  4. Victoria

    Victoria Regina Supporter

    Joined:
    Apr 11, 2011
    Messages:
    11,375
    I did some tests a few weeks back. As I remember 31 threads were created. thread 0 was the heaviest user followed by the thread I was using. It seems through careful observation that each civ had its own thread and there were graphical threads.
    In essence 2 heavy threads and many smaller one of varying degrees. I just used Perfmon, great internal tool.

    I used GPU-Z to ensure my graphic card was not getting in the way. This is a simple free great downloadable I would recommend.

    As this game is turn based the only mutex I would imagine of inconvenience would be graphical. I saw no spikey evidence of lockwait states.

    In essence you need a average to better graphics card or that will be the bottleneck the specs shown seem to be about right. The better the clock speed the better it will be with about 2GB ram, not sure more helps. (not ratified that as mine is 2)
    For the CPU, the more threads you have the better (up to maybe 36) and typically its 2 to a core on Wintel. and naturally limit what else is running at the time. If a thread can stay in a core it does help performance, maybe for the smaller usage threads, not so you would notice but technically it does.

    What was very clear is the time between turns was mainly down to graphical displays of units moving and fighting so turning those down/off and not having so many allied states/cities sped up the turn significantly

    While this game remains turn based 2 threads are likely to be used more than the others.
    Will clock speed help?.... yes but not so you would notice just like threads swapping back to the same core helps but we talk micro differences. Do more cores help?... we obviously you need a few threads based on what I have seen and perhaps a sweet spot of 4 core is good enough. But more will help in micro ways and as i said often there are other tasks running on your comp and life is a fair share to a large degree.
     
    Last edited: Nov 18, 2016
  5. Gumbolt

    Gumbolt Phoenix Rising

    Joined:
    Feb 12, 2006
    Messages:
    23,318
    Location:
    UK
    The reality is the game makers have not planned games to run more than 4-8 cores as the CPU have only just to started to exist. For last 10 years most CPU had been between 1-4 cores. Even now most of the 8-16 core CPU will still be HEDT users or even business server based. Most old games will not even know how to use multiple cores properly. Civ 6 seems to be badly designed for this from what you have suggested above. Not sure any patch or expansion will resolve this?

    It will be at least 12-24+ months before Intel moves to 6-8 cores as standard. These Skylake+ CPU will certainly not be mainstream. Will developers really design a game that could use 8-16 cores/threads when most won't have a PC to support this? I think the overall design of games needs to change to allow for these changes but I wouldn't expect miracles over night. The specs for civ 6 for basic play could be for systems at least 6-8 years old. This is where the mass market currently is. This is where the sales start too.

    That being said buying a Skylake CPU now limits upgrade chances later as I don't see Intel releasing a 6-8 core Skylake or Kabylake that will fit on this existing socket? Why would they with socket 2066 planned for Skylake+? Coffee lake is rumoured to be 6 core. What socket might this use? Cannon lake is rumoured to be based on a new design too. In some ways this could put me off Skylake and Kabylake if I wanted 6-8 HT CPU. Of course like most things there are trade offs.You can never completely future proof a system. Performance of Zen is far from guaranteed.
     
  6. Victoria

    Victoria Regina Supporter

    Joined:
    Apr 11, 2011
    Messages:
    11,375
    If you mean me, that is not what I meant to suggest.

    The benefit you will gain from more than 4 cores on this game you will barely notice. This is because it is turn based. yes it can use up to 16 cores to be most effective BUT the benefit you will not notice in game. It is too small and 8 of those 31 threads do not work simultaneously because each player is turn based.

    As I understand it Civ IV was no different, they were coding 30 threads + back then

    I am quite happy to eat my hat if someone can prove to me it is so. Its just that the less used program threads have to share the core threads more. My 8 core did not run over 20% overall and only one core was heavily utilized.
     
  7. Gumbolt

    Gumbolt Phoenix Rising

    Joined:
    Feb 12, 2006
    Messages:
    23,318
    Location:
    UK
    No not mean't at you. All general comments. Oh course these comments are purely focused on civ 6. Other non turn based games may well be more efficient with cores. Otherwise why buy multi core cpus?
     
  8. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    There's still hope. I don't need Civ6 to take advantage of 8-12 cores. What I want is two things:

    1) I want the game to run 60fps on my 2.0GHz 12-core when it's not GPU-limited, just like Doom, Rise of the Tomb Raider, Battlefield, and a lot recent games that have been designed to remove the CPU rendering bottleneck.
    2) I want "Next Turn" times to be limited by AI processing, not animation speed.

    Yes, in an ideal world, I'd like the AI processing to scale with more cores. But I'd settle for the above.

    EDIT: Think for a moment about how many games run at 60 fps on PS4, which is basically a 1.6GHz AMD Jaguar, and it probably has 1/3th the processing power per-core of a Haswell-E at the same clockspeed...

    This is why I think it's not unreasonable to expect games to run at 60fps on the CPU with a 2.0Ghz Haswell-E.
     
    Last edited: Nov 18, 2016
    Mglo likes this.
  9. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    Here are the benchmarks I ran. Specs are:
    CPU: X5670 @ 4.4Ghz
    GPU: GTX 1070
    1600x900 (GPU never exceeded 70% utilization)

    AI benchmark
    4C/4HT: 18.3 seconds/turn
    4C/8HT: 19.1 seconds/turn
    6C/6HT: 18.2 seconds/turn
    6C/12HT: 19.5 seconds/turn

    DX11 Graphics Benchmark:
    4C/4HT: 15.1ms (66 fps)
    4C/8HT: 12.8ms (78 fps)
    6C/6HT: 12.7ms (79 fps)
    6C/12HT: 12.8ms (78 fps)

    DX12 Graphics Benchmark:
    4C/4HT: 14.3ms (70 fps)
    4C/8HT: 12.6ms (79 fps)
    6C/6HT: 12.1ms (83 fps)
    6C/12HT: 11.8ms (85 fps)

    As you can see, there's a slight improvement going from DX11 to DX12.
    The AI benchmark runs better with hyper-threading turned off, and doesn't even fully utilize 4 cores. (Which makes sense. If 6 cores ran faster than 4, 4C/8HT wouldn't run slower than 4C/4HT)
    The Graphics benchmark shows that DX11 only really needs 4C w/ HT or 6C w/o HT, there's no improvement after that.
    The game behaves better with DX12... performance improves with 6 cores, and performs even more with 6C/12HT, which means for DX12, the sweet spot is probably 8C/8HT. (I'm just guessing on that)

    I didn't bother to take full benchmarks with my 12C/24HT because it ran the AI test in 21.5 seconds/turn, and couldn't hold 60 fps with the DX12 benchmark. Still, DX12 was an improvement. (Up to ~50fps from ~44fps)
     
  10. Gumbolt

    Gumbolt Phoenix Rising

    Joined:
    Feb 12, 2006
    Messages:
    23,318
    Location:
    UK
    Is this using the new patch??

    Looks like using 6C 12HT shows to be quicker on DX12. Also some gain in performance from using more cores. Albeit just over 20%.
     
  11. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    Yes this is with the patch. FYI to use DX12 I had to right-click on the game in the Steam Library. The DX12 .exe ends up launching DX11 for me. YMMV.
     
  12. snake_xiongyang

    snake_xiongyang Chieftain

    Joined:
    Nov 8, 2008
    Messages:
    9
    the test indicate the implement is very bad.
    I guess it use multithread AI, but all the threads try to acquire one global mutex. (multi threaded ,but only one work at same time)
    It is even worse than single thread AI.
    2K should pay more money for better coder
     
  13. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    If I were to hazard a guess, based on experience, (I did AI programming for about 5 years) the issue isn't needing "better coders"... usually deadlines and priorities are the limiting factor.

    No game has ever shipped bug-free. Therefore, shipping on time is an exercise in triage. You prioritize and fix the most important bugs first. Since multithreading optimization only affects 25% or less of their customers, it's probably not a huge priority compared to say, the glaring bugs the game shipped with. :p

    Also, when you make a game, run-time performance is never the first priority. The first priorities are usually making it fun and making it look good, because a fast yet boring/ugly game doesn't sell. And making it fun is almost always in direct conflict with making it fast. Giving designers flexibility and fast turn-around during prototyping usually means the engine can't make a lot of the assumptions that would speed things up.

    When a game is being made, the game design necessarily is like a temporary structure. Once it's finalized you can harden it. And for a game like Civ, which requires significant post-launch tweaking, you just can't harden everything.

    By the way, although I've expressed disappointment in this thread over the inefficient multi-threading, my bar is set unreasonably high. I was hoping Civ6 would be groundbreaking in terms of AI multithreading. With the exception of AotS, no game that I'm aware of has ever scaled to fully utilize an 8-core machine with hyper-threading. I was just hoping Civ6 would be the first. :p
     
    Last edited: Nov 21, 2016
    Mglo likes this.
  14. tekjunkie28

    tekjunkie28 Prince

    Joined:
    Oct 20, 2016
    Messages:
    334
    Gender:
    Male
    I like your work you did on the directx 12 benchmarks. It confirms what I 'feel' in the game. I played for a long time last night and turn times were excellent as always but I'm comparing to Civ5. That REALLY sucks about hyperthreading, I too was hoping for more core usage as my next build I was hoping to get hyperthreading or more cores. I really want more physical cores but that build is probably atleast 2 years away. This maybe the longest I have kept the same processor but mainly because Intel has no competition and each gen is only like 10% faster.
     
  15. Kwami

    Kwami Emperor

    Joined:
    Oct 3, 2010
    Messages:
    1,914
    Well, OK, but how many generations behind are you? If each generation really is 10%, then five generations would get you a 61% increase.
     
  16. tekjunkie28

    tekjunkie28 Prince

    Joined:
    Oct 20, 2016
    Messages:
    334
    Gender:
    Male
    I have a 4670k. I'm not anywhere close to this becoming my bottleneck. The processor I had before was a Q9550 that I probably could have kept for awhile longer. Depending on which game you are refering to the 6700k is anywhere from not as fast or slightly faster then the 4670K or the 4770K. Once you overclock the 4670k then your at about the same speed anyways.
     
  17. Gumbolt

    Gumbolt Phoenix Rising

    Joined:
    Feb 12, 2006
    Messages:
    23,318
    Location:
    UK
    No need to replace a Haswell CPU any time soon. Haswell to Skylake was about a 5-10% jump. The jump to Kabylake will mostly be higher due to clock speed. If you are already overclocking the CPU Haswell should be good for many more years. Coffee lake is due in 2nd half of 2018. I don't see Skylake + being worth it due to the cost. The reality is Intel has not really advanced much in last 3-4 years. Still on 14nm technology. 3-4 generations on 14nm is not good news.
     
  18. TheMeInTeam

    TheMeInTeam Top Logic

    Joined:
    Jan 26, 2008
    Messages:
    25,835
    That doesn't follow. If you completely strip graphics, you can't see the game. If each turn takes four hours to process, there wouldn't be many people left playing civ 6. Neither of those sounds extremes sounds fun. The "almost always" is a reach; there is a thread on this subforum that talks about the game being boring, and indeed that boils down to a combination of alpha-strat + too high % of the game's time playing it being mundane actions.

    What is the "fun" in civ 6? It's a mix of the historical theme, presentation (including but limited to graphics), and meaningful decisions you make. Because you spread these decisions over 400 turns, you will have a substantial portion of turns have trivial decisions only.

    Things like shoddy UI, design wrt stuff like moving units, and turn times all intermingle as elements that can reduce timely access to meaningful decisions. You can't sell the farm to maximize performance, but neglecting it knowing the limitations of design is a bad call too.
     
  19. Cromagnus

    Cromagnus Deity

    Joined:
    Sep 11, 2012
    Messages:
    2,272
    Re: Making it fun vs making it fast - The more flexible your game engine, the less efficient it is, generally speaking. The more control a game designer has to change the way things work, the less ability programmers have to optimize the code. As a game gets close to ship it usually hardens... but Civ has to stay flexible because it changes so much after launch. You could argue that's only necessary because it ships when it's "not done", but the game is so complex it's virtually impossible for it to ship "done". IMHO.

    This is a generalization but it's a fairly accurate one. A few examples from other games: if you limit the game to a top-down view, you can greatly improve performance. If you require all levels in a game be designed like a linear canyon that you can't climb out of, you can greatly improve performance. If you don't allow players to customize their gear 100 different ways, you can greatly improve performance. If you limit the AI so that they can't follow the player outside of a tethered area, or limit the AI to walking on the ground vs 3-D movement, it improves performance. If you bake the lighting to a fixed state instead of having moving or changing lightsources, it greatly improves performance.

    Now, in contradiction to my comment about performance, there is a minimum bar yes. If during development the game runs at 10 fps, it makes it very difficult for designers to test "the fun", and can even make it impossible. Like, badly hitching framerate would obviously make tuning the aim controls impossible. So, yes, you can't ignore performance. I'm just saying that the more dynamic a game is, the harder it is to optimize.

    EDIT: And I shouldn't say perf is never the top priority. Some twitch games have a golden rule that the framerate must never drop below 60 on the target platform, even if the design must be changed. But in that context, perfectly smooth framerate is a necessity for fun, because the game is competitive and it's a competitive disadvantage. So, while it's ok for Dragon Age Inquisition to drop frames, it's not ok for online fighting games or twitch E-sports. And that's usually only an issue with fixed console hardware. With an almost infinite variety of PC hardware configs out there ranging from really slow to really fast, you can't please everyone, so the devs usually target a rough minimum bar, and the players who care about perfect framerate usually shell out big bucks to ensure perf... and unlike console, by the time you finish the game, the min spec has usually gone up, so it's a moving target. Some studios use automated testing on target min spec hardware to enforce a minimum framerate during development, but that's the exception to the rule, sadly. :p

    Not to go off on a tangent, but making games is kind of like making music in that you can get lucky and have your first single be a top ten hit, and then spend the rest of your career trying to recreate that initial success. It's very difficult to consistently ship fun games, so anything (like focusing on performance) that would make that harder has to play second fiddle. It's not done until it's fun. Many a successful game has shipped with performance issues. But games that aren't fun are dead in the water.
     
    Last edited: Nov 21, 2016
  20. TheMeInTeam

    TheMeInTeam Top Logic

    Joined:
    Jan 26, 2008
    Messages:
    25,835
    My point is that it's more a scale than black/white. Unutilized or limited utilization of flexibility will make the game less fun outright, on the basis that it will play slower without adding anything to the experience to offset that cost. I gave the 4 hour turn example to illustrate the extreme; everyone draws their interest line somewhere differently, and that can vary by mood even for one person. However when you make something that could have taken 60 minutes necessarily take 120 minutes or more, without adding decisions, that's a very real impact on the experience and breaks the "faster --> less fun as a result of necessary concessions" conclusion. Fact is, we don't know for sure which concessions would cost more than the performance gives back.

    There should be, at least in principle, a theoretical optimal balance between speed performance and flexibility in design/engine, even if that's not realistic to approach. There is a line at which optimization becomes more important even to the average buyer than another piece of gear or dynamic light sources.

    When the core gameplay is the decisions the player makes in the environment presented, necessarily making those decisions require more waiting and making such information poorly presented or inaccessible should both be viewed as active detriments to the fun of the game, full stop. Maybe better graphics add more fun than one of those removes, but only to a point. Most people can't keep up with the speed at which I play, but at the same time the number of people who refuse to play on huge maps (a default feature in the game) based solely on the anticipated time spent waiting to be able to do something is less niche.

    If (for example) we're talking 80% of players don't run huge maps due to wait time on "recommended specs", the developers are actually not delivering viable huge maps due to performance. It's a feature in theory, but one rendered so inaccessible by wait time in that scenario that the majority of the player base won't enjoy using it. Same goes for marathon speed. The users are so much faster than the game that an increasing percentage of their time spent in front of the screen is spent doing nothing. I'm more sensitive to it, but almost everyone will feel it eventually.

    I might be an outlier for standard maps (which notably have crept to "small" with fewer civs as the default since civ 4!), but this effect is pretty easy to observe more widely as you increase wait time. What percentage of the player base has "fun" on huge + marathon in civ 6? If they don't find it fun, why isn't it fun? I think we can guess the answer.
     
Thread Status:
Not open for further replies.

Share This Page