1. We have added a Gift Upgrades feature that allows you to gift an account upgrade to another member, just in time for the holiday season. You can see the gift option when going to the Account Upgrades screen, or on any user profile screen.
    Dismiss Notice

Profiling for turn time processing speed evaluation

Discussion in 'Civ4 - Caveman 2 Cosmos' started by Thunderbrd, Jul 17, 2019.

  1. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    This post is for the programmers here. I'll start with the discussion that Koshling and I had a long time back:

    I think the conversation ended there and I have probably not done any further profiling for myself since then because well... I did not know where or how to set the define he was talking about. Perhaps someone can clarify that for me and hopefully all of this serves as a good enough start to the conversation about how to profile CivIV properly.
     
  2. KaTiON_PT

    KaTiON_PT Emperor

    Joined:
    Jan 2, 2010
    Messages:
    1,078
    Location:
    Portugal
    Thanks for starting this thread.

    There's also an alternative called Very Sleepy brought forth by @Nightinggale a while ago in this thread which has shown very good results even compared to the profiler that exists in Visual Studio. @billw2015 was able to discern all functions without having broken ones.
     
    Nightinggale likes this.
  3. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Although caveats are required, just because it is resolving most function calls, doesn't mean it got it correct, and the results it gives show some weird artifacts. It appears to be treating wait operations as cpu time, causing most of the top results in that image I think. In one way this is useful, assuming it is correct: 6 seconds waiting at critical section is pretty bad over a 130s end turn time, it indicates some fairly bad contention potentially.

    I'm going to try the inbuilt profiler now and see what it looks like.

    I definitely wouldn't worry about separately profiling final release vs release. I find about a 5% difference in performance between them (based on just timing a round, so take it with a grain of salt), which means nothing in there is likely to show up much different on a profile, it is probably mostly the profile code itself that is making the difference. It is debug vs release that is the real change and the internal profiler is enabled for both of these currently.
     
    Last edited: Jul 17, 2019
  4. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    We were talking about final release vs debug.
     
  5. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Yeah I meant forget profiling final release full stop, the only difference between release and final release (that I can see) is that release has profiling enabled. This makes sense, it is pretty common in games to have a build that has profiling (+ other dev tools in some cases) enabled, which are then disabled in shipping build, and for there to be no other difference.
    All I needed to do is enable cheatmode, Ctrl+D and this:
    upload_2019-7-17_22-53-2.png

    Here are my results from the internal profiler:
    https://docs.google.com/spreadsheets/d/1S5vmLsKS1hnL1ipSzOMw0bjc4b3A6GNJsE5SyVPWN6E/edit?usp=sharing
     
    KaTiON_PT likes this.
  6. Nightinggale

    Nightinggale Deity Supporter

    Joined:
    Feb 2, 2009
    Messages:
    4,138
    You should set the root to CvGame::doTurn (I think that's the one, didn't look it up right now) and then profile while hitting end turn. Stop when the next turn starts and you have a measurement of the most time critical part of the game: the wait for the AI. Since the active CPU core is supposed to be 100% loaded during the AI turn, counting waiting time shouldn't be an issue.

    I admit I haven't actually tried measuring multithreaded code. Maybe it doesn't work if threads are waiting for each other if it counts the wait as CPU load. This is actually an interesting question I haven't considered until right now. So far mods for the civ4 engine have all been single threaded when I looked at them.

    It makes no sense to profile debug. You need to know which parts of the code causes slowdowns after the compiler has run optimization. The debug DLL skips optimization because the debugger can't link code to source lines if optimized. You need to profile the release DLL. I added a profile DLL in Makefile 2, which is just as optimized as the release, but it has the symbols, meaning profiling can be linked to the lines in the code. That would be the ideal DLL for profiling.
     
    KaTiON_PT likes this.
  7. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    We have a very unique Makefile in use in our setup to help with enabling different file paths in use in an SVN team environment. Someone would need to adapt yours into ours for this but I'd invite them to do it!

    I think Alberts2 then did some things to enable this with the final release. As I understand it we only use the final release and debug dlls. Previous to this discussion with Alberts2, Koshling had instructed me on using the debug dll for the profile.
     
  8. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Yeah all my profiles are just from when I click end turn to when the next turn starts, this is what I have posted here.

    Release in the Makefile, appears to be what I would usually call a profile build: it has optimization on compiler, and /DEBUG and pdb on linker, and the internal profiler enabled. Can you please post your Makefile2 or at least the compile and linker flags so I can compare if it isn't too much trouble?
     
  9. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    A link is in his signature line.
     
    billw2015 likes this.
  10. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Yeah I will have a go, it looks good. At least I can steal the nice features anyway like multicore build! I would like to put the fastdep.exe into the repo as well, it is tiny and makes no sense you have to download it separately to build the dll.
     
  11. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    Cool... works for me. I imagine @alberts2 won't have any issue with that either but I figure he should say if he has any objection.
     
  12. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    What is the purpose of Release build vs Final_Release? It looks like it is meant to be the profile build really?
     
  13. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    That's possible. I have never used the release build myself so it's something another modder has been using, I presume Alberts2.
     
  14. Nightinggale

    Nightinggale Deity Supporter

    Joined:
    Feb 2, 2009
    Messages:
    4,138
    It should be good. It took ages to get right even though I at that time was already experienced with GNU make. Most of the remaining imperfections are due to limitations with nmake rather than the makefile itself. Stuff like copy pasted lines instead of writing it once and use wildcards. Apparently nmake is primitive when it comes to wildcards.
    Go for it. I wrote the makefile to be used. If you feel like you need to modify it, then do so, but do make sure the top of the file clearly identifies which makefile it is. I do have a modified version of Makefile 2.5 as well (mod specific) and the top states it is mod specific and not the general distributed makefile.

    Be aware that compiling with symbols is single threaded, meaning it boosts compile speed of release, but not debug or profile. Also it compiles the precompiled header (first file) and linking (last step) on a single core as well. Between that, it can compile one file on each core. Having said that, I tend to compile twice as fast when using 4 cores than when using a single core.

    Btw I highly recommend the assert DLL build. It's optimized like the release, but it has all the assert checks. Playing with that one is fast enough to play for fun (like 30% slower) and you will be notified whenever an assert check fails. This makes it much easier to locate bugs and make the game run more stable.
     
  15. Thunderbrd

    Thunderbrd C2C War Dog

    Joined:
    Jan 2, 2010
    Messages:
    26,251
    Gender:
    Male
    Location:
    Las Vegas
    Interesting - some of these asserts we have that have been going on forever may actually find some resolution if we start using this.
     
  16. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Yeah definitely I will add it. Probably I won't bother with the multicore actually seeing how much extra stuff it requires (i.e. not just a compile flag!) and that it doesn't work for the builds we are regularly compiling anyway!
     
    KaTiON_PT likes this.
  17. f1rpo

    f1rpo plastics

    Joined:
    May 22, 2014
    Messages:
    486
    Location:
    Germany
    Is C2C already being compiled with whole program optimization (/GL)? If not, then @devolution's final release configuration could be of interest as well: GitHub link
    As a result, we've seen a ~10% speed-up of AI turns.
     
  18. alberts2

    alberts2 Emperor

    Joined:
    Aug 16, 2012
    Messages:
    1,940
    Gender:
    Male
    Location:
    Germany
    Doesn't work in C2C because C2C uses parts of boost 1.32 which break whole program optimization.
     
    f1rpo likes this.
  19. alberts2

    alberts2 Emperor

    Joined:
    Aug 16, 2012
    Messages:
    1,940
    Gender:
    Male
    Location:
    Germany
    The multithreading in C2C is really inefficent causing lots of waits and it is no longer needed. It was added years ago them the turn times where alot longer as they are now. The reason for those long turn times was really inefficient coding in some places which i optimized alot.

    That multithreading will be removed in a big commit including some other performance improvements some time after v39 is released .
     
  20. billw2015

    billw2015 King

    Joined:
    Jun 22, 2015
    Messages:
    765
    Thats good, can I get these changes from a branch or something? I don't want to do much coding if I don't have an accurate profile, or I will get a bunch of conflicts!
     

Share This Page