Measuring Mod Performance in SDK

Discussion in 'Civ4 - SDK/Python' started by stmartin, Jul 21, 2009.

  1. PieceOfMind

    PieceOfMind Drill IV Defender Retired Moderator

    Joined:
    Jan 15, 2006
    Messages:
    9,319
    Location:
    Australia
    Subscribed to this thread. :)

    I'm very interested in the discussion here, and if possible if some breakthroughs are made with improving the code efficiency I'll try to get them included in PIG mod. :D

    Bigv32 has recently been starting a couple of threads around the Civ4 General Discussions area asking about how to improve the speed of the code as well, so I might point him to this thread for a good read.

    Nice work, by the way, stmartin. :goodjob:
     
  2. stmartin

    stmartin aka. poyuzhe

    Joined:
    Aug 4, 2007
    Messages:
    640
    Location:
    Shanghai
    I was working on this for some time, have some good result with HoTK. However, when I tested it with BTS today, the result is no significant speed change. What a surprise. Here is a little comparison result with AD1000 scenario, the first 90 turns are original BTS, the second 90 turns are with my modification.

    Spoiler :
    ---------------Current Game Turn is 160.------------------
    CurrenTurnTime = 0.000005, TotalTime = 0.000005
    ---------------Current Game Turn is 170.------------------
    CurrenTurnTime = 22.553091, TotalTime = 22.553097
    ---------------Current Game Turn is 180.------------------
    CurrenTurnTime = 21.516636, TotalTime = 44.069733
    ---------------Current Game Turn is 190.------------------
    CurrenTurnTime = 32.906067, TotalTime = 76.975800
    ---------------Current Game Turn is 200.------------------
    CurrenTurnTime = 39.070580, TotalTime = 116.046379
    ---------------Current Game Turn is 210.------------------
    CurrenTurnTime = 42.242577, TotalTime = 158.288956
    ---------------Current Game Turn is 220.------------------
    CurrenTurnTime = 53.782028, TotalTime = 212.070984
    ---------------Current Game Turn is 230.------------------
    CurrenTurnTime = 59.855103, TotalTime = 271.926086
    ---------------Current Game Turn is 240.------------------
    CurrenTurnTime = 72.126282, TotalTime = 344.052368
    ---------------Current Game Turn is 250.------------------
    CurrenTurnTime = 81.681183, TotalTime = 425.733551

    ---------------Current Game Turn is 160.------------------
    CurrenTurnTime = 0.000006, TotalTime = 0.000006
    ---------------Current Game Turn is 170.------------------
    CurrenTurnTime = 22.034870, TotalTime = 22.034876
    ---------------Current Game Turn is 180.------------------
    CurrenTurnTime = 18.692419, TotalTime = 40.727295
    ---------------Current Game Turn is 190.------------------
    CurrenTurnTime = 25.743469, TotalTime = 66.470764
    ---------------Current Game Turn is 200.------------------
    CurrenTurnTime = 35.685280, TotalTime = 102.156044
    ---------------Current Game Turn is 210.------------------
    CurrenTurnTime = 48.671822, TotalTime = 150.827866
    ---------------Current Game Turn is 220.------------------
    CurrenTurnTime = 61.286240, TotalTime = 212.114105
    ---------------Current Game Turn is 230.------------------
    CurrenTurnTime = 66.084290, TotalTime = 278.198395
    ---------------Current Game Turn is 240.------------------
    CurrenTurnTime = 70.925018, TotalTime = 349.123413
    ---------------Current Game Turn is 250.------------------
    CurrenTurnTime = 77.284454, TotalTime = 426.407867


    I've tested it only once, there sure was some random errors. But, still more than enough to persuade me to stop wasting out time.:crazyeye:
     
  3. The_J

    The_J Say No 2 Net Validations Super Moderator Supporter

    Joined:
    Oct 22, 2008
    Messages:
    34,009
    Location:
    DE/NL/FR
    That's sad :sad:.
    Most modders would have been interested in the performance increase.
     
  4. stmartin

    stmartin aka. poyuzhe

    Joined:
    Aug 4, 2007
    Messages:
    640
    Location:
    Shanghai
    If you want, I could share the progress I've made. :)
     
  5. The_J

    The_J Say No 2 Net Validations Super Moderator Supporter

    Joined:
    Oct 22, 2008
    Messages:
    34,009
    Location:
    DE/NL/FR
    I'm not an SDK modder, i would have been interested in an optimized dll.
    But you say it yourself: The result is not really great :(.
    Any idea, if the impact could be bigger on bigger maps with more civs?
     
  6. stmartin

    stmartin aka. poyuzhe

    Joined:
    Aug 4, 2007
    Messages:
    640
    Location:
    Shanghai
    Well, after much ado about nothing, I've finally managed to increase the speed of BTS for about 15% under my experiment setting: AD1000, China, Emperor, 90 turns of AI Autoplay.

    I'll release the damn thing after the next HoTK release.
     
  7. The_J

    The_J Say No 2 Net Validations Super Moderator Supporter

    Joined:
    Oct 22, 2008
    Messages:
    34,009
    Location:
    DE/NL/FR
    That's nice to hear :).
    15% is good, consider the people, who are playing on giant maps.
     
  8. stmartin

    stmartin aka. poyuzhe

    Joined:
    Aug 4, 2007
    Messages:
    640
    Location:
    Shanghai
    Hi guys, with all your help and motivation, especially xienwolf, emperorFool and davidallen. we've managed to increase BTS speed by 15% while keeping everything intact, include AI ability and game rule.

    So we released CAR Mod for BTS, check it out here!
     
  9. jdog5000

    jdog5000 Revolutionary

    Joined:
    Nov 25, 2003
    Messages:
    2,601
    Location:
    California
    This is quite interesting, thanks to xienwolf and stmartin for explaining how the profiler works! Hopefully this will help BBAI ...
     
  10. jdog5000

    jdog5000 Revolutionary

    Joined:
    Nov 25, 2003
    Messages:
    2,601
    Location:
    California
    stmartin -

    Could you explain briefly what you understand the different columns in the profiler output to mean?

    How did you pick which functions to address?
     
  11. stmartin

    stmartin aka. poyuzhe

    Joined:
    Aug 4, 2007
    Messages:
    640
    Location:
    Shanghai
    I do not know all of them. I only know 'Time:' means the total time, and 'Num:' means number of times called. The only useful is the 'Time:', because what's costing you the most time, you take care of them first. 'Num' can also be useful sometimes, since if the 'num' is small, you can afford to put big loops in that function.

    Also, I think this somehow represent turn time: 'DBG: Total Frame MS: 5815.0'.
     
  12. EmperorFool

    EmperorFool Deity

    Joined:
    Mar 2, 2007
    Messages:
    9,633
    Location:
    Mountain View, California
    Here's my educated guess based on other profiling tools I've used.

    Code:
    Time : Ave : Min% : Max% : Num : Profile Name
    -----------------------------------------------------
    25605.0 : 062.4 : 057.5 : 067.6 : 024 : CvPlayer::doTurn
    003.0 : 000.0 : 000.0 : 000.0 : 127 : CvPlayer::hasBusyUnit
    894.0 : 002.2 : 001.7 : 002.7 : 024 : CvPlayerAI::AI_doTurnPre
    007.0 : 000.0 : 000.0 : 000.0 : 702 : CvPlayer::canDoCivics
    668.0 : 001.6 : 001.6 : 001.6 : 254 : CvPlayerAI::AI_civicValue
    001.0 : 000.0 : 000.0 : 000.0 : 515 : CvCityAI::AI_yieldMultiplier
    21560.0 : 052.5 : 000.0 : 054.1 : 40594 : CvPlayerAI::AI_getAttitudeVal
    
    • Time: Total time spent in this function, (I think) including functions it calls
    • Ave: Average time spent in this function (Time / Num)*
    • Num: Number of times this function was called
    • Min%/Max%: Min/Max percentage of time spent in this function during any particular "frame" (turn?)
    Because doTurn() does very little work on its own and consists of mostly calls to other functions yet takes a significant percentage of total run time, I believe that all times count from when the function starts to when it exits, including time spent in other functions called from the timed function. In other words, A's time includes B and C's time if A calls B which calls C.

    Ave is more significant than Time because a small average run time leaves little room for improvement. Looking at the line for AI_getAttitudeVal() makes me wonder how Ave is calculated. 20000 / 40000 = 0.5, yet the Ave is shown as 52.5. :confused:

    Min% and Max% are just as confusing. In at least one "frame" (is this a turn?) it consumed 54.1% of the total run time! How is this possible given how much goes on in a single turn?

    * Now that I look at the numbers, I suspect Ave should actually be called Ave% and is not a measure of time.
     
  13. jdog5000

    jdog5000 Revolutionary

    Joined:
    Nov 25, 2003
    Messages:
    2,601
    Location:
    California
    This is exactly why I asked ... Ave is definitely not Time divided by Calls!

    I think your %'s explanation is right on, eyeballing it looks right ... so Ave, Min%, and Max% would then be the average, lowest, and highest percentage of each timed frame taken up by the function across all measured frames.

    As for what a frame is, my guess is that during automation a frame is a turn, but perhaps there could be several frames during the human player's turn or maybe the human player's turn would be have its own frame. Something like that.
     
  14. EmperorFool

    EmperorFool Deity

    Joined:
    Mar 2, 2007
    Messages:
    9,633
    Location:
    Mountain View, California
    Now I'm thinking a frame is a fixed measure of time, a time slice. Each turn may consist of many frames. My reason for this is that I cannot imagine that 54% of an entire turn could be spent in AI_getAttitudeVal()! I'm I'm correct, I think the % values are pretty meaningless which of course argues a little against my theory.

    Hmm, no I don't think that's actually what is going on.

    What we really want is the % of time spent in a given function. Then you'd just target the highest % functions for optimization. This would require being able to subtract out subfunction calls, though, else you'd target doTurn() which is probably cannot be optimized since it's just a bunch of function calls IIRC.
     
  15. jdog5000

    jdog5000 Revolutionary

    Joined:
    Nov 25, 2003
    Messages:
    2,601
    Location:
    California
    Well, I ran automation for 100 turns and the resulting dllprofile had exactly 100 uses of "Total Frame MS:" ... so a frame is definitely a turn, and the total frame MS seems to be a measure of turn time.
     
  16. xienwolf

    xienwolf Deity

    Joined:
    Oct 4, 2007
    Messages:
    10,589
    Location:
    Location! Location!
    I am curious how much overhead is missed by the profiler. If you do such a test again, maybe run a stopwatch at the same time and see if the results are roughly equivalent? (Though deciding precisely when to start the stopwatch for turn 1 would be the tricky bit, so just enter in the Automation command ASAP, and anything within ~10 seconds should be considered a match).
     
  17. davidlallen

    davidlallen Deity

    Joined:
    Apr 28, 2008
    Messages:
    4,743
    Location:
    California
    Based on my experience with profilers, do not expect much correlation between the CPU time reported in the profiler, and "wall clock" / stopwatch time. Even on a single user system, lots of things consume wall clock time which do not show up in CPU time. For example, disk access, graphics draw time, vertical refresh, etc. If the test is done with the civ window minimized, a lot of graphics draw time vanishes, but it is still not zero. For example, a test I did a few weeks ago showed that even when the window is minimized, up to 10% of the total wall clock time was due to graphics effects -- replacing a complex unit graphic using attached particles by an empty graphics file improved the runtime by 10%. That will not show up in the profiler.

    The profiler time is useful to find which are the biggest contributors, but it may not be possible to get much correlation to "wall clock" time.

    As usual YMMV.
     
  18. jdog5000

    jdog5000 Revolutionary

    Joined:
    Nov 25, 2003
    Messages:
    2,601
    Location:
    California
    A quick stopwatch test indicates that the Total Frame MS time reported is total time taken, in ms, for each turn and quite accurate. Everything in the profile log is reported as falling inside of CvGame::update, which only takes some fraction of the total reported time ... looks like CvGame::update averages 60 - 90% of turn time roughly when the game is minimized (ie, graphics might be ignored).

    I imagine the rest of the total time is spend on things outside the DLL, graphics and other engine pieces.
     
  19. xienwolf

    xienwolf Deity

    Joined:
    Oct 4, 2007
    Messages:
    10,589
    Location:
    Location! Location!
    Cool, so basically if you subtract CvGame::update from Total Frame MS, you get a measure of the EXE/Graphics time which you can do little about (well, you can do a lot about the graphics part, but we can't measure that section precisely). So if you want to know if it is worthwhile to sit down and optimize your mod, you can compare those two numbers to know what net gain is possible (if Total Frame MS is MUCH greater than CvGame::update, then you need to talk with the graphic boys, but if CvGame::update is a significant portion, you need to talk with the codemonkeys)
     
  20. Nor Me

    Nor Me Chieftain

    Joined:
    Feb 22, 2003
    Messages:
    50
    I can't get this to work like it does for other people and I don't know why. Specifically, The profiler only seems to get the last call to CvGame::update in any turn, the one with CvGame::doTurn. I've been working round this by moving the start and stop profiling dll calls to CvGame::update and then adding up the thousands of results per turn. That worked fine but it appears to be unnecessary.

    What exactly do you need to do to get the profiler to work properly?

    This is what I've done:

    1) Built a release or final debug version of the dll. Both seem to work the same but release gives more accurate timings.

    2) Go into CivilizatioIV.ini and set CheatCode = chipotle and LoggingEnabled = 1.

    3) In the game, press Ctrl-D, go to the profile tab and set something(s). I can't find this:
    Even with other people's code, the above will only profile the last call to CvGame::update. Yet stmartin's log shows he didn't have that problem. What am I doing wrong?

    Talking of ways to speed things up, I re-implemented the A* pathfinding. That was the only thing I've found that made a significant difference and only sped up the late game on huge maps by 15s a turn out of 1 min 30s.
     

Share This Page