Measuring Mod Performance in SDK

Subscribed to this thread. :)

I'm very interested in the discussion here, and if possible if some breakthroughs are made with improving the code efficiency I'll try to get them included in PIG mod. :D

Bigv32 has recently been starting a couple of threads around the Civ4 General Discussions area asking about how to improve the speed of the code as well, so I might point him to this thread for a good read.

Nice work, by the way, stmartin. :goodjob:
 
I was working on this for some time, have some good result with HoTK. However, when I tested it with BTS today, the result is no significant speed change. What a surprise. Here is a little comparison result with AD1000 scenario, the first 90 turns are original BTS, the second 90 turns are with my modification.

Spoiler :
---------------Current Game Turn is 160.------------------
CurrenTurnTime = 0.000005, TotalTime = 0.000005
---------------Current Game Turn is 170.------------------
CurrenTurnTime = 22.553091, TotalTime = 22.553097
---------------Current Game Turn is 180.------------------
CurrenTurnTime = 21.516636, TotalTime = 44.069733
---------------Current Game Turn is 190.------------------
CurrenTurnTime = 32.906067, TotalTime = 76.975800
---------------Current Game Turn is 200.------------------
CurrenTurnTime = 39.070580, TotalTime = 116.046379
---------------Current Game Turn is 210.------------------
CurrenTurnTime = 42.242577, TotalTime = 158.288956
---------------Current Game Turn is 220.------------------
CurrenTurnTime = 53.782028, TotalTime = 212.070984
---------------Current Game Turn is 230.------------------
CurrenTurnTime = 59.855103, TotalTime = 271.926086
---------------Current Game Turn is 240.------------------
CurrenTurnTime = 72.126282, TotalTime = 344.052368
---------------Current Game Turn is 250.------------------
CurrenTurnTime = 81.681183, TotalTime = 425.733551

---------------Current Game Turn is 160.------------------
CurrenTurnTime = 0.000006, TotalTime = 0.000006
---------------Current Game Turn is 170.------------------
CurrenTurnTime = 22.034870, TotalTime = 22.034876
---------------Current Game Turn is 180.------------------
CurrenTurnTime = 18.692419, TotalTime = 40.727295
---------------Current Game Turn is 190.------------------
CurrenTurnTime = 25.743469, TotalTime = 66.470764
---------------Current Game Turn is 200.------------------
CurrenTurnTime = 35.685280, TotalTime = 102.156044
---------------Current Game Turn is 210.------------------
CurrenTurnTime = 48.671822, TotalTime = 150.827866
---------------Current Game Turn is 220.------------------
CurrenTurnTime = 61.286240, TotalTime = 212.114105
---------------Current Game Turn is 230.------------------
CurrenTurnTime = 66.084290, TotalTime = 278.198395
---------------Current Game Turn is 240.------------------
CurrenTurnTime = 70.925018, TotalTime = 349.123413
---------------Current Game Turn is 250.------------------
CurrenTurnTime = 77.284454, TotalTime = 426.407867


I've tested it only once, there sure was some random errors. But, still more than enough to persuade me to stop wasting out time.:crazyeye:
 
I've tested it only once, there sure was some random errors. But, still more than enough to persuade me to stop wasting out time.:crazyeye:

That's sad :sad:.
Most modders would have been interested in the performance increase.
 
Well, after much ado about nothing, I've finally managed to increase the speed of BTS for about 15% under my experiment setting: AD1000, China, Emperor, 90 turns of AI Autoplay.

I'll release the damn thing after the next HoTK release.
 
Hi guys, with all your help and motivation, especially xienwolf, emperorFool and davidallen. we've managed to increase BTS speed by 15% while keeping everything intact, include AI ability and game rule.

So we released CAR Mod for BTS, check it out here!
 
stmartin -

Could you explain briefly what you understand the different columns in the profiler output to mean?

How did you pick which functions to address?

I do not know all of them. I only know 'Time:' means the total time, and 'Num:' means number of times called. The only useful is the 'Time:', because what's costing you the most time, you take care of them first. 'Num' can also be useful sometimes, since if the 'num' is small, you can afford to put big loops in that function.

Also, I think this somehow represent turn time: 'DBG: Total Frame MS: 5815.0'.
 
Here's my educated guess based on other profiling tools I've used.

Code:
Time : Ave : Min% : Max% : Num : Profile Name
-----------------------------------------------------
25605.0 : 062.4 : 057.5 : 067.6 : 024 : CvPlayer::doTurn
003.0 : 000.0 : 000.0 : 000.0 : 127 : CvPlayer::hasBusyUnit
894.0 : 002.2 : 001.7 : 002.7 : 024 : CvPlayerAI::AI_doTurnPre
007.0 : 000.0 : 000.0 : 000.0 : 702 : CvPlayer::canDoCivics
668.0 : 001.6 : 001.6 : 001.6 : 254 : CvPlayerAI::AI_civicValue
001.0 : 000.0 : 000.0 : 000.0 : 515 : CvCityAI::AI_yieldMultiplier
21560.0 : 052.5 : 000.0 : 054.1 : 40594 : CvPlayerAI::AI_getAttitudeVal

  • Time: Total time spent in this function, (I think) including functions it calls
  • Ave: Average time spent in this function (Time / Num)*
  • Num: Number of times this function was called
  • Min%/Max%: Min/Max percentage of time spent in this function during any particular "frame" (turn?)
Because doTurn() does very little work on its own and consists of mostly calls to other functions yet takes a significant percentage of total run time, I believe that all times count from when the function starts to when it exits, including time spent in other functions called from the timed function. In other words, A's time includes B and C's time if A calls B which calls C.

Ave is more significant than Time because a small average run time leaves little room for improvement. Looking at the line for AI_getAttitudeVal() makes me wonder how Ave is calculated. 20000 / 40000 = 0.5, yet the Ave is shown as 52.5. :confused:

Min% and Max% are just as confusing. In at least one "frame" (is this a turn?) it consumed 54.1% of the total run time! How is this possible given how much goes on in a single turn?

* Now that I look at the numbers, I suspect Ave should actually be called Ave% and is not a measure of time.
 
Ave is more significant than Time because a small average run time leaves little room for improvement. Looking at the line for AI_getAttitudeVal() makes me wonder how Ave is calculated. 20000 / 40000 = 0.5, yet the Ave is shown as 52.5. :confused:

Min% and Max% are just as confusing. In at least one "frame" (is this a turn?) it consumed 54.1% of the total run time! How is this possible given how much goes on in a single turn?

* Now that I look at the numbers, I suspect Ave should actually be called Ave% and is not a measure of time.

This is exactly why I asked ... Ave is definitely not Time divided by Calls!

I think your %'s explanation is right on, eyeballing it looks right ... so Ave, Min%, and Max% would then be the average, lowest, and highest percentage of each timed frame taken up by the function across all measured frames.

As for what a frame is, my guess is that during automation a frame is a turn, but perhaps there could be several frames during the human player's turn or maybe the human player's turn would be have its own frame. Something like that.
 
Now I'm thinking a frame is a fixed measure of time, a time slice. Each turn may consist of many frames. My reason for this is that I cannot imagine that 54% of an entire turn could be spent in AI_getAttitudeVal()! I'm I'm correct, I think the % values are pretty meaningless which of course argues a little against my theory.

Hmm, no I don't think that's actually what is going on.

What we really want is the % of time spent in a given function. Then you'd just target the highest % functions for optimization. This would require being able to subtract out subfunction calls, though, else you'd target doTurn() which is probably cannot be optimized since it's just a bunch of function calls IIRC.
 
I am curious how much overhead is missed by the profiler. If you do such a test again, maybe run a stopwatch at the same time and see if the results are roughly equivalent? (Though deciding precisely when to start the stopwatch for turn 1 would be the tricky bit, so just enter in the Automation command ASAP, and anything within ~10 seconds should be considered a match).
 
Based on my experience with profilers, do not expect much correlation between the CPU time reported in the profiler, and "wall clock" / stopwatch time. Even on a single user system, lots of things consume wall clock time which do not show up in CPU time. For example, disk access, graphics draw time, vertical refresh, etc. If the test is done with the civ window minimized, a lot of graphics draw time vanishes, but it is still not zero. For example, a test I did a few weeks ago showed that even when the window is minimized, up to 10% of the total wall clock time was due to graphics effects -- replacing a complex unit graphic using attached particles by an empty graphics file improved the runtime by 10%. That will not show up in the profiler.

The profiler time is useful to find which are the biggest contributors, but it may not be possible to get much correlation to "wall clock" time.

As usual YMMV.
 
A quick stopwatch test indicates that the Total Frame MS time reported is total time taken, in ms, for each turn and quite accurate. Everything in the profile log is reported as falling inside of CvGame::update, which only takes some fraction of the total reported time ... looks like CvGame::update averages 60 - 90% of turn time roughly when the game is minimized (ie, graphics might be ignored).

I imagine the rest of the total time is spend on things outside the DLL, graphics and other engine pieces.
 
Cool, so basically if you subtract CvGame::update from Total Frame MS, you get a measure of the EXE/Graphics time which you can do little about (well, you can do a lot about the graphics part, but we can't measure that section precisely). So if you want to know if it is worthwhile to sit down and optimize your mod, you can compare those two numbers to know what net gain is possible (if Total Frame MS is MUCH greater than CvGame::update, then you need to talk with the graphic boys, but if CvGame::update is a significant portion, you need to talk with the codemonkeys)
 
I can't get this to work like it does for other people and I don't know why. Specifically, The profiler only seems to get the last call to CvGame::update in any turn, the one with CvGame::doTurn. I've been working round this by moving the start and stop profiling dll calls to CvGame::update and then adding up the thousands of results per turn. That worked fine but it appears to be unnecessary.

What exactly do you need to do to get the profiler to work properly?

This is what I've done:

1) Built a release or final debug version of the dll. Both seem to work the same but release gives more accurate timings.

2) Go into CivilizatioIV.ini and set CheatCode = chipotle and LoggingEnabled = 1.

3) In the game, press Ctrl-D, go to the profile tab and set something(s). I can't find this:
(CTRL+D) and click on the "enable profiler" checkbox as well to trigger things off.

Even with other people's code, the above will only profile the last call to CvGame::update. Yet stmartin's log shows he didn't have that problem. What am I doing wrong?

Talking of ways to speed things up, I re-implemented the A* pathfinding. That was the only thing I've found that made a significant difference and only sped up the late game on huge maps by 15s a turn out of 1 min 30s.
 
Back
Top Bottom