Mini-engine progress

<Nexus> · Saturday at 11:53 AM

So what is ID allocation limit? Can it be increased?

snowern · Sunday at 5:00 PM

<Nexus> said:
So what is ID allocation limit? Can it be increased?

8K.

Code:

#define FLTA_ID_SHIFT                (13)
#define FLTA_MAX_BUCKETS        (1 << FLTA_ID_SHIFT)

I could either change it to 16 bits for both the index and ID (it's actually the index that's limited), or go 64-bit. 64-bit is your future proofing, but would require extra changes all around the DLL.

Don't need to do it yet. But once the AI starts defending thousands of cities, I might have to. If I get that far.

But optimisation is getting increasingly difficult. Got pirating under control, but it's still 16s for turn 522.

Pirating is really the worst case. Bunch of units pathing all around the map. Spy move is almost entirely mission execution. You'd need faster pathing, or path caching, or a better parallel unit update to speed that up.

Irrigation update and AI_updateRouteToCity are next, but getting mid-game turn times under control is not going to be possible for 10x Huge. The AI stuff will just need a whole rework. Or I extend the parallel unit update implementation to cover most AI, but... ehh... Maybe I should just find the biggest playable map size that doesn't require herculean parallel AI efforts.

<Nexus> · Sunday at 11:49 PM

snowern said:
8K.

Wow! That's a lot. I guess we run out of city names much earlier.
And what's the limit on units? If it's also 8K than...

snowern said:
Maybe I should just find the biggest playable map size that doesn't require herculean parallel AI efforts.

Just wanted to ask about that.
Maybe limit the game to a map size that won't exceed the limits?

devolution · 2025-07-07T01:56:40-0400

Let me start out by saying that I’m very impressed by your efforts and skill.

... before diving straight into the nerdy stuff:

What are the average- and worst-case sizes of your open list?

I’m asking because the K-Mod Pathfinder (refactored and improved by @f1rpo for the Advanced Civ Mod: https://github.com/f1rpo/AdvCiv/blob/1.12/CvGameCoreDLL/KmodPathFinder.h
) stores its open list in a std::vector. That means that the two (or three depending on your viewpoint) critical open list operations have the complexity:

1) Finding the minimum element requires a linear scan (O(n)). Quite a bit worse than the heap but extremely cache friendly.
2) Appending a new element is a constant-time push_back (O(1)).
3) Removing an element is just swapping with and popping the last element (ok since the list is not ordered and we're already committed to a linear scan)

I'm curious what numbers you are seeing. Also note this comment:

/* advc.opt: In a test over a few turns on a Giant map with 36 civs,
the mean list size at the start of processNode was 80, and the
maximal list size 333. In a longer test on a Huge map with 18 civs,
reserving memory for 128 nodes was faster than for either 32, 200 or 256
(power of 2 does seem to help). K-Mod didn't reserve any memory. */

However, it very likely that you are seeing much worse numbers considering your 1000x1000 map! Still it would be interesting to know the inflection point of a heap vs. a vector with respect to the open list.

See this as well

/* ^Looks like K-Mod has already tried a heap (std: priority_queue). Or maybe
it was abandoned w/o a test. Would have to re-heap after recalculateHeuristics.
And the functor should take m_stepMetric.getMaxPath() as a contructor argument
so that nodes within that limit can receive absolute priority. But, given the
modest number of open nodes, I doubt that a heap will be worthwhile. */

Finally, the linear find-min scan could itself be sped up by using branchless AVX2 code, effectively pushing the heap-vs-vector inflection point even higher. Have you (or anyone) actually measured where that sweet spot lies?

snowern · 2025-07-07T13:08:28-0400

devolution said:
Let me start out by saying that I’m very impressed by your efforts and skill.

... before diving straight into the nerdy stuff:

What are the average- and worst-case sizes of your open list?

I’m asking because the K-Mod Pathfinder (refactored and improved by @f1rpo for the Advanced Civ Mod: https://github.com/f1rpo/AdvCiv/blob/1.12/CvGameCoreDLL/KmodPathFinder.h
) stores its open list in a std::vector. That means that the two (or three depending on your viewpoint) critical open list operations have the complexity:

1) Finding the minimum element requires a linear scan (O(n)). Quite a bit worse than the heap but extremely cache friendly.
2) Appending a new element is a constant-time push_back (O(1)).
3) Removing an element is just swapping with and popping the last element (ok since the list is not ordered and we're already committed to a linear scan)

I'm curious what numbers you are seeing. Also note this comment:

However, it very likely that you are seeing much worse numbers considering your 1000x1000 map! Still it would be interesting to know the inflection point of a heap vs. a vector with respect to the open list.

See this as well

Finally, the linear find-min scan could itself be sped up by using branchless AVX2 code, effectively pushing the heap-vs-vector inflection point even higher. Have you (or anyone) actually measured where that sweet spot lies?

The nerdy stuff. Open set size goes up to around 2300 for 1040x640 on turn 520. I have tried different types of heaps. Currently using a quaternary heap in an array. Popping is the problem, so I think it's slow because it has to bubble down an element from the top to the bottom. And it does it for each active element in the SIMD vector.

I think my complicated SIMD hybrid heap thingy was a little faster still. But I don't like it because it's complicated. I think I need to update it too. And it really needs Clang's SIMD optimisation. So the fastest heap depends on which compiler you use.

Another heap was the Radix Heap. This was slightly faster than the typical binary heap, but requires monotonic pops, which doesn't quite happen because Civ4's path costs are weird.

The best heap is no heap at all. Which you can do if you only have a small number of distinct F costs at any one time. Then you can bucket them. This is trivial for a uniform cost grid (SEA UNITS). And this is what you can do if you redo the AI. You could respecify path costs to let you use bucketing (and then use an expensive pathfinder for accurate nearby pathing).

Updating the heuristic doesn't show up on the profiler. I also don't have a heuristic right now anyway, so I could just remove that code. A good heuristic would help, but I don't know of one. The vanilla heuristic underestimates too much to be useful, and the 16x landmark heuristic was very good, but needed to be kept up to date. I might try it out again though. Or at least, use the simple distance heuristic for sea units.

Scanning the open set with AVX512 is an idea. Scanning it for the smallest 16 elements would be slow, but I could scan it for the smallest element in each lane. It would be an approximation, which is fine as long you get the true minimum in one of the lanes. Alternatively, SIMD "partial sort".

Anyway.

Trying 5x Huge and CvPlotGroup::recalculatePlots has started to show up on the profiler, at about 55% CPU time on the main thread. Turn times are reaching ~20s at turn 536.

<Nexus> said:
Maybe limit the game to a map size that won't exceed the limits?

Limit!? Maybe there could be a warning message.

Also, parallelism idea. In case you have a gigantic map with a hundred AI players, you'd want to parallelise across players too. This may be possible by adding a vulkan command queue to your game update function. If you can specify every operation on game state in terms of tasks with read/write dependencies, you can then have some task graph lib maximise parallelism. The difficult part though will of course be unit updates, as units logically depend on the whole map. But what would possibly, be possible, is to specify unit update writes as the set of all possible moves, with plot danger radius, and when some unit (of any player) wants to do some pathfinding, it could co_await updated pathfinding plot info when a pending plot shows up at the top of the open set. This may be insanely complex to implement.

*Using arrays instead of linked lists for plot group plot lists took 4s off turn time. And they're still slow.
*Uninitialised variable. I'm gonna fix the hundreds of clang warnings now...

Mini-engine progress

<Nexus>

Traveler of the Multiverse

snowern

Warlord

<Nexus>

Traveler of the Multiverse

devolution

Prince

snowern

Warlord

Similar threads