Let me start out by saying that I’m very impressed by your efforts and skill.
... before diving straight into the nerdy stuff:
What are the average- and worst-case sizes of your open list?
I’m asking because the K-Mod Pathfinder (refactored and improved by
@f1rpo for the Advanced Civ Mod:
https://github.com/f1rpo/AdvCiv/blob/1.12/CvGameCoreDLL/KmodPathFinder.h
) stores its open list in a std::vector. That means that the two (or three depending on your viewpoint) critical open list operations have the complexity:
1) Finding the minimum element requires a linear scan (O(n)). Quite a bit worse than the heap but extremely cache friendly.
2) Appending a new element is a constant-time push_back (O(1)).
3) Removing an element is just swapping with and popping the last element (ok since the list is not ordered and we're already committed to a linear scan)
I'm curious what numbers you are seeing. Also note this comment:
However, it very likely that you are seeing much worse numbers considering your 1000x1000 map! Still it would be interesting to know the inflection point of a heap vs. a vector with respect to the open list.
See this as well
Finally, the linear find-min scan could itself be sped up by using branchless AVX2 code, effectively pushing the heap-vs-vector inflection point even higher. Have you (or anyone) actually measured where that sweet spot lies?
The nerdy stuff. Open set size goes up to around 2300 for 1040x640 on turn 520. I have tried different types of heaps. Currently using a quaternary heap in an array. Popping is the problem, so I think it's slow because it has to bubble down an element from the top to the bottom. And it does it for each active element in the SIMD vector.
I think my complicated SIMD hybrid heap thingy was a little faster still. But I don't like it because it's complicated. I think I need to update it too. And it really needs Clang's SIMD optimisation. So the fastest heap depends on which compiler you use.
Another heap was the Radix Heap. This was slightly faster than the typical binary heap, but requires monotonic pops, which doesn't quite happen because Civ4's path costs are weird.
The best heap is no heap at all. Which you can do if you only have a small number of distinct F costs at any one time. Then you can bucket them. This is trivial for a uniform cost grid (SEA UNITS). And this is what you can do if you redo the AI. You could respecify path costs to let you use bucketing (and then use an expensive pathfinder for accurate nearby pathing).
Updating the heuristic doesn't show up on the profiler. I also don't have a heuristic right now anyway, so I could just remove that code. A good heuristic would help, but I don't know of one. The vanilla heuristic underestimates too much to be useful, and the 16x landmark heuristic was very good, but needed to be kept up to date. I might try it out again though. Or at least, use the simple distance heuristic for sea units.
Scanning the open set with AVX512 is an idea. Scanning it for the smallest 16 elements would be slow, but I could scan it for the smallest element in each lane. It would be an approximation, which is fine as long you get the true minimum in one of the lanes. Alternatively, SIMD "partial sort".
Anyway.
Trying 5x Huge and
CvPlotGroup::recalculatePlots
has started to show up on the profiler, at about 55% CPU time on the main thread. Turn times are reaching ~20s at turn 536.
Maybe limit the game to a map size that won't exceed the limits?
Limit!? Maybe there could be a warning message.
Also, parallelism idea. In case you have a gigantic map with a hundred AI players, you'd want to parallelise
across players too. This may be possible by adding a vulkan command queue to your game update function. If you can specify every operation on game state in terms of tasks with read/write dependencies, you can then have some task graph lib maximise parallelism. The difficult part though will of course be unit updates, as units logically depend on the whole map. But what would possibly, be possible, is to specify unit update writes as the set of all possible moves, with plot danger radius, and when some unit (of any player) wants to do some pathfinding, it could
co_await
updated pathfinding plot info when a pending plot shows up at the top of the open set. This
may be insanely complex to implement.
*Using arrays instead of linked lists for plot group plot lists took 4s off turn time. And they're still slow.
*Uninitialised variable. I'm gonna fix the hundreds of clang warnings now...