Wasn't sure if that was even possible but it sounds like from your comment that it is(?) "You'd then update the heuristic between turns during the human's turn."
To avoid further suboptimality, you could defer only the updates that were not needed by the AI. This kind of thing probably isn't worth it though. Extra complexity, small gain.
Currently, I'm thinking a blend of landmarks/exact heuristics. If you have enough landmarks, then it's basically almost exact. It seems, by the demos, one landmark
near (or behind) the start of a path is perfectly fine. So you could have at most one landmark for every X plots, every couple of cities. And because the landmark is near the path, updating it can't be much slower in the worst case than normal A* anyway.
And if the heuristic is near exact, then fingers crossed, that alone should make pathing much faster, before even considering hierarchical or blockwise A*. Instead of visiting, like, 50% of land area, it'll be more like 5%. And the great thing about a heuristic is that it doesn't have to care about
pathValid
.
I'm hoping the final pathing system will be fast enough to not need path caching. With hierarchical or blockwise A*, maybe one side of the map to the other will take less than 1ms. Suboptimal either way, though. But blockwise A* could be made (possibly) optimal if you take into account remaining movement points upon entering a block, but that means more computation and memory usage.
There is another optimisation. You could do speculative pathing, a "dry run" of all units of one player to cache all expected paths, by imitating the vanilla unit update. And because it's just a dry run, all pathing is done with the initial game state, which means it can be done in parallel if the pathing system can handle it. Then afterwards in the serial unit update, it would hopefully use those cached paths. Pretty close to the magical parallel unit update, but requires path caching, and probably won't be all that useful if pathing is fast.
*Okay, results in (from the test program): Block A*
is fast.
Code:
Basic AStar: 38.4655ms (127033 expanded, 302.799ns ns/expand, 981533 edges)
34620
No SIMD:
Block AStar: 14.9763ms (805 blocks visited, 18604.1ns/block, 82.6849ns/plot)
34620
AVX-512:
Block AStar: 0.8841ms (805 blocks visited, 1098.26ns/block, 4.88116ns/plot)
34620
And it was faster without SIMD, which is surprising. And with 70% of land area visited, just the basic heuristic.
16x16 blocks is optimal. Perfect for AVX-512 and only a little slower with AVX2. AVX-512 32x32 is slower than AVX2 16x16.
Surprising bug in the paper, afaik: You must push the block onto the heap unconditionally if any one of its sides has a g-cost reduced. I assume this is because the f-cost you get from the updated edge might be greater than the f-cost after the whole block is updated.
Promising result, but this is with a purely additive cost function. No movement points, just plot movement costs. Taking into account movement points would require an LDDB lookup per boundary plot, and the LDDB would no longer be a simple 2D array.
*Maybe this would be better as the heuristic, if necessary. The LDDB might get out of control if used for pathfinding. 23MB per player, per path flags, per unit type... Deduplication and caching may get complex.
*Landmarks aren't as effective as I'd hoped. They are much better if you assume symmetric costs, but that may cause slight suboptimality. With 4 landmarks and symmetric costs, plotwise A* is ~10ms. Performance doesn't really improve with more landmarks.
*8 landmarks with non-symmetric costs is about the same actually.
*Going nuts with 16 landmarks and AVX-512 to evaluate gets it down to 6.3ms. But that's a lot of landmarks to keep up to date.
*<3ms with AVX-512 plotwise A* and basic heuristic, ignoring movement points:
SimdAStar: 2.3767ms (125122 visited, 18.9951ns/visited, 2.37438ns/edge)
*Landmark heuristic may not have any performance problems. If you build a road or chop a forest, you may be able to simply compute max path cost reduction and add that to a global heuristic bias. This should allow you to use an arbitrarily stale heuristic and update it whenever you feel like.
*~1.5ms for AVX-512 plotwise A* and 16x landmark heuristic. ~2.6ms for a longer path from one side to the other. Heuristic evaluation is expensive as it's a 16x gather basically. But using a per-block heuristic selection lookup isn't much of an improvement.
*Also, the big downside of using block A*, LDDB updates, may also be ignored. Could defer and use plotwise A* instead.
*AVX-512 plotwise A* on WSL is much faster at ~0.5ms. Slightly faster with
-march=alderlake
. MSVC codegen isn't all that great. Longer path is 1.3ms. Probably about time to start integrating this new path finder...