have you taken a look with a profiler though? most of the time is actually spent in the pathfinder - if you could find a way to speed that up, that would be a real breakthrough. although speed is quite good in CP right now, but with a faster pathfinder the AI could use more complex algorithms to generate its moves ...
Last time I looked at the mod with a profiler, I remember seeing that the majority of A* calls were about whether a unit could enter a tile (CvUnit::CanMoveOrAttackInto() and its cousins). Besides your neighbors caching thing and a myriad of minor optimizations I made via reordering certain calls, the three biggest A* performance optimizations I implemented were enabling proper caching, eliminating redundant calls for the CvUnit::CanMoveOrAttackInto() function family, and implementing an A* turn limiter. If they aren't in v9, you can find all three in the experimental branch, and I'm fairly certain those changes are stable enough for porting.
An odd thing I noticed about how the pathfinder uses instances AStarNode is that, well... caching only exists on paper. AStarNode and its siblings have a boatload of members that are made to cache more expensive calls for each node that are independent of path, so that these calls are only made the first time the pathfinder hits a node, not every time it runs through that node. The pathfinder functions cache away these variables alright, and they fetch values from the cache wherever they're needed... but every time the pathfinder enters the plot validation function, it will recalculate the cached values, even if those values were already calculated the last time the A* node was checked for validation.
My solution: I added a new flag into the AStarNode classes that gets set when values are cached for the node, and this flag gets unset when the object is reset (ie. when a new path is calculated with a different unit, from a different starting plot, or with reuse disabled). Then, the pathfinder checks whether this flag is set whenever it would want to (re)calculate the cached values, and simply skips these expensive function calls if the flag is set. I also had to make some other modifications to the pathfinder functions to cover my bases, eg. the path's starting plot needs to be cached before the function returns true.
The CvUnit::CanMoveOrAttackInto() function and its siblings have a LOT of redundant calls, ie. they check for the same values two or three times over the course of the function's execution. Worse still, the pathfinder functions that call these functions can call them multiple times: IIRC, there are cases where a pathfinder will end up calling CvUnit::canEnterTerrain() six times for a single plot. Since these functions are fairly expensive, cutting down on pointless calls speeds things up considerably.
The A* turn limiter is my dirty way of making sure units who are 20 turns away from a target don't waste time by checking whether they can reach a target within a turn; it's optional without my A* road fix, since you can easily rely on a heuristic, but since my A* road fix can potentially increase the heuristic's range by 3x, the A* turn limiter is almost mandatory to cut down on processing time. It works in a fairly simple fashion: as the pathfinder constructs possible paths to the target, it will not accept tiles whose turn count exceeds whatever value the turn limiter is set to. If a unit is just outside of the turn limit, the pathfinder does run slower, since instead of completing the path, exiting the pathfinder, and returning false, it will instead branch out with all its other paths until it no longer has any valid tiles before returning false. In all other cases though, the fact that a unit 3 turns away does not need to go through constructing a 3-turn long path when it wants to check whether the target is 1 turn away does speed things up.
Word of warning about readability: I did reorganize the AStar class so that it uses FastDelegates instead of function pointers, because that way I can reference members of the particular A* object without having to constantly pass an AStar pointer through arguments; if you're keeping AStar's function pointer setup make sure you add a pFinder-> in front of any calls to internal AStar functions or members.
I also messed about with trying to get OpenMP to work with the pathfinder, mainly by seeing if I could somehow parallelize the child validator, but I was not successful: even when the game did not freeze up on me, it refused to validate any tiles outside of ones lying straight NE of the starting plot. It might be worth a look though.
come to think of it, the other big problem is that multiplayer mode doesn't work (at least in CP, but it wasn't much better in vanilla i've heard, so i assume AUI has the same issue). there are desyncs all the time. those really have gazebo and me stumped, so if you want to take a look into it you're more than welcome
I haven't really been getting reports about desyncs, but I haven't heard much about how well the mod works in multiplayer as well: I'll get a comment or two on the workshop page every month that alludes to the person either using the mod in multiplayer or planning on using it, then never hear from them again. From what I can tell, a lot of the time those re/desyncs are caused by Lua interactions, though I can't really explain why; v9 doesn't use any new Lua code, which might explain why the mod doesn't cause desyncs (if it works in multiplayer), and I'm hoping the 4 new event systems I added in v10 don't cause sync issues, either (they're for dynamically changing the flavor of a building, unit, project, or process via Lua code, eg. Petra getting Growth and Production flavor for every desert tile within city radius of the city in question).
If the desync is caused by the RNG though, you could fiddle with CvDllGameContext::RandomNumberGeneratorSyncCheck() (if you haven't already tried) to have it pause the DLL before the resync happens so that you can have a good look at the game state (or, if you want to analyze other people's resyncs, have the DLL create a minidump right before the resync).