Mini-engine progress

olduser · Nov 13, 2024

snowern said:
Many games have got reimplementations. But Civ4 will be different from most as you have the DLL source available.

Other companies don't even like having their games reverse engineered but maybe Firaxis doesn't even care about having their source code used in a project like this. Requiring players having the original assets because they own the game would certainly help to avoid trouble. Once this becomes a standalone game things could get heated.

snowern · Nov 13, 2024

Ordinarily, original assets are required, but because this is a console implementation, art assets are not needed at all (except optional sounds). You could in theory provide your own XML info files and scripts.

A reason to rush towards a graphical implementation I guess.

snowern · Nov 26, 2024

Turn times are now 2-3s for my three ultra huge "early"-game test cases (FASTER!)

But, I've reached the bedrock of optimisation where AI_foundValue is taking up 43% of time on the main thread for save Giant Pangea begin T209.

A lot of little things are threaded now. The two bumps are normal players calculating found values, and then the barbarians. Or the other way around.

Unfortunately, I don't think there's anything I can do about that. AI_foundValue is a big complex function that can't be incrementally updated easily. At least barbarians don't found cities all the time.

But other things can be incrementally updated. Land targets, stolen city plots. Those have disappeared from the profiler.

Here's the flame graph for Giant Pangea begin T200, which only takes 1.8s:

Which is odd, as it's the save that has 3K barb animals. But you can see AI_animalMove.

Exploration is another big improvement. Instead of looking at every plot, AI_explore and AI_exploreRange will perform a localised search that should stay reasonably close to original mechanics. AI_explore is an interesting case, as to avoid searching the whole map, you can exploit the behaviour of its value function to know when to stop a local search (in most cases).

A specific case is Charlemagne's 627ms work boat that could not explore anywhere because it was stuck between barb galleys. But it would still pathfind all over the map, eating precious CPU cycles in AI_plotDanger, defeating my plot danger early-out code because there's 800 barb galleys.

Still haven't finished an actual Huge+++ map yet. Really should get around to that, to find the rest of the slow paths, check things work.

AOS9001 · Nov 27, 2024

Not only is this a cool project with huge potential for the future, it's also a fascinating look into what's going on deep under the hood of the game.

olduser · Nov 27, 2024

Are you using the vanilla BTS source code or a modded version?

Edit:
Have you tried tried using Clang as Compiler?

valergrad · Nov 27, 2024

Are you telling that 43% of time game is spending on AI trying to calculate the value of founding a new city? That's ridiculous)
May be this function is too complicated to optimize itself, but we can optimize how often it is called? How many times this function has been called at this save? Looking at signature - it is called for each plot. But I assume it is called only for plots that are achievable by that AI to settle, only when it has settlers, and only once for each plot for each AI. If any of this is not true somehow ( i have seen coding like this!) - it could be a huge improvement)

snowern · Nov 27, 2024

alberts8 said:
Are you using the vanilla BTS source code or a modded version?

Edit:
Have you tried tried using Clang as Compiler?

It is the original DLL code, with modifications to make it work for a modern compiler and the new engine. The mechanic-changing performance modifications are enabled by ENABLE_FASTER_ULTRA_HUGE_MAPS.

Compiling on Linux/WSL is something I hope to do. Code will need to be made portable, and I will have to figure out CMake. It would be a kind of fun. Should be faster on native Linux. On Windows, a lot of time is spent just printing to the console.

valergrad said:
May be this function is too complicated to optimize itself, but we can optimize how often it is called? How many times this function has been called at this save?

Yes, from a second look, it doesn't seem all that bad. It could probably be called less, as recomputation appears to be unconditional right now. And most of the function could be locally updated. But it has global dependencies, like the number of cities, and the nearest city.

snowern · Dec 11, 2024

The pathing was fast because it was wrong!

Good thing I had this verification code lying around to find out. I just didn't use it because it was slow.

:cry:

The caching system was wrongly caching some pathing state objects that depended on destination coordinate, which is a minor problem (I can probably just ignore that for long range pathing). And some pathing state objects did not invalidate when "max turns" changed, which had the effect of aborting long range pathing queries early because max turns increased.

Now, I have pathing state cache verification, which is much more rigorous, and pathing state reuse no longer depends on max turns.

Also, the coarse heuristic is inconsistent, which makes A* inefficient.

Back to the drawing board! I still have a few tricks up my sleeve...

I wonder if the Factorio heuristic is consistent. It is likely admissible, but consistency is important too. Even if my heuristic is admissible, it may be picking different estimations, causing the heuristic to jump about in an inconsistent way.

Below looks like a 30% visitation, but the logs say 51% of land was visited... the varying opacity of the overlay indicates an inconsistent heuristic.

2024y11m27d - Inconcistent heuristic.png

onkelm · Dec 21, 2024

Hi,
any pre Christmas news ?

snowern · Dec 21, 2024

onkelm said:
Hi,
any pre Christmas news ?

Oh, I'm just stuck in pathfinding hell at the moment.

How do you make pathfinding fast? There are many choices.

Jump Point Search - A well-known method, but it's for uniform grids.
Weighted Jump Point Search - This could work, but is greatly complex and might not be all that fast because of how noisy and scattered features on the map tend to be.
Rectangle Symmetry Reduction - Simply partition the map into uniform cost rectangles and reduce the branch factor with symmetry reduction. This might get you a speed up, but might not really be enough. Plus, it might get worse in the late game as the AI leaves holes in its road networks.
8-DOP/General Convex Symmetry Reduction - I could extend the idea to increase the effectiveness of the method, but too complex for the improvement I think you'd get.
Exact Heuristic - How this would work is you divide the map into blocks, and then compute the shortest path between every pair of blocks. This would be great, except that precomputation and updates would be slow. It might actually be good enough in practice, though, if you pick a large block size.
Bidirectional Search - This seems like a reasonable idea in the face of dijkstra-like search. You will get some very very slight suboptimality though.
The Factorio heuristic - I've found some info in a forum post and it's simply (number of chunks to goal) * (chunk size) + (shortest possible distance to the next chunk). It's probably neither admissible nor consistent, but must be good enough for them. I might not use it for the heuristic, but I could use it to guess the shape of the path...
Landmarks/Differential Heuristic - Near-exact. You just place a couple of landmarks around the edge of an area, coastal plots, and do dijkstra for the whole area. Then your heuristic is a couple of table lookups.
But, because the heuristic covers the whole map, any route or feature change will invalidate the heuristic. Lifelong Planning A* could be used to minimise the amount of updates. The worst case is the AI building a road near a landmark, which could have cascading effects. Another improvement is to cheat: allow the use of a stale heuristic if the expected "path shape" does not go near changes. This may create slightly sub-optimal paths.
Just make it faster - Some comment somewhere said that you can do 512x512 dijkstra in less than 1ms. Of course I was interested. It turns out that with great optimisation, I couldn't do it faster than 23ms (~100ns/node). But if you do it on a uniform grid, it turns into BFS and you can do it in 1.6ms (~6ns/node). Maybe that's what happened. It wasn't dijkstra at all.
Hierarchical A* - The tried and true method. What you can do is exact pathfinding up to X blocks, and then switch to hierarchical. This method would be very fast, but clearly suboptimal.
Optimal block-wise pathing - This is what I'm going to do next. You can simply divide up the map into blocks, compute boundary all-pairs shortest paths, and then pathfind on the boundaries only. Skipping over the interior of blocks, you will obviously vastly reduce the number of nodes in the search and still be optimal.
But the big downside is that you have massively increased the branch factor. A 16x16 block covers 256 plots and has 60 plots on the boundary. Each boundary plot connects to the other 59 boundary plots, so the number of edges coming out of one side of a block is almost a thousand. Not great! But... it might be possible to use AVX to accelerate this.
The second "disadvantage" is that these blocks will have to encode full validity info, which means I need to fully cache pathValid, which is a complex function. But, I should probably do that anyway, as it will give a decent speed up for even basic A*.
The third disadvantage is memory usage. All-pairs paths is a lot. 3600 16-bit weights per 16x16 block, 7.2KB per block, ~16MB for the whole map. And it's per player, per path flags, per unit movement spec, per unit traits. It's gonna add up. This could be improved by deduplicating blocks.
*Might be slightly suboptimal as the full pathCost function won't fit into 16 bits. Will have to divide the all-pairs costs by 1000, which mostly divides away the cost from defence bonus.

Also, I considered using C++ modules. You're lucky that didn't work out well. Maybe next time!

*I just found a magical paper: https://ojs.aaai.org/index.php/AAAI/article/view/7813. 1: Why didn't I think of that. 2: I hope it's not too good to be true.
*Too good to be true. It has the same combinatorial problem as above, so I don't know how they get their timings. But the structure of the algorithm is neat.

Zarniw00p · Dec 22, 2024

Just a thought (complete non-expert here). With the "Exact Heuristic", you say "precomputation and updates would be slow". How about:

1. On game start (4000BC) it precomputes only the blocks that players/AI can reach in a couple of turns or so.
2. Then during idle time (game is waiting while human thinks), it could begin precomputing the next set of blocks, radially.
3. All computed paths get added to a lookup table.
4. As the civs expand, more blocks are computed while waiting for player input (which could be quite a lot of time).
5. When roads are added to a tile, the relevant blocks are recomputed and the lookup table updated.

Then most of the hard lifting is done while the human is thinking, so they're not waiting ages between turns. They might notice the fan spin though!

(I'm sure there are plenty reasons this wouldn't work but wanted to type it out as I thought of it!)

Paul

snowern · Dec 22, 2024

Zarniw00p said:
Just a thought (complete non-expert here). With the "Exact Heuristic", you say "precomputation and updates would be slow". How about:

1. On game start (4000BC) it precomputes only the blocks that players/AI can reach in a couple of turns or so.
2. Then during idle time (game is waiting while human thinks), it could begin precomputing the next set of blocks, radially.
3. All computed paths get added to a lookup table.
4. As the civs expand, more blocks are computed while waiting for player input (which could be quite a lot of time).
5. When roads are added to a tile, the relevant blocks are recomputed and the lookup table updated.

Then most of the hard lifting is done while the human is thinking, so they're not waiting ages between turns. They might notice the fan spin though!

(I'm sure there are plenty reasons this wouldn't work but wanted to type it out as I thought of it!)

Paul

If the map didn't change, computing the exact heuristic for all-block-pairs would be easy at startup. And you can thread it. And you can let the computation run until it fills the map, giving you all heuristic values from one block in one go.

But the updates, those are the problem. You can't just defer them. When a unit wants a path, it needs updated values if you want optimality. And because this kind of heuristic is global, it's invalidated by distant changes.

But, it could be optimised. A path only needs the heuristic values coming from the goal, so one A* search is enough to update the heuristic for one path.

I said it might be okay in practice. But it's probably more computation than necessary. The Landmarks heuristic should be more performant overall as map updates will invalidate less stuff, and landmarks should be plenty accurate.

But if you allow sub-optimality, then you can arbitrarily defer updates. You could maybe do something really efficient like a one-turn lag for all heuristic values. You'd then update the heuristic between turns during the human's turn.

Fast pathing isn't all that difficult, ultimately. I could just slap Hierarchical A* on it and call it a day. I might have to, but I will resist being defeated by this foreboding comment!

Sephi said:
Sounds about right. Games with bigger maps these days usually use algorithms that are much faster than A* (but non optimal)

<Nexus> · Dec 22, 2024

snowern said:
Fast pathing isn't all that difficult, ultimately. I could just slap Hierarchical A* on it and call it a day. I might have to, but I will resist being defeated by this foreboding comment!

So it isn't so much an issue of coding but more of an issue of ego?

Zarniw00p · Dec 30, 2024

snowern said:
If the map didn't change, computing the exact heuristic for all-block-pairs would be easy at startup. And you can thread it. And you can let the computation run until it fills the map, giving you all heuristic values from one block in one go.

But the updates, those are the problem. You can't just defer them. When a unit wants a path, it needs updated values if you want optimality. And because this kind of heuristic is global, it's invalidated by distant changes.

But, it could be optimised. A path only needs the heuristic values coming from the goal, so one A* search is enough to update the heuristic for one path.

I said it might be okay in practice. But it's probably more computation than necessary. The Landmarks heuristic should be more performant overall as map updates will invalidate less stuff, and landmarks should be plenty accurate.

But if you allow sub-optimality, then you can arbitrarily defer updates. You could maybe do something really efficient like a one-turn lag for all heuristic values. You'd then update the heuristic between turns during the human's turn.

Fast pathing isn't all that difficult, ultimately. I could just slap Hierarchical A* on it and call it a day. I might have to, but I will resist being defeated by this foreboding comment!

Thanks for replying; interesting to see your thoughts

One of my ideas (buried in my list of steps) was the use of idle time (human player looking around and planning) to build/update the values, rather than having it take extra time while the AI turns are calculated. So the human is not sitting waiting for the work to get done, as it's done while the game is waiting for the human.

Wasn't sure if that was even possible but it sounds like from your comment that it is(?) "You'd then update the heuristic between turns during the human's turn."

Very interested in how all this turns out! No experience in pathfinding myself yet, but will encounter the need in a little side-project I'm doing (far simpler than this). Gonna review those Stanford resources you linked to.

Paul

snowern · Dec 30, 2024

Zarniw00p said:
Wasn't sure if that was even possible but it sounds like from your comment that it is(?) "You'd then update the heuristic between turns during the human's turn."

To avoid further suboptimality, you could defer only the updates that were not needed by the AI. This kind of thing probably isn't worth it though. Extra complexity, small gain.

Currently, I'm thinking a blend of landmarks/exact heuristics. If you have enough landmarks, then it's basically almost exact. It seems, by the demos, one landmark near (or behind) the start of a path is perfectly fine. So you could have at most one landmark for every X plots, every couple of cities. And because the landmark is near the path, updating it can't be much slower in the worst case than normal A* anyway.

And if the heuristic is near exact, then fingers crossed, that alone should make pathing much faster, before even considering hierarchical or blockwise A*. Instead of visiting, like, 50% of land area, it'll be more like 5%. And the great thing about a heuristic is that it doesn't have to care about pathValid.

I'm hoping the final pathing system will be fast enough to not need path caching. With hierarchical or blockwise A*, maybe one side of the map to the other will take less than 1ms. Suboptimal either way, though. But blockwise A* could be made (possibly) optimal if you take into account remaining movement points upon entering a block, but that means more computation and memory usage.

There is another optimisation. You could do speculative pathing, a "dry run" of all units of one player to cache all expected paths, by imitating the vanilla unit update. And because it's just a dry run, all pathing is done with the initial game state, which means it can be done in parallel if the pathing system can handle it. Then afterwards in the serial unit update, it would hopefully use those cached paths. Pretty close to the magical parallel unit update, but requires path caching, and probably won't be all that useful if pathing is fast.

*Okay, results in (from the test program): Block A* is fast.

Code:

Basic AStar: 38.4655ms (127033 expanded, 302.799ns ns/expand, 981533 edges)
34620

No SIMD:
Block AStar: 14.9763ms (805 blocks visited, 18604.1ns/block, 82.6849ns/plot)
34620

AVX-512:
Block AStar: 0.8841ms (805 blocks visited, 1098.26ns/block, 4.88116ns/plot)
34620

And it was faster without SIMD, which is surprising. And with 70% of land area visited, just the basic heuristic.

16x16 blocks is optimal. Perfect for AVX-512 and only a little slower with AVX2. AVX-512 32x32 is slower than AVX2 16x16.

Surprising bug in the paper, afaik: You must push the block onto the heap unconditionally if any one of its sides has a g-cost reduced. I assume this is because the f-cost you get from the updated edge might be greater than the f-cost after the whole block is updated.

Promising result, but this is with a purely additive cost function. No movement points, just plot movement costs. Taking into account movement points would require an LDDB lookup per boundary plot, and the LDDB would no longer be a simple 2D array.

*Maybe this would be better as the heuristic, if necessary. The LDDB might get out of control if used for pathfinding. 23MB per player, per path flags, per unit type... Deduplication and caching may get complex.

*Landmarks aren't as effective as I'd hoped. They are much better if you assume symmetric costs, but that may cause slight suboptimality. With 4 landmarks and symmetric costs, plotwise A* is ~10ms. Performance doesn't really improve with more landmarks.
*8 landmarks with non-symmetric costs is about the same actually.
*Going nuts with 16 landmarks and AVX-512 to evaluate gets it down to 6.3ms. But that's a lot of landmarks to keep up to date.
*<3ms with AVX-512 plotwise A* and basic heuristic, ignoring movement points: SimdAStar: 2.3767ms (125122 visited, 18.9951ns/visited, 2.37438ns/edge)
*Landmark heuristic may not have any performance problems. If you build a road or chop a forest, you may be able to simply compute max path cost reduction and add that to a global heuristic bias. This should allow you to use an arbitrarily stale heuristic and update it whenever you feel like.
*~1.5ms for AVX-512 plotwise A* and 16x landmark heuristic. ~2.6ms for a longer path from one side to the other. Heuristic evaluation is expensive as it's a 16x gather basically. But using a per-block heuristic selection lookup isn't much of an improvement.
*Also, the big downside of using block A*, LDDB updates, may also be ignored. Could defer and use plotwise A* instead.
*AVX-512 plotwise A* on WSL is much faster at ~0.5ms. Slightly faster with -march=alderlake. MSVC codegen isn't all that great. Longer path is 1.3ms. Probably about time to start integrating this new path finder...

Blake00 · Jan 26, 2025

snowern said:
Continuing on from https://forums.civfanatics.com/thre...ed-in-large-games.688441/page-3#post-16622683

snowern said:
Getting close to a playable state. And then you'd just need a main menu to start/load without fiddling with main.cpp.

onkelm said:
Awesome !
So just to make it clear for me - you startet your work at April 5. this year till now - lets say 6 month - and you reverse engineered/decompiled CIV4 as a single person show ?
Where have you been the last 20 years ?

This is remarkable work! Am so happy that my original thread asking some questions about the single thread slow AI turns issue has kicked off something as awesome as this!

Gameboy2396 said:
Definitely a priority on our side is to spread the word, especially to C2C devs. The gameplay in C2C is too overwhelming and too much to take in at once for me personally, but the user interface, such as the research screen, tech discovery window, etc., is absolutely top notch. There's already great talent out here, and plenty of it.

I can help with that. Part of the work I do here is working with @The_J to share cool Civ news (indluding fan reconstruction projects) to our various news pages across social media and then cross-share to as many other Civ communities as possible out there. As a result I've built up a list of all Civ4 communities I've found on FB/Reddit/Discord and can share your work there. I've actually been thinking of doing a combined Civ1, Civ2, Civ3 reverse engineering reconstruction projects awareness post which I could now add this Civ4 project into which would then allow me to share to all the many fan communities for those Civ games too (which no doubt even more Civ4 fans reside in).

To better facilitate the above and attracting of new talent, I'd recommend you guys edit the OP starting this thread to make it more 'new visitor friendly' by including some project summary/goals/needs/repository links/requests for talent etc at the very top (no need to kill the original post just stick the new project info stuff above and put a ----- horizontal line between them). Or if Snowern is busy I can edit it if need be if I'm supplied with what you guys want said.

Oh and speaking of a public github repository link, apologies if I've missed it in this big thread but is there a repository yet? Or is the work being kept private for now?
.

snowern · Jan 27, 2025

Blake00 said:
Oh and speaking of a public github repository link, apologies if I've missed it in this big thread but is there a repository yet? Or is the work being kept private for now?

Not yet. Kind of got side tracked making big maps go faster.

Currently integrating the new vectorised pathfinder with vectorised landmark heuristic. Hopefully, I can make heuristic updates have zero cost by doing it in the background. Takes a few hundred milliseconds for 16 landmarks x 7 unit movement types x num land plots. Then the vectorised pathfinder should take just a few milliseconds for long paths by using vectorised bit masking for pathCost/pathValid. Probably the most optimised pathfinder you can make without doing hierarchical or block A*, and it should still be optimal. But this kind of code is incredibly bug prone. Must keep verification enabled.

Blake00 said:
To better facilitate the above and attracting of new talent, I'd recommend you guys edit the OP starting this thread to make it more 'new visitor friendly' by including some project summary/goals/needs/repository links/requests for talent etc at the very top (no need to kill the original post just stick the new project info stuff above and put a ----- horizontal line between them). Or if Snowern is busy I can edit it if need be if I'm supplied with what you guys want said.

This is quite a long post. Maybe I might make a new post for the eventual github link big day, and link this post. That day will be chaos. For me. Or maybe I will just stick it in this post as there might be more progress. It's definitely not going to be something average joe can run anyway. It will be buggy and incomplete. It will require compilation and tweaking. But finishing a game will be possible in theory.

Blake00 · Jan 27, 2025

Well I guess it depends how much detail you want to go into, but hey on the bright side when you do setup a github it'll need a readme summary on the main page so you can kill 2 birds with 1 stone and just copy & paste it for the summary here heh.

Bugs and all, every great fan community game rebuild project started somewhere and it will be exciting to see what people will do with your work and whether it will eventually lead to a full rebuild (that still needs some of the original game files to avoid any legal issues) with better graphics & GUI, capabilities, fixes, and extensions etc. So many possibilities... someone needed to do the hard work to get the ball rolling and you've done it mate!

.

Blake00 · Jan 27, 2025

Oh and another suggestion is at that later stage we setup a subforum for a community rebuild project and a pinned wishlist thread for people to pitch all the things they would want added into an enhanced Civ 4 fan rebuild. The C7 Civ3 rebuild project guys did something similar and people came forward with a lot of cool stuff, many of which they're looking at adopting. Eg one of my favourites there was the suggestion of multiple linked worlds like what Civ 2 Test of Time had (& other games like Master of Magic, Age of Wonders, & Warlock series) as the multimaps on Test of Time suffered from Civ2's 7 player limit making things a bit sparse but that would not be an issue in Civ3 and Civ4 due to the higher player count. People could create some pretty crazy scenarios with linked worlds. However something like that would be even more dependent on some huge AI turn multi-threading speed improvements haha!

oldbay · Feb 6, 2025

@snowern i'm is delighted with your project!
Last year i'm began to make a similar project, but "from the other end": I rebuild CvGaemeCoreDLL(Realism Invictus mod) using boost 1.74 & python 3.9 for linux arh x64. In future planing create cross-platform build (linux/windows).
Today:
* the project use cmake for build.
* It can be imported to python3.
* I work on Python Assets

Github link: https://github.com/oldbay/civ4_ri_rebuild
CvGaemeCoreDLL: https://github.com/oldbay/civ4_ri_rebuild/tree/master/GameCore/CvGameCoreDLL

I propose to join Yours efforts.
My experience:
* Python (First programming language)
* SQL (Advanced level)
* С++ (Second programming language, base level)

Mini-engine progress

Emperor

Warlord

Warlord

Slightly over 9000.

Emperor

Warlord

Warlord

Warlord

Warlord

Warlord

Chieftain

Warlord

Traveler of the Multiverse

Chieftain

Warlord

CFC Mod Archivist & Social Media Helper

Warlord

CFC Mod Archivist & Social Media Helper

CFC Mod Archivist & Social Media Helper

Chieftain

Similar threads