• Our friends from AlphaCentauri2.info are in need of technical assistance. If you have experience with the LAMP stack and some hours to spare, please help them out and post here.

Mini-engine progress

I love compiler memes. Half the conversations on the BFBB decomp server are everyone pulling their hair out over floats.
 
At least Civ4 doesn't do floats... Well, it does a little, in CvPlot::shouldProcessDisplacementPlot. Oh, and getCombatOdds. I hope that's not going to mess up my perfectly deterministic world.
 
Just be glad you know you have the compiler the 4K devs used. BFBB's decomp team spent years trying to figure out if the memes were the result of some obscure minor revision of the compiler or if the HI devs genuinely cast all their floats to volatile.
 
Last edited:
so,
is is this project going?
need any help?
are we gonna have civ4 64bit in a few years? :)
It's playable right now. Just finishing touches. Verification of gigantic map optimisations against "live" data, verification of the regular build against the Firaxis DLL, actually play a gigantic map (hopefully nothing pops up), and will attempt Linux. Linux would be interesting. Probably can't use VS, the compiler probably won't be able to handle MSVC-specifics, you got case-sensitivity, and the terminal will work differently.

Future things post-release could be Python 3/latest pybind (likely to require script changes), get rid of CvString/CvWString, complete and total refactoring to remove almost all globals, fix all clang warnings, GRAPHICS, multiprocess mapfinding, new Civ4 machine-learning environment (like Freeciv) for ML devs. Plenty of things for people to do if they're crazy enough. I'll probably try some refactoring. C++ modules sound fun.
 
It's playable right now. Just finishing touches. Verification of gigantic map optimisations against "live" data, verification of the regular build against the Firaxis DLL, actually play a gigantic map (hopefully nothing pops up), and will attempt Linux. Linux would be interesting. Probably can't use VS, the compiler probably won't be able to handle MSVC-specifics, you got case-sensitivity, and the terminal will work differently.

Future things post-release could be Python 3/latest pybind (likely to require script changes), get rid of CvString/CvWString, complete and total refactoring to remove almost all globals, fix all clang warnings, GRAPHICS, multiprocess mapfinding, new Civ4 machine-learning environment (like Freeciv) for ML devs. Plenty of things for people to do if they're crazy enough. I'll probably try some refactoring. C++ modules sound fun.
Can't wait, we'll finally be able to mod the code for rivers and the Great Wall!!! :bounce:
 
if i could suggest one thing that may need to be murdered, dissected and rebuilt, it's the plotgroup system (which is the system handling which tiles are connected for resource access and trade purposes).

From what i've had to interact with, it seems very innefficient, with groups being completely deleted and rebuilt tile by tile each time there's an effect that impacts it ( pillaged road or improvement, discovered resources, ....)

(storage of the network is the key part to make that work, as of now, it's just a list of all the tiles in the group, without better understanding of the links between them)
 
if i could suggest one thing that may need to be murdered, dissected and rebuilt, it's the plotgroup system (which is the system handling which tiles are connected for resource access and trade purposes).
The CvPlotGroup code is included in the DLL.
Various mods made changes to that system and if you want to rebuild it completely today you could.
 
Can't wait, we'll finally be able to mod the code for rivers and the Great Wall!!! :bounce:
That's a mystery for me. How is the great wall encoded in the save file, and how do you build the wall out of NIFs. It's probably some list of coordinates.
Is there something i can contribute with?

If this project will be what it aims to be,
It could revolutionize civ fans and brong glory to civ4 and its perfection.
A "fun" thing to do would be a total refactoring of the DLL to reduce globals. But there will be a point of friction because a python interpreter is naturally a big global entity, and the scripts expect globals. Something that would be nice is if the game itself and other in-game-only classes were no longer globals. That way, initialising and destroying a game would be more elegant in code.

There's also the unicode refactoring. I'd probably just use wchar_t as-is for the Linux build, and do conversion in serialisation. That's the quick way. The more portable way is to do away with wchar_t and switch to char8/16/32_t. char8_t won't work well with C++ IO, but I'd like the type safety that separates it from non-"text" strings. Check that XML parsing handles encoding correctly too.

And another thing is missed optimisations. No doubt some slow paths remain. Find them in a profiler.

And ultimately, the graphical engine implementation. It would be easier in some ways as the DLL is designed for a realtime engine. But much more difficult in other ways. One random thing I've seen is that some NIFs have spline animations encoded in some format I don't know of.

If you are already moving away from Boost Python skip Pybind and use Nanobind instead.

🤣🤣🤣 Only if you like to deal with Compiler bugs because sadly no C++ Compiler has complete and stable module support.
Pybind has python embedding. Ie, starting up the python interpreter and running code. That's nice to have. But if nanobind were to support implicitly convertible int enums, that would be worth switching over for. Because that's how boost python did enums, and how scripts expect it. Pybind enums are strongly typed and I have to hack them to make them work with scripts. It is possible to make something custom, but you have to use class_ or enum_ to get the runtime registration behaviour.

I have actually got a modules branch working. Everything modularised except the engine and gigantic map optimisations. Just about. You only need to do a number of silly workarounds, and Intellisense barely works, and it's basically one big giant module. But compared to includes, the debug rebuild is twice as fast. But compared to the PCH, it's not much faster. I would like to use modules over PCH though, if the dev experience wasn't so terrible.
if i could suggest one thing that may need to be murdered, dissected and rebuilt, it's the plotgroup system (which is the system handling which tiles are connected for resource access and trade purposes).

From what i've had to interact with, it seems very innefficient, with groups being completely deleted and rebuilt tile by tile each time there's an effect that impacts it ( pillaged road or improvement, discovered resources, ....)

(storage of the network is the key part to make that work, as of now, it's just a list of all the tiles in the group, without better understanding of the links between them)
Plot groups were one of the slow paths I sped up. Haven't seen them on the profiler since. Yet. I need to get a fully populated gigantic map to stress test everything.

Anyway, uhh... the barbs have been making too many animals.
2025y06m25d - Cv4MiniEngine - Unit allocation failure.png

Code:
#define FLTA_ID_SHIFT                (13)
#define FLTA_MAX_BUCKETS        (1 << FLTA_ID_SHIFT)
Code:
╭───────────────────────────────────────────────────────────────────────
│                                                  MILITARY ADVISOR    
├┬─────────────────────┬─────────────────────────────────────╥──────────
││[ ] Julius Caesar    │░░ ╭──────────────────────────────┬─╮║        
╡╰─────────────────────╯░░ │Location                      │▼│║        
┤╭─────────────────────╮░░ ╰──────────────────────────────┴─╯║        
││[ ] Hammurabi        │░░ ╭──────────────────────────────┬─╮║        
│╰─────────────────────╯░░ │Unit Type                     │▼│║        
│╭─────────────────────╮░░ ╰──────────────────────────────┴─╯║        
││[ ] Zara Yaqob       │░░ [ ] Show individual units         ║        
│╰─────────────────────╯░░                                   ║        
│╭─────────────────────╮░░ [ ] ALL UNITS (8190)              ║        
││[ ] Tokugawa         │░░    [ ] Neutral Territory (8190)   ║        
│╰─────────────────────╯░░       [ ] Lion (4465)             ║        
│╭─────────────────────╮░░       [ ] Bear (774)              ║        
││[ ] Ramesses II      │░░       [ ] Panther (1241)          ║        
│╰─────────────────────╯██       [ ] Wolf (1710)             ║        
│╭─────────────────────╮██                                   ║        
╡│[ ] Frederick        │██                                   ║        
│╰─────────────────────╯██                                   ║        
│╭─────────────────────╮██                                   ║        
││[●] Barbarian        │██                                   ║        
│╰─────────────────────╯██                                   ║        
│
(it's a TUI, so I can just paste in the text, right?)

Maybe I'll need to upgrade the ID allocation system to 64-bit or put a hard cap on barbarians. Capping barbs would be the simplest option if possible. Surprised I didn't catch this earlier, it's only turn 32. Is this supposed to happen?

*Ahh... I think it's because I left the difficulty on deity, which has a lower getUnownedTilesPerGameAnimal.
 
Last edited:
Can't wait for Caveman2Cosmos being ported to your engine. Finally playing it without running out of memory and dreadful turn times almost sounds too good to be true.
That would take some effort.The TUI world view currently hard codes a lot of things, not very expandable right now. Or rather, this is what a graphical engine is ideal for, then you can just use the mod's assets as-is. After you merge in the DLL changes.

*Okay... so I do need to accelerate multi-unit pathfinding...
Code:
Path requests: 344947, acc=129.035s, max=0.10659s
Path requests (FAStar): 111694, acc=111.462s, max=0.106584s
Path requests (lite verify reachability): 87227, acc=0.173779s, max=0.0001732s
Path requests (multi-unit): 22886, acc=91.6555s, max=0.10659s
Path requests (other flags): 4467, acc=0.105721s, max=0.0034276s
Path requests (other unsupported): 84341, acc=19.8126s, max=0.052788s
Path requests (siege move through enemy): 0, acc=0s, max=0s
Path step distance: 101841, acc=0.436463s, max=0.0008209s
Turns: 32
 
Last edited:
Started doing AI autoplay on a 1040x640 map, with no verification, for performance testing.

Turns times are slowish. Typically, from my T310 save, they vary between 5 and 10 seconds. Turn 431 on marathon shot up to 37 seconds, as the AI decided to do some pirating which I haven't accelerated yet.

CPU usage fluctuates between 7% and 60% of my 12 thread 12600, and will dip down to single thread when calling into python for random events (that may loop over all plots), for parallel unit update serial fallbacks, and unaccelerated slow paths in the AI.

Stats:
Spoiler :
Code:
MyCvDLLEngineIFace::DoTurn: 431
Parallel unit update serial fallback for "AI_barbAttackMove isGoody": 41462, acc=11.2366s, max=1.11736s
Parallel unit update serial fallback for "AI_cityDefenseMove danger": 97, acc=0.809573s, max=0.0339984s
Parallel unit update serial fallback for "AI_guardCity AI_ejectBestDefender": 4356, acc=56.7682s, max=1.57043s
Parallel unit update serial fallback for "AI_pillageValue Dependency on plot reveal state.": 601, acc=13.1929s, max=1.38578s
Parallel unit update serial fallback for "AI_update unimplemented UnitAI attack city": 207, acc=1.85828s, max=0.0323374s
Parallel unit update serial fallback for "AI_update unimplemented UnitAI city counter": 359, acc=0.0260293s, max=0.0212741s
Parallel unit update serial fallback for "AI_update unimplemented UnitAI counter": 7, acc=0.11472s, max=0.0225251s
Parallel unit update serial fallback for "AI_update unimplemented UnitAI worker": 208, acc=2.3853s, max=0.0852043s
Parallel unit update serial fallback for "Exceeded kMaxParallelPasses": 269, acc=2.583s, max=0.571581s
Parallel unit update serial fallback for "canMoveInto isFriendlyCity not implemented": 363, acc=0.0279276s, max=0.0001978s
Parallel unit update serial fallback for "canStartMission Mission unimplemented": 24, acc=0.224009s, max=0.209854s
Parallel unit update serial fallback for "generatePathFromTo getNumUnits() != 1": 126, acc=0.0141302s, max=0.0003299s
Parallel unit update serial fallback for "groupAttack Attacking not implemented": 667, acc=0.82096s, max=0.0444635s
Parallel unit update serial fallback for "groupMove splitting the group": 25, acc=6.91546s, max=0.791066s
Parallel unit update serial fallback for "startMission bDelete": 5880, acc=0.163702s, max=0.0076654s
Path requests: 9835262, acc=570.522s, max=0.13167s
Path requests (FAStar): 454, acc=0.0073364s, max=7.88e-05s
Path requests (lite verify reachability): 0, acc=0s, max=0s
Path requests (multi-unit): 825462, acc=66.1773s, max=0.052154s
Path requests (other flags): 454, acc=0.0096554s, max=9.06e-05s
Path requests (other unsupported): 0, acc=0s, max=0s
Path requests (siege move through enemy): 0, acc=0s, max=0s
Path step distance: 663350, acc=39.7337s, max=0.0305177s
Turns: 121
computeFoundValuesRow: 6400, acc=0.145318s, max=0.0196332s
computeRevealedFoundValuesRow: 1393920, acc=12.9158s, max=0.0046825s
prepareForFoundValueQuerying: 2188, acc=8.69285s, max=0.0199555s
566 cities (27 barbarian)
13968/201524 plots owned on biggest land area (7%)
33.6479s since the last turn.

That's a lot of pathfinding. And from the parallel unit update, the "AI_guardCity AI_ejectBestDefender" serial fallback is the slowest. Not because of ejecting, but the unit will go on to run some slow AI procedure later.

The AI is at least doing something. 566 cities, 7% land coverage. If you extrapolate, you get about 8713 cities once the whole area has been settled. This exceeds ID allocation capacity. But that's not problem as long as they don't all belong to one player, afaik.

Oh, and barbs are capped to 4000. That's enough barbs for now. That's about 50 plots per barb, if they were all on the main landmass.

Flame graph for ~15 turns:
2025y07m04d - Cv4MiniEngine - Gigantic map 15 turns profile (after updates).png

The vectorised pathfinder took about 44% of CPU time (multi-threaded) in total. For comparison, 29% of CPU time was spent inside main. And you can see the heap ops in the pathfinder at the bottom are taking up most pathing time. That's annoying to look at. Not even computing costs, just popping the heap!

The serial updates are taking up a lot of time overall. It's all barbarians. And the AI_update before it is civs. Spies, workers, targeting barb cities. Then to the right is the barbarian's parallel update.

That AI_targetCity should really be in the parallel update. It may have been triggered by "AI_guardCity AI_ejectBestDefender".

The parallel unit update though. It may be theoretically sound and speed things up a lot, but the implementation is bug prone and inelegant compared to a proper AI overhaul. My constraint is that I'm strictly keeping to original AI behaviour, but an overhaul would not have that constraint.

So if anybody out there feels like somehow replacing it with an elegant parallel AI rewrite, I wouldn't mind that at all.

Vanilla AI is not easily parallelisable at a high level because you do everything for one group, then do everything for the next. You can't just parallel loop that, there would be too many data conflicts. Unless you do it the way I currently do and track the data conflicts in a separate implementation of AI, which is basically a crazy idea.

Issues with current implementation:
  • Units can depend on and change the number of missions targetting a plot or unit. The way these units will be parallelised is that one unit will be allowed to change the count, and all the others that read that data are deferred to the next parallel pass. The worst case would be a bunch of workers checking plots all over the map.
  • Pathing invalidation and dependencies are not a massive problem currently. Basically plot danger, afaik. But the fact that a unit can potentially take a read dependency on the whole map irks me. It is possible for plot danger to be invalidated by moving your own units. Animals depend on plot props too, but tend to be very localised. There is overlap though, which is why I've seen two or more large parallel updates for animals.
  • Current implementation is littered with serial fallbacks, and always will be.
  • An AI procedure may trigger serial fallback for something that is not slow, like attacking, but the group may then go on to run more expensive AI procedures. For my implementation to deal with this problem, it would need a small rearchitecturing to run only the next AI procedure in serial and then try the group again in the next parallel pass, instead of running the group's update to the end.
  • You've got a whole entire duplicate AI implementation so you can track data conflicts without modifying live game state. This is a maintenance nightmare and it's why I'm not particularly infused about continuing with this parallel AI implementation.
So, a sane parallel AI overhaul would be great. I'd imagine you'd design the AI code like a GPU pipeline. A sequence of stages that each process groups in parallel. Like, worker management, city defence, retreat to city, scouting, missionary spread, pillaging... for all relevant unit groups at once. Cheap rare things like attacking, join/split group, and declaring war can be done in serial.

But anyway, the parallel AI right now is enabled for barbarians only. Don't trust it enough for all AI, and so I also don't have to implement so many AI procedures. I have thought about adding verification, but that wouldn't be so simple because the RNG will be different.
 
If an AI overhaul is done, perhaps it'd be a good idea to rework turn order to be parallel, such that players and AI set what they want their units to do and then those actions are all carried out at once at the end of the turn.

For the conflicts I can forsee and off-the-top-of-my-head preemptive troubleshooting, I imagine combat could give a combat penalty if a unit defends against a unit in a tile it didn't plan to attack and that if two improvements were to be built on a tile in the same turn due to the user misclicking at some point, whichever improvement the AI would prefer would be the one that gets built and the other worker(s) remain idle. Likewise, if there are more workers building an improvement than needle, the extra workers are counted as being idle and if a combat unit(s) moves into a tile an enemy unit(s) is also moving into, they enter combat with each other. Maybe except any recon units

Order of unit actions could be recon > attacking aerial combat > aerial relocation + moving aerial combat > aerial city bombarding > attacking land combat + spies > workers > settlers > great people + missionaries > moving land combat > land city bombarding > attacking naval combat > moving naval transport > moving naval combat > naval workers > naval city bombarding, with the combat phases being serial and the rest parallel.
 
If an AI overhaul is done, perhaps it'd be a good idea to rework turn order to be parallel, such that players and AI set what they want their units to do and then those actions are all carried out at once at the end of the turn.
That's not just a different AI, that's a different game! Very simultaneous turns.

But that is what will happen internally for a parallel unit update. If you do AI decisions in parallel, a unit may end up doing something that's a bad idea or not possible given the actions of other units. But, the idea is that if you work on smaller subproblems, it should be easier to deal with.

Like barb animals. AI_animalMove is incredibly simple, relatively. You could easily decide all actions for animals in parallel, but, some actions may conflict. What if two animals attack the same unit, or two animals end up sharing a plot. But for this restricted subproblem, you could easily check that stuff in parallel.

And workers. You might have some simple worker allocation routine. Something quick to find target plots and allocate workers to them. Then do all the pathfinding in parallel. And if you're redoing the AI, you can respecify pathfinding to make it faster, or use approximate pathfinding for AI decisions. And as an optimisation, you could flood-fill the map once to know which plots workers can reach.

I'd also wonder that if somebody replaces the AI, they might as well replace most of the mechanical parts of the DLL to make it more thread-friendly. Like, use mutexes, atomics, get rid of lazily calculated values, some ability to execute missions and moves in parallel. Be careful not to introduce non-determinism though.
 
That's not just a different AI, that's a different game! Very simultaneous turns.

But that is what will happen internally for a parallel unit update. If you do AI decisions in parallel, a unit may end up doing something that's a bad idea or not possible given the actions of other units. But, the idea is that if you work on smaller subproblems, it should be easier to deal with.

Like barb animals. AI_animalMove is incredibly simple, relatively. You could easily decide all actions for animals in parallel, but, some actions may conflict. What if two animals attack the same unit, or two animals end up sharing a plot. But for this restricted subproblem, you could easily check that stuff in parallel.

And workers. You might have some simple worker allocation routine. Something quick to find target plots and allocate workers to them. Then do all the pathfinding in parallel. And if you're redoing the AI, you can respecify pathfinding to make it faster, or use approximate pathfinding for AI decisions. And as an optimisation, you could flood-fill the map once to know which plots workers can reach.

I'd also wonder that if somebody replaces the AI, they might as well replace most of the mechanical parts of the DLL to make it more thread-friendly. Like, use mutexes, atomics, get rid of lazily calculated values, some ability to execute missions and moves in parallel. Be careful not to introduce non-determinism though.
I guess my builder is showing. Always considered Civ IV as defined by its units and its cities. You could remove the RNG from combat, make player take turns moving 1 unit at a time, and make all tiles grassland cities on hills and I'd consider it a minor mod, but remove the catapult line and you just made a total conversion lol
 
I guess my builder is showing. Always considered Civ IV as defined by its units and its cities. You could remove the RNG from combat, make player take turns moving 1 unit at a time, and make all tiles grassland cities on hills and I'd consider it a minor mod, but remove the catapult line and you just made a total conversion lol
By the law of large numbers, you can remove RNG by having 10x as many units. Give the barracks a big chunky production bonus.

Anyway, I've now seen that CvPlayerAI::AI_plotTargetMissionAIs is taking 33.4% of CPU time. It doesn't even path, it just counts missions. Maybe that's another thing that needs caching. On the other hand, CvUnitAI::AI_pirateBlockade is taking 45% of time on the main thread on turn ~520. And I've already sorted target plots by distance and turn-limited the pathing, so, uh oh. How many privateers are there exactly...
Code:
AI_pirateBlockade: { [256, 117] with Huayna Capac Privateer PLOT_OCEAN Coast } going to [263, 287]
AI_pirateBlockade: { [761, 149] with Montezuma Privateer PLOT_OCEAN Ocean } going to [322, 491]
AI_pirateBlockade: { [735, 121] owned by Montezuma with Montezuma Privateer PLOT_OCEAN Ocean Fishing Boats } going to [318, 489]
AI_pirateBlockade: { [753, 134] with Montezuma Privateer PLOT_OCEAN Ocean } going to [304, 490]
AI_pirateBlockade: { [749, 126] with Montezuma Privateer PLOT_OCEAN Ocean } going to [817, 223]
AI_pirateBlockade: { [745, 118] with Montezuma Privateer PLOT_OCEAN Ocean } going to [309, 488]
AI_pirateBlockade: { [737, 110] with Montezuma Privateer PLOT_OCEAN Ocean } going to [493, 449]
AI_pirateBlockade: { [733, 106] city of Huexotla owned by Montezuma with Montezuma Privateer PLOT_LAND Tundra Road } going to [341, 157]
AI_pirateBlockade: { [745, 133] with Montezuma Privateer PLOT_OCEAN Ocean } going to [286, 486]
AI_pirateBlockade: { [737, 125] with Montezuma Privateer PLOT_OCEAN Ocean } going to [483, 452]
AI_pirateBlockade: { [773, 161] with Montezuma Privateer PLOT_OCEAN Ocean } going to [497, 449]
AI_pirateBlockade: { [765, 153] with Montezuma Privateer PLOT_OCEAN Ocean } going to [467, 459]
AI_pirateBlockade: { [750, 136] with Montezuma Privateer PLOT_OCEAN Ocean } going to [328, 489]
AI_pirateBlockade: { [801, 216] with Montezuma Privateer PLOT_OCEAN Ocean } going to [814, 227]
AI_pirateBlockade: { [810, 193] with Montezuma Privateer PLOT_OCEAN Ocean } going to [334, 476]
AI_pirateBlockade: { [860, 211] with Montezuma Privateer PLOT_OCEAN Ocean } going to [810, 229]
Not actually a whole lot. But they are pathing across the world.
.
 
The AI is at least doing something. 566 cities, 7% land coverage. If you extrapolate, you get about 8713 cities once the whole area has been settled. This exceeds ID allocation capacity. But that's not problem as long as they don't all belong to one player, afaik.
That would ruin my world conquering plans...
 
Back
Top Bottom