AI League 2.0

Thrasybulos

Prince
Joined
May 4, 2023
Messages
591

Spoiler Current Elo Ratings Listing (Updated 2025.07.21) :

Elo_Listing_S2_R10.png




After my first attempt was aborted, the time has come to reboot the League format.

The basic idea remains the same: a swiss-system open tournament.
  • Each Series (Season, Tournament) will be played over ten rounds.
  • Each round, each leader will be assigned to one of eight pools(Group, Arena, Table).
    • Pools 1 to 4 (the "top" pools) will be 6-leader pools.
    • Pools 5-8 (the "bottom" pools) will be 7-leader pools.
  • Pool assignation:
    • The leaders are sorted 1 to 52, with leader 1 getting the 1st slot on Pool 1, leader 2 the second slot on Pool 1, etc.
    • For the first round, random.org will do the sorting (no tournament seeding).
    • For subsequent rounds, the leaders will sorted by
      • number of wins
      • reverse previous order
        Spoiler Example :
        For instance, Suleiman starts round 3 with 5 wins in position 8 (Pool 2, slot 2).
        Genghis starts round 3 with 3 wins in position 23 (Pool 4, slot 5).
        Suleiman wins one game while Genghis has a great round and wins 3.
        At the end of round 3, they're both tied: but since Genghis started lower, he wins the tie and will be ranked higher than Suleiman for the round 4 pool assignation.
  • Eight maps (four 6-player maps & four 7-player maps) will be randomly drawn for the AI Survivor maps, and each attributed to a pool for the duration of a Series.
  • Each round will feature several games:
    • Game 1 will have the leaders playing from the starting positions matching their "slot": 1st leader plays as Team 1, 2nd leader as Team 2, etc.
    • Then the leaders will be rotated for the next games: leader 1 will start as Team 2 in game 2, with leader 2 moving to Team 3 ; game 3 will have leader 1 as Team 3 on the map, leader 2 as Team 4, etc.
    • Once every leader has played from every starting position, the games end for that pool. So the top pools will play six games, while the bottom pools will play seven.
So 52 games will be played each round, and each Series will thus feature 520 games.

Settings

The games will be played with my now standard settings:
  • AI Survivor S5 rules: Agressive AIs, no Tech trading, no Vassals, no Huts, no Events, no Apostolic Palace, Deity AIs but no bonus techs.
  • In addition to that: no UN, no barbs.
And a final minor point: since the AIs don't start with Archery tech, and with no barbs to threaten their early cities, their 4 starting archers have been turned into warriors. Barring the case of a super early DoW (turn 40ish), this should have zero impact on the games, it's just for consistency's sake.

Contrary to the first version of the League where I ran games manually and wrote mini-reports for each game, these games will be run with game.aiplay.
So you'll get the data, not the stories. :p

Scoring

Leaders will score one point for a win, plus one point for every win their opponents got in the previous rounds.
Spoiler :

Since everyone starts round 1 with zero wins, scores for round one will equal the number of wins.
But let's say a round 2 pool features Gandhi (2 wins), Shaka, Suleiman, Mao, Zara (1 win each), and Bismarck (0 win).
A Bismarck win would get him 1 + (4 x 1 + 2) = 7 points.
A Shaka win would get him 1 + (3 x 1 + 2) = 6 points.
While Gandhi would only get 1 + (4 x 1) = 5 points for a win.

The stakes will thus get higher and higher as the tournament progresses, with wins in the tops pools worth more points.

Now, this scoring system is meant only for League ranking purposes.
Should it ending up as a decent leader rating system in its own right, that would be through serenpidity, not through design.

As a side note: a point I'd already mentioned for my first league is that the total number of wins is a distorted number here.
The best leaders should be grouped at the top, and each game will offer only one of them a win. While the worst leaders will tend to gravitate to the bottom pools where each game will have one of them get a win.
So, in terms of wins, the best leaders will underperform, while the worst leaders will overperform.

Ratings

Leader ratings will instead be achieved through their "elo" ratings.
See here for a short explanation of the system used (and why those "elos" are actually no longer elos. :D).
Spoiler "Wins" determination :

The issue here is that such a system is based on pair-wise comparisons.
Which the Horatius League allowed, but which isn't the case for a game featuring 6 or 7 different leaders and a single winner.
My previous attempts had tried to fill in the gaps will a comparison system based on survival and endgame scoring, but I've come to the conclusion it was both flawed and pointless.

So let's consider a game featuring Huyna Capac, Shaka, and Pacal (and 3 other leaders we'll ignore).
In that game, Shaka launches an early attack on HC and ends up wiping him out.
But he can't snowball fast and hard enough: Pacal wins by Space.

There are imo two equally valid ways to score such a game.

The first way is to say: the game featured HC, Shaka, Pacal. Pacal won. So Pacal gets a win versus both Shaka and HC.
This has the advantages of being simple to track, and of yielding ratings with a simple meaning: a better rating means a better chance of winning the game.
But it doesn't take into account the actual game narrative. In our hypothetical game, there was no interraction between HC and Pacal, so why should Pacal get a win vs HC?

So the second way of scoring that game would be: Shaka beat HC by killing him, so Shaka gets a win vs HC while Pacal gets a win vs Shaka: he beat him by winning the game.
And my preference went to that second option (again, purely a matter of preference, as I don't think one view is "better" than the other).
Such a system would adequately take into account each game's narrative, but would make the ratings a fuzzier notion: a better rating means a better chance of "beating" an opponent, which may or may not imply winning the game.

The real issue with that second option, though, lies in the data collection process.
Sullla has a very simple and effective system for tracking kills: whoever gets the last city of an opponent gets the kill credit.
That's a perfectly OK system for AI Survivor's purposes, but here it just wouldn't do. Apart from the infamous "kill steals" (and boy, do they happen a lot!), there are shared kills to consider, "relayed"-kills (where a civ is crippled by one civ, then killed off by another), and our considered system would introduce on top "win steals" (where a civ is in the process of eliminating an opponent when another civ achieves a win condition, and gets the win vs that opponent).
So I needed a better system to track kills.
And found one.
I would go for "partial wins" based on city captures (all kinds: standard military captures, but also culture flips and cities given away in peace treaties).
Trouble was... that would need automation. Keeping track manually of each city exchange would mean that recording the results of a game would take longer than running the game!
I was in luck, though, as someone had just posted a savegame parsing API. Except that... looking at what a savegame contained, I realized I would be missing some data.
So I'd need to mod the game to get that data.
I'm neither familiar with C++/Python nor with the Civ4 codebase, so while I believe I would have got there in the end, it would have taken a lot of time, and let's be honest, more efforts than I was willing to put in.
So in the end, and in order to get things moving, I've dropped the idea of going for the second approach.

So my ratings will be based solely on wins.
Pacal gets his win vs HC, after all.

Long story short: the underlying scoring system will be based on wins only. Whever wins the game will score a "win" versus every other player.
 
Last edited:
Series 1, Map Selection

As already mentioned elsewhere, the maps are slightly edited versions of Sullla's maps:
  • The ice has been almost totally removed.
  • "Illegal" BFC features have been fixed: desert/tundra tiles turned into plains, jungles removed, mountains turned into hills. In some rare cases fresh water access had to be provided, usually by slightly altering the course of a nearby river.
  • Map errors have been fixed (those I spotted anyway): riverside oases, non-river floodplains, wrong river endpoints, coastal ocean tiles, etc.
  • Although the temptation was great, I haven't fixed on the other hand the wrong ressource clusters left by Sullla when moving capitals around.

Spoiler Pool 1 :

Pool1_S3_G1.jpg

Season 3, Opening Round Game 1

Spoiler Pool 2 :

Pool2_S5_PO1.jpg

Season 5, Playoff 1

Spoiler Pool 3 :

Pool3_S3_PO2.jpg

Season 3, Playoff 2

Spoiler Pool 4 :

Pool4_S5_G5.jpg

Season 5, Opening Round Game 5

Spoiler Pool 5 :

Pool5_S5_WDC.jpg

Season 5, Wildcard Game

Spoiler Pool 6 :

Pool6_S8_G4.jpg

Season 8, Opening Round Game 4

Spoiler Pool 7 :

Pool7_S6_G6.jpg

Season 6, Opening Round Game 6

Spoiler Pool 8 :

Pool8_S2_G6.jpg

Season 2, Opening Round Game 6
 
Series 1, Rounds 1-5 results

Spoiler Round 1 :

S1_R1_P1-4_Games.png

S1_R1_P5-8_Games.png

This format somewhat differs from what I usually do, in the sense that individual game results are not provided (victory type and date, and how each leader performed in each game - although technically that last information is present through the diagonals of the result tables).
Instead, you get the two important data points: how each leader performed (rows), and how strong each starting position was (columns).


Leaders performance in the round and next round pool assignation:
S1_R1_Results.png

Leaderboard:
S1_R1_Leaderboard.png


Ignore the elo ratings for now: I've only included them because they're part of the template.
It'll take a few rounds before they stop wildly fluctuating, and probably a coupla complete series at least before they start being truly meaningful.

The "EloK" value is the elo as I had intended to calculate it (you get "wins" against opponents you eliminate and against the survivors when you win a game), but using the flawed (for this purpose) Sullla method for kill attribution.
Since the underlying data is unreliable, this is an approximation at best, but it'll be interesting to see where that approximation leads...

Spoiler Round 2 :

S1_R2_P1-4_Games.png

S1_R2_P5-8_Games.png


S1_R2_Results.png

S1_R2_Leaderboard.png


Spoiler Round 3 :

S1_R3_P1-4_Games.png

S1_R3_P5-8_Games.png


S1_R3_Results.png

S1_R3_Leaderboard.png


Spoiler Round 4 :

S1_R4_P1-4_Games.png

S1_R4_P5-8_Games.png


S1_R4_Results.png

S1_R4_Leaderboard.png


Spoiler Round 5 :

S1_R5_P1-4_Games.png

S1_R5_P5-8_Games.png


S1_R5_Results.png

S1_R5_Leaderboard.png

 
Series 1, Rounds 6-10 results

Spoiler Round 6 :

S1_R6_P1-4_Games.png

S1_R6_P5-8_Games.png


S1_R6_Results.png

S1_R6_Leaderboard.png


Spoiler Round 7 :

S1_R7_P1-4_Games.png

S1_R7_P5-8_Games.png


S1_R7_Results.png

S1_R7_Leaderboard.png


Spoiler Round 8 :

S1_R8_P1-4_Games.png

S1_R8_P5-8_Games.png


S1_R8_Results.png

S1_R8_Leaderboard.png


Spoiler Round 9 :

S1_R9_P1-4_Games.png

S1_R9_P5-8_Games.png


S1_R9_Results.png

S1_R9_Leaderboard.png


Spoiler Round 10 :

S1_R10_P1-4_Games.png

S1_R10_P5-8_Games.png


S1_R10_Results.png

Of course, the "next pool" column is a lie here, since this is the last round. ;)
I've left it in as an indicator of how "stable" the situation has become by the end of the series.

And this is the final leaderboard:
S1_R10_Leaderboard.png



Oh, look who ends up at the top. What a surprise! :rolleyes:

All in all, pretty much expected stuff.
A few mild surprises, though: Gandhi redeemed himself somewhat in the very last round, but he had a pretty bad time overall, as was the case for Hannibal. Joao and Sury seem to have overperformed. Both the Persian and Chinese leaders seem to have somehow swapped positions. Shaka may surprise some, but ever since he won my first "Jumbled Rumble", I have great expectations for him. :lol:

One thing which might bear watching: I seem to be getting a lot more early eliminations than I remember when running AH for instance.
It could be just an impression (as of the end of round 9, the median date for FTD was turn 157, which doesn't seem *that* wild), or it could be an effect of the "no barbs" setting: that the AIs should get into war mode earlier in the absence of barbarians to keep them busy after their initial expansion phase is something I would expect. That those early wars should be largely successful... not so much.
I guess I'll have to try and watch a few of those games to get an idea of what's really happening (if something's actually happening).

Oh, and the Golden Spear award goes to...
... De Gaulle !
He overtook Shaka in the very last round.
 
Last edited:
This doesn't seem to generate a lot of interest, so I think it's better if I skip the details (and in particular the round-to-round progression) from now on.
New reporting format in the next post.

I'll probably move the map data to a dedicated section at some point, for easier referencing (and maybe add some stuff like seeing whether there's a correlation between starting techs and performance on some starts, average finish date per map / per starting position, other stuff?).
I'm also planning on providing more stats / graphs at some point (for instance, people seem to really want to rate the various traits. Could also do that for starting techs, can-or-cannot-plot-at-pleased, peaceweight, tech preferences, etc.).

I'm away for the whole of next week, so further updates shall have to wait a bit anyway. ;)
 
Last edited:
Series 1 Results

Spoiler Results :

S1_Results.png


Spoiler Awards :

S1_Awards.png


Spoiler Map Data :

Spoiler Pool 1 Map (S3 Game 1) :

Pool1_S3_G1_Stats.png


Spoiler Pool 2 Map (S5 Playoff 1) :

Pool2_S5_PO1_Stats.png


Spoiler Pool 3 Map (S3 Playoff 2) :

Pool3_S3_PO2_Stats.png


Spoiler Pool 4 Map (S5 Game 5) :

Pool4_S5_G5_Stats.png


Spoiler Pool 5 Map (S5 Wildcard) :

Pool5_S5_WDC_Stats.png


Spoiler Pool 6 Map (S6 Game 6) :

Pool6_S6_G6_Stats.png


Spoiler Pool 7 Map (S8 Game 4) :

Pool7_S8_G4_Stats.png


Spoiler Pool 8 Map (S2 Game 6) :

Pool8_S2_G6_Stats.png


 
Series 1 Results

Spoiler Results :

Spoiler Awards :

Spoiler Map Data :

Spoiler Pool 1 Map (S3 Game 1) :

Spoiler Pool 2 Map (S5 Playoff 1) :

Spoiler Pool 3 Map (S3 Playoff 2) :

Spoiler Pool 4 Map (S5 Game 5) :

Spoiler Pool 5 Map (S5 Wildcard) :

Spoiler Pool 6 Map (S6 Game 6) :

Spoiler Pool 7 Map (S8 Game 4) :

Spoiler Pool 8 Map (S2 Game 6) :


De Gaulle's a KILLER!
 
Yeah. :lol:
I guess his propensity for joining dogpiles makes him good at racking up kill credits...
 
Series 2 Results

Spoiler Results :

S2_Results.png


At the end of round 9, HC was only in 4th position (with Suleiman at the top!), so I was getting ready to talk about how he seemed to struggle on that map (he did) and about how even in a format with a lot of games played, he wasn't invincible.
And then... he won three games in the last round, eking out a first place, however narrow! :shake:

On the watchlist:
Cathy and Sury, performing way better than my previous experiments had led me to expect (although in Cathy's case, I had the first inklings with my "AH Rankings" runs).
Qin, for the opposite reason: so far he'd always proved a strong leader, but he seems to be struggling a lot here...

Spoiler Awards :

S2_Awards.png


Spoiler Map Data :

Spoiler Pool 1 Map (S2 Game 7) :

Pool1_S2_G7_Stats.png


Spoiler Pool 2 Map (S7 Game 7) :

Pool2_S7_G7_Stats.png


Spoiler Pool 3 Map (S4 Game 3) :

Pool3_S4_G3_Stats.png


Spoiler Pool 4 Map (S6 Playoff 1) :

Pool4_S6_PO1_Stats.png


Spoiler Pool 5 Map (S6 Game 8) :

Pool5_S6_G8_Stats.png


Spoiler Pool 6 Map (S4 Game 6) :

Pool6_S4_G6_Stats.png


Spoiler Pool 7 Map (S3 Game 8) :

Pool7_S3_G8_Stats.png


Spoiler Pool 8 Map (S7 Game 2) :

Pool8_S7_G2_Stats.png


 
Another fresh start but this time to stay? :lol:
It takes 520 games to complete a one serie it seems. After completing 7 series this will be the biggest data set. So your results will be reliable soon.
Whatever you decide with score system always keep it simple and easy to follow. I think just wins is acceptable.
 
Yup, this time no reason not to see it through.
I'm happy with the format, and since the games are run in the background, no burnout risk.

As for the score, as I've mentioned, the League score is mainly to determine each Series winner without having to devise a tie-breaker system. But that's not important. In a way, that's mainly for my enjoyment: I'm not just performing data collection, there's an actual tournament there, where I can "root" for some leaders, and wince when they fail hard, or go "what ???" when a bad leader has a good round. :lol:

The main "scoring" system is the Elo rating.
Calculations are complex, but I'm doing them for you.
Reading it is easy: bigger = better. :p

Wins are not a very good indicator here, though, as the system is rigged against the good leaders and in favour of the poor-performing leaders: winning is important, but what's more important is against whom the win is.
And that is translated into the Elo rating.
 
Back
Top Bottom