AI Jumbled Rumble : another set of AI Survivor Alternate Histories, with a twist

Thrasybulos

Chieftain
Joined
May 4, 2023
Messages
81
As I've already said, Sullla's AI Survivor (btw, let me add mine to the thanks for running that) is a concept which mixes several aspects:
- It's an AI competition, aiming at ultimately providing a ranking of the AIs.
- It's a predicton contest.
- It's a Live Show.

The Alternate Histories for Sullla's AI Survivor series provide information aiming at showing the different ways a game could have gone, at and determining how predictable the outcome actually was.
In other words, they're about the prediction contest : they are replays of the exact same setups, leaving essentially unanswered the question of whether the results should be explained in terms of AI strength or in terms of the context of the game (starting position, neighbours).

That lead me to the notion of providing another set of Alternate Histories, but this time with a focus on the first aspect: ranking the AIs.
In order to achieve that, I'll be replaying the maps... while shuffling around the AIs on the map.
As with the Alternate Histories, I'll be playing 20 iterations of each game (same map, same AIs), but with a different permutation of the starting positions for each game.
That should provide an idea of the relative strength of each leader in that game's subset, regardless of its starting position and specific set of neighbours.

It'll also fulfill a secondary objective: determining how balanced (or unbalanced) each map was, and thus how much that influenced the game outcome.
Spoiler :
Now, if my maths are not too rusty, there are 6! = 720 possible permutations for a 6-player game, and 7 times as many for a 7-player game.
So 20 games is only a small subset of the possibilities, but I'll try to make it as fair as possible.
Doing 18 reruns of each 6-player game and 21 reruns of each 7-player game would allow each leader to play each position 3 times exactly, but that would cause issues with the first objective, so I'll leave it at 20 runs and do for the best.

Instead of Sullla's Power Rating (5 pts for a win / 2 pts for a runner-up result / 1 pt per kill), I'll use an Elo rating to rank the AIs.
Spoiler :
Now, of course, the Elo System is far from a perfect fit:
- it is designed for 1v1 games, these are FFA games
- it assumes the outcome of a game depends solely on the strengths of the players involved, while here we know that external/random factors (starting position, opponents' peaceweights, religion spreads, etc.) play a big role.

For the second point, well, let's just hope that playing a lot of games with different contexts will somewhat make those external and random elements cancel each other.
As for the first point, a lot of multiplayer game systems designers have provided their answer to that. I looked at what they did, and came with the implementation described afterwards.

Why not simply use Sullla's system ?
Having another system will allow to compare them (I'll still be keeping track of the Power Rating). Also, there are potential shortcomings with the Power Rating (doesn't take into account the opponents' strength being the main one, but also very punishing for opening round eliminations, and the infamous kill steals agalore...).

Spoiler Elo Implementation :

Each match (or "run": one of the 20 replays of an AI Survivor game) is scored liked a mini closed tournament, where each participant gets a result (win = 1, loss = 0, or point split for an intermediate result) vs each participant.

The Elo system provides the expected result between two opponents, based on their rating.
There is an arbitrary coefficient involved, set at 400 for Chess for instance. This is the "sensitivity" of the system. While I expect to confirm that some AIs perform better than others, I don't expect the performance level differences to be drastic. So in order to have a wider range of values, I used 800 for that coefficient here.

To determine whether a player's rating needs to be adjusted, the expected result is compared with the actual result of the game... which we need to calculate.

Actual result (score) calculation:
The winner of a match gets a win (1-0) vs each of its opponents.
The other leaders which don't get eliminated get a win vs each of the eliminated civs.

That leaves us with scoring the survivors amongst themselves, and the eliminated civs amongst themselves.
I could have simply gone for a draw (0.5-0.5), and honestly, that probably would have been enough.
But since the point split doesn't have to be 50/50, I thought it would be nice to distinguish between a civ which made to the end with a couple of tundra cities vs a civ which became an overwhelming juggernaut well on its way to Domination but losing to a successful Culture attempt by a much smaller civ.
While not perfect, the in-game score could certainly serve for that purpose.
So say two civs survive in addition to the winner, one with 1,000 pts and the other with 3,000 points: the result point will be split 0,25-0,75 (1,000/(1,000+3,000) vs 3,000/(1,000+3,000)), which seems fair enough.
Now how about the dead civs? Ideally we'd use something like the highest in-game score they'd reached, but getting that information would require a python mod or a tool to extract it from the replay file... way too much effort. What we have is the turn of elimination. But the extreme values tend to be too close (T100 elimination vs T300 elimination would result in a 0.25-0.75 split while it should be more pronounced. So I basically squared the values before the comparison(*), which yields results closer to what I would expect.

(*) The exact calculation sets a "turn score" by adding the turn number each turn to the score.
So the turn score = 1 + 2 + 3 + ... + Turn of elimination = (T Elim)*(T Elim + 1)/2.

So:
- Game winner gets a perfect score.
- Survivors get a win vs eliminated civs, score with other survivors based on their in-game score.
- Eliminated civs score amongst themselves based on their turn of elimination.


Elo progression:
A player's Elo rating is adjusted according to the difference between its actual score vs its expected score.

Every AI will have an initial Elo rating = 1600.
Winning a 6-player game means a score of 5, vs an expected score of 2.5.
So in that case, it's an Elo gain of (5 - 2.5) * K.
"K" is an arbitrary factor.
The bigger it is, the faster a player reaches its "true" rating, but the more volatile the system is.
Since there are 6-player games (5 opponents) and 7-player games (6 opponents), I had initially gone for K=6 in 6-player games, and K=5 for 7-player games, yielding an identical value (total K = 30 in both cases).

After the season, and in light of the results, I made the following adjustments:
- I halved the value for the opening round and wildcard pool games.
- I used a quarter of the value for the playoffs games.
- I used a fifth of the value for the championship.

Also, the new Elo for an AI is bulk-calculated after the 20 runs of a game, and not adjusted after each run.
I initially used the latter method (which has the advantage limiting dramatic gains or losses), but that led to an AI with a better total score having a lower rating than another which had had a better late run. And while that would make sense for human players or true AIs, that makes little sense here where the "AIs" are just a bunch of static parameters with no capacity for learning whatsoever.

Tournament Format:

Season 1 will be based on Season 4 of AI Survivor.

I'll be following AI Survivor's format, with some changes:
- No change to the Opening Round.
- The Playoffs will have different participants (those who score best here, as opposed to those which made it in the live games).
- The Championship will have different participants and use a different map.
- The bigger change concerns the Wildcard game : for Season 1, it's replaced with a Wildcard League. The 36 AIs which don't make it to the playoffs are made into two pools of 18 AIs. For each pool, 3 6-player games are played, and the best two move on to a 6-player Pool Finals. The winner of that game gets a wildcard to the Playoffs.
The idea is to get more play-time for the weaker AIs, and to send two "Elo-bags" to the playoffs.

Tournament Rules:

I'll use the same settings and rules as AI Survivor, with two exceptions.
- The big one: No UN.
Diplomatic Victory is disabled.
Spoiler Rationale :

Four reasons :
1- This is an attempt at ranking the AIs, and the UN is a feature they can't use. If there is any kind of logic programmed there, it's virtually undistinguisable from an RNG call.
The AI won't call for the UN victory when it could win, it'll keep calling for the vote when it cannot win. It won't pursue any kind of gameplan involving a Diplo win. And it'll randomly put resolutions up for voting, and cast its vote randomly on them (voting "No" for Free Speech when running the Culture Slider and still in Bureaucracy?? Voting "No" to end a war where it's getting slaughtered??).
2- In the same vein, the UN shouldn't help make up for an AI bad decisions. If it's choking under unhealthiness because it researched and built every source of pollution while skipping Biology and Medecine, it shouldn't get bailed out by the UN.
3- What's that nonsense about UN peacekeeping and banning nukes? We want blood!
4- Technical reason: on my ancient PC, pop-up dialog calls at the end of the AIs turn processing slow the game dramatically. I think a war has been declared, it's just another Open Border ask... or a UN vote call then result.
No UN ends up speeding up games which run late by a noticeable margin.

- Enforced peace when an AI is at war with a opponent reduced to a single city hidden behind another civ with closed borders.
Spoiler Rationale :

This a bug, plain and simple, and it can completely alter the outcome of a game.
The AI won't sign peace, it won't plot another war, it won't launch an amphibious assault (same landmass): it gets stuck.
That situation had me consider disabling barbs altogether (not the only reason, but it happens 90-95% of the time because of an early barb city capture behind enemy lines).
In the end, I left them in because I think that no barbs could significantly alter the AI's performance.
And I got lucky: the situation didn't arise as frequently as with my standard Alternate Histories runs. But then a frankly ludicrous instance happened (with *two* AIs locked in war with a single-city AI), and I decided to broker peace through the worldbuilder, and to make it a rule henceforth.

"Test Protocol":

- Each game is run from the worldbuilder save (for practical reasons, and to get new peaceweights each time).
- Permutations are performed by changing the team number for the AIs, not by moving their units around. That means turn order is tied to each starting position, not to each AI.
- No Great Spy infiltrations to unlock demographics. Ok, they do have an impact (I seem to observe far less instances of an AI completely tanking its early eco - guess being free of that 20% spending on Espionnage helps ; conversely, those AIs which choose to spend on Espionnage target their actual opponents), but probably nothing major. The main reason is that they're a hassle: since I'm running the game from the wb file, I would have to re-add them each time.
They serve two purposes:
- Contact with the AIs: done through the wb file instead.
- Enabling graphs: done though a simple change to CIV4EspionageMissionInfo.xml, assigning 0 to the cost of the see demographics mission. Which permanently enables them as long as you have at least one EP spent vs a civ. So I just run the espionnnage slider for the first turn of the game, and I'm done.

For each game, I'll also provide an archive containing:
- An Excel file with the detailed game results (the macro is used for the Elo calculations).
- A second Excel files with graphs about the game.
- Minimap pictures of the game start and end.
- The worldbuilder files used for the 20 runs.
- The replay files for each run.
 
Last edited:
The first game saw a surprise win from Isabella and an even more surprising Roosevelt making it to the runner-up spot.
The AH were more in line with the community's expectations, with Cyrus as the top leader, and Cathy right behind.

I expected roughly the same here, but Qin's starting position was clearly atrocious, so I was curious as to how he'd fare.

AI_League_S1_Opening_G1_Results.jpg


Cyrus did come on top, but Cathy was outperformed by Qin.
I was at first at a loss at how to explain how Cathy could perform worse than in the Alternate Histories, especially since it turns out her starting position was one of the worst... but I believe Qin to be the explanation: when not stuck with that horrible start, he proved a tough competitor who could prevail over her.
Qin is a balanced leader, an economy-focussed leader who doesn't neglect his military. His performance was rarely impressive, but usually solid.
Cathy seems more erratic: she could be an unstoppable force, especially when starting from Cyrus's start (A), or face a very early elimination as in her first game here.

AI_League_S1_Opening_G1_Leaders.jpg


Elisabeth and Roosevelt were the marked AIs here, owing to their peaceweight, and to the fact that their peaceweight would-be ally, Isabella, would often run a different religion, and we know what that means.

On the whole, no AI proved dominant here.

AI_League_S1_Opening_G1_Positions.jpg

...and the explanation may stem from the map unbalance. Cyrus's initial start (position A) accounts for half the wins there!
Whichever AI would start there would become a powerhouse unless it was severely dogpiled. Even Elisabeth and Roosevelt managed to win from that spot!

Isabella's initial position (D) and Elisabeth's (C) provided a fighting chance, but they were still far inferior. It should be noted, though, that Isabella actually performed better from her initial spot than from Cyrus' where her zealotry brought her into more trouble.
As expected, Qin's (E) was the very worst spot to start from. The Chinese leader was the only one to almost get a win from there, but instead set a record that I'm 100% sure won't be beaten: in a crazy game which ended with a Time victory, he was eliminated on turn 495!
Roosevelt's start (F) proved better than it looked: if not for winning, at least for survival.
Combine that with Isabella's preference for start D, and the live game's outcome, if not the most likely, at least starts making sense.


AI_League_S1_Opening_G1_Summary.jpg


Cyrus and Qin move on to the playoffs, the rest will play in the Wildcard League.
 

Attachments

  • 01 - Opening Round - Game 1.zip
    3.9 MB · Views: 17
- Permutations are performed by changing the team number for the AIs, not by moving their units around. That means turn order is tied to each starting position, not to each AI.
Do all AIs start with the exact same units (Scouts/Warriors/Quechuas/Workers/Fast Workers)?
- No Great Spies.
Why? This one confuses me and seems super random.

It's also not entirely clear to me if/how you account for neighbours (Gandhi next to Montezuma or Gandhi next to Mansa Musa are very very different games for Gandhi).
 
Roosy has not just tundra but also ice starting 3 tiles away from his capital :lol:
(first time i saw the original starting position)

Worst land (if we can even call it that) ever..sure he's not a strong AI leader in this setup anyways, but come on would Sulla start fan favorites like HC or Mansa there.
 
@need my speed
If an AI is entitled to special units, I try and make sure it gets them (I did mess up a coupla times, but mostly got it right). For instance if Gandhi was on Team 2, and plays the next game as Team 4, I edit the worker for Team 4 to make it a Fast Worker, and demote Team 2's Fast Worker to a mere Worker.

By "No Great Spies" I mean that I don't infiltrate Observer Civ Great Spies in each AI Capital at the start of the game as Sullla does. Not that Great Spies as a unit are removed from the game. :)
(I've reworded it, hoping it's clearer now).

I can't ensure total fairness regarding positions played and neighbours. As I said, I'd need to play all permutations for that, and well... No.
I do keep track of the permutations that were played, so that data is available to qualify the results. I also try to have permutations as varied as possible (eg, no runs differing by a single swap).

@Fippy
Qin's was worse : completely boxed in, in jungle. Not helped either by the fact that every AI sent its second settler to the SE coastal spot. Dry Rice and jungle. Couldn't figure that one out. Why that spot??
And have a look at Mao's start in Playoff 2. :lol:
 
Ideally we'd use something like the highest in-game score they'd reached, but getting that information would require a python mod or a tool to extract it from the replay file... way too much effort.
Civ4ScreenShot0004.jpg

My utopic ideal would be to give them points based on their score area, maybe except for winners (especially culture winners). Sadly it is not as possible unless I roughly try to draw them by myself on autocad.

Your excel is very professional and detailed. So let me try to understand
Your first game resulted in:

Domination-Turn 323
Cyrus 7665 points
Isabella 4656 points
Elizabeth Turn 312
Qin Shi Huang Turn 252
Roosevelt Turn 237
Catherine Turn 136


Now, winner Cyrus gets clean 5,00 points.
Isabella got 4,00 points simply being the only survivor, 1 point off.
the turn score = 1 + 2 + 3 + ... + Turn of elimination = (T Elim)*(T Elim + 1)/2
I assume all eliminated leaders can get no more than 3,00 points because survivors win against eliminated so 1-0
from there how do you exactly calculate how much points they should get?
I mean in relation to how much Elizabeth should get compared to Catherine, 312/136 makes = 2,29 times more.
or 48,82/9,31=5,24 times more points Elizabeth should get to whatever cathy gets. But here I see Elizabeth got 3,25 times more points than Catherine. I mean I could just assume game end date 323 as 3,00 clean points and could have give them whatever proportional to that. I really needed something simpler than this :D
Here I see
Elizabeth got 2,08 points. She died in Turn 312, so 312x313/2=48,82
Qin Shi Huang got 1,69 points. He died in Turn 252, 252x253/2= 31,87
Roosevelt got 1,59 points. Epsilon would make it 28,20
Catherine got 0,64 points. That gives her 9,31

Clearly there are more to proceed, the games where more than 2 survivors, Elo adjustments is completely different. I will try to understand them too.
 
I'll use the same settings and rules as AI Survivor, with two exceptions.
- The big one: No UN.
Diplomatic Victory is disabled.
I understand your choice in the context of these tests while also wondering what to do about Diplomatic Victory in the general sense of AI Survivor. I like having it in the game not least of all because it rewards good relations, can resolve otherwise intractable late-game messes, and elicits terrific odds from Fippy.

One possible solution with added victory micro (as we already see when checking for Domination, for instance) would be to check for votes whenever the UN meets. This would have the benefit of rewarding the AI for friendly relations even if they are unaware of how to leverage that into victory. If the AI is unable to navigate the UN in its own interests, then I see it as less of an issue to award Diplomatic Victory in this manner. After all, not all games result in diplomatic relations that are compatible with victory. An issue I have observed, however, is where AI vote for Diplomatic Victory with no obvious friendly attitude.
 
There is software that will generate a random latin square (e.g. 6 permutations where each leader occurs in each position once). The random part is important because some latin square generators will create squares with a relatively regular structure (e.g. players 1 and 2 would move in each iteration but always be neighbors in terms of starting position), which you want to avoid. It also allows you e.g. to generate 3 random latin squares for 18 iterations.

I understand the point about 6 vs. 7 player maps, but given that you're using ELO does it matter if it's 20 iterations each time? In fact, given that each iteration is treated as a mini round robin tournament for ELO purposes, then it's like if each AI is playing 6 "games" on a 7-player map vs. 5 in a 6-player map. Therefore running for example 24 iterations of the 6-player maps (120 ELO duels for each Civ) is close to 21 iterations of the 7-player maps (126 ELO duels for each Civ).
 
@Keler

Providing an expanded view of the score was (still is, but fell way down the list) on my TODO list.

Let me try and detail it here.
Let's take as an example game 12, which has 3 eliminated and 3 survivors.
Roosevelt, winner.
Cyrus, 2nd, lives with 2649 pts.
Isabella, 3rd, lives with 2535 pts.
Cathy, 4th, eliminated on turn 379, "turn score" = 379 * 380 / 2 = 72,010
Qin, 5th, eliminated on turn 287, turn score = 41,328
Elizabeth, last, eliminated on turn 129, turn score = 8,256

Cyrus gets vs Isabella 2649 / (2649 + 2535) = 0.51
Isabella gets the remainder 2535 / (2649 + 2535) = 1 - Cyrus's score = 0.49

Qin's score vs Elizabeth is 41,328 / (41,328 + 8,256) = 0.83
Elizabeth's is 1 - 0.83 = 0.17.
And so on...

Which gives :

vsRooseveltCyrusIsabellaCathyQinElizabethTotal
RooseveltXXX111115
Cyrus0XXX0.511113.51
Isabella00.49XXX1113.49
Cathy000XXX0.640.901.54
Qin0000.36XXX0.831.19
Elizabeth0000.10.17XXX0.27

I only display the last, total value, in the file.

But that's how the detailed score is calculated.

Hope that clears it. :)
 
@Saxo Grammaticus

I'm not suggesting that Sullla should remove the UN. That's part of the "Live Show" aspect, providing many a facepalm moment. :D
But for my purpose here, it's random noise.
Interesting idea about a "forced diplo victory"... simplest way to implement it would probably be through a mod which forces the victory vote as the first resolution after each secretary general election. But Sullla's very mod-reluctant...

@antimony

Good suggestions... Since there are only 4 7-player games in the tournament format, I'm not sure I'd like to increase the game count to 24 for most games, but keeping the same idea I could on the contrary decrease it to 18 runs for 6-player games, and 15 runs for 7-player games (14 "fair" games where each AI get to play in each position twice, then a "final" where those still in the race get attributed a "good" position). That would be exactly 90 Elo duels in each situation.
Food for thought... :thumbsup:
 
Game 2 was a Gandhi cultural win which also featured Willem at last emerging from the mediocrity pit he'd wallowed in so far, and a very early and humiliating Pacal defeat, courtesy of the Troll King himself.
The AH have revealed Willem to be the dominant AI on this map, with Louis a solid second, and Gandhi managing to pull in some wins.

For the shuffled replays, purely based on past performance and impressions, I expected Willem and Pacal to be the best performers here. But as Season 5 champion, could Mehmed also be a contender? And what of Louis (this was before this season's finale)? What would be Gandhi's win / FTD ratio? And could Victoria, when starting in a better spot, live up to her strengths?
So lots of interrogations...

AI_League_S1_Opening_G2_Results.jpg


... and no real surprise, in fact. Willem and Pacal did end up as the better AIs in this group.
It was also a group where peaceweight mattered: a standard 4 low peaceweights vs 3 high peaceweights where although the high peaceweights managed 5 wins, the field ended up dominated by the low peaceweights.

AI_League_S1_Opening_G2_Leaders.jpg


Gandhi confirmed his status as a feast of famine leader: he got 3 wins, and died early most of the rest of the time.
Victoria (a mild disappointment) and Wang Kong weren't able to achieve much.
Mehmed just couldn't compete with the much better techers out there.
Louis was generally solid, and a real beast when starting from his original position, winning all 3 such games.
Willem may have got lucky a few times by surviving when he should have been eliminated, but otherwise played the high risk high rewards game we's come to know him for.
Pacal... I remember thinking at the time: he's a strong leader, but a bad leader. Meaning his strong traits and preferences carry him along, but his usual decisions and gameplan tend to be anything but impressive.

When it appeared towards the end that Willem and Pacal would be in a tight race for the top spot, I "rigged" the last game so that both would start in a strong position.
And had that game been on the live stream, it would have been something! :wow:
In a nutshell: Willem did get to Rifling early, but then turned on the Culture slider... way too early. It seemed he'd thrown the game. Meanwhile, Pacal did deliver, and got a crushing tech and territory lead. His spaceship would land way before Willem achieved 3 Legendary cities, and that's if he didn't simply conquer his way to Domination before.
And then, somehow, Willem managed to drastically increase his culture output, and it appeared he would actually beat Pacal's space victory by about 5 turns!
Pacal declared on him. Rifles don't do too well against mechs and modern armors. Willem's outer cities quickly fell, and Pacal's main stack moved next to one of Willem's Legendary candidates.
Pacal signed peace. :wallbash:
Willem dropped the slider. :smoke:
And turned it back after a few turns. He won by Culture on the very turn before Pacal's ship was due to land!!


AI_League_S1_Opening_G2_Positions.jpg

As was the case for the previous map, this map wasn't balanced, with Louis's starting spot (C) accounting again for 50% of the wins.
It shows that it was a central position as it basically led to a win or die result.
The second strongest position was Willem's (G), with Mehmed's (D) decent as well.
The two worst starting positions were Pacal's (A) and Victoria's (E), although for different reasons. Pacal's was boxed in, with no Copper, and opponents on 3 sides. It has the highest elimination ratio as a result. Victoria's had a nice backline, although barb territory, but with no food and jungle-choked: it led to a very slow development.

I initially tried for this game a shuffling algorithm based on kills (basically killer swaps with victim). It was the most fun, and at least made kill steals interesting instead of annoying, but I had to drop the attempt quickly as it became apparent it couldn't lead to fair permutations. So if you're wondering why for instance Louis played his 3 games from position C so close together, that's why.

AI_League_S1_Opening_G2_Summary.jpg
 

Attachments

  • 02 - Opening Round - Game 2.zip
    3.8 MB · Views: 17
Last edited:
No matter what the settings are, low peaceweight leaders ALWAYS outperform high peaceweight leaders under equal enough conditions. What's a big surprise is how all those high peaceweight leaders like Gandhi and Mansa Musa outperformed in their live games while they realy should not. Ending up as best leaders in Sulla's ranking. Extreme amount of luck on their parts. In fact there are way too many live games we watched where a %5 chance outcome happened, and their same alternative replays pretty much supports that. And here with shuffled starts where every leaders gets to play 3 or 4 times in every capital, high pw leader's score pretty much says it all.I almost never see Gandhi doing well ever in my single games too. I wouldn't consider Willem as low peaceweight leader, he should not be in good relations with mehmed or pacal.This is another equally distributed pw game just like game 1 where high pw leaders proved to be able to do nothing. Not into conquering anything, techs away to get rifles yet dies to a warmonger grenade rush because their AIs too busy building wonders and buildings. Their early game expansion could turn into a Pyramids and missionaires disaster too..

And as for the map, I couldn't realise at that time that victoria's start (E) would be that hopeless. Then again I never see such terrible starts like Pacal (A) when I generate maps on my computer. I also thought Gandhi's (B) would do better than this at first glance.
 
High peaceweight leaders are outnumbered in the game, so the odds are stacked against them from the start. What seals the deal is I believe the fact most of them cannot plot at Pleased. Add that to a usually low attack rating, and you've got a really bad combination for AI Survivor settings.
Mansa would be an exception: I find him to be fairly aggressive, and he can plot at Pleased. With his strong eco, that makes him a serious contender. At least for the Opening Round. The peaceweight situation in the Playoffs tends to make the situation pretty bad for him, barring extreme luck... which he seemed to have enjoyed on a few occasions indeed.
Here Gandhi got the expected 3 wins for 20 iterations of a 6-player game, so in that respect, his performance was simply average. But usually, he needs a high peaceweight field to perform well... which he got an unduly number of times (Seasons 3, 5, 7 come to mind, and I might have forgotten some).

Outlier results are to be expected sometimes, but I must confess that this season (Season 7) seems to have been particularly ripe with them.

Willem is peaceweight 4 (low end of "neutral"), Mehmed and Pacal are PW 2. Random modifiers at the start of the game can certainly widen the gap a lot, but a base gap of 2 isn't that bad. I'd say PW 0 leaders would be more of an issue for him.
 
Game 3's story was a simple one, whether for the live game or the AH: a complete and overwhelming domination by Julius Caesar.
His starting position seemed particuliarly good, and especially suited for his traits. So could he achieve similar results from different starts?

AI_League_S1_Opening_G3_Results.jpg


Well... No.
Caesar debunked ? Possibly. He didn't perform poorly (he does finish second after all), but he certainly didn't live up to expectations.

This is a game which got me worried on two accounts.
First, there's always the possibility that this experiments yields a result we'd rather not: all AIs actually perform about the same, it's all about the game context.
And it surely felt that way here: until about run 10 or so, all 6 AIs had about the same score, and each run's result seemed dictated by the starting positions.
Then some starting to pull ahead, and some to fall behind, and that gave rise to a second concern.
Look at the final results: Suleiman got 5 wins. Caesar got 5 wins. Joao got 5 wins. And Shaka, with only 3 wins, finishes 1st.
Is my assessment method just wrong? Note that Sulla's get to the same results too.
My "score" heavily emphasizes survival (figures for "AI survivor"). Sullla's less so, but it rewards kills. And Shaka's typical game involved conquering 3 opponents while another AI conquered only one but seriously out-teched the Zulus along the way. Then came the time of the last war for the Zulus, the one that would ensure their Domination... and it went poorly. Massed cavs are impressive... until they face mechs, modern armors, and nukes. Boy, did Shaka get to glow in the dark game after game! But still, that approach works here: as the sole survivor beyond the game winner, he would score high. And score high on Sulla's Power Rating too through his numerous kills.
But what if a similar scenario happened all throughout? Would it be "right" for the ultimate champion to be decided not through getting more wins, but through dying less often?
If we look at sports or other competitions... the answer is "yes". A player or team which consistently finished second would certainly be ranked World n°1 (unless they always lost in finals to the same opponent).
But let's hope it doesn't happen, as how "right" it might be, it still wouldn't feel very satisfying.

AI_League_S1_Opening_G3_Leaders.jpg

So, in the end, Shaka came on top through outliving his victims (by definition).
For a while, Suleiman looked as if he'd be the runner-up. He had a series of good games as a well-rounded leader: good eco, fairly aggressive, goes after all three victory conditions. But he was eliminated a lot, often a consequence of founding a religion and failing to have it spread to the military leaders. Suleiman would actually have got more wins if he didn't insist on running Bureaucracy when going for culture. That's not even his favoured civics, so why? There was even one game where he was running Free Speech when he turned the slider on... only to revolt a few turns later into Bureaucracy! :smoke:
Joao performed comparably to Caesar's, and was even more impressive than the Roman leader when starting from that godly spot. But he ultimately payed the price for his highish peaceweight.
De Gaulle was mediocre as expected.
Frederick was the only leader who couldn't pull a win, even when starting from the Caesar's original OP position. Although to be fair, he did get to become the dominant AI on at least one occasion, only to be beaten to the punch by Ottoman Culture.
And as mentionned, JC makes it to the playoffs, but not very convincingly.
All in all, it doesn't seem like we have Championship material with these leaders, but we'll see.

AI_League_S1_Opening_G3_Positions.jpg

The map turned out to be even less balanced than the previous ones, with JC's original position (A) being completely OP and accounting for 60% of the wins, while Suleiman's spot (B) in the middle of the map was an absolute death sentence. Only Shaka has managed to get a win from there.
Joao's (E) and Shaka's (F) were little better.
Frederick's (D), although better, was along the lines of Roosevelt's in game 1: safely tucked out of the way, offering good survival prospects but scant chances at winning.
Only De Gaulle's spot (C) really gave a fighting chance, and that's why he was the second strongest leader in the AH (far behind JC).


AI_League_S1_Opening_G3_Summary.jpg


The same two warmongers as in the live event get sent to the playoffs, but in reverse order.
I suspect they won't fare any better, though...
 

Attachments

  • 03 - Opening Round - Game 3.zip
    3.1 MB · Views: 14
First off, I appreciate that your original post has the answers to most of my questions--very thorough! I'm just fascinated with the rating system.

Game 3 piques my interest in just how survival should be valued. At first glance, it seems baffling that Shaka would advance over Joao and Suleiman when they had more first place finishes. Of course, I can identify a soft spot on my part for the two leaders based on their performance in my tests this past season. There is also the nature of Sulla's system where first place finishes are weighted over second place finishes, with the added factor of kills. Is it wrong for Shaka to be rewarded for consistent survival? Hmm...

In your particular rating system, each match runs as a tournament with wins, losses, and "draws" for survivors and those eliminated. I wonder how much it makes sense to award a win to survivors against those eliminated. I can think of a number of scenarios where the survivors are effectively rump states. Your point split for survivors would seem to address the issue of several contenders with 2,000+ pts each as well as a 3,000 pt gap between 2nd and 3rd, for instance. Where I struggle, however, is the somewhat arbitrary nature of who stays and goes with a runaway calling the shots. How much is it a victory to not get chosen for elimination in the last ten turns? Are each of those victories over the eliminated worth the same as the winner's? Of course, Survivor's the name of the game :lol:

I also see merit in the way you evaluate eliminations via turn score. The earliest elimination we have seen in the contest is T80, so really a T100 elimination does not compare directly to one at T300.

My utopic ideal would be to give them points based on their score area, maybe except for winners (especially culture winners). Sadly it is not as possible unless I roughly try to draw them by myself on autocad.
I am intrigued by this as well, though I have a hunch it could disproportionately reward "mid-game score leaders" who dominate the score board for much of the game with tech, wonders, and empire-building but end up eliminated. (This could be moderated by the absorption of their lands and the continuation of the game). I suppose this plus the above speak to how to evaluate the performance of the AI who place other than first, while distinguishing between this project and the contest at large.

As though this isn't already way too much thought given to the series :mischief: I have a couple other questions.

-How do you randomize the starting positions?

-How will you calculate expected score once ELOs are "floating," i.e. from Wildcard on?
 
I have a hunch it could disproportionately reward "mid-game score leaders" who dominate the score board for much of the game with tech, wonders, and empire-building but end up eliminated
right, or porportionately reward "was going to win but dogpiled to death" leaders who did not get a kill :lol: and the worst thing is when there are 2 giant civs and 1 rump state civ left on the map and one of the giants destroy other one or putting him belove that "all game slept" rump state in score late game. Fair right?


I still don't know how ELO is calculated for oppening rounds, belove average gets minus points. And both 6 or 7 player game winner gets 7,50 points. Good stuff, I am going to stay tuned for that. I was just going to ask what does Elo stands for but turns out there is a wikipedia page for Elo rating system with a man named Arpad Elo!
 
Last edited:
-How will you calculate expected score once ELOs are "floating," i.e. from Wildcard on?
I still don't know how ELO is calculated for oppening rounds, belove average gets minus points. And both 6 or 7 player game winner gets 7,50 points. Good stuff, I am going to stay tuned for that. I was just going to ask what does Elo stands for but turns out there is a wikipedia page for Elo rating system with a man named Arpad Elo!
iirc the Wikipedia page is a good source, but I've used this as a reference : https://www.omnicalculator.com/sports/elo
Basically, the Elo formula gives the expected result between two players. So I calculate it for each "duel" and then sum it all. That gives me the total expected result.

-How do you randomize the starting positions?
I tried different "algorithms" based on the game results:
- Swap 1st and last, rotate the others
- Killer swaps with victim
- winner takes FTD's place, 6th takes 5th's place, 5th takes 4th's place, etc...
None of those guaranteed a fair repartition, so all had to be abandonned along the way.
I also tried (Game 3) pre-generating the 20 permutations.

What I did for the last games of the Opening Round was divide each series (for 6-player games, idea is similar for 7-player games) into 3 "rotations" of 6 games: each AI must have played every position inside a "rotation". So after 6 games, each AI has played from each starting position one, twice after 12 games, 3 times after 18 games.
I applied the last algo I'd tried (1 -> 6, 6 -> 5, 5-> 4, etc...) for the first games of each rotation until it yielded an "illegal" result and filled up the rest "manually". With the constraints in place, it's like a sudoku game. ;)
That last two games are "rigged" to ensure fairness for the AIs still in the run: if two AIs are still competing for the 2nd, qualifying place for instance, it wouldn't do to have one play from two strong positions and the other from two "death spots".
Then, sometime during the Wildcard League, I started simply copy-pasting the same permutations for the first 18 games.

In your particular rating system, each match runs as a tournament with wins, losses, and "draws" for survivors and those eliminated. I wonder how much it makes sense to award a win to survivors against those eliminated. I can think of a number of scenarios where the survivors are effectively rump states. Your point split for survivors would seem to address the issue of several contenders with 2,000+ pts each as well as a 3,000 pt gap between 2nd and 3rd, for instance. Where I struggle, however, is the somewhat arbitrary nature of who stays and goes with a runaway calling the shots. How much is it a victory to not get chosen for elimination in the last ten turns? Are each of those victories over the eliminated worth the same as the winner's? Of course, Survivor's the name of the game :lol:
Sulla's system has "kill steals", I have indeed cases of "survival steals". Particularly annoying are the instances when one AI conquers another, but then gets attacked and conquered by a 3rd, stronger AI, and the initial victim lives on with a couple of cities left.
It had me wondering for a while whether I should have a civ5-like conquest system: if an AI has lost its capital, it's considered eliminated when the game ends, with the turn of elimination being the last turn of the game.
After all, we had a case when an AI won the game after losing its capital and getting it back (hey, I've started season 2 and I had such a case yesterday evening: Willem, 20 turns away from a Cultural Victory, declares on a much, much stronger Kublai. He loses his capital, which had already gone Legendary. Utrecht is also Legendary, Rotterdam is the one lagging. Willem seems like he's on his way out, but then Ramesses, the game leader, attacks Kublai as well and destroys his armies. Rotterdam goes Legendary. Willem recaptures Amsterdam and wins instantly. As a side note, if Willem's attack was stupid, so was Ramesses', since bailing out Willem was the only way he could fail to win the game...).
But we've never had a game where an AI won a game after losing its capital and failing to get it back.
But that would be adding another system the AI doesn't understand. In one game, I had an AI lose its capital, then regain it. In a later war, it gave it away for peace (it was just a standard city for it).
In the end, I decided it wasn't worth it, for a few fringe cases.

If (and that's a big if), I find a way to read the data in the replay files (or the save files: since the replay is only created at the end, I guess the relevant data must also be present in the save files), I got a few ideas about getting something close to what Keler has described. But that's entirely contingent on the ability to extract that data.
 
Game 4 saw Justinian use the AP to rein in a dominant Peter and claim victory for himself, with Charlie in his wake, to the infamous ultimate result we know.
The alternate histories (played without the AP) have shown that while Justinian was indeed strong in that game (7 wins), Peter was even better (11 wins). But the latter's starting position, next to the doomed Asoka and boxed-in, jungle-choked Sury, was suspected to have played a major role in those results.
And I indeed expected that when shuffling the AIs around, Peter would be nothing special, and Justinian would prove the better AI in that field. Who would be the second best performer? Ask the community, and I'm fairly certain most would have answered Sury without hesitating. That was my case too, even though I had the example of the AH for S4's wildcard game that I'd run, where Saladin outperformed Sury.
Another interrogation was Monty: was he really that bad?

AI_League_S1_Opening_G4_Results.jpg


:wow:
Justinian blew my expectations, by absolutely crushing the competition. If the previous game had given rise to doubts about whether some AIs were actually really better than others, this game laid them to rest immediately.

AI_League_S1_Opening_G4_Leaders.jpg

Justinian has two main weaknesses: cannot plot at pleased is the major one, of course, often preventing him from making a decisive move. The other is the fact that the way the Culture victory is coded, he basically never goes for it, even though he usually has a very strong culture. But in spite of those weaknesses, his performance was stellar, combining a very strong economy with a strong military. And while he may at times be a tad late getting his cataphracts into play, there's no Rifling tech eschewing nonsense from him.
Asoka and Charlie were doomed by their peaceweight: two high peaceweights vs 5 low peaceweights. And in that group, religious tensions would also add fuel to the animosity. To his credit, Asoka managed one miraculous win. But that's it.
Monty did better: he got two wins. But was eliminated as often as Asoka: 17 times. I'm afraid that he's unfortunately as bad as he's said to be.
Peter got as many wins as Monty, and survived as many times as Charlie: in other words, he performed rather poorly, which I pretty much expected.
What I did not expect, on the other hand, was that Sury would basically be as bad as Peter. I had a gnawing suspicious that Sury was one of those leaders, like Cathy or Kublai, that a few good performances in AI Survivor had led people in general to somewhat overestimating them... but this still came as a surprise.
Now, one group of leaders isn't enough of a sample to evaluate the overall performance of an AI, but that seemed to be a perfect group of AIs for him to do well.
Saladin was a nice surprise. Now, I played those game before Season 7's playoffs and Championship which illustrated that he certainly could be a solid leader. His results here confirm that. He's not Justinian-tier, but certainly seems above average.

AI_League_S1_Opening_G4_Positions.jpg

Peter's position (F) was indeed very strong, especially on account of having the extremely weak position B (Sury's) as a backline.
Justinian's position (A) was decent: sheltered, good land, with some room.
So was Charlie's (C), which is a bit more of a surprise, as it was central, with very close neighbours. Proximity with usual dogpile-bait locations?
Saladin's (G) was dependent on barb city captures, with lots of jungle: decent land, but slow development. Too slow.
Asoka's (C) had decent land, but it lacked production and was the most central position: not healthy in a group of religious fanatics.
Montezuma's (E) position seemed similar to but better than Charlie's, and yet it led to poorer results. I guess the metal situation was to blame: no copper, and iron availability iffy (closest could easily be stolen by whoever was at Charlie's spot).
Sury's position (B) was confirmed as awful: even Justinian couldn't make it work.

AI_League_S1_Opening_G4_Summary.jpg


While I was harbouring doubts about the AIs that the previous game sent to the playoffs, here, with Justinian, we had definite Championship material.
 

Attachments

  • 04 - Opening Round - Game 4.zip
    3.4 MB · Views: 15
Sury's position (B) was confirmed as awful: even Justinian couldn't make it work.
but he made it work by coming up as well deserved runner up twice! And so is rewarded for being the only leader not starting there 3 times :) This is so much fun to read, thanks for sharing all your effors.
 
Top Bottom