AI Jumbled Rumble : another set of AI Survivor Alternate Histories, with a twist

Saxo Grammaticus · Aug 4, 2023

@Thrasybulos Thank you for the detailed reply!

That sounds like a good compromise for positions on 6-AI maps. Hope you had some fun devising the first 18! I appreciate that you're rotating positions, as it also serves to sharpen our map-reading skills.

I kind of see "survival steals" as the core issue of survival vs. the fringe cases. As for as AI Survivor goes, survival matters in terms of getting into the Wildcard Game. From the Wildcard Game on, survival has no evident reward. De-emphasizing survival could involve a point split for every AI other than the winner, while keeping survival would maintain the survival/elimination distinction. Scenarios like Keler described with the two contenders vs. the rump state definitely came to mind when I was mulling over this question earlier.

Keler's idea of taking the area under the score curve is growing on me at least to compare with the current turn score method. Would it be easier to record score every turn and sum up as total score per game?

Calculating expected score seemed potentially tedious to do on this scale, based on my preliminary reading. I'm curious whether that can be automated at all.

I see the formula for performance score in your spreadsheet and am curious about a few points. Where do the 2 and -0.5 come from in the wins/losses component? Does Total Score approximate wins/losses? I am also curious whether you keep a win-loss-draw record for each AI, as that could be interesting in the long run in terms of matchups.

Thrasybulos · Aug 4, 2023

Saxo Grammaticus said:
I see the formula for performance score in your spreadsheet and am curious about a few points. Where do the 2 and -0.5 come from in the wins/losses component? Does Total Score approximate wins/losses?

Yes, since I'm using partial results, using 2*(score-0.5) is a way to turn the 0,1 scale into the -1,1 scale demanded by "wins - losses".

Saxo Grammaticus said:
I am also curious whether you keep a win-loss-draw record for each AI, as that could be interesting in the long run in terms of matchups.

I've stalled by using Excel for now, but realistically there's only two ways this can end: I burnout and move on to something else... or everything ends up in a database and I develop an application around it.

That said, I have a different AI tournament format in mind, so the current plan (very early stages though) would be to wrap up things for the current format as they are, and to invest into the heavy duty programming for the new idea.

Saxo Grammaticus said:
Keler's idea of taking the area under the score curve is growing on me at least to compare with the current turn score method. Would it be easier to record score every turn and sum up as total score per game?

As I said earlier, I would need a way to extract the data as there's no way I'm gonna keep track manually of score turn by turn.
Preliminary research seems to show that while extracting the data from the replay or save file is possible, it would require quite a lot of work and SDK exploring, so the Python interface mod might be the way to go.
Assuming that is done, and the data is available, I've thought of several possibilities:

Use the highest game score reached. Represents the peak a civ achieved. Doesn't take into account duration (but does it matter), nor survival.
Use the sum of each turn's score: that would indeed be a very close approximation of what Keler's described. But the way score accumulates over time, it probably wouldn't differ too much from I have in place, as it would greatly reward survival: even a runt civ scores about 1,500-2,000 pts per turn in the end game, while a great mid game civ will be at about 3,000 pts.
Flatten the curve (basically, instead of measuring the area on the score graph, measure the area on the score replay): instead of summing turn score, sum score%. That would remove score inflation from the equation, keeping only relative score. I'm afraid it might advantage mid game leaders too much, though.
A middle solution between the previous two: use relative scores, but weighted by turn number. On paper, I'd favour this solution, but would have to check actual numbers for actual scenarios.

Thrasybulos · Aug 4, 2023

Game 5 was warmonger territory, featuring both Mongol leaders, Alex, and Toku. The live game saw Alex rise to prominence, only to be taken down by Toku who surprised everyone with that win.
Sullla's alternate histories for that game yielded pretty similar results to my own:

Kublai was the favourite, with Alex and Toku both having a decent shot as well.

Would that picture hold if we shook things up?

Well... No.
I did not expect that. As I've already mentionned, I suspect Kublai to be a tad overrated, but I didn't think this would be a game to show that.
In a world of crazies, turns out Temujin is the better Khan, after all.

As the series started, I scratched my head to explain how Toku could be so dominant in that field: he had won 3 of the first 6 games, got 2 runner-up places, for a single elimination. Wow. But how?
Turned out I didn't need an explanation... because Toku wasn't dominant afterall: he won a single game in the next 14, and was eliminated 10 times out of 14. Goes to prove that even repeating a game a few times can yield a totally wrong picture. Even 20 times is most certainly not enough, but oh well.
So, thanks to his early excellent performance, Toku finished 3rd, with a slightly better than average result.
But the real achievers in this group were Alex, the best at making it alive (3 wins, 8 runner-up), and Genghis, the best at winning (6 wins, 3 runner-up).
We've had instances of Alex getting his snowball rolling (season 5, season 7) but Genghis has been so far one of the most underperforming warmongers in AI Survivor (if not *the* most: even Monty has won a game!). So it was a pleasant surprise to see him be effective for a change.
So, what happened to Kublai then? Well, it's not that he played poorly, but in this group of lunatics, he was... too sane? He would often open up a serious tech lead over his rival, only to be relentlessly pummelled to death. There was one game in particular where he was more than one full era ahead (he ended up with mechs vs cavs) but his two opponents (Alex and Genghis iirc) sent soooo many units his way, that even with the tremendous losses they suffered, they ultimately prevailed.
Bismarck and Zara were the sacrificial victims in this group: they stood no chance. It is actually remarkable that they survived sometimes, and even got to win a few.
Between my AH and Sullla's, Zara won 3 times out 40. The first run of my games acts as one extra AH... and Zara won it. Starting with an outlier result, way to go!

It was during these replays that I truly noticed for the first time the AI actively using sabotage missions to delay their opponents' spaceship. There was once instead in particular with a pretty tight race, where I thought one AI had :smoke:

when it started running the espionnage slider, dropping behind in tech as a result. And yet... it won the race: its opponent had to rebuild engines 8 times (!!) and was still rebuilding when it launched.
In his AH for S7 Game 3, Sullla mentionned an infuriating game where Justinian lost because he kept restarting a ship component in different cities... I now believe that what he witnessed was Justinian getting sabotaged again and again, and slotting the missing component into the first available build queue.

OK, for the first time, the results I got here were not in line with my in-game observations.
According to those results, we had yet another unbalanced map, with one starting position (C, Alex's in the live game) accounting for half the wins.
And yet, this map felt like the most balanced so far.
Positions B and D (Zara's and Bismarck's) also allowed for strong games (and if we add up wins and survivals, they're indeed on par with C on the graph).
Position F (Toku's) had that Gold + Gems start (Sullla does a good job, but there's one thing I wish he'd do differently with these maps: when he moves the observer civ to the ice and some AIs to different spot, he leaves intact the deserted starting positions which had been loaded with ressources by the start equalizer. Some of those ressources should really be removed.) but it could be boxed-in, had a lot of land under jungle, and could easily miss its Copper. So, on the whole, it was weaker.
Position E (Genghis') was clearly the weakest, although I can't really explain it. The AIs always struggled from there, but there doesn't seem to be anything obviously wrong with it. It has room, ressources, easy Copper. A lack of production?
Position A (Kublai's). Here, clearly a death spot: 0 win, 4 survivals, 16 eliminations. But how come , then, that Kublai won 17 out of 40 games from that starting position in the alternate histories?

I think that in this game, the land of the starting position mattered very little compared to the neighbours it came with.
Zara and Bismarck were the two marked men. Start next to one of them, and you've got a shot. Start next to them both, and you're the clear game favourite. Start next to neither (ie, have two crazies as neighbours)... pray.
The very last game illustrated this perfectly. A clear Toku win, where his Science bias allowed him to turn his conquests into an overwhelming tech and military lead. Except... the game didn't start that way. It appeared it would be yet another Genghis win: Toku and Genghis carved up Zara, with Genghis getting the most out of it. Then Genghis fought Kublai, but as he had destroyed Kublai's armies and was about to conquer him, he signed peace. Menawhile, Toku starting a slow and gruelling conquest of a pretty strong Bismarck to his south. As Genghis was about to remedy his mistake and renew hostilities with Kublai, Alex, pleased with Genghis, annoyed with Bismarck, and bordering them both, declared on... Genghis. Alex was very strong, and with Genghis's troops massed on the opposite border, it definitely hurt. Genghis did prevail in the long struggle which ensued. But instead of Genghis quickly conquering Kublai and reaching a critical mass, it was Toku who had time to conquer Bismarck, then Kublai, then develop their lands.
Toku had started the game next to Zara and Bismarck, Genghis next to Alex. Toku won. That simple.
In the live game and the AH, Kublai started... next to Zara and Bismarck. He won. That simple.

Position A here ended up as a death trap, not because it was inherently weaker, but because, as the central position, it guaranteed you had crazies on your borders.
The live game and AH were an exception, with the two most direct neighbours being the two "victims", and the psychos further out, with Genghis in particular being attributed the only truly weak position.

Game 3 has sent Shaka and Caesar to the playoffs. They're now joined with Genghis and Alex.
I don't think the playoffs are going to be boring...

Qgqqqqq · Aug 5, 2023

Really enjoy these. One minor point of feedback - it can be hard reading your posts sometimes, because the paragraphs aren't separated. Can I suggest either adding a line between them, or having them as bullet points? Thanks!

I agree re Sullla's balancing by the by. What he really should do is roll a six-player map, and then manually edit in a AI at your starting position (and move yourself to the Ice). It's not too difficult to do with Platy worldbuilder, or in Notepad.

Saxo Grammaticus · Aug 5, 2023

Saxo Grammaticus said:
Calculating expected score seemed potentially tedious to do on this scale, based on my preliminary reading. I'm curious whether that can be automated at all.

OK, came up with my own working solution here. As you have the header Elo ratings on the Scoring sheet, you can for Elo Rating x add expected scores against each AI using the above ratings, then subtract 0.50 for expected score against self to get expected score per game. If you have versions for six and seven AI games, then the whole business of expected score for floating Elo Ratings can be contained on the Scoring sheet without additional reference tables.

Thrasybulos said:
I've stalled by using Excel for now, but realistically there's only two ways this can end: I burnout and move on to something else... or everything ends up in a database and I develop an application around it.

Well, before the burnout takes hold, let me just say that as I read more about Elo Ratings, I find your effort to adapt it to "multiplayer" Civ impressive. Gives me a lot of appreciation for all the records in sports and other games that have to be calculated somehow.

Thrasybulos said:
That said, I have a different AI tournament format in mind, so the current plan (very early stages though) would be to wrap up things for the current format as they are, and to invest into the heavy duty programming for the new idea.

I'm definitely curious to hear about your alternate format.

Thrasybulos said:
Flatten the curve (basically, instead of measuring the area on the score graph, measure the area on the score replay): instead of summing turn score, sum score%. That would remove score inflation from the equation, keeping only relative score. I'm afraid it might advantage mid game leaders too much, though.

The potential simplicity of this appeals to me. Is it even something you could give to an AI to get a breakdown on proportions?

I find it funny to use score at all, as it is a suspect measure, but it does allow for useful comparisons here.

Saxo Grammaticus · Aug 5, 2023

Thrasybulos said:
Position E (Genghis') was clearly the weakest, although I can't really explain it. The AIs always struggled from there, but there doesn't seem to be anything obviously wrong with it. It has room, ressources, easy Copper. A lack of production?
Position A (Kublai's). Here, clearly a death spot: 0 win, 4 survivals, 16 eliminations. But how come , then, that Kublai won 17 out of 40 games from that starting position in the alternate histories?

My intuition is that map position roughly falls between the periphery and the middle. Season 7 featured a number of peninsulas that served as semi-isolated pockets. Some AI, like Mehmed, seemed well-suited to breaking out of their corner. Other AI, like Imperialistic Suleiman and Joao, seemed to do better playing the middle, where they could take land off of their opponents. Positions A and E in Game 5 are obviously playing from the middle, with lots of exposure to a bloodthirsty crowd I suspect favors the periphery.

Not sure how to interpret Kublai Khan's successes in the Alternate Histories. He did score the earliest Domination win in my tests all season at T253. The only AI to beat that score was Ramesses with a string of Cultural wins in the 240s. This map also has an incredibly large river system, so I wonder if that plus dogpiles on the higher peace weights blunted the violent impulses of this lot. In that case, I would say Kublai Khan has the best economy of the warmongers.

Thrasybulos said:
Position F (Toku's) had that Gold + Gems start (Sullla does a good job, but there's one thing I wish he'd do differently with these maps: when he moves the observer civ to the ice and some AIs to different spot, he leaves intact the deserted starting positions which had been loaded with ressources by the start equalizer.

Seconding Qgqs above: simply swapping out the observer and editing in an AI sounds like the way to address that.

Thrasybulos · Aug 5, 2023

Qgqqqqq said:
Really enjoy these. One minor point of feedback - it can be hard reading your posts sometimes, because the paragraphs aren't separated. Can I suggest either adding a line between them, or having them as bullet points? Thanks!

Thanks, and sure I can try, but could you be more specific at what bothers you? Providing an example of what you think should be formatted differently?

Qgqqqqq said:
What he really should do is roll a six-player map, and then manually edit in a AI at your starting position (and move yourself to the Ice). It's not too difficult to do with Platy worldbuilder, or in Notepad.

Saxo Grammaticus said:
Seconding Qgqs above: simply swapping out the observer and editing in an AI sounds like the way to address that.

That's what I did when I rolled my own maps for the Wildcard League, but more on that when we get there.

Saxo Grammaticus said:
OK, came up with my own working solution here. As you have the header Elo ratings on the Scoring sheet, you can for Elo Rating x add expected scores against each AI using the above ratings, then subtract 0.50 for expected score against self to get expected score per game. If you have versions for six and seven AI games, then the whole business of expected score for floating Elo Ratings can be contained on the Scoring sheet without additional reference tables.

I'm not sure I'm getting the issue there?
I already have the formula for the expected score in place, and it's completely independant of the actual score calculation.
I may have missed your point, though.

Saxo Grammaticus said:
I'm definitely curious to hear about your alternate format.

Need to test and refine it first.

It would be more "fun" and less "rat labby" though, achieving "fairness" through repetition rather than design.

Saxo Grammaticus said:
I find it funny to use score at all, as it is a suspect measure, but it does allow for useful comparisons here.

It is suspect for the human player, but it's not that bad for the AI as everything it takes into account is indeed an indicator of how well the AI is doing. Scoring Wonders is debatable (they do help the AI, though) but everything else (land, techs, population) matters. I'd say the one element missing is Military power.

Saxo Grammaticus said:
The potential simplicity of this appeals to me. Is it even something you could give to an AI to get a breakdown on proportions?

What do you mean?
I've thought about it a bit more, and the conclusion is that... more thougt is needed.

There's one major issue common to all the ideas I've listed: they're a function of victory date.
An AI which rose to early preminence through an early conquest, but died to a dogpile on turn 220, would get a very good score if the game ended with a turn 240 Cultural Victory, a mediocre score if it ended with a turn 350 spaceship, and close to zero if it ended with a Time victory.
And that won't do.

Saxo Grammaticus said:
Not sure how to interpret Kublai Khan's successes in the Alternate Histories. He did score the earliest Domination win in my tests all season at T253. The only AI to beat that score was Ramesses with a string of Cultural wins in the 240s. This map also has an incredibly large river system, so I wonder if that plus dogpiles on the higher peace weights blunted the violent impulses of this lot. In that case, I would say Kublai Khan has the best economy of the warmongers.

Well, I did end up offering an explanation: having both Zara and Bismarck as his direct neighbours in the particular setup of the live game (and AH). Swap one of them with one the crazies, and Kublai won't perform the same.

Thrasybulos · Aug 5, 2023

Game 6 shocked everyone with the early exit of the game's overwhelming favourite, Mansa Musa, and Darius doing something for a change: winning.
Sullla's alternate histories and mine:

are pretty much in line (I have Mao doubling up as the favourite for runner-up, while second place is more of a mixed match in Sullla's sample): Mao was the actual favourite for that game, not Mansa.
Would that hold from different starting positions?

Hold on a second, there: Hatty? Hatty was actually the best AI in that group!?
I suppose that since then, we've had Sulla's alternate histories for S7 game 7, so Hatty being the dominant AI in a given game is less of a shocker, but still...

This was a 4 high peaceweights vs 3 low peaceweights matchup, which is actually pretty balanced, as the more "dynamic" playstyle of the "baddies" usually makes up for their numerical inferiority.
But here, it was a roflolstomp for the high peaceweights: the "baddies" only got 3 wins.
So what happened?

Brennus offered the worst performance of an AI so far in my tests. He survived exactly once, dying 19 times out of 20, being first to die 6 times in a row in the last 6 tests. He was the only AI in that group who couldn't get a win out of the best position on the map. Even Sitting Bull did! His economy lagged, his attacks weren't decisive, religion sometimes put him at odds with his natural allies... Either he's a far worse AI than we suspected, or this definitely wasn't a matchup for him.
Gilgamesh on the contrary performed decently in the face of adversity. He got two wins, and made it to the end nearly half the time. Sure, a lot of those times he was in the process of being eliminated, but that was after a solid mid-game.
Mao won the first game, a repeat of the alternate histories. And then... didn't show much. He wasn't as bad as Brennus, but he felt a bit of a fraud nonetheless. He did have a few games where he grew to a solid position, but that was never enough.

So there you go: team Evil fielded one decent AI, one very poor AI, and a midling AI.
What of Team Good?

Mana Musa was, as expected, their strongest element. Of course a great economy, fairly aggressive (he can plot at pleased), once he got started, he was unstoppable... except by a faster victory achieved by one of his teammates.
Darius was on the whole a Mansa bis on this map. He was far more active than his usual "let-me-tech-alone" self, declaring early wars and acting as the second "enforcer" of the team. Mansa seemed on the whole a bit stronger (and got better results), but a solid performance by Darius here, including a mind-boggling T266 Spaceship victory (that's a turn 256 launch!).
Sitting Bull, on the other hand, was for the most part a complete non-entity. He basically just tagged along, his main contribution being the occasional lure for one of the baddies. On more than one occasion, he even got murdered at the end of the game by a Mansa seemingly exasperated by his lack of contribution.
And then, there was Hatshepsut. As we've seens in the AH for S7 G7, her main game plan is Culture, Culture, Culture. But she also reached positions where she would have won by Domination had her Cultural victory not been so fast, and she also was able to beat Mansa at a space race, no mean feat. There were times when Mansa acted as a protective big brother, shielding her from aggression. But she could also field a large army at times and lay the smack down on her own. She could even take one for the team: there was a game where she started in Mansa's spot, surrounded by the 3 baddies... and quickly found herself in a 1 vs 3 situation. She held the fort long enough for Mansa and Darius to become dominant and wipe the floor with the 3 miscreants. She succombed in the end, but job done.

So... Team Good fielded for its part three strong AIs and only one weak AI.
For once, the odds were stacked in favour of the high peaceweights.

As the AH could lead to guess, Mao's initial position (B) was by far the strongest position on the map, accounting for half the wins (funny how that exact proportion keeps repeating). Every AI but Brennus got at least one win from there.
Darius's (D), Brennus' (C) and Gilgamesh's (E) were also good starting positions, totalling 8 wins and decent survival odds.
Sitting Bull's (G) was a bad central spot. Hatty achieved a very dominant win from there, following an excellent land grab phase, but she wasn't able to renew that feat, and no other AI could win from there.
And then there were the two stinkers: Mansa's (A) and Hatty's (F). Boxed-in, with poor land quality.

And that explains the results from the alternate histories: the low peaceweights started from the best spots on the map, while the two best high peaceweight AIs (Mansa and Hatty) inherited the two death traps.

So Hatty and Mansa join the likes of Shaka, Genghis, Alex, Caesar to the playoffs.
Uh... they'd better get some friends to join up or I suspect this could get nasty. :trouble:

Qgqqqqq · Aug 5, 2023

E.g. break this text up:

OK, for the first time, the results I got here were not in line with my in-game observations.

According to those results, we had yet another unbalanced map, with one starting position (C, Alex's in the live game) accounting for half the wins.

And yet, this map felt like the most balanced so far.

Positions B and D (Zara's and Bismarck's) also allowed for strong games (and if we add up wins and survivals, they're indeed on par with C on the graph).

Position F (Toku's) had that Gold + Gems start (Sullla does a good job, but there's one thing I wish he'd do differently with these maps: when he moves the observer civ to the ice and some AIs to different spot, he leaves intact the deserted starting positions which had been loaded with ressources by the start equalizer. Some of those ressources should really be removed.) but it could be boxed-in, had a lot of land under jungle, and could easily miss its Copper. So, on the whole, it was weaker.

Position E (Genghis') was clearly the weakest, although I can't really explain it. The AIs always struggled from there, but there doesn't seem to be anything obviously wrong with it. It has room, ressources, easy Copper. A lack of production?

Position A (Kublai's). Here, clearly a death spot: 0 win, 4 survivals, 16 eliminations. But how come , then, that Kublai won 17 out of 40 games from that starting position in the alternate histories?

Saxo Grammaticus · Aug 6, 2023

Thrasybulos said:
I'm not sure I'm getting the issue there?
I already have the formula for the expected score in place, and it's completely independant of the actual score calculation.
I may have missed your point, though.

I was tinkering around to understand everything and somehow missed the formula :lol:

Glad you have it all sorted.

Thrasybulos said:
What do you mean?
I've thought about it a bit more, and the conclusion is that... more thougt is needed.
There's one major issue common to all the ideas I've listed: they're a function of victory date.
An AI which rose to early preminence through an early conquest, but died to a dogpile on turn 220, would get a very good score if the game ended with a turn 240 Cultural Victory, a mediocre score if it ended with a turn 350 spaceship, and close to zero if it ended with a Time victory.
And that won't do.

I was thinking of the percentage score chart at the bottom left of the timeline. Perhaps you could use a color histogram to get the proportions from there instead of any file extraction. As a general rule, I am not as bothered by victory date in this case. Part of this has to do with reevaluation of win-loss records on my part: the idea of doing win-loss records for each AI (especially against every other AI) with 120 games per map is kind of overwhelming! Devaluing outright survival in favor of score across the game, you could even use the percentage score chart approach to derive point splits for all the non-winning AI.

As for Game 6, that means Hatshepsut debuts second only to Justinian :wow:

Brennus is funny, as I have seen him win in the tests, but only in an erratic fashion. Regarding the Villains at large, this game leads me to wonder whether the nature of Elo ratings will at times simply facilitate peace weight robbery :think:

Thrasybulos · Aug 7, 2023

Qgqqqqq said:
E.g. break this text up:

It appears then you're not asking me to improve the way my posts are structured, but on the contrary to remove all form of structure by adding line spacing after each sentence.
I'm not going to do that, sorry: even though that's not apparently your case, most people find it harder to read when there's too much line spacing and they have to mentally rebuild the text structure.

Thrasybulos · Aug 7, 2023

Game 7 was an overwhelming victory by Huyna Capac. Not only was he already emerging as seemingly one of the best AIs, his starting position appeared a tad too generous.
Sullla's alternate histories and mine differ just enough to significantly alter the secondary narratives about Washington winning when HC doesn't and Churchill being the favourite for runner-up status, but the main story remains the same: a crushing domination by the Incan leader.

Would that hold true here, or would HC disappoint as Julius Caesar did in game 3 ?

Well, HC is no JC: he easily confirmed his status as the better AI in this group, scoring 9 wins.

But for a change, let's talk about the map first.

Hyuna Capac's inital position (A) was indeed completely overpowered: 13 wins out of 20! Every AI but Boudica was able to win from there, with Augustus even equalling HC with a 100% win rate.
It had everything: ressources, room, commerce, production. The funny thing is that Sullla created this monster: the map had been generated with that statring position along the coast. It was probably an attempt at rebalancing the north which backfired. Not throwing stones here: balancing these maps for the AI is hard without extensive testing (more about this later, as I experimented with it for the Wildcard League).

The map was so dominated by that position that getting a read on the others is difficult.
Churchill's (D) looks like it was the second best... but that's 100% Huyna Capac's doing. That position had the most room available, and was the least exposed to an early conflict. That was apparently enough for the Incan leader to get the ball rolling, but not for any of the other leaders (the Copper situation could be iffy, too).
Boudica's (B) and Napoleon's (E) were right on top of each other, often leading to a mutually destructive conflict.
Washington's (F) suffered mainly from being the natural target of whoever was in position A.
Augustus' (C) was jungle-choked, which led to a stunted development. Churchill was the only one to get a win from there, but only on account of getting the most spoils out of a dogpile on Boudica.

Huyna Capac was thus the better achiever on this map.
Apart from having to deal with and survive whichever monster spawned from position A, he had two further hurdles to overcome:
Boudica would often found a different religion and hate him for it. That conflict would most of the time spell doom for the Celtic Queen, as others would pile in, but it would also drag HC down.
And then, there's the late Classical / Medieval era.
HC would often start a war then... and seemingly lose interest in it. His inital stack would not be reinforced, and once it was spent, no further units would be sent at his enemy. And that's because all of his build queues woud be focussed on Wonders and missionaries. So those wars would end up backfiring as he just wouldn't fight in a conflict he'd started! :rolleyes:

As for the other leaders, it seemed for a while that one of the high peaceweights would get the runner-up spot, but Napoleon, as HC's sidekick and enforcer, got there in the end.
Boudica, as mentioned, would often be in a team of her own (her peaceweight setting her at odds with the high peaceweigths, her different religion leading her to fight HC and often Napoleon too), which would spell her doom. She wasn't as bad as Brennus was on the previous game, but that still led her to finishing last here.
Augustus had the best performance from position A, but that wasn't enough to secure him a spot in the playoffs.
Basically, if a high peaceweight leader got a good game from position A, the others tended to do well too. While Napoleon could do well both when he started from position A himself, and when HC did well. So in a way, he simply follows HC's coattails into the playoffs.

Mansa and Hatty needed a friend... they get Napoleon.
Oh boy...

Thrasybulos · Aug 7, 2023

Saxo Grammaticus said:
I was thinking of the percentage score chart at the bottom left of the timeline. Perhaps you could use a color histogram to get the proportions from there instead of any file extraction.

That is indeed an option, although that would probably be harder for me. :lol:

Saxo Grammaticus said:
As a general rule, I am not as bothered by victory date in this case. Part of this has to do with reevaluation of win-loss records on my part: the idea of doing win-loss records for each AI (especially against every other AI) with 120 games per map is kind of overwhelming! Devaluing outright survival in favor of score across the game, you could even use the percentage score chart approach to derive point splits for all the non-winning AI.

Well, I am bothered by that.

The longer the game goes, the lower the score for the eliminated civs, but by transfer, the higher the score for the remaining civs. So we end up with a system which rewards incompetence, and I definitely don't like that.

If we consider the Shaka runner-up games: he ends up with a score of 4, after having murdered almost everyone and having risen to a dominant position... but not dominant enough to secure a win. He gets rollbacked by a more advanced civ which wins the game and scores 5. The remaining, dead civs, share the remaining 6 points amongst themselves.
The current system works fine there imo, and the proposed system would work along the same lines, so be ok too.
Now, let's consider a Shaka victory. He murders 3 civs, 2 remain beyond him. One of them gets the hammer, the other lives. It may even be that the one that gets killed had played a much better game so far than the one which survives. In that case, the current system is too generous to the survivor, and unfair by comparison to the last eliminated civ. The proposed system would, on the other hand, better reflect the game situation.

But that would be at the cost of introducing new flaws, for other situations.
If I'm to change the current system, especially for one which requires a lot more work on my part, I'd like it to be clearly superior: not better in some cases, worse in others.

Furthermore, if we devalue survival (which could make sense, not arguing that), dead civs which score 0 against survivors under the current system, will score higher. That means survivors will score lower. So the score range for the non winners will be narrower. And at some point, the differences would stop being significant.

Which got me thinking along a different line... :think:

The system I have in place currently means an AI's ranking is an attempt at measuring how likely an AI is to "do well". "Doing well" being defined as:
- Winning.
- Not dying.
- Getting a good in-game score at the end of the game.

The current discussion somehow revolves around that definition of "doing well".
In sports, only the final result counts. A soccer team may have 80% possesion of the ball, shoot 15 times at the opposing goal, hitting the post 10 times, if it never scores while the opposing team scores with their one attack of the game, they lose the game, and get ranked accordingly. The fact they played "better" doesn't matter.
So maybe this is the wrong approach, and instead of figuring out how best to attribute an "artistry score", the focus should be purely on the actual result? Instead of trying to evaluate how likely the AIs are to "do well", maybe the ranking system should only be about how likely they are to win?

That would mean here that the winner gets a win vs each of the other AIs and... that's it.
So while the elo gain for the winner would take into account the whole game composition, the elo loss for the others would only take into account the winner's elo. The other participants would be ignored.
(Also, but that's a side issue, another system would have to be put in place to determine tournament progression, as draws would be likely.)

It might be that simple? The nice thing is that, contrary to the other propositions, I already have the data available. So I can test it, see what kind of results it yields for the whole season run by comparison.

Keler · Aug 7, 2023

Saxo Grammaticus said:
color histogram

thats very interesting, but do they only create Red Green Blue data? If only magic wands could say the area of selection.

Thrasybulos said:
and instead of figuring out how best to attribute an "artistry score", the focus should be purely on the actual result? Instead of trying to evaluate how likely the AIs are to "do well", maybe the ranking system should only be about how likely they are to win?

I mean you already have Sulla's ranking(Power rating) approach as well, but it surely comes to my mind too. For example despite Agustus winning 4 times still not enough for him to go beyond Napoleon with 2 victories. Or Hatshepsut winning 9 times puts her hardly over Mansa Musa with 4 victories in score. Alexander with 3 victories ends up getting more points winning half times Genghis Khan did, 6 times. Maybe your ranking kind of reflects luck factor in terms of winning. Unimpressive culture winners, fast first to die leaders all combined.

Saxo Grammaticus · Aug 8, 2023

I seem to recall reading about photography that allowed for color histograms with specific hues other than standard RGB... niche use? :lol:

I agree that the Power Ranking more or less gives us the winner-takes-all approach with token concessions to the wheel of survival (second) and the carnival of violence (kills). While I think survival/not dying is certainly a big factor, I suspect what we are looking for is more akin to who the contenders are. For instance, if Louis loses one of the three cultural cities moments before victory, he is more or less a contender but will probably end up eliminated. As much as evaluating contenders would be nice, survival is a more generous measure than winner-takes-all. Valuing survival is certainly part of what makes this project interesting!

Thrasybulos · Aug 9, 2023

Game 8 had Stalin go on a murder spree to secure a spot in the playoffs, with Hannibal tagging along.
Sullla's alternate histories (well, Myth's actually, let's give due credit) confirm that narrative.
Mine produced a slightly different outcome:

Hannibal came on top, but victory was more or less evenly split between 4 leaders: Stalin, Hannibal, Ramesses, Pericles. The difference essentially boils to Pericles's performance: he won 5 games in my runs, but only one in Myth's, those wins going to Stalin and Lincoln.
Both sets agree on Hannibal as the runner-up favourite, but Myth has Ragnar the heavy favourite for FTD while I have Lincoln slightly ahead.

I was looking forward to replaying this game, jumbled edition, essentially for one leader: Ragnar. The Viking leader, as the only Financial pure warmonger, ought to be one of the better warmongers out there, with the Financial trait overcoming one their main weakness: they tend to critically fall back in tech.
But that's not been the case: throughout AI Survivor's history, Ragnar has been one of the worst performing AIs. On this map, he clearly had been dealt a bad hand, with a very poor starting position. His season 7 game also comes to mind, with a dreadful start. So is Ragnar just unlucky, just bad, or both?

I was also interested in Stalin and Pericles, both of whom having a good, if irregular track record: I suspected both were frauds, and hoped this game to confirm it.
I had no doubts about which leader would come on top here: Hannibal was clearly the best in the field, no room for debate.

... or maybe there was room for debate, in fact!
The Egyptian leaders were on a roll, it would seem.

This game had a similar composition to Game 6, with 4 high peaceweights vs 3 low peaceweights, and it yielded a more balanced outcome which conformed better to expections: 8 wins for the baddies, 12 wins for the nice guys. And if we consider that Ramesses pulled a miracle Culture win in a game where his team mates were decimated, it's even more balanced.
Funnily, the repartition of those wins wasn't balanced: it started with 4 wins for the high peaceweights, then the low peaceweights had a long series of successes, and then the high peaceweights (well, Ramesses) concluded with their own long series. Goes to show once again that we need to play a large sample of games to get a somewhat truthful picture.

Hannibal only got two wins but has the best survival rate. Financial low peaceweights tend to do well in general, and he's no exception. He could be considered as the successful version of Ragnar (a military-focussed leader with good economy) if his one glaring weakness didn't disqualify him from a membership into the warmongers' club: he cannot plot at Pleased. This game had enough high peaceweight "targets" that it wasn't crippling, but it still prevented him from making some winning moves.
Financial also proved a curse on at least one occasion: as one of the traits which favours a Cultural victory, it led him to throw at least one game (should have taken better notes, I played those games a coupla months ago: I know it happened at least once, not sure if it was more than that) by pulling the slider when he could have easily won Space, and lose as a result.

Stalin... failed to fail. Contrary to my expectations, he proved solid, winning two very convincing games, and achieving a 60% survival rate in a hostile environment. And that's in spite of the very first game being an outlier: in a repeat of the AH position, Stalin uncharacteristically founded an early religion... which spread to exactly no one, and led to an early and massive dogpile the kind of which Gandhi would be more familiar with.
Stalin was actually neck and neck with Hannibal for second place, right to the end. So either I'm overrating Hannibal, or Stalin's AI Survivor's successes are down to more than mere luck.

As for Ragnar... Let's be generous and say the jury's still out. He did get the most wins out of the low peaceweights (3 wins), so that's definitely something. But he was also eliminated a lot more often. "Erratic" is, I believe, a term that's been employed to describe him, and sure does seem to apply. He can have good games, where he plays to his strengths and ends up winning or at least in a good position... and he can also launch doomed, pointless, across-the-map expeditions or suicidal attacks which quickly put an end to his game.
So when granted better starting positions than the one he was stuck with in the AH, he certainly performed better overall. But "better" doesn't necessarily imply "well".

So Team Evil fielded a better team here than in game 6: two solid-to-good leaders, and a madman with flashes of brilliance.
What of Team Good ?

Lincoln was clearly the weaker member of that team. He sometimes achieved a tech lead, he sometimes fought back well. But the fact he was the only leader never to achieve a win is no coincidence. He was a priority target for the low peaceweights, which often led to his elimination. His generally peaceful nature meant he wouldn't press his occasional successes. And while a decent eco leader, that was never close to enough to have a shot at beating Ramesses' Culture in a race.

Hammurabi is basically the less successful, high peaceweight version of Pacal: a mostly passive leader who loves building every wonder, but shoots for Space instead of Culture. His tech preferences meant he would often found a religion, and thus be at odds with Ramesses, while failing to spread his religion and get allies. Not a healthy situation.
Still, teching in your corner while the world is at war is something that often works for Pacal, and it did work a few times here for Hammurabi too.

Pericles is a bit of a mixed bag. He was often, after Ramesses, the strongest member of Team Good, and thus a crucial element in their conflict against the low peaceweights. But as for winning... he would often turn the slider late, with a less than adequate Culture output. 120 culture/turn in your 3rd Legendary candidate? With the slider on? You do know the game ends on turn 500, don't you? :confused:

Now, it might simply be that with Ramesses and Hammurabi nabbing most of the wonders and religions, his go-to game plan just couldn't work in the context of that particular game. Or just that he's in fact not very good at it.

Ramesses was the success story of that game. He achieved a similar result to Hatty's, but in a much more hostile environment: with stronger opponents, and weaker allies. His gameplan was essentially the same: Culture, with the odd Space victory. But it usually went through a military dominance phase first. He had so secure the attempt himself, and couldn't count on his allies to shield him while he went for it. Hatty did it too at times, but she mostly had Mansa and Darius clear the way for her, and Culture was often the only way she could win. Here, Ramesses could often also have won by Domination or Space had he not chosen to go for Culture.
So he felt stronger than Hatty.

Ragnar's (F) start was confirmed as awful, leading to 14 eliminations and no win. A boxed-in coastal start, with the only prospect for expansion being into the jungle. And with no Copper for an attempt at an early breakout.
Nearly as bad was Hammurabi's (C): again, a boxed-in coastal start. To make matters worse, the occupants of these two awful spots would often fight one another in the early game, weakening them further. Essentially, two starting positions had been assigned to an area of the map which could only accomodate a single civilization. Only Ramesses was able to get a win from there, and it was a miracle Culture attempt which he was lucky to pull off while never being able to develop beyond his limited initial core.
Lincoln's (E) wasn't much better, and that's harder to explain. The land and room to expand is decent, and it's bordering in the East and North-East the two weakest spots. I suppose it owes to being central and bordered on the other sides by stronger spots? Or it could just be a quirk of the dataset.

Ramesses' (G) was just a notch better (one fewer elimination, one more win). It suffered from one weakness mainly, but a major one: no Copper, while sitting next to a spot (B) which had Copper at the Capital. In the AH, that spot "performed" better, because it was the best AI (Ramesses) occupying it, but also because Pericles in B wasn't very likely to exploit the situation. Hannibal or Stalin for instance, on the other hand, had no such qualms. Starting with Mining, they could connect that Copper very early, and DoW at a very early date.

Pericles' (B) position was thus one of the best positions on the map, mainly on account of that metal situation, and on account of its weak neighbours.
Stalin's (A) led to the most wins. It was position E's better twin: bottom center of the map too, but just better. Better land, more room to expand. Stronger neighbours, and that may explain in part the situation: whoever was in position E was the natural target on account of being the weaker neighbour, which in turn led to poorer results from E?
Finally, Hannibal's (D) position was also a strong starting position: good land, sheltered. Its one issue was that expansion usually meant trying to conquer the strong tenant of position A.

So Mansa gets a strong friend to join him in the playoffs, brightening his prospects. If the wildcard goes to another high peaceweight, we could have one of the playoffs where Evil would be vanquished.

Thrasybulos · Aug 9, 2023

So, we now have 16 participants for the playoffs:

We're mising two, and those will have to fight hard to earn their wildcard.
As I've already mentionned, this will be done through a 2-round "Wildcard League".

The remaining 36 leaders will play 6 six-player games, the first 3 games making up "Pool 1", the last 3 "Pool 2" (I should have chosen a different term, as it bears no relation here with the "Leader Pools" of Sullla's AI Survivor. Sory for any confusion it might entail).
The Winner and runner-up of each game will then play a "Pool Finals", and only the winner of that game will get a wildcard.

The first two spots for each game was seeded according to the Opening Round results and Elo rating (for season 1, both are the same).
random.org was in charge of filling the rest, and that yielded the following match-ups:

Now, what about the maps used?

My initial plan had been to use Season 4's wildcard game map (conveniently a 6-player game) for the finals and to generate new random maps for the pool games, respecting the settings Sullla had chosen for Season 4 of AI Survivor.
Since I'm rotating the starting positions, map balance wouldn't be a major concern, so I'd pick the first one that came up with the proper "shape", but not editing it other than to add the observer civs afterwards through the worldbuilder file: no need thus to move civs around.

That's what I did for the first two games, although I had to reroll a lot more than anticipated: I kept getting the kinds of maps where 4 AIs would be bunched up in a narrow strip of land to the south, while the last two had the run of the rest of the map, or maps with a central inner sea limiting interractions, etc...
I did that too for the 3rd game. In the first run, Mehmed was severely boxed-in, but he played a superb game when he broke out by launching a successful long-range attack, and snowballed from there.
I... discarded that game (sorry, Mehmed).
Because the second run revealed just how imbalanced that map was, which Mehmed's success had somewhat obscured. One starting position was completely boxed-in. Three lacked metal. One had an obscene amount of land as a backline.
Because I had rerolled a lot, and I liked the overall shape of that map, I decided to edit the map instead of rerolling one. And thus completely discarded the rule I'd set out with (no edits).

So for Pool 2, I had a rethink.
What's the point of providing map data on maps which are edited, and that no one has seen before?
People have watched the games on AI Survivor maps, so providing data on those can be of interest to them.
Providing data on "raw" maps generated by the game might be of passing interest too.
But edited, unknown maps?

I could reuse the Opening Round maps instead, providing a second dataset on those. Problem was... apart from Game 5's map, the picture on the others was pretty clear already, and they were fairly imbalanced.
So in the end, what I opted for, was to play edited versions of the Opening Maps: I would try my hand at fixing the imbalances which had been revealed.

We'll see how successful I was...

Keler · Aug 10, 2023

Thrasybulos said:
although I had to reroll a lot more than anticipated: I kept getting the kinds of maps where 4 AIs would be bunched up in a narrow strip of land to the south, while the last two had the run of the rest of the map, or maps with a central inner sea limiting interractions, etc...

This is exactly why I like Balanced maps more than Pangea. Always proper shaped.

Thrasybulos said:
What's the point of providing map data on maps which are edited, and that no one has seen before?

In your tournament context it makes more sense to use any of the maps Sulla used, but overall it would be still interesting, all you need to do is show the map, in which you still will need to as you modify sulla's pangea maps to make all starting positions fairer. In fact I am planning to do some tournament tests by myself too to see differences between map scripts. But I need to discover more from you first. Increasing map number to 20(8+6+2+3+1) from 13(8+1+3+1) was a good idea.

So exactly, how the wildcard league games affect the ranking here, would the two leaders making it back to play offs end up earn more points than the leaders directly qualified in the first round simply because they appeared in more games? I can see you work with comlex Elo system and its arbitraries, but I suppose you won't really use Sulla's ranking for overall ranking, only as a 20 test result for comparsion? Because otherwise those wildcard winners get more points than openning round winners.

Thrasybulos · Aug 10, 2023

Keler said:
This is exactly why I like Balanced maps more than Pangea. Always proper shaped.

Something to keep in mind. :thumbsup:

Keler said:
So exactly, how the wildcard league games affect the ranking here, would the two leaders making it back to play offs end up earn more points than the leaders directly qualified in the first round simply because they appeared in more games? I can see you work with comlex Elo system and its arbitraries, but I suppose you won't really use Sulla's ranking for overall ranking, only as a 20 test result for comparsion? Because otherwise those wildcard winners get more points than openning round winners.

I'll talk about it more later, as now would involve spoilers, but:

Can't use Sullla's Power Rating for ranking indeed as it would make it unfair. Unless I simply don't take into account the Wildcard League results.
The idea of having those extra games was twofold:
- Getting the eliminated civs to play more games to get a more accurate ranking for them.
- Send to the playoffs two leaders with inflated elos so that there's be more points available for the playoffs (you don't gain elo points out of thin air, you take them from your opponents).

Thrasybulos · Aug 10, 2023

Alright, first Wildcard League game.

Let's start with the map:

Spoiler :

So would Peter and Ragnar work together and defeat the high peaceweights one by one, game after game?
Or would Darius's teching prevail?

Well, that went extremely poorly for the low peaceweight leaders!
The common opinion is that low peaceweight leaders tend to perform better not only because they outnumber the high peaceweight leaders, but also because they tend to passively watch as their allies get conquered. That wasn't the case here: the dogpiles against Peter and Ragnar just kept coming.
Even worse: far from working together, Ragnar and Peter would often initiate the dogpile against the other in the few occasions where one of them had a good initial game!

Ragnar and Peter are hard to tell apart as both had a very hard time. Their scores are very similar, with Ragnar usually being eliminated earlier, but surviving 3 games vs only once for Peter. Not much to say, really: they were dogpiled game after game, and that was it.

Hammurabi has the worst performance amongst the high peaceweight leaders, which I didn't expect (I would have cast Roosevelt in that role). I believe one of the main reasons for that is that he'd often found a religion he wouldn't spread, and thus be the odd man out in that team. Another reason is that while he's a decent eco leader... the others were just better at it.
Roosevelt was the best at not dying, which places him in 3rd position, right behind Darius. But to have a shot at winning, he needed to get big, and that required lucking out on the spoils from the dogpiles against Peter and Ragnar, as he would rarely if ever initiate a conflict after their elimination.
Darius had a solid performance. As expected, he tended to have the best economy. Somewhat surprisingly, and for the second game in a row, he was a tad more active than anticipated. But he did somewhat disappoint, though, as I expected him to be the best performer in that group if the low peaceweights faltered.

He wasn't the best... because Victoria was. Which, on paper, was to be expected: Financial + Imperialistic, the only high peaceweight who can plot at Pleased in that group (meaning that once Ragnar and Peter were eliminated, she had control over events), of course she'd be the favourite. Except that season after season of AI survivor, she's consistantly failed to deliver (apart from one game which the alternate histories debunked).
But lo and behold, in the right context, she can deliver.
...against her will, though. She apparently hates winning and will try her best to throw her games.
Remember Season 3's Wildcard game, where she turned the slider a few techs away from completing the tech tree and ended up losing the game as a result?
Well, that was no fluke. She's done it here on more than one occasion too.
So as we've seen with Hannibal in the previous game, Financial can actually be a handicap. A Financial AI will typically go for late conquests, when its tech edge finally translates into a decisive military edge. And those conquests may include Holy Cities, which may then trigger a Culture attempt starting from scratch. Victoria just seems a more extreme example, as she will sometimes turn the slider on with only a coupla techs remaining on the tech tree! :wallbash:

But that's not all.
You know how a very dominant AI will never, ever, sign peace with a one-city civ it's at war with?
Well, Victoria will. There was one game where whe was at 62% land area, at war with an opponent reduced to a single city that her armies where converging on. Capturing that city would have pushed her to the Domination threshold.
She signed peace.
And then gifted back a city.
Un-be-lie-va-ble!! :hammer2:

So Vicky, in the right context (high peaceweight field), can be the strong leader she's supposed to be. But boy, can she be frustrating as well.

This was a completely unedited map.
It turned out to be rather decent, with 3 positions (A, E, F) offering a good shot at victory.
Position B suffered from poor land quality: it was essentially tundra and ice.
Position D had no backline and led to being boxed-in hard and early.
Position C was interesting: it could be made to work, but was unforgiving. If its occupant did not ace its expansion phase, it would be severely boxed-in (3 or 4 cities), with no metal access!

(Summary on the next post, because attachment limit reached...)

AI Jumbled Rumble : another set of AI Survivor Alternate Histories, with a twist

Clerk

King

King

Attachments

Emperor

Clerk

Clerk

King

King

Attachments

Emperor

Clerk

King

King

Attachments

King

Prince

Clerk

King

Attachments

King

Prince

King

King

Similar threads