AI League 2.0

Not at home atm, I'll try and have a look at the map and replay tonight (but yeah, when I recorded the result, an Immortal rush is what I attributed it to).
 
Last edited:
Hard to say exactly from the replay, but yes, it was certainly an Immortal rush: Persepolis had Horses in its BFC, Sury had no Copper (only Iron, and not that close). War was declared on turn 37. :wow:

I remember that way back then, when I started doing AHs, I tried keeping the initial save so that interesting games could be replayed: "new random seed" option off, no player interaction, the games should have played the same.
And yet, they always started diverging at some point, so I dismissed the idea (and the practice).

But back then I was running those games "manually", so maybe, just maybe, I wasn't consistent in refusing/denying all the various AI demands?
Maybe I should give it another try... And even if the games still diverge at some point, early exceptional events like this one should be replayable, so it might still be worth it?
 

Attachments

Yes, I ran a test yesterday after posting this, and using aiplay, there was no divergence.
So that's definitely something, and I should try and keep those initial saves from now on.

I suspect that unfortunately it won't allow to replay the games "manually" (but I'll test it to confirm it anyway): the observer civ not being dead means extra random rolls (for demands) and might impact other decisions as well, so there's little chance the games play out the exact same.
 
Hmm... something weird going on here:

Filesize.png


Why is the size of the initial saves growing??

Each game is launched from a different WB file.
I'm not shutting down Civ4 after each game... looks like there's stuff kept in memory which somehow finds its way into the next game?
:confused:
 
Right, so there is indeed some kind of "information leak" happening:

file_diff.png


On the left, the first initial save.
On the right, the initial save for the second game run without closing civ4.
This is in a player section of the save: somehow the "MapEspionnageHistory" (and the same happens for Culture) of the previous game is preserved for each player and injected into the new game's save?
 
Do you have an overall power ranking? One thing I've been working on over the past few weeks is what would happen in a theoretical "draft", where the bottom leaders get to pick their civs and so on and so forth. I just did this with Sullla's leaderboard, I want to compare with other rankings (i.e. Keler's, this one's)
 
Yes : the elo ranking is kept up-to-date in the top post of this topic.
(By the way, the next update should happen in the next few days: on Sunday or Monday, most likely.)
 
Last edited:
Series 9 Results

Spoiler Results :

S9_Results.png


The last time HC had lost every game in round 1 was Series 3, which he ended up losing.
Well... not here.
He recovered and became dominant, once again. :rolleyes:
(And of course, here again it turns out he won a majority of Cultural victories. Sigh... Quite a few of them were "Domination shortcuts", but certainly not all of them.)

That said, Ramesses was at the top with a significant lead in victories for quite some time. But then all the other high peaceweights dropped out, and he had to face an all low-peaceweight field for the remaining rounds. Amazingly, he did win some games (a combination of rolling low + starting in a good spot?), which allowed him to stay the whole time in Pool 1, a feat which no high(ish) peaceweight had managed so far.

Overall, the high peaceweights had a tough time this series: Pool 8 was most of the time a high peaceweight field, but it was Pool 7 which was their reserved domain, with either an all high-peaceweight field, or a 6 vs 1 situation.
The high peaceweight leaders are already at a numerical disadvantage, but with them staying grouped at the bottom, it meant those few who ventured in the higher pools had a very tough time...
In round 10 for instance: Ramesses was alone in Pool 1, Hatty alone in Pool 2, and Mansa alone in Pool 3...
One of the reasons for them staying grouped at the bottom was that those games tended to yield unbalanced results. For instance, Gandhi won 5 games out 7 in a single round when he found himself in such a favourable environment (and he won only 3 games total in the remaining 9 rounds...).
But if a single AI wins most games in a single round, the other Pool participants don't make it higher...

Hannibal did much better this time than last series. Suleiman had a bit of a slump in the middle but finished very strong.
Justinian and Willem had a bad series, so the gap between them and HC keeps widening...

I haven't commented the Qin / Mao situation for a while: Qin has (at last?) been doing better than Mao, so it's now looking a bit more like I thought it would. But that's still a far cry from what I expected, tough.
One explanation might come from the stats I posted earlier in answer to Keler: Qin would seem to be pretty good at surviving. And all my previous "experiments" rewarded survival to some extent, while here only winning matters. That might explain why I overestimated him?

And of course, I jinxed Nappy last time: saying he was the best of the crazies and an outright strong leader meant he underperformed here, with Shaka clearly outperforming him.

Spoiler Awards :

S9_Awards.png


(I unfortunately do not have the initial save for that turn 64 elimination: it was a round 2 game, I only started keeping those initial saves from round 3 onward. :hammer2: )

Spoiler Elo Ratings :

S9_Elos.png


After more than 4K games played, things are stabilizing a bit, but we're still seeing some largish variations.
Peter and Stalin did very poorly and lost a substantial number of elo points.
Sitting Bull keeps digging...
Mansa had Hatty in his sights, but he's still behind.

Spoiler Map Data :

There were only two 7-player maps left from the AI Survivor set, so for the remaining slots I started a new rotation of the 7-player maps, with the added rule that a map cannot be re-used for the same Pool it was used previously.

Spoiler Pool 1 (S4 Playoff 1) :

Pool1_S4_PO1_Stats.png


Spoiler Pool 2 (S3 Game 3) :

Pool2_S3_G3_Stats.png


Spoiler Pool 3 (S6 Game 3) :

Pool3_S6_G3_Stats.png


Spoiler Pool 4 (S8 Game 1) :

Pool4_S8_G1_Stats.png


Spoiler Pool 5 (S8 Game 6) :

Pool5_S8_G6_Stats.png


Spoiler Pool 6 (S2 Game 8) :

Pool6_S2_G8_Stats.png


This was the first map which had already been used (Series 4).
Here were the results I got previously:
pool5_s2_g8_stats-png.740267


Well, there's one glaring difference: Position 3 had been slightly better than average in that previous set, it's deathspot territory in the new one!
In the new set, Position 5 is even more dominant, with Position 4 also doing better than previously: I believe it might explain how Position 3, located between these two, saw a dramatic drop in fortunes.

Spoiler Pool 7 (S1 Game 6) :

Pool7_S1_G6_Stats.png


Spoiler Pool 8 (S5 Game 6) :

Pool8_S5_G6_Stats.png


This is the second map which had already been used, and funnily enough, random.org gave me here a map I'd just used in the previous series.
Here were the results I'd got then:
pool6_s5_g6_stats-png.744914

Pretty similar, but not exactly the same: Position 6 had done better, and Position 4 had also done slightly better, with apparently as a direct consequence Position 3 faring a bit worse.

 
Stalin doing good work setting new records. Unfortunately they're the kinds of records he doesn't want to set :crazyeye:.
 
Oh, forgot to mention...
In the last round, Louis managed something that I remember ever witnessing once before: he won by Domination... and then proceeded to getting wiped out (by JC). :lol:
 
It is fun to look at your results, you forgot to mention Stalin, probably stuck with high peaceweights forever?

Do you have a graph or something that shows elo rating of leaders by the end of each series, how they shift around, go up and down over series?
Like you said, things are stabilizing with more series.
 
It is fun to look at your results, you forgot to mention Stalin, probably stuck with high peaceweights forever?
Sorta, but not entirely: he got stuck in Pool 8 which usually had a majority of high peaceweights, but it was a weak majority (4 to 3 most of the time). And in the last round for instance, it was 4 to 3 in favour of the low peaceweights... and that didn't help.
Pool 7 was the real death trap for low peaceweights.

Do you have a graph or something that shows elo rating of leaders by the end of each series, how they shift around, go up and down over series?
Not ready yet, but definitely planned.
I'm actually planning on providing those values in table format: thought about it a bit, and I'm not sure a graph with the elo values would be that useful. I think that the graph I'll provide is, for each leader, the variation in elo ranking.

Like you said, things are stabilizing with more series.
Of course, the issue is there are two possible explanations:
  1. The values are stabilizing around the "true" values.
  2. We're simply getting into hard diminishing returns (Series 2 doubled the dataset, Series 3 added 50%, Series 4 added 33%, Series 5 added 25%, etc...).
(1) would be great, (2) not so much.
I guess it's a bit of both. :)
 
Now that we start having a lot of game results, here is a table comparing some scoring systems (win ratio, elo, league score, eloK):

Spoiler Scoring Table :

ScoreComparison.png


As you can see, there's pretty little difference between the first three systems.
In particular, the fact win ratio and Elo yield very similar lists could simply mean that attempting to include the leaders' relative strength into the scoring system turns out to be irrelevant. :(
Or it could mean that the League's format, which was tailor-made for such a system, is a tad too efficient...
Or it could mean that there's simply too much noise in the data (unbalanced maps, unbalanced peaceweight situations, outlier game results, ...).

On the subject of noise and outliers, the last column stands out: the "EloK".
As a reminder (cf. the top post where this is explained in more detail), while the Elo rating is computed with the underlying system of the winner of the game gets a "win" versus every other player, the EloK value is computed with the winner getting a win versus surviving players only, and killers getting a win vs their victim.
Before I started running these and paying a bit more attention to it, I would have estimated the "kill steal" ratio at about 20%.
Now, I would say it's closer to 50%. :wow:

KillSteal_Example.jpg


These happen all the time, game after game!
So the EloK values are computed with a ton of noise.
And what's surprising is that in spite of that, the leader ranking list they yield isn't crazy. Sure, it's different, with the more peaceful leaders getting dropped down, but that's a feature, not a bug.
If Shaka and Napoleon get ranked higher, Monty remains at the bottom.

So with a better system for scoring kills (like the one I had imagined, but which is too impractical to use without some heavy coding), it could work...

Now, Sullla doesn't have that issue with his system: who kills whom doesn't matter in his case, it's just about the number of kills.
And sure enough, kills steals do not result in Gandhi scoring more kills than Shaka. :)

That said, I don't think that "things balance out" completely either.
For instance, after playing all those games and paying some attention to that aspect, I can tell that HC is getting a lot more kills stolen from him than he steals kills.
These two examples were taken from the same game:
Spoiler :

KillSteal_HC1.jpg


KillSteal_HC2.jpg



I suppose it can be explained by the fact that HC doesn't have a "killer instinct": he usually takes his own sweet time when conquering opponents, sometimes even losing interest in his wars (too much other stuff to build).

But another leader who's getting a lot more kills stolen from him than he takes from others might be more surprising: Shaka! :crazyeye:
In his case, I suppose that's because he's always killing someone (when he's not being killed), so any time someone else kills his target, it's obviously right before he would have done so himself? :lol:
 
Back
Top Bottom