A statistical analysis of which start conditions improve the likelihood of winning

Just wanna say, big respect for the work you're doing with this Civ science project. Keep up the good work, it'll be good to have a statistical basis to reference for some claims, a classic one of which is Player Skill > Starting Dirt > Civ
 
Thanks! That means a lot!

Already from the project it looks like starting dirt > civ.

Much harder to address player skill with this project as there isn't the data. There aren't any inept players who have systematically uploaded their games to youtube!
 
Excellent work! I made a thread recently about this exact subject and asked whether or not and how often people restart when they get a "bad start".

This is exactly the type of info I was looking for.

Well done.
 
I would like to see some more details. I belong to the annoying fraction of the human race that dislikes nicely looking articles that summarises and simplifies. I much prefer tons of tables, alternative models and residual plots. We usually spend our life quarrelling about the Appendix :)

Things I would like to see:

Coefficient-table w/ SE
corr(X)
Formal model (you don't even specify it)
Alternative specifications (do you have omitted variable bias? is a variable significant due to its correlation? Is including cross-terms a better fit?)
Discussion of errors

Do you have a large enough sample? Logit-models can misbehave if you have lots of ind. variables relative to sample size. I can't really tell from your article how many independent variables you have.

You might of course have a thorough analysis, but it doesn't really show from the article.
 
I would have liked your dataset to be collected from multiple sources rather than just from FilthyRobot. Nonetheless, keep up the good work.
 
I disagree with sugar daddy. One player keeps the test "fair". I think the results would be different for single-player.
 
I disagree with sugar daddy. One player keeps the test "fair". I think the results would be different for single-player.

A single source leads to sampling bias, specifically, selection bias. Samples from inthesomeday, for example, could reveal very different results in a regression than it would for me simply because we have different play styles (and win rates). The best way to remove this would be a large dataset with randomly selected samples. Obviously, obtaining such isn't feasible in this case.
 
A single source leads to sampling bias, specifically, selection bias. Samples from inthesomeday, for example, could reveal very different results in a regression than it would for me simply because we have different play styles (and win rates). The best way to remove this would be a large dataset with randomly selected samples. Obviously, obtaining such isn't feasible in this case.

Disagree with random sampling - it might show the player is more important that the starting dirt + civ. It would work if players were the same strength - find another person that wins over half their games. Perhaps the most accurate would be some GLM.

One issue I had was that when tiering Civ he didn't (appear to) take into account the type of map. E.g. England is a top tier Civ on water maps, no doubt. But is it top-tier on pangea?

It's possible the same is true for coastal dirt - not as useful on pangea as for a continents map. A mountain is always useful. I was very surprised about the river. Perhaps if the generator thinks your best location is by a river, you're less likely to be by the coast/ mountain. A GLM would sort this out. I like starting on a hill. I'm surprised that doesn't make a difference.
 
On the other hand as much as the statistician me appreciates real statistics on the web the civ5 player me find it a bit unsurprising that mountain, coast and mining luxuries improve your win probability...

One thing that quickly came to my for me is to what extent having trapping resources suggests unfortunate other factors (poor tundra terrain, no coast, or slow yield starts) in tandem, versus it just being the need for trapping itself. Maybe the player is just really bad with trapping resources for some reason, but I'm not picturing what "being bad with trapping resources while good with coastal starts" looks like from the standpoint of player skill. Maybe someone else could envision why that might be, but I'm suspecting the start quality for the moment.
 
Disagree with random sampling - it might show the player is more important that the starting dirt + civ.

If a random sample showed this to be true, then it confirms that fact that Captain_Wozzeck's own sample was bias and couldn't be used to represent the game play of the general population. Also, If the starting location correlation to wins is statistically insignificant, that's very important to be known. The point here is to follow the data and information to which ever conclusion it takes us.
 
Nice analysis. But I have some suggestions about the statistical approach. In other parts of this forum, it is often claimed that what matters is not whether you win or lose (top players should be able to win any start) but how quickly. So, you could test how the starting factors affect time-to-win using survival analysis techniques. The only difficulty with this idea is deciding how to treat losses -- presumsbly as censoring events.
 
Personally my take on things is hammers are the most important thing in the early game.

Whether it's pumping out settlers, units, or buildings it comes into play with all of them.

I've had starts on flat ground in jungle, takes forever to get anything done. And there is absolutely nothing you can do about it until you get metal casting (and even then it may not make a big difference, but at least you can do something).

To tell you the truth (as was said in the article), the capitol is by far the most important city of the empire. Big hammers in the capitol at the beginning is my favorite kind of start. Unless you get really lucky it takes a while for founded cities to start producing anything (units, settlers, whatever). You pretty much have to live with whatever your capitol produces till what I'd call the midgame starts.
 
Disagree with random sampling - it might show the player is more important that the starting dirt + civ. It would work if players were the same strength - find another person that wins over half their games. Perhaps the most accurate would be some GLM.

So to shed some light on to this debate, I think having the data consistent for one player is actually really critical for this sort of analysis. There is an enourmous amount of variance in player skill and style, and so it's important to keep that variable constant.

In order to do as you say and control for player ability, we would need a huge number of games for each player, so that we could still make statements about starting conditions that weren't masked by variability in player skill.

Secondly, there is an important practical point. FilthyRobot has a huge number of games on youtube and importantly, he uploaded all his wins and losses (and I checked with him about this). Most channels have far fewer games, and the vast majority cherry pick the exciting games for upload (Yoruus, for example, only uploads a fraction of his games). So this sort of data frankly doesn't exist from anyone else!

One issue I had was that when tiering Civ he didn't (appear to) take into account the type of map. E.g. England is a top tier Civ on water maps, no doubt. But is it top-tier on pangea?

It's possible the same is true for coastal dirt - not as useful on pangea as for a continents map. A mountain is always useful. I was very surprised about the river. Perhaps if the generator thinks your best location is by a river, you're less likely to be by the coast/ mountain. A GLM would sort this out. I like starting on a hill. I'm surprised that doesn't make a difference.

The data is from a set of games that were always Pangaea, always 6 player. The tier list was also designed by FilthyRobot for specifically these conditions, so I think it fits the bill quite accurately.

As for hills, I thought it would be interesting, but basically FilthyRobot nearly always settles on a hill when he can (very common in multiplayer for the added defense), so there wasn't really a reliable non-hill to compare to
 
One thing that quickly came to my for me is to what extent having trapping resources suggests unfortunate other factors (poor tundra terrain, no coast, or slow yield starts) in tandem, versus it just being the need for trapping itself. Maybe the player is just really bad with trapping resources for some reason, but I'm not picturing what "being bad with trapping resources while good with coastal starts" looks like from the standpoint of player skill. Maybe someone else could envision why that might be, but I'm suspecting the start quality for the moment.

Well, calendar resources were also associated with poorer performance, just not to such a strong degree as trapping. I actually think the "truth" might be that all non-mining resources are worse, but we don't see the statistical signal for coastal resources, because that is offset by the added bonus of being on the coast.

I think there is a point to be made that trapping might have poor terrain. Furs spawn of flat tundra (bad) and truffles can be in forest or jungle (slow start). Ivory is often near flat desert (bad). I think this might be why trapping did so much worse than calendar. Another reason might be that at least calendar is on the way to philosophy, whereas trapping is a bit "out of the way" for a quick tech path to National College
 
On the other hand as much as the statistician me appreciates real statistics on the web the civ5 player me find it a bit unsurprising that mountain, coast and mining luxuries improve your win probability...

Well they might not a be a big surprise to experienced players, but I thought they were worth sharing. It's still useful to know which advantages outweigh ones that people often perceive as important (like which civ they are playing).

Secondly, coast is definitely a debated topic in multiplayer, because of the vulnerability to attack. I've known players to move off the coast just so they don't have to worry about frigates. This data at least shows that despite the disadvantages, coast is still a big bonus in multiplayer.

Obviously in single player the bonus is unquestionably good, because the AI suck at naval invasions
 
I would like to see some more details. I belong to the annoying fraction of the human race that dislikes nicely looking articles that summarises and simplifies. I much prefer tons of tables, alternative models and residual plots. We usually spend our life quarrelling about the Appendix :)

Well there is nothing wrong with being that sort of person :)

I do stand by how I wrote the article. I wanted this to be approachable by most civ fans, and if I had gone fully into technical details I might have lost a lot of people!

The model I fit that gave the numbers in the article fitted Coast (Y/N), Mountain (Y/N), River (Y/N), Natural wonder (Y/N) and luxury tech (5 categories).
However, I ran a large number of permutations eliminating different variables to make sure that the results I got were robust to changes in variable selection (I also tested for possible interactions, but found none, so didn't pursue that much further).

If I get time later today I'll put up the coefficients table as an addendum to the article
 
Top Bottom