A statistical analysis of which start conditions improve the likelihood of winning

Well, calendar resources were also associated with poorer performance, just not to such a strong degree as trapping. I actually think the "truth" might be that all non-mining resources are worse, but we don't see the statistical signal for coastal resources, because that is offset by the added bonus of being on the coast.

I think there is a point to be made that trapping might have poor terrain. Furs spawn of flat tundra (bad) and truffles can be in forest or jungle (slow start). Ivory is often near flat desert (bad). I think this might be why trapping did so much worse than calendar. Another reason might be that at least calendar is on the way to philosophy, whereas trapping is a bit "out of the way" for a quick tech path to National College

I agree, but there's always a danger to this kind of conjecture. It might be a case where "these starts result in you dying to chariot rushes" or something too...just as an example. Maybe there's correlation to strategic resource availability or in a given scenario a given player mistakenly attempts a path for a wonder that hurts more than helps.

It's still interesting data, but the conclusions drawn must be done so with care. I have a lot easier time envisioning why mining resources/coast are particularly advantageous (fast yields, powerful internal trade routes in an environment where external are less viable respectively). Is the variance in calendar starts really explained by just delaying the early outputs from worked tiles until calendar...is that really enough to make you so much less likely to win? If not, it might be pointing to strategy mistakes that he makes when presented with that scenario, which would also be insightful unto itself.
 
Once again, it's the hammers.

Look you are going to get at least one luxury resource in your starting location.

If there are any luxuries that don't produce gold as well, I'm drawing a blank now.

So effectively the only difference between a start with cotton (assume one "kind" of luxury resource) and one with gold is pretty much the hammers.

And I think that has a snowball effect that lasts the whole game, since effectively your starting city is the only one that has meaningful production in the first part of the game, where arguably things are most important.

I mean heck, depending on how close the closest civ (or two) is, I don't even bother putting out a settler. I got straight for taking that neighboring civ. In my experience early production is the biggie with pulling this off fairly easily.
 
Actually I kind of believe that so much, maybe you ought to just go to one thing, compare the number of hammers around a starting location. One variable.

Then see how it correlates.
 
The data is from a set of games that were always Pangaea, always 6 player. The tier list was also designed by FilthyRobot for specifically these conditions, so I think it fits the bill quite accurately.
Okay, so now I would like to ask you to revisit your previous article where you reasonably conclude that civ choice did not statistically matter.

If you only pick games with mining luxes, assuming there are enough of them, can you statistically validate FithyRobot’s tiers? Or at least see some statistical difference between the top and bottom teirs? With six tiers, I would be amazed to find statistical difference between any two neighbors. But surely between the best and the worst? (Controlling as best you can for other variables at least.)
 
Okay, so now I would like to ask you to revisit your previous article where you reasonably conclude that civ choice did not statistically matter.

If you only pick games with mining luxes, assuming there are enough of them, can you statistically validate FithyRobot’s tiers? Or at least see some statistical difference between the top and bottom teirs? With six tiers, I would be amazed to find statistical difference between any two neighbors. But surely between the best and the worst? (Controlling as best you can for other variables at least.)

That is actually something I'm interested in doing in the future :)

There were 58 mining games, which should be enough of a sample to see something
 
I agree, but there's always a danger to this kind of conjecture. It might be a case where "these starts result in you dying to chariot rushes" or something too...just as an example. Maybe there's correlation to strategic resource availability or in a given scenario a given player mistakenly attempts a path for a wonder that hurts more than helps.

It's still interesting data, but the conclusions drawn must be done so with care. I have a lot easier time envisioning why mining resources/coast are particularly advantageous (fast yields, powerful internal trade routes in an environment where external are less viable respectively). Is the variance in calendar starts really explained by just delaying the early outputs from worked tiles until calendar...is that really enough to make you so much less likely to win? If not, it might be pointing to strategy mistakes that he makes when presented with that scenario, which would also be insightful unto itself.

You make fair points. Of course I am not getting at the "truth", just finding associations between certain variables and winning. Trapping resources might for example be associated with bad terrain, or low production etc etc. I do have a tendency to perhaps over-state conclusions when I'm trying to write for the general audience, because articles with too many ifs, buts and disclaimers tend to not be appreciated as well (a sad truth, but one only need look at science journalism to see it in action). I recognize that this is something to be improved in my writing approach!

One hypothesis (that I think is a good one) that Sunbeam put forward is that it simply relates to how many hammers are in the terrain. Trapping and Calendar luxuries occur on flat land, and only provide a single hammer if they are on plains. On the other hand mining luxuries are often on hills, and might be an indicator of a more hammer-heavy start (which is critical in multiplayer) in general.

I would add though that after trawling through the games I did get a decent sense of the player, and I would say he makes remarkably few bad strategy decisions, or at least as far as I can tell. It's not like he panics at a bad start and goes on a suicidal rampage like some players do, or attempts to compete for the great library with hammer-poor land.
 
One hypothesis (that I think is a good one) that Sunbeam put forward is that it simply relates to how many hammers are in the terrain. Trapping and Calendar luxuries occur on flat land, and only provide a single hammer if they are on plains.

If it's really about hammers, that's a measurable outcome too. Wouldn't you just optimize for hammers ASAP then? You need food to sustain a gold mine and still grow. Something like cotton can be food neutral, so you can work a regular mine in addition. What's the functional difference between a gold mine + farms versus cotton + farm + mine?

You can certainly work the gold mine earlier than you can the cotton, but is that enough to explain a major swing in win probability alone? Maybe it's actually just from Filthy delaying mining or something when presented calendar?
 
The model I fit that gave the numbers in the article fitted Coast (Y/N), Mountain (Y/N), River (Y/N), Natural wonder (Y/N) and luxury tech (5 categories).
However, I ran a large number of permutations eliminating different variables to make sure that the results I got were robust to changes in variable selection (I also tested for possible interactions, but found none, so didn't pursue that much further).

Good to know :) However, I don't like 9 independent variables with n=180. Haven't studied logit-models in a while, so need to think about it a bit more. A simple simulation should be sufficient as a test though.

Just have a look at this thread and see that most questions are surrounding alternative specifications and the validity of those - very common when publishing statistical analysis. Of course some concerns missing data, but no interactions should in principle (given all assumptions) mean that Mountain is not favourable due to increased likelihood of mining lux e.g.
 
^ Just because some of the factors don't interact doesn't preclude it happening in some cases.

If for example starting with coastal literally guaranteed you 4 or more horses, but you don't test for the presence of horses, and horses are actually the decisive factor, you'd overrate coastal starts.

That one is pretty unlikely, but it should illustrate what I mean.
 
Yes, I agree. Perhaps I wrote it a bit unclear, but I meant for the variables actually in the dataset. Two variables in the dataset that don't interact, has insignificant correlation by definition. This is only valid for the population though, if your sample is representative and sufficient for lln

Of course you cannot really make any inference about hammers, horses, jungle or desert using this data. So if any of these are relevant, you can neither analyse nor discuss them within the realms of current dataset.
 
Of course some concerns missing data, but no interactions should in principle (given all assumptions) mean that Mountain is not favourable due to increased likelihood of mining lux e.g.
That is actually a very good example since one reasonably presumes the map generator algorithm would correlate the presence of mountains with hills. So mountain ends up being an indicator for higher hammers -- even though one cannot work mountains and the primary benefit is unlocking wonders and maybe observatory.

Then TheMeInTeam points out there could any number of hidden programmatic correlations which are not suspected. But don't the statistical tools tend to expose any such correlations? But I think the point is valid that you need to be tracking benchmarks that you think are irrelevant (e.g., the presence of horses (and other resources)).
 
Yes indeed, the theory can expose all correlations and their (statistical) significance - as long as they are in the dataset :)

To expose a latent correlation (or more interestingly, causation), one needs a structural model. This adds a new layer of "arbitrariness", and a throng of identification problems appear. The famous example of a structural model gone wrong is the "epicycle theory" of planetary movements, aka the Ptolemaic model, which can be fitted to any dataset.

Of course, when studying a game all this becomes much easier since we can read the code. One cannot do the same in the "real world", since the "code" is unaccessible for us.
 
Good to know :) However, I don't like 9 independent variables with n=180. Haven't studied logit-models in a while, so need to think about it a bit more. A simple simulation should be sufficient as a test though.

Just have a look at this thread and see that most questions are surrounding alternative specifications and the validity of those - very common when publishing statistical analysis. Of course some concerns missing data, but no interactions should in principle (given all assumptions) mean that Mountain is not favourable due to increased likelihood of mining lux e.g.

So I did run the analysis without the luxury tech in the model, as well as a regression looking to fit only luxury techs, as well as half a dozen other combinations.

The way I've always approaches tests like logit regression is to see how robust results are to these changes. If reducing the number of variables makes significant things disappear, then that would be a big red flag for me. As I say, all the results I reported were robust in the sense that they weren't greatly altered by fitting different permutations of models.

Now to a professional statistician, this approach might seem a bit crude, but I've found it to be useful.
 
Of course, when studying a game all this becomes much easier since we can read the code. One cannot do the same in the "real world", since the "code" is unaccessible for us.

A lovely way of summarizing science!

The scientist in me would actually love to run real experiments now. I have wondered about loading IGE and playing the same map with and without a mountain 5 times each or something. Or one permutation with truffles, the other with silver, something like that...

Unfortunately this would be very boring and time consuming for the player (me!), and I would always worry about the sort of "butterfly effect" things that could happen and make the games have vastly different outcomes
 
Haha, yes. Its a lot more fun to create ideas and dabble in the possibilities than actually gathering data :) This obsession with truth creates a lot of tedious work

Become successful (or rich), so that you can hire research assistants :D

If Civ wasn't so resource-heavy, it should have been possible to simulate. Like in Chess, when AI vs AI is an acceptable way of testing claims. But I doubt someone will create a database of 15 million civ-games :p
 
This is really interesting! I think it might show that we're rating civs wrong, and that start bias may be far more important than a civ's UA or UUs. I have tended to notice that when I play Poland, I get salt a lot, which is an extremely dominant start.

If you look at civs by start bias instead of by tiering, do you get a significant result? That might be a question to ask for a future installment.
 
This is really interesting! I think it might show that we're rating civs wrong, and that start bias may be far more important than a civ's UA or UUs. I have tended to notice that when I play Poland, I get salt a lot, which is an extremely dominant start.

If you look at civs by start bias instead of by tiering, do you get a significant result? That might be a question to ask for a future installment.

Yeah, someone commented something similar on the article and I think it's an interesting thought. I've often felt that lots of the "weak" civs (carthage, denmark, byzantium, ottomans) are redeemed by a coastal start.

The reason I think it would be really hard to show though is that start bias itself doesn't really say much about terrain quality. We all know that tundra can be awful, but it can also put you nice and defensively on the edge of the map. Similarly desert can be a petra dream with nice floodplains, or flat with little food.

I think coast is the only start bias that is likely to really change the outcome based on what I've learnt so far, but it certainly might be worth a further look.
 
So to shed some light on to this debate, I think having the data consistent for one player is actually really critical for this sort of analysis. There is an enourmous amount of variance in player skill and style, and so it's important to keep that variable constant.

I don't want to drag this point on any longer, but this is a clear example of selection bias. It is for that reason ("enormous amount of variance in player skill") why randomly sampling from many different sources is even more necessary. Running a regression from the dataset of one player only tell us which variables correlates to wins for that one particular person. We cannot generalized to the rest of the population with one source. Regressing over a different player could lead to entirely different results.

Nonetheless, I certainly understand the logistical difficulty in trying to find enough players to sample from.
 
Top Bottom