Disagree with random sampling - it might show the player is more important that the starting dirt + civ. It would work if players were the same strength - find another person that wins over half their games. Perhaps the most accurate would be some GLM.
So to shed some light on to this debate, I think having the data consistent for one player is actually really critical for this sort of analysis. There is an enourmous amount of variance in player skill and style, and so it's important to keep that variable constant.
In order to do as you say and control for player ability, we would need a huge number of games for each player, so that we could still make statements about starting conditions that weren't masked by variability in player skill.
Secondly, there is an important practical point. FilthyRobot has a huge number of games on youtube and importantly, he uploaded all his wins and losses (and I checked with him about this). Most channels have far fewer games, and the vast majority cherry pick the exciting games for upload (Yoruus, for example, only uploads a fraction of his games). So this sort of data frankly doesn't exist from anyone else!
One issue I had was that when tiering Civ he didn't (appear to) take into account the type of map. E.g. England is a top tier Civ on water maps, no doubt. But is it top-tier on pangea?
It's possible the same is true for coastal dirt - not as useful on pangea as for a continents map. A mountain is always useful. I was very surprised about the river. Perhaps if the generator thinks your best location is by a river, you're less likely to be by the coast/ mountain. A GLM would sort this out. I like starting on a hill. I'm surprised that doesn't make a difference.
The data is from a set of games that were always Pangaea, always 6 player. The tier list was also designed by FilthyRobot for specifically these conditions, so I think it fits the bill quite accurately.
As for hills, I thought it would be interesting, but basically FilthyRobot nearly always settles on a hill when he can (very common in multiplayer for the added defense), so there wasn't really a reliable non-hill to compare to