How scoring works, with graphs!

carterba · Jan 27, 2006

I posted this in another thread, but I thought it might be of interest to GOTM people:

How scoring works:
There are four components: population, land, tech, wonders. The contribution to game score is the percent of each that you control times a multiplier: 4000 for population, 2000 for land, 2000 for tech, and 1000 for wonders.

In the final score calculation, each component is divided by an exponential function of the maximum possible. The maximum possible is determined by the map; it seems that maximum possible population on a standard map is around 750 and maximum land is around 1000. Max tech is 300.

At 2050, the exponent is 1, so when time runs out final score = game score. If you're at turn t and there are T total turns in the game, the exponent is t/T. You would divide your population by the maximum population raised to the t/T power and multiply by 4000 to get the contribution of population to the game score. So if your population score is 331/750 (44%), the contribution of population to the final score is 4000*331/(750^(t/T)). As t gets closer to T, the divisor gets larger, and the score gets smaller.

(I left out a few details, but that's the gist of it.)

So that's why early finishes score higher. Your score decays exponentially with the turn number.

Attached is a graph of turn # vs final score, assuming Noble difficulty, normal game speed, and an approximately standard-sized map. Max population is 800; max land is 1000. I make the following simplifying assumptions: population increases at a flat rate of one point per turn up to 56%, land holding increases at a rate of one point per turn up to 45%, tech score increases at a rate of 2/3 per turn (I'm not sure how tech score is calculated, so I don't know if this is realistic) up to 300 (the maximum), and no wonders are built.

The highest possible score achievable under these assumptions is 184,691 at turn 69, controlling 8.6% of the max population, 6.9% of the land, and having a tech score of 45 (which I think is not very realistic). At turn 2050, final score is 5145 (which is pretty close to my usual winning game scores).

If anyone has a better model of population, land, tech, and wonder increases, let me know and I'll plug 'em into my program.

DaveMcW · Jan 27, 2006

carterba said:
I make the following simplifying assumptions: population increases at a flat rate of one point per turn up to 56%, land holding increases at a rate of one point per turn up to 45%, tech score increases at a rate of 2/3 per turn (I'm not sure how tech score is calculated, so I don't know if this is realistic) up to 300 (the maximum), and no wonders are built.

Those are pretty big assumptions! I would love to gain 1 pop/turn at 4000BC.

The growth curve is not a straight line, but an exponential curve that increases even faster than the finish divisor. Near the domination limit the growth curve flattens out, and the optimal finish date is where the second derivative of the growth curve intersects the second derivative of the finish divisor.

We have a collection of over 600 growth curves for GOTM1, if you want to crunch the numbers and generate a model.

Gyathaar · Jan 27, 2006

Some info:
Max land score is number of land tiles on the map

Max population I believe is the sum of the food all tiles on the map will give with no improvements (grass tiles 2 food, salt water tiles 1, fresh water tiles 2 and so on), divided by number of food that each citizen eats, in other words 2.

Ancient age techs is worth 1 point, classical ages techs are worth 2, and so on up to 7 points each for future techs.

All wonder, small and great ones are worth 5 points each.

It is possible to go over the max value for techs and population..

AlanH · Jan 27, 2006

If you visit this post you'll find a spreadsheet containing all the final scores and dates and VCs for GOTM 01. It was a Normal Continents map, with Rome as the player civ. I have plotted the results as a scatter graph on an exponential scale, and they come out distributed around a pretty linear trend line.

You will have to wait for the powers that be to reboot the file server before you and download the spreadsheet or view the graph.

bshumbera · Oct 14, 2006

AlanH said:
I have plotted the results as a scatter graph on an exponential scale, and they come out distributed around a pretty linear trend line.

Looking closely at that plot, it almost looks like the trend line may have a slightly different slope for the different victory conditions, most likely due to the population and land area components. The conquest and domination victories appear to be steeper. On the other hand, cultural looks like it has the lowest slope. Wouldn't you agree?

AlanH · Oct 14, 2006

You could use the spreadsheet to create the separate trend lines. I'm no statistician, so I wouldn't like to offer any opinions that may expose my total ignorance of the subject ...

.

Thrallia · Oct 14, 2006

I'd speculate that that is because cultural and space race victories have a much narrower time period in which they are achieved, not because of any inherent scoring differences in them. The fastest space race victory is generally about 300 years ahead of the last one, so there's a very narrow time period, cultural generally has about a 400 year gap, so its almost the same thing, whereas conquest and domination tend to have 1000 year or greater gaps, causing a steeper trend line(not to mention each turn is weighted heavier at first than at the end, thereby giving a steeper curve inherent to any early finish in regards to a later finish)

StevenJoyce · Oct 19, 2006

AlanH said:
You could use the spreadsheet to create the separate trend lines. I'm no statistician, so I wouldn't like to offer any opinions that may expose my total ignorance of the subject ... .

I, on the other hand, am pretty good at statistics. I ran an OLS regression of Log(Score) on Turn using the GOTM1 results. I allowed each victory condition to have a different intercept and slope. (Obviously, there is no slope for the time victory condition.) I then tested various hypotheses of the form "these two victory conditions have the same intercept/slope." I used heteroskedasticity-robust standard errors.

Results:

Number of obs = 585
R-squared = 0.9282
Root MSE = .18235

[pre]
--------------------------------------------------------
| Robust
| Coef. Std. Err. t-statistic
--------------------------------------------------------
Conquest intercept | 14.04529 .5915946 23.74
Cultural intercept | 12.39093 .4274544 28.99
Diplomatic intercept | 13.13844 .2307633 56.93
Domination intercept | 13.45518 .07267 185.15
Space intercept | 13.58356 .1133032 119.89
Time intercept | 8.486896 .0228713 371.07

Conquest slope | -.0128118 .0015442 -8.30
Cultural slope | -.0090372 .0012823 -7.05
Diplomatic slope | -.0103058 .0006411 -16.08
Domination slope | -.0105715 .0002136 -49.48
Space slope | -.0117623 .0003013 -39.04
--------------------------------------------------------
[/pre]

Example of how to read this table: For conquest victories, the estimated intercept is 14.045 and the estimated slope is -0.0128. We can be 95% certain that the true intercept and slope are within 1.96 standard errors of the estimated values (provided that the model is correctly specificied). (More precisely, is the model is specified correctly, and if we were to repeat the analysis on many different independently generated data sets, 95% of the time, the true slopes and intercepts will be within 1.96 standard errors of the estimated values. For the conquest intercept, the standard error is 0.59, so the true intercept probably lies between 12.88 and 15.21.

Which intercepts are significantly different?

The time intercept is statistically significantly lower than all of the other estimated intercepts.

The cultural intercept is marginally signficantly lower than conquest, domination, and space intercepts.

None of the other pairs of intercepts are statistically different.

Which slopes are significantly different?

Looking at the estimated slopes, you might guess that conquest and cultural would be significantly different. You would be wrong. There simply weren't enough conquest and cultural victories in the data to pin down the slopes precisely enough to say that they're statistically significantly different.

What type of victories were there alot of in the data? Domination and Space.

Space victories lose significantly more points per turn than cultural, diplomatic, or domination victories.

None of the other pairs of slopes were significantly different.

If there are any other statistical models any of you would like me to check, let me know. If they're as easy to estimate as this one, I'd be happy to do it.

ShannonCT · Oct 19, 2006

StevenJoyce said:
The time intercept is statistically significantly lower than all of the other estimated intercepts.

Why is there even a time victory intercept in the table? If all of the time victory data points have the same value for for the independent variable (t=430), how can OLS predict Log(score) for t=0?

StevenJoyce · Oct 19, 2006

ShannonCT said:
Why is there even a time victory intercept in the table? If all of the time victory data points have the same value for for the independent variable (t=430), how can OLS predict Log(score) for t=0?

When the only explanatory variable is an intercept term, the OLS estimate is a horizontal line passing through the mean of the dependent variable (log(score) in this case). Obviously, if we interpret the model literally, this is simply silly (it suggests that the score for a time victory is independent of the finish date, and in particular, that a time victory at t=0 would have the same expected score as a time victory at t=430). But there's a better interpretation of the time victory intercept coefficient -- its the mean log(score) for time victories.

Alternatively, I could force the time victory intercept to be zero, and estimate a slope for time victories. The key point is that because there is no variation in the dates of time victories, it's impossible to estimate both a slope and an intercept.

[pre]
Variable | Coefficient Std. Error t-stat
Time victory intercept | 0
Time victory slope | .019737 .0000532 371.07
[/pre]

The other coefficients remain the same as in my previous post. Ovbiously, this slope is significantly different than the slope for the other victory conditions -- it's positive. It has to have a positive slope in order to fit the positive log(score) at t=430 and the forced zero log(score) at t=0.

I prefer the first version of the model for the following reason. For time victories, the only meaningful information in the data is the mean log(score). That value is calculated as the intercept term. The estimated slope with a forced zero intercept contains the same information (the mean log(score) equals 0+slope*430), but reporting it as a slope obscures the mean log(score).

ShannonCT · Oct 20, 2006

StevenJoyce said:
When the only explanatory variable is an intercept term, the OLS estimate is a horizontal line passing through the mean of the dependent variable (log(score) in this case). Obviously, if we interpret the model literally, this is simply silly (it suggests that the score for a time victory is independent of the finish date, and in particular, that a time victory at t=0 would have the same expected score as a time victory at t=430). But there's a better interpretation of the time victory intercept coefficient -- its the mean log(score) for time victories.

So that value doesn't really belong in the same table with the other intercepts. It makes it appear that time victories are inherently worse than other victory types - they are probably only worse because they take so long.

We cant say that the difference in the intercepts for time victories and other victories is statistically significant because they are expected values for at different values of t. You could however compare the expected Log(score) at t=430 for each victory condition and test for a significant difference. I cant see any other way to draw valid conclusions about time victories.

StevenJoyce · Oct 20, 2006

ShannonCT said:
So that value doesn't really belong in the same table with the other intercepts. It makes it appear that time victories are inherently worse than other victory types - they are probably only worse because they take so long.

We cant say that the difference in the intercepts for time victories and other victories is statistically significant because they are expected values for at different values of t. You could however compare the expected Log(score) at t=430 for each victory condition and test for a significant difference. I cant see any other way to draw valid conclusions about time victories.

Good point. In the following table I report the same regression lines as in my first table, but with the turns recentered so that newturn = turn - 430. This means that the "intercept" is now the predicted score at newturn=0 (turn=430).

Time victories still have the lowest intercept, but now its intercept is statistically significantly lower than only domination and diplomatic victories, and its intercept is not significantly different from the intercept for conquest, cultural, or space victories. All of the other hypotheses tests remain the same as in my first post.

[pre]
-----------------------------------------------------
Robust
Coef. Std. Err. t
-----------------------------------------------------
Conquest intercept | 8.536224 .0923983 92.39
Cultural intercept | 8.504912 .1384188 61.44
Diplomatic intercept | 8.706938 .0521258 167.04
Domination intercept | 8.909419 .0228844 389.32
Space intercept | 8.525793 .0193514 440.58
Time intercept | 8.486896 .0228713 371.07

Conquest slope | -.0128118 .0015442 -8.30
Cultural slope | -.0090372 .0012823 -7.05
Diplomatic slope | -.0103058 .0006411 -16.08
Domination slope | -.0105715 .0002136 -49.48
Space slope | -.0117623 .0003013 -39.04
[/pre]

bshumbera · Oct 21, 2006

I wonder if data tables exist for the other GOTMs. With all of that data, it should be possible to reduce the error in the analysis and might show whether there is a difference between victory types.

AlanH · Oct 21, 2006

bshumbera said:
I wonder if data tables exist for the other GOTMs. With all of that data, it should be possible to reduce the error in the analysis and might show whether there is a difference between victory types.

All the published game results exist on the results pages. A quick copy/paste into Excel will get you close to 10,000 records.

ShannonCT · Oct 21, 2006

bshumbera said:
I wonder if data tables exist for the other GOTMs. With all of that data, it should be possible to reduce the error in the analysis and might show whether there is a difference between victory types.

You could do regression analysis on each game, but you probably shouldn't mix the data, as scores are affected too much by map type and difficulty. Mixing the data would probably result in higher standard errors.

DaveMcW · Oct 22, 2006

Very interesting. After re-reading this thread for half an hour I think I understand what you are talking about.

Would you mind drawing the lines on top of AlanH's graph to make it 100x easier to understand?

AlanH · Oct 22, 2006

ShannonCT said:
You could do regression analysis on each game, but you probably shouldn't mix the data, as scores are affected too much by map type and difficulty. Mixing the data would probably result in higher standard errors.

Note also that the GOTMs have been played in different versions of the game software, which mapped dates and turns differently, and with different game speeds.

StevenJoyce · Oct 23, 2006

The following zip file contains four files that may be of interest.

1. A csv file with the data from the GOTM1 to GOTM8 results pages. I won't have time to do any analysis on this for at least a week, and possibly longer. So if anyone else wants to have a go at it, feel free.

2. A pdf file showing a scatterplot of the log(scores) against the final turn # for gotm1, along with the estimated regression lines.

3. Two pdf files showing the 95% confidence intervals for some of the victory conditions (cultural versus space and cultural versus conquest). I couldn't combine all of the confidence intervals into one graph due to clutter.

da_Vinci · Oct 24, 2006

ShannonCT said:
"So that value doesn't really belong in the same table with the other intercepts. It makes it appear that time victories are inherently worse than other victory types - they are probably only worse because they take so long."

I would think that domination victories tend to score higher for a given turn because they require specific land and population thresholds, and these factors (especially pop) are highly weighted in score.

This also seems to reduce the dispersion in domination scores in AlanH's scatter plot of GOTM 1. Since correlation is a blend of slope and colinearity (how close to the line are the points), I would expect that the correlation for domination is the highest.

The conquest, diplomatic, and cultural results look (from the plot) like there might be some high influence points: wondering if StevenJoyce did any checks for this.

Also, I wonder if conquest and diplomatic might be more of a spline pattern, with a more zero slope from turns 200 to 300 (if DaveMcW is removed as an outlier) and then a more negative slope from 300 onward.

@StevenJoyce: curious what stats package you are using?

dV

StevenJoyce · Oct 24, 2006

1. I used Stata 9.2 SE for this analysis. If I were to do a spline rather than OLS, I'd switch to Matlab, because I like its spline package better.

2. I didn't check for influential observations until da_Vinci suggested it. For each observation, I calculated how much each estimated coefficient would change if I deleted the observation, and divided this change by the coefficient's standard error. This reveals that ALL of the variables have an observation that changes the estimated coefficient by more than 0.2 standard deviations, and most have observations that change the estimated coefficient by about 0.7 standard deviations. With our sample size, this suggests that these results are highly influenced by outliers. Not much can be done about this, without either (a) a better model (something other than OLS or OLS with different explanatory variables) or (b) more data.

How scoring works, with graphs!

Chieftain

Deity

Warlock

Mac addict, php monkey

Warlord

Mac addict, php monkey

Prodigal Staffer

Chieftain

Deity

Chieftain

Deity

Chieftain

Warlord

Mac addict, php monkey

Deity

Deity

Mac addict, php monkey

Chieftain

Attachments

Gypsy Prince

Chieftain

Similar threads