GOTM18 Statistics

Karasu

Wanderer
GOTM Staff
Joined
Jun 28, 2002
Messages
2,852
Location
Only for a while
Here are some game summary plots for you to use as you look back at the Gotm18-Celts results summary and look forward to what you might try to do in Gotm20-Spain.

After only three months of presenting data to you in this format we start seeing some trends and some interesting changes.
What do you think?

hordiva.gif


Here is a pie chart showing the distribution of chosen victory conditions:

gotm18_Finishes1.jpg


With an interesting view of how the distribution of those Victory Conditions compares between Gotm18, Gotm17, and Gotm16.

gotm18_FinishesHistory.jpg


hordiva.gif


The distribution of final scores also continues to show how well the Jason scoring system has closed the gap between the well played games of all victory categories and those games in the outstanding high scoring categories that are also well played.

gotm18_Scores1.jpg


The distribution of scores grouped by choice of victory condition continues to show no significant overall scoring advantage for one victory condition compared to other similarly played games of other victory conditions.

gotm18_ScoresHistory.jpg


... and just an added perspective of victory dates:


gotm18_Dates.jpg


hordiva.gif


We continue to track a number of behavior statistics as a means of helping us to better serve the player community while validating and testing all the game submissions to make sure that the game results accurately represent player performance and the characteristics of the game that was played.

"Dulcis in fundo", the Turns Per Session stat in the last games.

No big changes here...

gotm18_TpSHistory1.jpg


As an extra, and just to show Creepster's titanic effort to coordinate the submission process and how Aeson coordinates the final socring process to still assure that you get game results in a timely manner: give a look at how submissions come in during the month.

gotm18_Submissions1.jpg
 
Great job done here!
We see around 50% players use the Space Ship victory.
It would be interesting to have the % of the distribution of the Jason score.


LeSphinx
 
FABULOUS! You're really putting in a lot of effort. Thank you.:goodjob:
 
Karasu: Have you run any statistical tests on this data? Have you run it through minitab or the like? I would be interested in seeing the mean and the median of the scores as well as some sort of a regression plot showing turns submitted versus score. With the histogram of scores above, you probably have a mean and a median already calculated....
 
Here you go.



stats.jpg


Edit: I can't do the regression plot as I don't have the turns/reload data. :(

The full data set was used and so includes wins and losses. If the wins and losses were seperated into seperate distributions then the two data sets may exibit more normal behaviour. (Technically I suspect that the data set is bi-modal).

stats2.jpg


Yeah. Looks like people who won the game are reasonably normal ;)

Edit 2: Re-read your post and I now understand what you wanted. Again this is with victories and losses grouped together.
Quite enlightening as regards fairness of Jason score. Large pinch of salt required due to different best dates of course. The wins and losses are clearly defined though.

Observation order is from 1 = highest Jason Score to 160 = lowest Jason score. The Turn number is used as the predictor in the linear regression.

stats4.jpg


stats3.jpg
 
VERY VERY nice. I suspected a normal curve (minus losses). I don't know a lot about statistics. Most of what I know can fit in the standard descriptive statistical data that you posted.

I find it amazing that the curve is SO normal (I mean P=.4xx is about as normal as random data can get, IMHO) for winners. What would be interesting is to see if we can move the curve up over time. IOW, are players getting better over time, or will the curve remain the same. Or will we see something more bimodal as new players have a hump of lower scores and older/more experienced players have a higher hump. Then you could do some interesting prediction modeling (if you are into that kind of stuff :) ). All of it just in fun, of course.

I worded the regression plot statement poorly. I was wondering if jason scores could be plotted by number of GOTM submissions. If that data is not on the same spreadsheet/minitab file, doing such a thing would be too much work, but if the data is already there, because of the new ranking system (based on number of GOTM played) it would be interesting to see how experience affects score. I would think to see a loose positive correlation. That sort of data may not be available however.
 
Good questions Nightfa11. I wonder what the coorelation of number of GOTMs played versus score would look like. That might be an indicator of experience and the resulting curves would be interesting to compare. For example, using Philip Martins Buddist classification to sort average scores.
 
Nightfall,

We monitor some of what you are talking about qualitatively a bit more than quantitatively.

The general perception of process is that players enter at different inherent skills levels and most players have a one or two game finding their bearings process that they go through. Some players dive right in while other players lurk for a number of games before jumping in. We know that the number of players who download and start the gotm game is roughly 3 to 4 times higher than the ending submissions just based on some bandwidth and dowload tracking. (we awill be getting better at this).

After players get their initial bearings as to what the process of participating in a GOTM is like, then there is a critical game count number between 1 and 3 where players sort of find where they might be in the players matrix and find what it is that they enjoy about the games. Players participate for different reasons and this can effect the performance response but that is just fine if they find their comfort and participation zone.

The top end of the scoring distribution tends to be populated by players who can and will succeed under almost any circumstances. The mid range shows players who surge to the top and then who might appear back towards the middle depending on the skills match that the current game represents for the skills and personality of that player. We can look at the differences between performance in Gotm15-Russia and Gotm16-Rome as well as the differences between Gotm17-Carthage and see some elements of how different approaches may be required for different map types.

Looking at a sequence of games by individual players, we can see strong improvement in players who have developed an understanding of the discussion processes that lets them begin to really shine as individuals. This seems to occur at different rates but happens in the 3 to 6 games range where players who where new suddenly morph into players who demonstrate some very powerful control over their game experiences that really makes it look like they are having a lot more opportunities for both challenge and fun. Some recent good examples of these adaptive responses are players like Borealis, Renata, and ControlFreak (just to name a few but there are many others).

If you want to look at the big picture of what is going on in a similar way that the staff have been looking at things you will see that Phil_Martin is looking to see if we are doing things in the overall process that make it possible for more players to get across the 2 to 3 game threshhold and then again up to the 6 or 7 game threshhold where the start to become self actualized in whatever play modes they may personally choose.

Karasu is slicing across the process from a slightly different direction to see if the mix of games that we present generate individually valid statistical distributions of scores and play statistics. The big picture views of what the games look like over a sequence of months helps us to determine if the games present variety and challenge across the entire membership. The views that Karasu is generating from the data help us to see if the player responses to the scoring process and game conditions are supporting the overall goal of providing players lots of options for enjoyment and challenge while still keeping a common game reference.

Aeson also looks individually at the specific scoring responses as they correlate with map features that in the past have been ignored in almost all scoring systems.

I currently have been filling the "mapmaker" role and I have been tracking data like terrain type distributions and ratios of terrain to civs/rivals and traits as a means of understanding both the player performance response as well as the in-game AI performance. You can begin to see some of these features when you start to look at what civs have been placed in the recent games and how they have performed in various map positions and under the selected conditions of the game.

Creepster is also on the front end of the process looking at incoming submissions and how the submissions profile (number, date, who) may correlate with game performance statistics and the long term retention and participation patterns.

All of the these effeorts are really an attempt to understand who WE are as players and how to make sure that the process stays fresh while supporting the ongoing needs of the player community.
 
Wow...cracker writes a book :D. I'll be interested in the stuff Phillip Martin is working on. It sounds like what I was talking about.

My play style does not lend itself to increasing scores month after month. I won't get into it here as that would be threadjacking. However, that's why I would be interested in seeing how scores increase as #GOTM submitted increases to see if there are ppl playing roughly the same way I play.
 
Originally posted by Nightfa11
I don't know a lot about statistics. Most of what I know can fit in the standard descriptive statistical data that you posted.

I find it amazing that the curve is SO normal (I mean P=.4xx is about as normal as random data can get.

Same here, I just happen to have been on a Minitab course :)

The p value is the highest I can recall for a RL dataset, and I agree its pretty incredible. Still, it's only a bit of fun as you say and doesn't mean much as was carefully implied by a previous poster ;)

I can't do the other stuff you asked for quickly because getting the data from a web page into Minitab is a pain and sometimes I have to actually do stuff at work. Such is life. :)
 
Guys, you really blew me away! :thumbsup:

I'll see what I can do to set up the data in a better fashion and present it in more informative ways.

Mad-bax + Nightfa11, if you have more suggestions (in a bundle with some explanations for a limited mind...), go ahead and post them (or PM me). We can spend some time working together on that, provided it's when I am at the office... ;)
 
Karasu:

IMHO there is very little you could do to improve the presentation of the data you give. It is exceptionally well delivered and covers everything that people really want to know. You certainly don't need advice from someone like me!

But, since you asked (and I can't resist) here goes... (deep breath)

Giving averages and upper and lower quartiles is easy to do and does give people a perspective of their success in a particular game in terms of score, rather than just ranking.

You could publish trends, such as how average Jason score varies from game to game. But this could be a little difficult to do properly because each game is so different and the data would need to be "washed" so that you compare apples with apples. You can do it though because of the way the Jason formula works.

You could (just for fun) do some predictive modelling to take a guess at what the best actual score will be for each victory condition for the next game.

And there are things you can do just for fun like contour/surface plots against victory condition so people can see which victory condition gave (on average) the best score for a given game.

The problem with all of this is that the statistics would be dubious wheras what you do already is not.

In short, I think what you do is great, and for geeks like me really interesting (Yes I am aware I need proffesional help). I would dearly like to help you out in any way I can, and the only thing I can suggest is that you give me access to the data you have and let me play with it. I'll give you some examples and if you like any of them then we can go from there.

This post is way too long. Sorry.
 
Originally posted by mad-bax


Edit 2: Re-read your post and I now understand what you wanted. Again this is with victories and losses grouped together.
Quite enlightening as regards fairness of Jason score. Large pinch of salt required due to different best dates of course. The wins and losses are clearly defined though.

Observation order is from 1 = highest Jason Score to 160 = lowest Jason score. The Turn number is used as the predictor in the linear regression.

stats4.jpg


stats3.jpg
[/B]

These graphs look great, but what are "residuals"? I don't understand it but I would like too.
 
Offa: this is just a bit of fun on my part. Linear regression basically tries to fit a straight line through the data by using a least squares fitting technique. Since the data does not fit exactly on that line each data point is a certain distance from it. The redidual is a measure of how far a particular data point is from the line of best fit.

This particular regression is a joke, since if I were to fit the data using the curve represented by the Jason system all the residuals would be zero. I artificially generated variation by not considering the Firaxis score, and not considering that different best dates for different victory conditions. Finally, the Jason sytem is not a linear function, which is why the residuals plot in a S shape.

Having said that it does highlight one or two things. You will notice that the residuals are small in the centre of the scoring range, but large at the two extremes. This is indicative (only) that the Firaxis score has a larger impact on the higher and the lower scoring games than it does on the average game. IIRC Aeson has already indicated as much himself.
I hope (without being too nerdy) that this has answered the question, if not just PM me and I can send you some electronic literature (written by someone who knows what he's talking about) that will be technically accurate.
 
Mad-Bax@ Interesting residuals for the fitted value. Since there were no losses above 4000pts the two "groups" seen must all be winners. What is your guess as to what set the lower branch "deviants" apart from the "mainstream". I have two guesses.

The first (and most likely) is the people that managed to win while only taking the starting continent. I won diplomatically, eliminating the celts and carthage and never setting foot in a boat.

The second is possibly conquesters that razed as they went, but I wouldn't think there were that many of them.
 
Guys, it looks like there are hundreds of statistically-inclined (and skilled) people in this forum.

Why don't you put your wisdom together and write a treaty on the application of statistics principles to CivilizationIII...
 
Stats was required for my engineering degree. I think there are a lot of engineers here.:scan: :crazyeye:
 
Originally posted by Karasu
Tu quoque...
mens rea:rolleyes:

EDIT: The best response I could find on latin translation sites. I'm not fluent like you.;)
 
Back
Top Bottom