GOTM 2.0 Brainstorming

My basic problem with the proposal is the requirement for players to decide at the outset whether or not to compete for a rating. It sounds to me like a way of splitting the community. Less-than-dedicated players will just download the non-rated save because (a) it's less hassle, (b) just by downloading it they may jeopardise their rating if they cant find time to play, and/or (c) they won't understand the whole concept. I would feel far more receptive to this suggestion if we could remove this requirement.

I understand that you want to treat a no-show as a loss, but I really don't think it adds useful information to a skill level assessment in this player group. Players may not submit for any number of reasons. Most of the reasons probably relate to their real lives or personal priorities. I suspect that few are skill related.

On a different issue, you state, but haven't explained for me, that the AI is a competitor and has to be measured. Why isn't the AI just part of the game environment? Like the golf course; or, in chess, the sum of (chess rules + tournament structure + whatever else affects player performances outside of their own skill sets)?

Each xOTM represents a number of matches between all players who shared the same VC goal. We don't know which VC the AI was targeting during their games, and we don't know which VC a lost submission was targeting, if any. We *do* know the relative performances of all the players who won by Conquest, for example. Each player in that set won against those with later dates, and lost against those with earlier dates. Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier. Players who didn't submit lost to everyone who did. Why doesn't this data set allow us to rate players using Elo without measuring the AI?
 
AlanH,

That's pretty much exactly what I suggested yesterday. I did a quick sample of it just now. Let's assume that 6 players competed in quick-speed culture in one xOTM (this is totally random):

Player A CurrentRating=2356 FinishDate=1485
Player B CurrentRating=1644 FinishDate=1605
Player C CurrentRating=1500 FinishDate=1615
Player D CurrentRating=1981 FinishDate=1780
Player E CurrentRating=1592 FinishDate=1898
Player F CurrentRating=1171 FinishDate=1902

Assign draws between B & C and between E & F, since they effectively are.

"A" "B" "C" "D" "E" "F"

"A" / 1.0 1.0 1.0 1.0 1.0 5.0
0.9837 0.9928 0.8965 0.9878 0.9989 4.8597

"B" 0.0 / 0.5 1.0 1.0 1.0 3.5
0.0163 0.6960 0.1257 0.5743 0.9384 2.3507

"C" 0.0 0.5 / 1.0 1.0 1.0 3.5
0.0072 0.3040 0.0590 0.3706 0.8692 1.6100

"D" 0.0 0.0 0.0 / 1.0 1.0 2.0
0.1035 0.8743 0.9410 0.9037 0.9906 3.7218

"E" 0.0 0.0 0.0 0.0 / 0.5 0.5
0.0122 0.4257 0.6294 0.0963 0.9186 2.0822

"F" 0.0 0.0 0.0 0.0 0.5 / 0.5
0.0011 0.0616 0.1308 0.0094 0.0814 0.2843

The resultant ratings are (with a factor of 32):

Player A 2356+32(5.0-4.8597)=2361
Player B 1644+32(3.5-2.3507)=1681
Player C 1500+32(3.5-1.6100)=1561
Player D 1981+32(2.0-3.7218)=1926
Player E 1592+32(0.5-2.0822)=1541
Player F 1171+32(0.5-0.2843)=1178

Now, if we assume that these are a representative sample, and that the actual number of competitors in the xOTM was 26, we could assume that the ratings fluctuations would actually be fivefold (this is very crude, of course):

Player A 2381 (+25)
Player B 1829 (+185)
Player C 1805 (+305)
Player D 1706 (-275)
Player E 1337 (-255)
Player F 1206 (+35)

You'll notice that "A" rightfully remains way on top, but increases only marginally, since he just beat up on clearly weaker players. "B" and "C" (who's a newbie with an entry rating of 1500) get normalized, and jump ahead of the inflated "D". "E" and "F" get much closer. All reflecting their relative strength almost exactly.

Most chess websites consider 50 games or so a requirement for normalization. Playing the same VC in just 4 xOTM gives you potentially over 100 matches as a sample. I would think it should work really well, and normalize within 3-5 xOTM per VC (most of the dedicated players should already have more than that, if you backwards calculate from GOTM 1). Also, it wouldn't be dependant on frequency of participation, and, just like in chess, could totally accommodate multi-tier competition. Also, it wouldn't make any sense to count non-submittals, since the only effect of it on weak players ratings would be a very marginal decrease, while it would unfairly hurt good players for not having time to complete a game they can obviously win.
 
Great thread! It took me a long time to get through it all, but here are some responses to the posts I found most intriguing:

My favorite part of the XotMs is comparing games, I love things like the map showing where everyone settled, so if possible I would love to have some statistics collected. For instance things like -

Date you got your 3rd, 6th, or 9th cities (if you did).
Date key techs were reached (Liberalism, Astronomy, Civil Service, Biology, etc.)
Date of First Great General (or just great person?).
Total number of great people generated.

That is all I could think of right now but I'm sure there are other things that would give people a good idea of things they can look at to improve their game.
Yes, I'd love to see some basic statistics like this collected for each game, even if it were only up to 1AD or 500 AD, or only for a very limited number of stats. Data on turns/dates of the first 3-5 cities and especially Key tech dates would be a feast in itself.

I'd like to see an award for the most informative/entertaining spoiler. Perhaps determined by a vote amongst everyone who submitted a completed game.
We really ought to do this, at least for the "entertaining" category. It would only require a poll and it might motivate more people to try a little harder. There have been some truly deserving posts lately, and even "nominations" from the academy, er, staff, could serve as a kind of reward/recognition in itself.

As noted before in SGOTM...ADVANCED STARTS!

After being Lord of the files and mud huts for the 1000th time, one wishes he, at least, started a little closer to indoor plumbing.
Well said. :lol: Indoor plumbing is good... and tanks can be fun every now and then. (You know, the kind that blow things up instead of just flush...)

Here's a not-too-specific design idea for the new geniuses (in particular Erkon):

Try making a game that has a second bump in the road. Something that throws even the experts for a loop just when they thought they had the game under control.
Ask Leif to send you an Immortal version of WOTM 22. (Is it after midnight yet? :shifty:) Moderator Action: No! Spoiler deleted.

Advanced Start
Different era starts, not just ancient
Allowing to pick one of 2-3 leaders you deem most suitable for the VC in mind
Both great ideas. Don't know how hard they would be to implement. I've heard some people claim they've had girlfriends who were more flexible and eaiser to manage than WB. :p

Or offer a NOOB-Rescue challenge. When game difficulty is at Emperor or above, anyone using the adventurer save that is still alive at 500AD may posts their 500AD save file. Anyone who has already submitted the game can take that save and try to salvage it.
I love the basic concept presented here. Maybe it could be a separate/side competition? Great idea though. :goodjob:

I would like to see more summaries of player actions along the lines of the map of start positions. I don't know what is easily extracted from the saves, but a chart of wonders built (with date) by player would be interesting, as would a chart of civics by date for each player, a chart of war declarations/peace by the player, etc. Some of this can be gleaned by going through the log files on the results page, but I think having it for all players in a chart form would be potentially very interesting.
Wow, this would be great if someone could create a program to extract this kind of information automatically (even just the tech dates as a pilot program). Never mind posting it in a table on CivFanatics -- just put it in a downloadable spreadsheet and make it available to anyone who wants it. It would be a regular part of my monthly download diet for sure.

Okay. Here's an off-the-wall idea: How about two-member teams? You get to team up with someone and play together (not exactly succession, just sharing ideas by pm, either one playing turns whenever). EDIT: BUt no replays, etc., of course. Only one can play a certain set of turns.
Sounds great, but you get Erkon and I get Klarius, okay? ;)
(Sorry Erkon, hate to saddle you with LC, but I couldn't resist that! :lol:)

Here's a suggestion based on some of the discussion. It seems that a lot of players download and play, but don't finish games. That says that something like the quick start challenge would be of interest to a lot of players. Play the game out to 0 AD or something, then submit, and give awards based on various metrics. You could give the "jesusin" to the player with the most culture, the "erkon" to the player who razes the most cities, etc.
Great award names, but the QSC is the best idea. Somthing like this would be great and ought to be possible, no?
 
Ask Leif to send you an Immortal version of WOTM 22. (Is it after midnight yet? :shifty:) Moderator Action: No! Spoiler deleted.
Whoops. :blush: Apologies to the staff. :sad: I really thought it was after midnight (the time stamp on the post says 12:41 AM), but I'll be sure to give it a full 24 hours next time to avoid time zone issues. :wallbash:
 
The earth rotates on its axis, so there are at least 24 different midnights each day around the world, and yours isn't special. We wait until midnight on the International Date Line to bring down the curtain on each game.

Just check the GOTM Home Page if you want to know whether a game is still running.
 
Each xOTM represents a number of matches between all players who shared the same VC goal. We don't know which VC the AI was targeting during their games, and we don't know which VC a lost submission was targeting, if any. We *do* know the relative performances of all the players who won by Conquest, for example. Each player in that set won against those with later dates, and lost against those with earlier dates. Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier. Players who didn't submit lost to everyone who did. Why doesn't this data set allow us to rate players using Elo without measuring the AI?

Agree that "Each player in that set won against those with later dates, and lost against those with earlier dates".

(Addendum: Now, for score ratings, each player in the set won against those lower scores, and lost against those with higher scores?

Hmm ... maybe a player has to specify in advance if they are going for a score rating or a speed rating in that game? Since performance on speed and score may be mutually exclusive for some VC? Or maybe they specify in the submission if they want it rated for speed or score? End addendum)


Not so sure we can operationalize "Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier." because as you say, "we don't know which VC a lost submission was targeting, if any." So if the ratings for speed are to be stratified by VC, we don't know which stratum should get the losses/retirements.

Same issue applies to the idea that "Players who didn't submit lost to everyone who did."

That is, unless the rated games always have a specified target victory condition. Then you would need VC specific ratings for both speed and score.

The two questions are 1) does the AI have to be rated, and 2) do losses/retires/non-submissions have to be rated?

And a potential set of principles to try to adhere to might be

1) "a ratings system with minimal requried modification of the current system". Or minimal modification to some (imminent?) revised system

2) "Perfect is the enemy of good"

I think that if we can generate a system with controlled rating inflation, based on just rating wins (which are either wins or losses depending on relative performance) and not rating AI, that would be the way to go.

The fact that CIV has more than a dichotomous outcome (which makes it different from chess) should allow us to do this ... we have a contiuous ourcome (score) and a time to event outcome (speed).

dV
 
Agree that "Each player in that set won against those with later dates, and lost against those with earlier dates".

(Addendum: Now, for score ratings, each player in the set won against those lower scores, and lost against those with higher scores?

Hmm ... maybe a player has to specify in advance if they are going for a score rating or a speed rating in that game? Since performance on speed and score may be mutually exclusive for some VC? Or maybe they specify in the submission if they want it rated for speed or score? End addendum)


Not so sure we can operationalize "Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier." because as you say, "we don't know which VC a lost submission was targeting, if any." So if the ratings for speed are to be stratified by VC, we don't know which stratum should get the losses/retirements.

Same issue applies to the idea that "Players who didn't submit lost to everyone who did."

That is, unless the rated games always have a specified target victory condition. Then you would need VC specific ratings for both speed and score.

dV
Can't we normalise the separate ratings of the players within each VC so that the means and standard deviations for all VCs are the same, eliminating VC as a variable? Does this resolve the issue of not knowing which VC a loser was targeting?

Re. the score vs. speed issue. Maybe we work out a rating based on each, and take the higher of the two as the skill indicator for that player?

I'm still not sure what adding a ratings system will do for the greater enjoyment of the GOTM playing community ....
 
Can't we normalise the separate ratings of the players within each VC so that the means and standard deviations for all VCs are the same, eliminating VC as a variable? Does this resolve the issue of not knowing which VC a loser was targeting?
I think we could normalize ratings so that there could be a single speed rating across the VCs. But if determining performance in a game is based on knowing who a player beat and did not beat with a 1900 space win, which loser/retiree/non-submitters did he beat? I suppose all of them? Regardless of what VC they were pursuing?

Re. the score vs. speed issue. Maybe we work out a rating based on each, and take the higher of the two as the skill indicator for that player?
I think you would need to keep these separate ... you would not use someone's higher speed rating to calculate the score performance of players who scored higher or lower than that index person, would you?

dV
 
I think we could normalize ratings so that there could be a single speed rating across the VCs. But if determining performance in a game is based on knowing who a player beat and did not beat with a 1900 space win, which loser/retiree/non-submitters did he beat? I suppose all of them? Regardless of what VC they were pursuing?
You missed my badly expressed point. If the results for each VC are processed separately to generate player ratings within that VC, and those separate results sets are each normalised to the desired common mean and standard deviation, then the ratings can be merged into one set. I'm not certain of the maths - does the sum of two distributions with the same means and SDs still have the same mean and SD? I think so, but if not it's just a matter of renormalising the SD of the combined set.

I think you would need to keep these separate ... you would not use someone's higher speed rating to calculate the score performance of players who scored higher or lower than that index person, would you?

dV
Not sure, but keeping them separate is no different from our current tables of separate speed and score rankings. I was just suggesting an equivalent to our current "combined ranking" which would not recompute the entire data set, but just tabulate the better of each player's two ratings as a way of comparing the relative skills of a high scorer and a fast finisher.
 
You missed my badly expressed point. If the results for each VC are processed separately to generate player ratings within that VC, and those separate results sets are each normalised to the desired common mean and standard deviation, then the ratings can be merged into one set. I'm not certain of the maths - does the sum of two distributions with the same means and SDs still have the same mean and SD? I think so, but if not it's just a matter of renormalising the SD of the combined set.


Not sure, but keeping them separate is no different from our current tables of separate speed and score rankings. I was just suggesting an equivalent to our current "combined ranking" which would not recompute the entire data set, but just tabulate the better of each player's two ratings as a way of comparing the relative skills of a high scorer and a fast finisher.
Sounds like you mean keeping one game performance assessment separate by VC for speed, and score separate from speed ... but eventually normalizing and/or combining them for some overall rating. That sounds right ...

The sets for rating a given game's performance have to be (I think) each VC for speed, but only one for score (to compete in score, pick the VC you can score best in). I still wonder if we need to have players select whether they want a result ranked for speed or for score, since a high score game may not be fast, and a fast game may be low score for peaceful VCs.

Once your rating is recalculated within a set, then we could have ways to normalize across the sets.

On the other hand, maybe it makes perfect sense to keep them separate ... one may be a high rated diplo player but a lousy dominator. A bit like gymnastics individual event medals and the all around medal, maybe both separate and normalized?

If I recall correctly, the variance of a sum is the sum of the variances, but that is for adding estimating statistics derrived from the samples.

I think you are asking what is the variance of a new sample that is the union of two known samples, in terms of the variance of the known samples. I would assume that is highly dependent on the degree to which the known samples overlap. So any formula would have to invoke the difference of the means and of the variances, I would suspect.

For examples, pooling two basketball teams might give a variance similar to that of each team individually. Pooling a basketball team with a group of jockeys would be another matter ... (of course that would be bimodal, but you get the point ...)

If we assume the two sets have same mean and SD, seems logical that the union would have same mean and SD, but I can't quote you a formula.

dV
 
I'm still not sure what adding a ratings system will do for the greater enjoyment of the GOTM playing community ....
I am still a bit unclear about the answer to this question as well? :hmm:
 
A related question would be : how many players find global rankings to be interesting at all ? I myself am completely award-oriented, and I suspect it's that way for a lot of players.
 
I haven't really followed this thread (was away for most of it) but I just have to say that I will probably go to my grave never understanding this obsession with speed, and why even in the current ranking display we have a ranking based on how many minutes you spent with the game running. I pay so little attention to those rankings that I just leave my game running during dinner, when I get a phone call, or even overnight on a weekend. When I used to play when travelling a lot, using a laptop, I think I would just suspend the machine with the game running for days at a time.

Do people really save and exit their game every time they get a phone call? Those of you that play chess (another turn based game) really think that the world chess federation ranking system is incomplete because it doesn't keep track of how much of the chess clock the player uses? Shuld I be deemed to have played a "better" chess game than an opponent that beat me, head-to-head, because I made my moves much quicker than the relative chess piece margin that I lost by? Really? Can't you just go play any other 3x game on the marekt in the last 10 years (e.g. Age of Empires) if you want emphasis on how fast you play. Why this compulsion to use the score/ranking system to compel people feel they have to play turn-based civ like an RTS when there is no logical reason to in the design of the game itself??? I don't get it.

I was one of those arguing [unsuccessfully, a year or two ago] for the addition of real-time based awards -- the reasoning was to provide an award where micro-management is a trade-off rather than a necessity in order to compete, so that those of us who do not have 10-20 hours to spend on the game (but only maybe 5 hours) are not instantly ruled out of contention in all categories.
 
A related question would be : how many players find global rankings to be interesting at all ? I myself am completely award-oriented, and I suspect it's that way for a lot of players.

I'm not in the slightest bit interested in them -- to encourage people to submit defeats, they always penalise missing more than, say, two months per year. That means it always ranks regular poor players far above infrequent skilled players. The rank-order for the games you have submitted is much more interesting if you are an irregular player, as the global ranking devolves to measuring your playing frequency over time, whereas the rank order in the individual games shows how well you are doing in them.
 
I'm not in the slightest bit interested in them -- to encourage people to submit defeats, they always penalise missing more than, say, two months per year. That means it always ranks regular poor players far above infrequent skilled players. The rank-order for the games you have submitted is much more interesting if you are an irregular player, as the global ranking devolves to measuring your playing frequency over time, whereas the rank order in the individual games shows how well you are doing in them.
agreed
I'd like to see the score position (Xth out of Y players) and the ranking in the VC (Nth date out of Mth winning the Z victory condition) in the tables you get to see on the global ranking pages
 
The Global Rankings page has a lot of options. Maybe you haven't checked them all out. You can select any game as the "most recent game" - the one in the left hand column. And you can choose to sort by the rankings for that game only.

You may also find what you are looking for on the Results pages. Select the game of interest, and then sort the results. In the Sort menu, select Date first, then Victory Condition, and you will see the results separated by VC, with each VC sorted by date. Select Score first and then VC to change the second sort to score.
 
The Global Rankings page has a lot of options. Maybe you haven't checked them all out. You can select any game as the "most recent game" - the one in the left hand column. And you can choose to sort by the rankings for that game only.

You may also find what you are looking for on the Results pages. Select the game of interest, and then sort the results. In the Sort menu, select Date first, then Victory Condition, and you will see the results separated by VC, with each VC sorted by date. Select Score first and then VC to change the second sort to score.

Ok, I knew that (and used it to check my own history).
But wouldn't be great to have for everyone an history of those things without much looking around?
I mean you can have a 12 month table with the points you made, but not the rankings.
 
I've mentioned previously, either here or in one of the other discussions around this subject, that I hope to provide a "Record of Achievement" page that would list one player's medals, awards, all-time highs (and lows), batting averages etc. That would also list all games you have played, and the score & date positions achieved in those games. Would that meet your requirements?
 
I've mentioned previously, either here or in one of the other discussions around this subject, that I hope to provide a "Record of Achievement" page that would list one player's medals, awards, all-time highs (and lows), batting averages etc. That would also list all games you have played, and the score & date positions achieved in those games. Would that meet your requirements?
:goodjob:
more than what I asked for!
 
Back
Top Bottom