Sorry for being silent for so long, but I have been busy with other stuff and the questions asked required some time

.
AlanH said:
My basic problem with the proposal is the requirement for players to decide at the outset whether or not to compete for a rating. It sounds to me like a way of splitting the community. Less-than-dedicated players will just download the non-rated save because (a) it's less hassle, (b) just by downloading it they may jeopardise their rating if they cant find time to play, and/or (c) they won't understand the whole concept. I would feel far more receptive to this suggestion if we could remove this requirement.
I'm not sure I can see any split here. All players would be playing the same save, only there should be some mark identifying the rated game. So all game discussions, spoilers etc. would be common and only then rating calculation would be reserved for those who want this. Where is the problem?
Removing the "registration" requirement could mean that players can decide not to submit a bad performance knowing that there is no penalty for doing so. Imagine that a chess player could decide after the game if he wanted to have it rated or not! Only way around this would be to add some model predicting if a non-submission is the due to a bad performance or due to other circumstances. This is obviously quite difficult to do with any accuracy but it might be possible to come up with a scheme that will take into account how frequent a player submits games. This should work such that players who seldom submit are not assumed to have lost the games they didn't submit. Any such scheme would reintroduce some bias towards awarding participation but it would probably be less than what it is with the existing speed rankings.
AlanH said:
I understand that you want to treat a no-show as a loss, but I really don't think it adds useful information to a skill level assessment in this player group. Players may not submit for any number of reasons. Most of the reasons probably relate to their real lives or personal priorities. I suspect that few are skill related.
Well actually this is exactly how the current rankings work. Since you get 0 points for not submitting and some points even for a retired or lost game a no-show is "rated" as a loss or even worse than a loss. And you don't even ask people if they ever intended to play the game. This probably explains why the current rankings have only a moderate correlation with playing skill.
AlanH said:
On a different issue, you state, but haven't explained for me, that the AI is a competitor and has to be measured. Why isn't the AI just part of the game environment? Like the golf course; or, in chess, the sum of (chess rules + tournament structure + whatever else affects player performances outside of their own skill sets)?
If we look at the chess analogy a single player Civ game should be compared to a game against a chess engine (AI). And chess engines do have a rating you can check it out here
computer chess ratings . Putting it differently, when you play a GOTM you play against the AI not against any other players. Your performance relative to other players can only be measured indirectly through the result obtained against the AI. So any game model must have some notion of AI skill level. Wether you choose to estimate the AI skill or not is another issue.
AlanH said:
Each xOTM represents a number of matches between all players who shared the same VC goal. We don't know which VC the AI was targeting during their games, and we don't know which VC a lost submission was targeting, if any. We *do* know the relative performances of all the players who won by Conquest, for example. Each player in that set won against those with later dates, and lost against those with earlier dates. Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier. Players who didn't submit lost to everyone who did. Why doesn't this data set allow us to rate players using Elo without measuring the AI?
You can do something like this, but there are certain drawbacks to it. One is that it's bad practise to adopt the update equations from a statistical model developed for an entirely different game. This way you have no idea of what the underlying implicit statistical model of the game is. More important is that you throw away a lot of information if you rate it equally when player A beats player B by 1 turn compared to by say 20 turns. Clearly it takes more skill to outperform someone by 20 turns while finishing 1 turn ahead means nothing considering the randomness of the game. Both these issues are dealt with in the rating system I have proposed. As I have stated earlier it is possible to derive the AI rating from old games and it's probably also possible to derive updated AI ratings even if retired games are made elligible for the submission bonus.
da_Vinci said:
Not so sure we can operationalize "Players who submitted a loss or a retirement lost to all the winners and either drew with all the other losers, or beat those who lost/retired earlier." because as you say, "we don't know which VC a lost submission was targeting, if any." So if the ratings for speed are to be stratified by VC, we don't know which stratum should get the losses/retirements.
Same issue applies to the idea that "Players who didn't submit lost to everyone who did."
This weakness is not present in my proposal because the AI has a rating so you know the strength of the opposition (i.e. the AI) and that's all you need to rate the game.
da_Vinci said:
The two questions are 1) does the AI have to be rated, and 2) do losses/retires/non-submissions have to be rated?
1) Yes, the AI should be rated - any attempt not to do so is simply trying to hide the fact that the AI have some playing skill 2) Yes, losses/retires/non-submissions must be rated to prevent the system from being exploitable.
da_Vinci said:
I think that if we can generate a system with controlled rating inflation, based on just rating wins (which are either wins or losses depending on relative performance) and not rating AI, that would be the way to go.
I don't see how this can work. It would mean that a sub-standard win is punished in terms of lost rating while a lost or retired game has no impact on rating. How will players react to this? And is it fair?
da_Vinci said:
Can't we normalise the separate ratings of the players within each VC so that the means and standard deviations for all VCs are the same, eliminating VC as a variable? Does this resolve the issue of not knowing which VC a loser was targeting?
Something like this is happening in the system I propose by calculating the "required workload" for each VC
AlanH said:
I'm still not sure what adding a ratings system will do for the greater enjoyment of the GOTM playing community ....
When the first mobile phone came out I'm sure that many people were wondering if they would ever want to own such a device. You obviously don't miss something that you don't know anything about and haven't tried out. Ask any chess player if it would be a good idea to abolish the ELO rating system and he would look at you in disbelief.
Summing up on the two main issues:
1) Handling retired games. If retired games are not allowed it may result in games that are lost fast on purpose. If retired games are to be given the same submission bonus as lost submitted games there will likely be fewer lost games submitted. I think it should still be possible to derive AI ratings from the submitted losses and from the wins perhaps by using an estimated loss date for the retired games.
2) Handling non-submissions. Best way from a statistical point of view is to have people commit to submitting at game download time. What would Chess ratings look like if the players could decide AFTER the game if they wanted it rated? Second best is to rate non-submissions as a loss but taking into account how often the player submits. I.e. if a player has been submitting frequently a non-submission will be rated fully as a loss. If he also misses the next submission it will only be rated as a "partial" loss and after a few more non-submissions his rating will cease to decrease since the non-submissions are likely due to something else than bad play. This could probably be coupled with some mechanism that allows players to regain rating faster if they have lost some due to non-submissions.