GOTM 2.0 Brainstorming

Frederiksberg · Apr 2, 2009

da_Vinci said:
I suspect he means for every submitted game.

Yes, substitute "game" by "submitted game".

da_Vinci said:
Does the rating for losses recognize that losing later is better (even though winning sooner is better)?

No, it doesn't work like that. One reason is that a fast loss could also be the result of the AI making better decisions in a particular game (like attacking you early in a Deity game). In that case it's more the better play of the AI that is responsible for you loosing fast rather than anything in your own play. As it's done now, if you know you are loosing you might as well play the last turns fast.

AlanH · Apr 2, 2009

I'm confused. How do you rate a non-submission vs. a submitted loss?

Frederiksberg · Apr 2, 2009

AlanH said:
I'm confused. How do you rate a non-submission vs. a submitted loss?

Basically rate a non-submission as a loss. That should encourage people to submit. And add a small "submission bonus" to all submitted games. This will also serve as a handle that can be adjusted every now and then - say once per year - to keep inflation/deflation near zero. In chess you are also rated if you don't show up to a game without an acceptable reason. Retired games should not be given the submission bonus since there is no useful information in a retired game (We are missing the victory/loss date).

dr_s · Apr 2, 2009

Here's a suggestion based on some of the discussion. It seems that a lot of players download and play, but don't finish games. That says that something like the quick start challenge would be of interest to a lot of players. Play the game out to 0 AD or something, then submit, and give awards based on various metrics. You could give the "jesusin" to the player with the most culture, the "erkon" to the player who razes the most cities, etc.

But here's a twist. Have a second competition based on the submitted saves. Either pick one save, or allow players to pick from a few saves, or allow them to pick any save, and then finish the game. Award medals, rankings as in the current xOTMs. I think it would be fun and educational to play out a save from another player's game.

da_Vinci · Apr 2, 2009

Frederiksberg said:
Basically rate a non-submission as a loss. That should encourage people to submit. And add a small "submission bonus" to all submitted games. This will also serve as a handle that can be adjusted every now and then - say once per year - to keep inflation/deflation near zero. In chess you are also rated if you don't show up to a game without an acceptable reason. Retired games should not be given the submission bonus since there is no useful information in a retired game (We are missing the victory/loss date).

The big difference between chess ratings and CIV ratings would be our attempt to define degrees of winning and losing.

And perhaps our attempt to evaluate the AI play in a particular game, rather than the AI rating over several games, as a reference point for player performance? (If that is what Fb is talking about with early loss being sometimes better than later loss).

On the point of what losses are better than others ... I think that longer survival has to mean something. Which would be an incentive not to retire, but to play to the bitter end? Currently, the conquest of one's empire deflates one's score, creating an incentive to retire at the outset of the indefensible invasion.

Now taking another notion from chess ... is there a "draw" in CIV, from a ratings perspective? If you are beaten to space by a turn, or to that last legendary city by a turn, one could argue that the player performed essentially equal to the AI ... and does that provide any useful way to segregate losses into better or worse ones?

dV

Frederiksberg · Apr 2, 2009

da_Vinci said:
The big difference between chess ratings and CIV ratings would be our attempt to define degrees of winning and losing.

Yes, a win in 300 turns is not the same as a win in 200 turns given that the same VC is achieved. And it's even more complicated than this with several ways of achieving victory which are not the same and require different number of turns on average.

da_Vinci said:
And perhaps our attempt to evaluate the AI play in a particular game, rather than the AI rating over several games, as a reference point for player performance? (If that is what Fb is talking about with early loss being sometimes better than later loss).

It seems logical to assume that the AI doesn't always play with exact same strength since many AI decisions are more or less random. One example is the Deity AI that decides to attack the human early thus making a better play than the AI that waits.

da_Vinci said:
On the point of what losses are better than others ... I think that longer survival has to mean something. Which would be an incentive not to retire, but to play to the bitter end? Currently, the conquest of one's empire deflates one's score, creating an incentive to retire at the outset of the indefensible invasion.

This is not clear cut I think. The model underlying the proposed rating sytem assumes that the player is going for a victory and not only trying to postpone the AI victory as far as possible. I suppose it's debatable if a defensive battle with no prospect of winning should be rewarded. A retired game should be counted as a non-submission i.e. it should not get the submission bonus since a retired game is useless from a rating perspective while a submitted loss has value because we get a turn number for the AI victory.

The issues regarding how well the model underlying a rating system fits the actual data can only be answered by testing with real game data. My starting point is a very simple model (as is the ELO rating model) and with more knowledge of real data it may be possible to refine the model for better fit.

da_Vinci said:
Now taking another notion from chess ... is there a "draw" in CIV, from a ratings perspective? If you are beaten to space by a turn, or to that last legendary city by a turn, one could argue that the player performed essentially equal to the AI ... and does that provide any useful way to segregate losses into better or worse ones?

It would be really nice to know how close the players were to actully winning the game but I doubt that this information can be made available from the submitted saves.

AlanH · Apr 2, 2009

A turn number for the AI victory may be meaningless. Before we allowed retired submissions, Civ3 players who decided they could not finish their games, but still wanted to submit, would commit suicide. They could give away all their cities, or leave them undefended. That doesn't give any useful information about how well the AI played.

da_Vinci · Apr 3, 2009

Won't AI time to victory depend in part on how the human plays, at least in some cases?

And AI gets a conquest win by killing one civ (the human) ... and that can be highly dependent on how well the human handles power level and diplomacy to prevent or delay being attacked.

Lots of nuances ...

dV

Frederiksberg · Apr 3, 2009

AlanH said:
A turn number for the AI victory may be meaningless. Before we allowed retired submissions, Civ3 players who decided they could not finish their games, but still wanted to submit, would commit suicide. They could give away all their cities, or leave them undefended. That doesn't give any useful information about how well the AI played.

Such behavior could be disallowed. Finishing a game by basically just pressing enter could be allowed but actively trying to end the game in favor of an AI could be illegal. The cases you describe should be detectable.

Alternatively the AI ratings can be estimated from old games and then frozen.

da_Vinci said:
Won't AI time to victory depend in part on how the human plays, at least in some cases?

And AI gets a conquest win by killing one civ (the human) ... and that can be highly dependent on how well the human handles power level and diplomacy to prevent or delay being attacked.

Lots of nuances ...

dV

In Chess, the outcome of a game certainly depends on how your opponent plays. Thats why opponent rating is used when calculating your new rating from a given result.

The impact of the AI bias towards certain VC's is something to investigate a bit. But it doesn't make much sense to create a complex rating model before some data is available. And maybe a complex model is not desirable at all. In Chess, the result of a game also depend on numerous factors like the choice of opening, how well the players play the type of positions that develop on the board, fatigue, time trouble etc. Yet, the model for the Chess player is that his performance in a given game can be described by a single number that is gaussian distributed with a fixed standard deviation of 200 and a mean that is equal to the ELO rating.

AlanH · Apr 3, 2009

I guess my basic concern is that this system seems to impose significant new constraints on players and staff, just to be able to measure the non-performance of non-players.

nokem · Apr 3, 2009

It would also discourage me from downloading a game for inclusion in the ratings unless I was absolutely sure I could get it finished in time. I can't help feeling that once we're talking about 'illegality' and 'detection' of transgressions then we're a million miles away from what XOTM is about.

Let's face it the rankings are meaningless and always will be. I'm #65 on the combined rankings and Vynd is #71. Does that make me a better player? No way! Vynd makes his ranking with just 3 submissions whereas I've filed 13 games for my points. Shouldn't Vynd be above me? That would be a truer measure of skill, but it would discourage submissions. Unconquered Sun doesn't make the top 100 despite one of the most inspirational submissions and write-ups it's been my pleasure to read.

In my opinion only two things will encourage participation: a great sense of community on the board and an interesting and challenging series of games to play. I guess I'm just repeating what I said earlier in the thread (and I'm sure I'm breaking the rules again.

)

nokem

da_Vinci · Apr 3, 2009

Frederiksberg said:
In Chess, the outcome of a game certainly depends on how your opponent plays. Thats why opponent rating is used when calculating your new rating from a given result.

Absolutely, but in chess there is no attempt to estimate your opponent's performance in THAT game in isolation, the rating is used, which represents an estimate of average performance over several to many games.

Unless I have misunderstood, I thought you were trying to generate a rating for the AI based on a single game (or a single save), rather than AI performance (within diff levels) over many games. Which seems to add a lot of complexity.

Asuming the AI does not learn ... its skill level should be pretty constant over time. Yes, it will have variability as it applies its algorithm to varying game conditions (the algorithm makes decisions that are better for some game situations than for others), but that is a pretty predictable and constant range of variation, I would think.

So if AI rating is estimated historically, how much complexity can be removed from your system?

Perhaps the real opponents are the other humans ... so finishing (winning) faster than player X is considered a "win" over player X ... and perhaps the gap in number of turns would give a variable margin of outperformance (it is fixed in chess since all wins are equal, IIRC). Finishing (winning) slower is a loss ... also with a variable margin of underperformance. With appropriate ceilings and floors on those margins.

Still remains how to rate a loss to the AI ... if AI has an historical rating, a loss could be a set margin of underperformance to that, which should be simple (but is it accurate for the system?).

Then how to rate an attempt with no submission ...

Hmm ... is it feasible to generate a ratings system based only on won games? Since those wins could be considered to be "wins" over all who finished slower, and "losses" to all who finished faster ... maybe consider the players immediately above and below the player to be rated (or the 2 or 3 above or below ... obviously issues for the top and bottom players)? Then no need to rate the AI at all.

That would remove any penalty for trying and failing, which might remove a lot of the reservations that are popping up in the thread.

dV

AlanH · Apr 3, 2009

Then no need to rate the AI at all.

Sounds reasonable to me. The AI is not the competitor here, the other players are. The AI is more like the golf course in this situation - a constant across one game, certainly, and actually a constant across many games. As dV says, it doesn't learn, until a new version of the game software is released.

da_Vinci · Apr 3, 2009

nokem said:
Let's face it the rankings are meaningless and always will be. I'm #65 on the combined rankings and Vynd is #71. Does that make me a better player? No way! Vynd makes his ranking with just 3 submissions whereas I've filed 13 games for my points. Shouldn't Vynd be above me? That would be a truer measure of skill, but it would discourage submissions. Unconquered Sun doesn't make the top 100 despite one of the most inspirational submissions and write-ups it's been my pleasure to read.

nokem

The global rankings are NOT a SKILL ranking, they are a PERFORMANCE ranking. And so you have to perform (participate) to be in the running. Just like the best team does not always win the championship (tournament, playoffs, etc.) due to weather, luck, injuries, etc.

So the globals do what they are designed to do ... we need something different to do something different.

The things that make the current global rankings not a skill ranking are the penalties for non-participation, and the subordination of one (or two) of two (or three) excellent results that fall in the same temporal bracket.

We could, using the same points systems for score and speed, generate

1) a lifetime best ranking, (best 10 results ever, not aged, no bracket issues), and

2) a recent best ranking (best 10 results in the last [year, half year, whatever], not aged, no bracket issues) Hmm ... this allows a "current year champion" (last 12 mos) and a champion for each calendar year ... if we want to spam these kinds of lists ...

Each of these three (global, lifetime, recent) represent something different, and thus the lists will be different.

The premise for a ratings system is to try a different approach, to use performance to estimate skill, and create a skill comparison.

dV, the tutor of tautology

AlanH · Apr 3, 2009

So the globals do what they are designed to do ... we need something different to do something different

Yes. What *are* we trying to do?

da_Vinci · Apr 3, 2009

AlanH said:
Yes. What *are* we trying to do?

Well, I think that some players who can't submit every month, find the globals no incentive at all to submit at all.

So maybe a lifetime best rank (based on 6 to 10 best ever), and a yearly champion based on say best six submissions of last 12 months, might motivate some never submitters to become sometimes submitters?

And, it might be useful to have a ranking list that is more a reflection of skill rather than participation ... not to replace the globals, but in addition to them.

Then you would have all of the added traffic of the debate over which system has more meaning ... think of all the increased advertising revenue!

dV

cabert · Apr 4, 2009

dr_s said:
Here's a suggestion based on some of the discussion. It seems that a lot of players download and play, but don't finish games. That says that something like the quick start challenge would be of interest to a lot of players. Play the game out to 0 AD or something, then submit, and give awards based on various metrics. You could give the "jesusin" to the player with the most culture, the "erkon" to the player who razes the most cities, etc.

But here's a twist. Have a second competition based on the submitted saves. Either pick one save, or allow players to pick from a few saves, or allow them to pick any save, and then finish the game. Award medals, rankings as in the current xOTMs. I think it would be fun and educational to play out a save from another player's game.

I like that idea.
Maybe not as the normal GotM, but 2 or 3 times a year, it would be some kind of SG.
It has a lot of educational value + with more "awards" including variant awards (like most wonders in 1AD,...) it could give the top players an incentive to try something different, and write about it.
Some guys including myself have played a few SGs called "trash" games, where we had to pick the worst save at some point and try to salvage it.
This could be a challenge too, although I noticed the best players tend to stick with their own games.
If the staff designates one (or more, based on VC pursued) "save this game" challenge, and gives a special award to this, I have the feeling some good players may want to try.

It was done in the RB too, where sullla took over a save in a very backward situation, and salvaged it.

da_Vinci said:
The global rankings are NOT a SKILL ranking, they are a PERFORMANCE ranking. And so you have to perform (participate) to be in the running. Just like the best team does not always win the championship (tournament, playoffs, etc.) due to weather, luck, injuries, etc.

So the globals do what they are designed to do ... we need something different to do something different.

The things that make the current global rankings not a skill ranking are the penalties for non-participation, and the subordination of one (or two) of two (or three) excellent results that fall in the same temporal bracket.

We could, using the same points systems for score and speed, generate

1) a lifetime best ranking, (best 10 results ever, not aged, no bracket issues), and

2) a recent best ranking (best 10 results in the last [year, half year, whatever], not aged, no bracket issues) Hmm ... this allows a "current year champion" (last 12 mos) and a champion for each calendar year ... if we want to spam these kinds of lists ...

Each of these three (global, lifetime, recent) represent something different, and thus the lists will be different.

The premise for a ratings system is to try a different approach, to use performance to estimate skill, and create a skill comparison.

dV, the tutor of tautology

I think a few "see how good I am" statistics would make the not so frequent submitters feel better about it.
For instance, I once was in the top 100, but can't remember what my best ranking was.
I also never submitted a game which was in the awards, but maybe I was in the top 10 at least once (not sure about it). Some memory about this would make me feel better than my current "not on the list" ranking

.

edit : I checked and my best global ranking was 36th, while my best ranking in any particular game was 13th.
Strangely I recalled only the (warlords) Rome emperor domination I had submitted, but it wasn't by far my best ranking (many players went for domination in this game better than I did).
Maybe those informations could pop up when you check the global ranking. It's only about bragging rights, but then again what's GotM about?

dalamb · Apr 4, 2009

dr_s said:
But here's a twist. Have a second competition based on the submitted saves. Either pick one save, or allow players to pick from a few saves, or allow them to pick any save, and then finish the game. Award medals, rankings as in the current xOTMs. I think it would be fun and educational to play out a save from another player's game.

That could be fun a couple of times per year, but likely only if there were some discussion about good ways to proceed from the chosen save(s). I've never been in a SGOTM but I gather advice from all the players helps the successor choose what to do next.

"Pick from a few saves" would be best for me. If I'm interested in learning more about cultural wins I'd pick the "jesusin" and if I were in a mood to crush my enemies and see them scatter before me, I'd pick the "Erkon".

babybluepants · Apr 4, 2009

Hi all,

I'm very new to Civ 4 and CFC (I never played the game 'till about two months ago). I think I'll submit the current Hatty game. In fact, I've been playing only Noble since I started Civ 4, and I just played and easily won my first Monarch game as practice for the GOTM (and enjoyed it much more than Noble) - so, I guess, I like having everyone compete on the same level. I ignored the current BOTM because of Immortal, but I think I might use these as a chance to try something new in the future.

You guys run these threads waaay too long. I only had time to read through about half, so if something like this has already been mentioned, feel free to ignore it... I don't understand why your ranking system has to be so result based, as opposed to player-comparison based. It seems that the nature of the GOTM, where players all play the same game with the same settings, is conducive to a chess tournament / elo rating system (per victory condition anyway). I mean, if 20 players complete, say, a conquest win in one GOTM, could you not treat them as 20 chess players playing each other. Assign 1 for win / 0.5 for draw / 0 for loss and then calculate elo ratings from there. The only possible variation would be assigning a factor for draws (for instance, in a quick culture game finishing in 1705 or 1710 is probably equivalent, but 1705 / 1740 is probably a clear win/loss). You already have a fairly substantial database of past games, and you could assign a large factor for ratings fluctuations to compensate for the relatively low number of games per player per victory condition.

Frederiksberg · Apr 5, 2009

da_Vinci said:
Absolutely, but in chess there is no attempt to estimate your opponent's performance in THAT game in isolation, the rating is used, which represents an estimate of average performance over several to many games.

Unless I have misunderstood, I thought you were trying to generate a rating for the AI based on a single game (or a single save), rather than AI performance (within diff levels) over many games. Which seems to add a lot of complexity.

You have to differentiate between a single performance rating and the averaged rating. The performance rating indicates the player performance in a single game or tournament while the player rating is an average with an exponential forgetting factor. A single game can be rated and I know for a fact that this is sometimes done in the Danish Chess Team Championship.

In my rating proposal for GOTM rating the AI and player rating could similarly be calculated as both performance rating and averaged rating with the latter being the end result of the rating calculation.

da_Vinci said:
Asuming the AI does not learn ... its skill level should be pretty constant over time. Yes, it will have variability as it applies its algorithm to varying game conditions (the algorithm makes decisions that are better for some game situations than for others), but that is a pretty predictable and constant range of variation, I would think.

So if AI rating is estimated historically, how much complexity can be removed from your system?

As far as I know the AI doesn't learn from experience and thus it will have a fixed playing strength until the SW is updated or modified.

The AI playing strength is an inherent part of the statistical game model I have used so fixing the AI rating after estimation on old games has little influence on complexity.

da_Vinci said:
Perhaps the real opponents are the other humans ... so finishing (winning) faster than player X is considered a "win" over player X ... and perhaps the gap in number of turns would give a variable margin of outperformance (it is fixed in chess since all wins are equal, IIRC). Finishing (winning) slower is a loss ... also with a variable margin of underperformance. With appropriate ceilings and floors on those margins.

Still remains how to rate a loss to the AI ... if AI has an historical rating, a loss could be a set margin of underperformance to that, which should be simple (but is it accurate for the system?).

If you want to discuss modifications to the rating system I'm proposing we should start by discussing the underlying model. The rating updates more or less follow from the model. Looking at the model also has the advantage that you can discuss how well it may fit reality.

da_Vinci said:
Then how to rate an attempt with no submission ...

It must be rated as a loss - otherwise most players wouldn't submit any losses. Actually there should be a small bonus for those who do submit a loss as there already is in the existing ranking system.

da_Vinci said:
Hmm ... is it feasible to generate a ratings system based only on won games? Since those wins could be considered to be "wins" over all who finished slower, and "losses" to all who finished faster ... maybe consider the players immediately above and below the player to be rated (or the 2 or 3 above or below ... obviously issues for the top and bottom players)? Then no need to rate the AI at all.

That would remove any penalty for trying and failing, which might remove a lot of the reservations that are popping up in the thread.

dV

Since a loss is clearly a possible outcome of the game I doubt that you can formulate any satisfactory game model that ignores this. It would also open for exploits - in some cases it will be better to loose or not submit compared to submitting a sub-standard win.

You can't have a ranking system that doesn't penalize a poor performance. Sometimes you loose and that shows the limits of you skill which is exactly what we are trying to measure. People who don't like to compete could be offered a "non-rated" game as I have already suggested.

AlanH said:
Sounds reasonable to me. The AI is not the competitor here, the other players are. The AI is more like the golf course in this situation - a constant across one game, certainly, and actually a constant across many games. As dV says, it doesn't learn, until a new version of the game software is released.

The AI is certainly a competitor in the game itself and that's why my game model has a notion of AI playing skill. The AI skill is constant for a given SW and level, but that doesn't mean that we shouldn't estimate it - either using old game data or update it like player ratings. The AI skill is something inherent in the model and not something that can be removed without fundamentally changing the model. I think you may have gotten the impression that AI rating is an additional feature complicating matters, but that is not the case.

Since we are discussing possible modifications of the system I think I should explain the game model in more detail. That would also ensure that we speak the same language.

In my model I introduce the concept of playing power (or skill), denoted P, and the concept of a total workload (denoted W) that is the required work needed to achieve a certain VC. The VC is then achieved when the playing power integrated over a number of turns, T, exceed the required workload i.e. when:

P*T >= W

The playing power P is not constant since no person plays equally well every time. This is modeled by drawing P from a normal distribution with a standard deviation of 200 and a mean R, which is a parameter in the model (The rating). This happens to be exactly the same assumptions as in Chess (ELO) rating. The workload, W, required to achieve victory will also vary depending on the map, the speed and the VC. I don't make any specific assumptions on distribution for this variable.

So the victory date for the player in a particular game is:

T=W/P

Since the AI's are also participating in the game there is some chance that they will finish first. Since the AI is playing the same map and speed I have chosen to model the AI in the same way as the player i.e. the victory date for the AI is:

T_AI=W/P_AI

Now the outcome of the game is:

Win in T turns if T <= T_AI
Loss in T_AI turns if T_AI < T

Given a game result (Win/loss in T turns) it is then possible to derive estimates for the playing power in this particular game for both player and AI. The rating can then be updated as a running average of the estimated playing power with an exponential forgetting factor. Anyway, this is where it starts getting technical, and in order to discuss modifications to the rating system it's much more relevant to address changes to the model. The key point is that the model should be able to explain exactly how a win or loss in T turns is generated as a function of the players skill since these are the readily observable data. The rating update is then "only" a matter of estimating the "skill" parameters in this statistical model.

GOTM 2.0 Brainstorming

Deity

Mac addict, php monkey

Deity

Prince

Gypsy Prince

Deity

Mac addict, php monkey

Gypsy Prince

Deity

Mac addict, php monkey

Chieftain

Gypsy Prince

Mac addict, php monkey

Gypsy Prince

Mac addict, php monkey

Gypsy Prince

Big mouth

Deity

Deity

Deity

Similar threads