Ghost Rating V2
Posted: Mon Feb 12, 2018 3:33 pm
Not to bury the lede, here are the GR2 ratings for February. Note that inactive players are filterable using the column on the far right.
Intro
As most of you know, Ghost Rating is currently the (official yet unintegrated) rating system we use at WebDiplomacy. We have had lots of discussion over the years about how the rating system could be changed or improved. I’ve made a script to calculate, what I believe, is a better rating system. My hope would be for us to move away from the old rating system and towards the new one but I don’t want to be presumptuous. I look forward to hearing everyone’s feedback and discussion.
What’s in a name?
The new rating system is based on EIDRaS (ELO Inspired Diplomacy Rating System). EIDRaS is a terrible name and, because we may tweak some things, I think it is appropriate for us to choose a new one.
I propose we simply call it Ghost Rating V2 since Ghost Rating is a fucking badass name and it still honours The Ghost Maker who first implemented a rating system for us. I'm happy to find a different one if people would prefer.
What was Ghost Rating V1
As a quick refresher, the way (the old) GR works is that everyone bets an equal fraction of their rating and that pot is divvied up among the winners according to the scoring system (e.g. winner takes all, etc.).
What is Ghost Rating V2?
In GR2, a player’s expected result is calculated based on their own rating and the rating of their opponents. After the game, everyone’s rating is adjusted based on the difference between their actual score and their expected score.
Some of the important properties of the system that differentiate it from GR1 are:
1. Winning against stronger players is more beneficial than winning against weaker players
2. Losing against stronger players is less detrimental than losing against weaker players
3. Playing against new players on the site affects your rating less. New players are often much better or much worse than the initial rating they start with. You shouldn’t be penalized as much for losing to some pro who just joined the site but doesn’t have a high rating yet.
4. Your rating fluctuates more quickly in your first few games. Unfortunately, people don’t play a ton of Diplomacy games. That means that it takes quite a while for people to reach their “true rating.” GR2 allows for people to rise or drop to their true rating more quickly. This also alleviates some of the issues addressed in point (3).
The Math
Feel free to stop reading here if you’re uninterested but I’m going to lay out some of the math and point to things that can be tweaked depending on what people here want.
After each game a players rating changes by delta where:
delta = V*E*(S-X)
S
S is simply the player’s score. E.g For a solo in WTA it’s 1, for a draw in WTA it’s drawSize/7, etc.
X
X is the player’s expected score given by:
X = exp(c*Rating) ÷ Σj[exp(c*Ratingj)]
; where Rating is the player’s rating and Ratingj is the rating of player j. c is a variable that describes the shape of the expected score curve (see the picture here). A higher value of c creates a steeper curve. The spread of scores in a game is affected by the scoring type. For SoS, the spread is larger than for WTA which is larger than for PPSC.
The following values of c are used:
SoS – 0.0025
WTA – 0.0020
PPSC – 0.0015
E
E is a factor that adjusts based on a player’s experience. This is what allows your rating to fluctuate more quickly at the beginning and less later. The formula for E is:
E = 1+40/(GamesPlayed+10);
When you have played 0 games, E is 5 and it trends towards 1. The values of “40” and “10” can be changed however we want.
V
V describes the value of the game and is calculated as
V = 50*A*P*R.
A
A is the adjustment per variant. The adjustments were chosen as follows
Classic – 1
1v1 – Excluded (see note below)
All others (e.g. World, Modern, etc.) – 0.5
P
P is the adjustment for press. The following adjustments were chosen:
FP or Rulebook – 1
Public press – 0.5
Gunboat – 0.25
R
R is the adjustment for the number of provisional players and is given by the following equation:
R = 0.25+2*f
; where f is the fraction of players in the game with a non-provisional rating.
We can discuss changing the structure of R as well as the number of games required to stop being provisional. I have chosen 7 for now. There’s no hard and fast way to choose when a provisional rating finishes but I think 7 is a nice number for Diplomacy because of the number of powers in a classic game.
Other comments:
-GR2 excludes any games finished in 1903 or earlier. I believe GR1 meant to do this but accidently set the cutoff at turn 3 (i.e. S02) instead.
-The game cutoff is just whenever the database was mined by our wonderful admins. That means the last games are from February 9th, not the 1st.
-Inactivity is considered 90 days inactive, rather than 3 months as done by GR.
-Unranked games are ignored and banned and provisional players are removed from the list.
-1v1 games are all currently listed as unranked so I’m not sure how people want to handle 1v1 games. I suggest we remove the requirement for them to be unranked so they can be included in GB and overall ranks.
Intro
As most of you know, Ghost Rating is currently the (official yet unintegrated) rating system we use at WebDiplomacy. We have had lots of discussion over the years about how the rating system could be changed or improved. I’ve made a script to calculate, what I believe, is a better rating system. My hope would be for us to move away from the old rating system and towards the new one but I don’t want to be presumptuous. I look forward to hearing everyone’s feedback and discussion.
What’s in a name?
The new rating system is based on EIDRaS (ELO Inspired Diplomacy Rating System). EIDRaS is a terrible name and, because we may tweak some things, I think it is appropriate for us to choose a new one.
I propose we simply call it Ghost Rating V2 since Ghost Rating is a fucking badass name and it still honours The Ghost Maker who first implemented a rating system for us. I'm happy to find a different one if people would prefer.
What was Ghost Rating V1
As a quick refresher, the way (the old) GR works is that everyone bets an equal fraction of their rating and that pot is divvied up among the winners according to the scoring system (e.g. winner takes all, etc.).
What is Ghost Rating V2?
In GR2, a player’s expected result is calculated based on their own rating and the rating of their opponents. After the game, everyone’s rating is adjusted based on the difference between their actual score and their expected score.
Some of the important properties of the system that differentiate it from GR1 are:
1. Winning against stronger players is more beneficial than winning against weaker players
2. Losing against stronger players is less detrimental than losing against weaker players
3. Playing against new players on the site affects your rating less. New players are often much better or much worse than the initial rating they start with. You shouldn’t be penalized as much for losing to some pro who just joined the site but doesn’t have a high rating yet.
4. Your rating fluctuates more quickly in your first few games. Unfortunately, people don’t play a ton of Diplomacy games. That means that it takes quite a while for people to reach their “true rating.” GR2 allows for people to rise or drop to their true rating more quickly. This also alleviates some of the issues addressed in point (3).
The Math
Feel free to stop reading here if you’re uninterested but I’m going to lay out some of the math and point to things that can be tweaked depending on what people here want.
After each game a players rating changes by delta where:
delta = V*E*(S-X)
S
S is simply the player’s score. E.g For a solo in WTA it’s 1, for a draw in WTA it’s drawSize/7, etc.
X
X is the player’s expected score given by:
X = exp(c*Rating) ÷ Σj[exp(c*Ratingj)]
; where Rating is the player’s rating and Ratingj is the rating of player j. c is a variable that describes the shape of the expected score curve (see the picture here). A higher value of c creates a steeper curve. The spread of scores in a game is affected by the scoring type. For SoS, the spread is larger than for WTA which is larger than for PPSC.
The following values of c are used:
SoS – 0.0025
WTA – 0.0020
PPSC – 0.0015
E
E is a factor that adjusts based on a player’s experience. This is what allows your rating to fluctuate more quickly at the beginning and less later. The formula for E is:
E = 1+40/(GamesPlayed+10);
When you have played 0 games, E is 5 and it trends towards 1. The values of “40” and “10” can be changed however we want.
V
V describes the value of the game and is calculated as
V = 50*A*P*R.
A
A is the adjustment per variant. The adjustments were chosen as follows
Classic – 1
1v1 – Excluded (see note below)
All others (e.g. World, Modern, etc.) – 0.5
P
P is the adjustment for press. The following adjustments were chosen:
FP or Rulebook – 1
Public press – 0.5
Gunboat – 0.25
R
R is the adjustment for the number of provisional players and is given by the following equation:
R = 0.25+2*f
; where f is the fraction of players in the game with a non-provisional rating.
We can discuss changing the structure of R as well as the number of games required to stop being provisional. I have chosen 7 for now. There’s no hard and fast way to choose when a provisional rating finishes but I think 7 is a nice number for Diplomacy because of the number of powers in a classic game.
Other comments:
-GR2 excludes any games finished in 1903 or earlier. I believe GR1 meant to do this but accidently set the cutoff at turn 3 (i.e. S02) instead.
-The game cutoff is just whenever the database was mined by our wonderful admins. That means the last games are from February 9th, not the 1st.
-Inactivity is considered 90 days inactive, rather than 3 months as done by GR.
-Unranked games are ignored and banned and provisional players are removed from the list.
-1v1 games are all currently listed as unranked so I’m not sure how people want to handle 1v1 games. I suggest we remove the requirement for them to be unranked so they can be included in GB and overall ranks.