webdiplomacy.net

Posted: **Mon Feb 12, 2018 3:33 pm**

Not to bury the lede, here are the GR2 ratings for February. Note that inactive players are filterable using the column on the far right.

Intro

As most of you know, Ghost Rating is currently the (official yet unintegrated) rating system we use at WebDiplomacy. We have had lots of discussion over the years about how the rating system could be changed or improved. I’ve made a script to calculate, what I believe, is a better rating system. My hope would be for us to move away from the old rating system and towards the new one but I don’t want to be presumptuous. I look forward to hearing everyone’s feedback and discussion.

What’s in a name?

The new rating system is based on EIDRaS (ELO Inspired Diplomacy Rating System). EIDRaS is a terrible name and, because we may tweak some things, I think it is appropriate for us to choose a new one.
I propose we simply call it Ghost Rating V2 since Ghost Rating is a fucking badass name and it still honours The Ghost Maker who first implemented a rating system for us. I'm happy to find a different one if people would prefer.

What was Ghost Rating V1

As a quick refresher, the way (the old) GR works is that everyone bets an equal fraction of their rating and that pot is divvied up among the winners according to the scoring system (e.g. winner takes all, etc.).

What is Ghost Rating V2?

In GR2, a player’s expected result is calculated based on their own rating and the rating of their opponents. After the game, everyone’s rating is adjusted based on the difference between their actual score and their expected score.

Some of the important properties of the system that differentiate it from GR1 are:

1. Winning against stronger players is more beneficial than winning against weaker players

2. Losing against stronger players is less detrimental than losing against weaker players

3. Playing against new players on the site affects your rating less. New players are often much better or much worse than the initial rating they start with. You shouldn’t be penalized as much for losing to some pro who just joined the site but doesn’t have a high rating yet.

4. Your rating fluctuates more quickly in your first few games. Unfortunately, people don’t play a ton of Diplomacy games. That means that it takes quite a while for people to reach their “true rating.” GR2 allows for people to rise or drop to their true rating more quickly. This also alleviates some of the issues addressed in point (3).

The Math

Feel free to stop reading here if you’re uninterested but I’m going to lay out some of the math and point to things that can be tweaked depending on what people here want.

After each game a players rating changes by delta where:

delta = V*E*(S-X)

S

S is simply the player’s score. E.g For a solo in WTA it’s 1, for a draw in WTA it’s drawSize/7, etc.

X

X is the player’s expected score given by:

X = exp(c*Rating) ÷ Σj[exp(c*Ratingj)]

; where Rating is the player’s rating and Ratingj is the rating of player j. c is a variable that describes the shape of the expected score curve (see the picture here). A higher value of c creates a steeper curve. The spread of scores in a game is affected by the scoring type. For SoS, the spread is larger than for WTA which is larger than for PPSC.

The following values of c are used:
SoS – 0.0025
WTA – 0.0020
PPSC – 0.0015

E

E is a factor that adjusts based on a player’s experience. This is what allows your rating to fluctuate more quickly at the beginning and less later. The formula for E is:

E = 1+40/(GamesPlayed+10);

When you have played 0 games, E is 5 and it trends towards 1. The values of “40” and “10” can be changed however we want.

V

V describes the value of the game and is calculated as

V = 50*A*P*R.

A

A is the adjustment per variant. The adjustments were chosen as follows

Classic – 1
1v1 – Excluded (see note below)
All others (e.g. World, Modern, etc.) – 0.5

P

P is the adjustment for press. The following adjustments were chosen:

FP or Rulebook – 1
Public press – 0.5
Gunboat – 0.25

R

R is the adjustment for the number of provisional players and is given by the following equation:

R = 0.25+2*f

; where f is the fraction of players in the game with a non-provisional rating.

We can discuss changing the structure of R as well as the number of games required to stop being provisional. I have chosen 7 for now. There’s no hard and fast way to choose when a provisional rating finishes but I think 7 is a nice number for Diplomacy because of the number of powers in a classic game.

Other comments:

-GR2 excludes any games finished in 1903 or earlier. I believe GR1 meant to do this but accidently set the cutoff at turn 3 (i.e. S02) instead.

-The game cutoff is just whenever the database was mined by our wonderful admins. That means the last games are from February 9th, not the 1st.

-Inactivity is considered 90 days inactive, rather than 3 months as done by GR.

-Unranked games are ignored and banned and provisional players are removed from the list.

-1v1 games are all currently listed as unranked so I’m not sure how people want to handle 1v1 games. I suggest we remove the requirement for them to be unranked so they can be included in GB and overall ranks.

Posted: **Mon Feb 12, 2018 3:43 pm**

Octavilous!

Posted: **Mon Feb 12, 2018 3:57 pm**

I like this one way better than the old one.

Ghost Rating is dead! Long live Ghost Rating V2!

Posted: **Mon Feb 12, 2018 3:59 pm**

Thanks, Yonni!

Did you change the math at all?

Last month I was one of the biggest underperformers relative to GR, whereas this month my rankings in FP and classic are almost identical to GR. I soloed a game last month against some not exceptional competition, which didn't shift my GR rankings at all, which I feel is probably more accurate than the considerable upswing I got here.

You can always ignore the unrankedness for the specific 1v1 variant ID if you really want to include them.

Posted: **Mon Feb 12, 2018 4:35 pm**

Thanks for putting this together. Is inactivity based on the date a player's last game finished? It would be really cool if there was a way to see how each game affected your rating.

Posted: **Mon Feb 12, 2018 4:38 pm**

ghug wrote: ↑
Mon Feb 12, 2018 3:59 pm
Did you change the math at all?

Yeah, it's been tweaked quite a bit from the last one I posted. I also noticed some errors in the code since the last time time so just ignore that the old one ever existed.

ghug wrote: ↑
Mon Feb 12, 2018 3:59 pm
You can always ignore the unrankedness for the specific 1v1 variant ID if you really want to include them.

I might do that but am worried it might rub some people the wrong way if they thought it was not going to affect their GR. I do think, however, that Xorxes dominance of 1v1 should show up somewhere in the Overall and/or GB rankings.
I think the best solution is for the admins to allow ranked 1v1 games and we just include them going forwards.

Posted: **Mon Feb 12, 2018 4:41 pm**

Aereaux wrote: ↑
Mon Feb 12, 2018 4:35 pm
Thanks for putting this together. Is inactivity based on the date a player's last game finished?

Yup, inactivity is based on last game finished.

Aereaux wrote: ↑
Mon Feb 12, 2018 4:35 pm
It would be really cool if there was a way to see how each game affected your rating.

I'd love to do that. We can make fun graphs and everything. However, keeping longitudinal track of everyone's rank is massively memory intensive using Matlab and my shitty programming skills. I might try look into this in the future but not yet.

Posted: **Mon Feb 12, 2018 4:42 pm**

The reason 1v1s are unranked is that they become easy D-Point farms otherwise - grab a friend who has 100 points, play a 100 point game with them, have them concede, win 100 points, your friend gets refunded back up to 100 points, rinse & repeat.

Posted: **Mon Feb 12, 2018 6:15 pm**

Yeah, I don't think 1v1s are ever going to be ranked. It's such a different game than diplomacy that I not sure it would even make sense to fold them into one rating (though I say the same about gunboat, so who knows).

As far as the math goes, adjusting C for the different rating systems is an interesting way to do it. I'm having trouble conceptualizing the actual result of that. I think making it double WTA's for SoS makes some intuitive sense. The issue with PPSC is still that a player infinitely better than her opponents can still only be expected to win ~50% of the pot, whereas this would expect 100%.

I'd be in favor of a lower number than 40 in the E term. It seems like once you're past 20 games or so, everything should be worth pretty much the same, but that's more a matter of preference.

Posted: **Mon Feb 12, 2018 6:28 pm**

ghug wrote: ↑
Mon Feb 12, 2018 6:15 pm
As far as the math goes, adjusting C for the different rating systems is an interesting way to do it. I'm having trouble conceptualizing the actual result of that.

Here's a graph of expected score (against average opponents) vs. rating for different values of c:
https://imgur.com/a/mPrgk

Posted: **Mon Feb 12, 2018 7:21 pm**

Would you mind publishing the script?

These are correct changes and I'm surprised the original was so weak (to be frank). Good shit.

Posted: **Mon Feb 12, 2018 7:37 pm**

Condescension wrote: ↑
Mon Feb 12, 2018 7:21 pm
Would you mind publishing the script?

https://drive.google.com/drive/folders/ ... sp=sharing

Posted: **Mon Feb 12, 2018 8:49 pm**

On balance I think I approve of these changes...

Posted: **Mon Feb 12, 2018 11:12 pm**

Had to filter me out to knock me out, eh?

I like this, puts me back as the best. I approve as well!

Posted: **Mon Feb 12, 2018 11:20 pm**

TheWiz has you beat in classic games...

Posted: **Mon Feb 12, 2018 11:28 pm**

Filtering out these quasi-retired players makes a lot of sense. I've no idea who most of these inactive players are. Granted, back in the day VillageIdiot was not without some talent, but who on earth is this zultar character?

Posted: **Tue Feb 13, 2018 12:03 am**

Yonni wrote: ↑
Mon Feb 12, 2018 11:20 pm
TheWiz has you beat in classic games...

Sounds like you’ve still got some bugs in your algorithm, but I trust you’ll iron them out.

Posted: **Tue Feb 13, 2018 12:14 am**

What is the average difference between the Actual Score & Expected Score for all of the players and all of the games for EIDRaS, Ghost Rating v1, and Ghost Rating v2? I want the absolute difference between Actual Score & Expected score for each of the players per game (e.g.,|Actual Score - Expected Score| = |Expected Score - Actual Score|).

That's the only way we'll know which one is better since the point of Ghost Rating is to get the Actual Score to equal the Expected Score every single game (Of course this is impossible but your trying to get those two values to be as close to each other as possible).

Posted: **Tue Feb 13, 2018 1:43 am**

I also don't want to do the math, ChippeRock, but it can be put as simple as this example:

(Game 1) VI calls some village idiots to play a game, and he settles his way to an easy two way draw with his favourite cousin.

(Game 2) VI plays hard to find his place in a four way draw agasint six other high ranked players.

In normal GR, VI would gain more points from Game 1 than from game 2.

In GRV2, VI would gain more points from Game 2 than from game 1.

Posted: **Tue Feb 13, 2018 3:33 am**

CCR wrote: ↑
Tue Feb 13, 2018 1:43 am
I also don't want to do the math, ChippeRock, but it can be put as simple as this example:

(Game 1) VI calls some village idiots to play a game, and he settles his way to an easy two way draw with his favourite cousin.

(Game 2) VI plays hard to find his place in a four way draw agasint six other high ranked players.

In normal GR, VI would gain more points from Game 1 than from game 2.

In GRV2, VI would gain more points from Game 2 than from game 1.

I understand math. Of course the expected score would be higher in the first game. What I'm asking for is the average absolute difference between the Expected Score and Absolute Score for players per game for the 3 different rating systems.

However he won't necessarily gain more points in Game 2 than Game 1. Let Yonni get me the numbers and we'll see how good the rating systems are at predicting the result of a game.

webdiplomacy.net

Ghost Rating V2

Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2

Re: Ghost Rating V2