Ghost Rating V2

Use this forum to discuss Diplomacy strategy.
Forum rules
This forum is limited to topics relating to the game Diplomacy only. Other posts or topics will be relocated to the correct forum category or deleted. Please be respectful and follow our normal site rules at http://www.webdiplomacy.net/rules.php.
Message
Author
chluke
Posts: 63
Joined: Sun Dec 31, 2017 12:10 am
Karma: 7

Re: Ghost Rating V2

#21 Post by chluke » Tue Feb 13, 2018 4:02 am

Thanks for the work, Yonni.

Man, did I drop like a rock in GRv2 "Classic" games (#51) vs GRv1 Classic (#4)! And it's not just from inactive players. For example, I'm 94(!) spots ahead ahead of "serginss" in GRv1 Classic, yet he is ahead of me in GRv2 Classic!?! :(

I'm going to take a wild guess and say that you're counting live games in "Classic", whereas GRv1 has classic non-live (I've bombed many live games where I couldn't get my timeclock right, and I play even more terribly when I only have 3 minutes per turn.)

Any chance you can break out "live" games, which are a totally different animal (for me at least).

Yonni
Posts: 126
Joined: Thu Oct 19, 2017 6:55 pm
Karma: 40

Re: Ghost Rating V2

#22 Post by Yonni » Tue Feb 13, 2018 6:16 am

@ Chluke

The classic rating excludes everything under 12 hours. I'm not sure what GR1 uses as a cutoff but I'm not sure if that's the cause of the discrepancy. I can take a look to make sure I don't have any fuckups

@ ChippeRock

There are definitely good ways to test the strength of a scoring system. However, I don't have the inclination to follow this up with a rigorous statistical analysis of the different systems. The code and data is all available if you want to have a go at it - I can try answer any questions you have about what I wrote.

ghug
Admin
Admin
Posts: 996
Joined: Mon Mar 20, 2017 3:51 pm
Location: Seattle
Karma: 323

Re: Ghost Rating V2

#23 Post by ghug » Tue Feb 13, 2018 7:33 am

@Yonni, I think GR defines live as an hour or less. There are so few games in that time frame that it doesn't really make a difference.

@Chippe, that's the way to test, but obviously it'll be some work to do. It should be noted, though, that there are only two systems, not three, which makes things slightly simpler.

@CCR, that's not exactly right. The two systems probably won't differ two much in the example you give. It's when he loses both games that the differences are stark, as GR would punish him the same for both of them, whereas Yonni's would ding him less for the loss against tougher opponents.

Of course, VI never loses, so it's a moot point. #VIVACASCADIA

VillageIdiot
Posts: 177
Joined: Sun Dec 31, 2017 3:55 am
Karma: 92

Re: Ghost Rating V2

#24 Post by VillageIdiot » Tue Feb 13, 2018 9:39 am

”Your rating fluctuates more quickly in your first few games. Unfortunately, people don’t play a ton of Diplomacy games. That means that it takes quite a while for people to reach their “true rating.” GR2 allows for people to rise or drop to their true rating more quickly.”

How quick are we talking here? I ask as this was a pain point over at PlayDip for a while when they adopted a similar approach. It’s been since modified, but for a few years it was set in such a way players could based on a string of ‘beginners luck’ or whatever shoot up to top 10 in their first ten games, lose their momentum or get a bit more known, then suddenly shoot back down. Meanwhile the seasoned players face frustration of getting constantly nudged up and down by all these overnight sensation players who over time turn out to be nowhere near as good as their temporary score had lead to believe.

The other shift in culture you’re going to see should you adopt this approach is if it is more rewarding for better players to have 4wds against other strong players then solos over average joes then your inadvertently encouraging good players to want to stick to games with each other, frustrating the less established players looking to get into the ‘big’ games.

I’m not trying to discourage this at all, having played on both sites there’s pros and cons I enjoy of both systems and actually enjoyed getting to play both sides. PlayDips system (similar to this) encourages more solo seeking/aggressive games yet can be frustrating the damage one fluke bad game (sometimes due to surrenders out of your control) against a low ranked player can effect a rating while WebDip allows for a gradual growth you can feel confident in yet its lack of frisky reward “best outcome possible” or penalizing of underperforming (accepting a 5wd when your stats show you should be able to consistently achieve 3wd) does lead to a more conservative level of game play.

There’s a sweet spot in the middle here I’m sure, if you’ve hit that spot I’m all in!

Yonni
Posts: 126
Joined: Thu Oct 19, 2017 6:55 pm
Karma: 40

Re: Ghost Rating V2

#25 Post by Yonni » Tue Feb 13, 2018 3:55 pm

@ CCR

Let's take a look at three concrete examples in WTA games.
(1) Average (R=1000) against 6 opponents with an average rating in the top 10 of classic (R = 1500).
(2) Average (R=1000) against 6 average opponents (R=1000)
(3) An average top 10 player (R=1500) against 6 average opponents (R=1000)

A 4-way draw in example (1), a 3-way draw in example (2), and a 2-way draw in example (3) are all about equal. However, even a 2-way draw in example (1) is not equal to a solo in example (3). This might actually point to a need to have a bigger spread in ratings. At a "top player" rating of 1950, the 2-way and solo become about equal. Though I'm sure there are varying opinions about whether or not that should be the case.

@VI

I'd love to comment more on the PlayDip rating system but, unfortunately, they don't publish it. Having a wealth of data from the PlayDip website and a unified rating system would be great for the hobby but I don't believe that option is on the table. (I'm not one to pick at old wound re: PlayDip and enjoyed playing there for the ODC but I am a bit annoyed by this one).

But, to get to the things we can control...

If you look at the math on the first page, there are two variables that are affected by a player's 'newness' E and R.

E is the bigger one and is calculated E = 1+40/(GamesPlayed+10). For a new player with 0 games played, E = 5. At 10 games, it goes down to 3. At 50 games it goes down to 1.7. I think it's not an unreasonable scale and shouldn't run into the issues you're describing.

R actually works the other way a bit. R = 0.25+2*f where f is the number of provisional players in the game. f will be at most 6/7 for new provisional players but will often be lower because they're playing other new ppl too.

ChippeRock
Posts: 229
Joined: Thu Oct 19, 2017 5:36 pm
Karma: 66

Re: Ghost Rating V2

#26 Post by ChippeRock » Tue Feb 13, 2018 6:00 pm

Yonni wrote:
Tue Feb 13, 2018 6:16 am
@ ChippeRock

There are definitely good ways to test the strength of a scoring system. However, I don't have the inclination to follow this up with a rigorous statistical analysis of the different systems. The code and data is all available if you want to have a go at it - I can try answer any questions you have about what I wrote.
I've got the table with the Actual Scores, but not the Expected Scores for those games. Please post a spreadsheet with the player's Expected Score for all of the games.

ChippeRock
Posts: 229
Joined: Thu Oct 19, 2017 5:36 pm
Karma: 66

Re: Ghost Rating V2

#27 Post by ChippeRock » Tue Feb 13, 2018 6:05 pm

Additionally, I don't recall there being data on the Expected Scores for GR v1.

ghug
Admin
Admin
Posts: 996
Joined: Mon Mar 20, 2017 3:51 pm
Location: Seattle
Karma: 323

Re: Ghost Rating V2

#28 Post by ghug » Tue Feb 13, 2018 6:49 pm

GR has expected scores, but they're buried deep in the code. Someone inclined to mess with it could get them out, but that probably won't be me for a while.

Yonni
Posts: 126
Joined: Thu Oct 19, 2017 6:55 pm
Karma: 40

Re: Ghost Rating V2

#29 Post by Yonni » Tue Feb 13, 2018 7:53 pm

ChippeRock wrote:
Tue Feb 13, 2018 6:00 pm
I've got the table with the Actual Scores, but not the Expected Scores for those games. Please post a spreadsheet with the player's Expected Score for all of the games.
That doesn't exist right now. I've linked the Matlab code that calculates the expected score for each game. It will take some editing to make it output a table of expected scores for easy comparison. Let me know if you have any questions about it. Maybe I'll get to it at some point in the future but it's going to be put on the back burner because life is getting a bit busy these days.

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest