Ghost Rating V2

Forum rules
This forum is limited to topics relating to the game Diplomacy only. Other posts or topics will be relocated to the correct forum category or deleted. Please be respectful and follow our normal site rules at http://www.webdiplomacy.net/rules.php.

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:points: :-D :eyeroll: :neutral: :nmr: :razz: :raging: :-) ;) :( :sick: :o :? 8-) :x :shock: :lol: :cry: :evil: :?: :smirk: :!:
View more smilies

BBCode is ON
[img] is ON
[flash] is OFF
[url] is OFF
Smilies are ON

Topic review
   

If you wish to attach one or more files enter the details below.

Expand view Topic review: Ghost Rating V2

Re: Ghost Rating V2

by Yonni » Tue Feb 13, 2018 7:53 pm

ChippeRock wrote:
Tue Feb 13, 2018 6:00 pm
I've got the table with the Actual Scores, but not the Expected Scores for those games. Please post a spreadsheet with the player's Expected Score for all of the games.
That doesn't exist right now. I've linked the Matlab code that calculates the expected score for each game. It will take some editing to make it output a table of expected scores for easy comparison. Let me know if you have any questions about it. Maybe I'll get to it at some point in the future but it's going to be put on the back burner because life is getting a bit busy these days.

Re: Ghost Rating V2

by ghug » Tue Feb 13, 2018 6:49 pm

GR has expected scores, but they're buried deep in the code. Someone inclined to mess with it could get them out, but that probably won't be me for a while.

Re: Ghost Rating V2

by ChippeRock » Tue Feb 13, 2018 6:05 pm

Additionally, I don't recall there being data on the Expected Scores for GR v1.

Re: Ghost Rating V2

by ChippeRock » Tue Feb 13, 2018 6:00 pm

Yonni wrote:
Tue Feb 13, 2018 6:16 am
@ ChippeRock

There are definitely good ways to test the strength of a scoring system. However, I don't have the inclination to follow this up with a rigorous statistical analysis of the different systems. The code and data is all available if you want to have a go at it - I can try answer any questions you have about what I wrote.
I've got the table with the Actual Scores, but not the Expected Scores for those games. Please post a spreadsheet with the player's Expected Score for all of the games.

Re: Ghost Rating V2

by Yonni » Tue Feb 13, 2018 3:55 pm

@ CCR

Let's take a look at three concrete examples in WTA games.
(1) Average (R=1000) against 6 opponents with an average rating in the top 10 of classic (R = 1500).
(2) Average (R=1000) against 6 average opponents (R=1000)
(3) An average top 10 player (R=1500) against 6 average opponents (R=1000)

A 4-way draw in example (1), a 3-way draw in example (2), and a 2-way draw in example (3) are all about equal. However, even a 2-way draw in example (1) is not equal to a solo in example (3). This might actually point to a need to have a bigger spread in ratings. At a "top player" rating of 1950, the 2-way and solo become about equal. Though I'm sure there are varying opinions about whether or not that should be the case.

@VI

I'd love to comment more on the PlayDip rating system but, unfortunately, they don't publish it. Having a wealth of data from the PlayDip website and a unified rating system would be great for the hobby but I don't believe that option is on the table. (I'm not one to pick at old wound re: PlayDip and enjoyed playing there for the ODC but I am a bit annoyed by this one).

But, to get to the things we can control...

If you look at the math on the first page, there are two variables that are affected by a player's 'newness' E and R.

E is the bigger one and is calculated E = 1+40/(GamesPlayed+10). For a new player with 0 games played, E = 5. At 10 games, it goes down to 3. At 50 games it goes down to 1.7. I think it's not an unreasonable scale and shouldn't run into the issues you're describing.

R actually works the other way a bit. R = 0.25+2*f where f is the number of provisional players in the game. f will be at most 6/7 for new provisional players but will often be lower because they're playing other new ppl too.

Re: Ghost Rating V2

by VillageIdiot » Tue Feb 13, 2018 9:39 am

”Your rating fluctuates more quickly in your first few games. Unfortunately, people don’t play a ton of Diplomacy games. That means that it takes quite a while for people to reach their “true rating.” GR2 allows for people to rise or drop to their true rating more quickly.”

How quick are we talking here? I ask as this was a pain point over at PlayDip for a while when they adopted a similar approach. It’s been since modified, but for a few years it was set in such a way players could based on a string of ‘beginners luck’ or whatever shoot up to top 10 in their first ten games, lose their momentum or get a bit more known, then suddenly shoot back down. Meanwhile the seasoned players face frustration of getting constantly nudged up and down by all these overnight sensation players who over time turn out to be nowhere near as good as their temporary score had lead to believe.

The other shift in culture you’re going to see should you adopt this approach is if it is more rewarding for better players to have 4wds against other strong players then solos over average joes then your inadvertently encouraging good players to want to stick to games with each other, frustrating the less established players looking to get into the ‘big’ games.

I’m not trying to discourage this at all, having played on both sites there’s pros and cons I enjoy of both systems and actually enjoyed getting to play both sides. PlayDips system (similar to this) encourages more solo seeking/aggressive games yet can be frustrating the damage one fluke bad game (sometimes due to surrenders out of your control) against a low ranked player can effect a rating while WebDip allows for a gradual growth you can feel confident in yet its lack of frisky reward “best outcome possible” or penalizing of underperforming (accepting a 5wd when your stats show you should be able to consistently achieve 3wd) does lead to a more conservative level of game play.

There’s a sweet spot in the middle here I’m sure, if you’ve hit that spot I’m all in!

Re: Ghost Rating V2

by ghug » Tue Feb 13, 2018 7:33 am

@Yonni, I think GR defines live as an hour or less. There are so few games in that time frame that it doesn't really make a difference.

@Chippe, that's the way to test, but obviously it'll be some work to do. It should be noted, though, that there are only two systems, not three, which makes things slightly simpler.

@CCR, that's not exactly right. The two systems probably won't differ two much in the example you give. It's when he loses both games that the differences are stark, as GR would punish him the same for both of them, whereas Yonni's would ding him less for the loss against tougher opponents.

Of course, VI never loses, so it's a moot point. #VIVACASCADIA

Re: Ghost Rating V2

by Yonni » Tue Feb 13, 2018 6:16 am

@ Chluke

The classic rating excludes everything under 12 hours. I'm not sure what GR1 uses as a cutoff but I'm not sure if that's the cause of the discrepancy. I can take a look to make sure I don't have any fuckups

@ ChippeRock

There are definitely good ways to test the strength of a scoring system. However, I don't have the inclination to follow this up with a rigorous statistical analysis of the different systems. The code and data is all available if you want to have a go at it - I can try answer any questions you have about what I wrote.

Re: Ghost Rating V2

by chluke » Tue Feb 13, 2018 4:02 am

Thanks for the work, Yonni.

Man, did I drop like a rock in GRv2 "Classic" games (#51) vs GRv1 Classic (#4)! And it's not just from inactive players. For example, I'm 94(!) spots ahead ahead of "serginss" in GRv1 Classic, yet he is ahead of me in GRv2 Classic!?! :(

I'm going to take a wild guess and say that you're counting live games in "Classic", whereas GRv1 has classic non-live (I've bombed many live games where I couldn't get my timeclock right, and I play even more terribly when I only have 3 minutes per turn.)

Any chance you can break out "live" games, which are a totally different animal (for me at least).

Re: Ghost Rating V2

by ChippeRock » Tue Feb 13, 2018 3:33 am

CCR wrote:
Tue Feb 13, 2018 1:43 am
I also don't want to do the math, ChippeRock, but it can be put as simple as this example:

(Game 1) VI calls some village idiots to play a game, and he settles his way to an easy two way draw with his favourite cousin.

(Game 2) VI plays hard to find his place in a four way draw agasint six other high ranked players.

In normal GR, VI would gain more points from Game 1 than from game 2.

In GRV2, VI would gain more points from Game 2 than from game 1.
I understand math. Of course the expected score would be higher in the first game. What I'm asking for is the average absolute difference between the Expected Score and Absolute Score for players per game for the 3 different rating systems.

However he won't necessarily gain more points in Game 2 than Game 1. Let Yonni get me the numbers and we'll see how good the rating systems are at predicting the result of a game.

Re: Ghost Rating V2

by CCR » Tue Feb 13, 2018 1:43 am

I also don't want to do the math, ChippeRock, but it can be put as simple as this example:

(Game 1) VI calls some village idiots to play a game, and he settles his way to an easy two way draw with his favourite cousin.

(Game 2) VI plays hard to find his place in a four way draw agasint six other high ranked players.

In normal GR, VI would gain more points from Game 1 than from game 2.

In GRV2, VI would gain more points from Game 2 than from game 1.

Re: Ghost Rating V2

by ChippeRock » Tue Feb 13, 2018 12:14 am

What is the average difference between the Actual Score & Expected Score for all of the players and all of the games for EIDRaS, Ghost Rating v1, and Ghost Rating v2? I want the absolute difference between Actual Score & Expected score for each of the players per game (e.g.,|Actual Score - Expected Score| = |Expected Score - Actual Score|).

That's the only way we'll know which one is better since the point of Ghost Rating is to get the Actual Score to equal the Expected Score every single game (Of course this is impossible but your trying to get those two values to be as close to each other as possible).

Re: Ghost Rating V2

by VillageIdiot » Tue Feb 13, 2018 12:03 am

Yonni wrote:
Mon Feb 12, 2018 11:20 pm
TheWiz has you beat in classic games...
Sounds like you’ve still got some bugs in your algorithm, but I trust you’ll iron them out.

Re: Ghost Rating V2

by Octavious » Mon Feb 12, 2018 11:28 pm

Filtering out these quasi-retired players makes a lot of sense. I've no idea who most of these inactive players are. Granted, back in the day VillageIdiot was not without some talent, but who on earth is this zultar character?

Re: Ghost Rating V2

by Yonni » Mon Feb 12, 2018 11:20 pm

TheWiz has you beat in classic games...

Re: Ghost Rating V2

by VillageIdiot » Mon Feb 12, 2018 11:12 pm

Had to filter me out to knock me out, eh?

I like this, puts me back as the best. I approve as well!

Re: Ghost Rating V2

by Octavious » Mon Feb 12, 2018 8:49 pm

On balance I think I approve of these changes...

Re: Ghost Rating V2

by Yonni » Mon Feb 12, 2018 7:37 pm

Condescension wrote:
Mon Feb 12, 2018 7:21 pm
Would you mind publishing the script?
https://drive.google.com/drive/folders/ ... sp=sharing

Re: Ghost Rating V2

by Condescension » Mon Feb 12, 2018 7:21 pm

Would you mind publishing the script?

These are correct changes and I'm surprised the original was so weak (to be frank). Good shit.

Re: Ghost Rating V2

by Yonni » Mon Feb 12, 2018 6:28 pm

ghug wrote:
Mon Feb 12, 2018 6:15 pm
As far as the math goes, adjusting C for the different rating systems is an interesting way to do it. I'm having trouble conceptualizing the actual result of that.
Here's a graph of expected score (against average opponents) vs. rating for different values of c:
https://imgur.com/a/mPrgk

Top