Forum
A place to discuss topics/games with other webDiplomacy players.
Page 918 of 1419
FirstPreviousNextLast
smcbride1983 (517 D)
13 May 12 UTC
I, Claudius Discussion Thread
Howdy, Webdiplo. I am coming to the end of The Satanic Verses and about to pick up I,Claudius by Robert Graves. If anyone would like to discuss the book with me feel free to pick up a copy and read along.
34 replies
Open
Alderian (2425 D(S))
23 May 12 UTC
Ghost Rating update
http://tournaments.webdiplomacy.net/theghost-ratingslist

By Category link added
Page 3 of 5
FirstPreviousNextLast
 
Frank (100 D)
24 May 12 UTC
This is a great addition. Not surprising at all to see Barn3tt at the top of the gunboat list. Great to see 5nk still so high on the all-time list, does anyone still remember him? He was a great player and funny forum poster back in the golden age of live games.
ghug (5068 D(B))
24 May 12 UTC
I'm really bad at gunboat...
Yonni (136 D(S))
24 May 12 UTC
Ghug, I think there's a fair number of us out there. Want to throw your hat in the ring for a few games to try hone our skills?
achillies27 (100 D)
24 May 12 UTC
Well ghug, you seem to rank higher then me in everything but gunboat... you pretty darn good.
Alderian (2425 D(S))
24 May 12 UTC
@CSteinhardt, I'll run it tonight without weighting and see how it affects things. Part of me thinks that everyone will be in the same order, just with the ghost rating values 4 times as far from 100 in one direction or the other. The other part of me isn't so sure.
Ooh, I'm 423 in classic full wta non-live. Sweet.
CSteinhardt (9560 D(B))
24 May 12 UTC
@Alderian: I can mathematically prove to you that wont be the case. PM me if you'd like.
zultar (4180 DMod(P))
24 May 12 UTC
Will this "proof" involve anything like .9999999999 repeating=1?

If it does, then I can tell you that the webdip community has already concluded that this is not possible. Furthermore, any math that cannot be intuited by everyone is obviously not acceptable. If our feelings are not going to be accounted for by this so-called mathematics, then it cannot and should not be accepted.

:)
Yonni (136 D(S))
24 May 12 UTC
Started writing out a proof and forgot how much I miss doing math (or academics, or using my brain). Then I realized how frustrating it could get and how much wine I've had tonight. And gave up.
Now I just want to ask someone else if they have the answers.

God damn, feels like university again.
Draugnar (0 DX)
24 May 12 UTC
1 / 3 = .33333333
.33333333 * 3 = .999999999
1 / 3 * 3 = .9999999999
3 / 3 = .999999999
1 = .999999999

:-)
orathaic (1009 D(B))
24 May 12 UTC
I start out writing 0.9999 repeating, but i keep writing 9s and thus never get to an '='s sign... My proof took to long to write, but intutively i know it is true ;)

And no draug, i don't remember thes details of how GR is calculated, also given that i've been active here (with only one game at a time) for the last three months, not being in the list is a bit of a surprise.
Draugnar (0 DX)
24 May 12 UTC
Didn't ask you to, Ora. I was just chiming in with the .9∞ =1 proof.
Alderian (2425 D(S))
24 May 12 UTC
@CSteinhardt, no need for a proof. Easier for me to just tweak the program to ignore the press weight when run for a specific press type. The numbers came out quite different. A lot of players are in similar positions but others are in very different positions.

I want to look at the specifics of a few people first so that I can explain it properly before making the change official since I've already posted the numbers.

I think it comes down to the difference between players that go W L W L W L and players that go W W W L L L. In the second case, the weighting was cushioning the fall. With the weighting removed, they plummeted.
CSteinhardt (9560 D(B))
24 May 12 UTC
That's one class of ratings where you'll notice it more strongly, but not the effect I was describing. The effect I was describing, rather, was this:

Imagine that you play every single game at the same level, a GR 1000 level. How many games do you have to play to reach GR 1000? Answer: you don't reach it, you just asymptote there. How many games do you have to play to reach GR 500? For the standard formula, if you are playing against similarly-rated players, each game takes you about 1/15 of the way to your true rating. Thus, you might go from 100 to 160, 216, 268, 317, 363, 405, 445, 482, and then you top 500 in 9 games.

What about using the 1/4 weighting? Now it goes from 100 to 115, 130, 144, 159, ..., and finally reaches 500 after 34 games, not 9.

And, this assumes you're able to find people of your rating to play against; in practice, that 9 and 34 ends up more like 30 and 110 games for a typical set of opponents.

To put it another way, the problem is that for your lower-weighted ratings, almost every strong player is *below* their true skill-based rating. However, players who have played more games are not as far below their true rating. So, for example, a player with a true skill of 300 who has played 100 games will end up being rated equal to a player with a true skill of 400 who has played 40 games.

This whole issue, by the way, is one of the reasons that the GR formula should be reworked (I'd be happy to discuss a plan for that if you PM me). The brief answer is that for an ELO-based system, you don't keep the variance constant, but rather you do better to adjust it as information comes in. A player who has played 100 games should see one game have less of an impact than one who has played 10 games, because it's less new information as a fraction of the total. Again, while I'm happy to point you to good ways to solve this problem, what I'm telling you isn't my own, original idea -- this is a problem that's been solved before, and I'm just somebody familiar enough with the literature to point you towards a better solution.
Mr A (386 D)
24 May 12 UTC
I don't see the convergence quality too critical. A single diplomacy game's result is by far less representative for the players 'qualities' than e.g., Chess, or especially Go, where the results of single games are _very_ predictable if there is a relevant difference in the players' scores.

I don't think GR (or another rating system for diplomacy) needs to converge faster or 'better' than the current formula. At least any rating system which tries to improve the convergence should not overrate outlier results.
Draugnar (0 DX)
24 May 12 UTC
Given the multiplayer nature and the emotional aspects of Diplomacy, it is more in line with the "Any Given Sunday" team sports than with a one on one intellectual, pure strategy game.
Yonni (136 D(S))
24 May 12 UTC
Also, because of the shear number of games played as gb as opposed to FP. I see no reason why they need to have the same convergence criteria.
Hellenic Riot (1626 D(G))
24 May 12 UTC
Wow, my live games are better than I thought. Third place on the current list and eighth on the all time list. Gives me something to cling on too!

Pretty good with the WTA Full Press normal games too, considering I play a fairly equal amount of PPSC too.

My gunboat ranking is hilariously bad though. Good reason to give up on those altogether, I think.
Gobbledydook (1389 D(B))
24 May 12 UTC
lol
I'm last place in non-live, full-press games. But I managed to get rank 78 in gunboat.
I suck at written communication I suppose, but my board tactics are much better.
orathaic (1009 D(B))
24 May 12 UTC
'To put it another way, the problem is that for your lower-weighted ratings, almost every strong player is *below* their true skill-based rating.'

but if every 'stronger' player is below their *true* skill rating then the whole system can be seen a deflated, and thus still represent the current skill.

The problem with a faster criteria is that a series of losses (bad luck, but it happens, i've had an entire year go by without a win) throws your skill-rating so completely off - whereas if the skill is just an estimate and everyone is somewhere near their *true skill* then it doesn't matter if on average they are approaching an asymptote, so long as they have played a similar number of games (against similarly skilled players) then their rankings will be close enough to accurate regardless of rating.

Of course ranking is less useful if the number of players is not constant over time, but that effect can be disregarded by looking at the % of players above/below your rank.
Alderian (2425 D(S))
24 May 12 UTC
Yes, and I think I need to add percentile to the rankings.

Regarding the gunboat weighting. Part of the reason it is weighted less in the normal ranking is so that it has less impact compared to a normal press game. But that is only part. The other part is because there is more luck involved.

In full press you may find that no one wants to work with you for reasons other than your press. But, that should be rare. If you are eliminated, odds are, your press played a large part in that.

In gunboat, I would think there is a higher chance of you being screwed no matter what you do and therefore an individual game's outcome is less informative about your skill level, and therefore should be weighted less.
Dejan0707 (1608 D)
24 May 12 UTC
Great work Alderian with this add on, it is really interesting to see statistics in different field of play. I think I will have to work on my gunboat skills thought.
orathaic (1009 D(B))
24 May 12 UTC
so wait, if we weight gunboat more heavily hen we can get a measure of the person's luck instead of their true skill? interesting idea :)
CSteinhardt (9560 D(B))
25 May 12 UTC
"I don't see the convergence quality too critical. A single diplomacy game's result is by far less representative for the players 'qualities' than e.g., Chess, or especially Go, where the results of single games are _very_ predictable if there is a relevant difference in the players' scores. "

So, now you're getting to the core of the problem, which is that the current GR formula is poorly chosen regardless of what weighting you pick. Let me be a little bit more abstract for a moment, apologies in advance!

For an typical rating formula, we're going to make two assumptions:

1) Every player does have an underlying level of skill, and on average they play each game with that skill. However, because Diplomacy is not a deterministic game like chess or go, the results for a given skill are pretty disparate. In other words, in chess an 1800 will beat a 1400 a lot more than in Diplomacy, a GR 180 will beat a GR 140.

2) That underlying level of skill does change over time. However -- and this assumption lies at the core of what we're going to do -- we assume that this change is relatively slow in the sense that it takes a number of games played to become substantially better or worse. If it didn't, then you would want the rating system to be based upon just the last few games, after all, because the others would be irrelevant.

OK, so given both of those things, how should a rating system be designed? First off, it should have two numbers, not one. The current number simply has a rating. However, we need to keep track two numbers: a rating and an uncertainty. The rating should be, quite simply, our best attempt at describing the skill with which the player has played all of their games so far. So, if a player has played just one game in his life, but soloed against the top 6 players in the world, his rating would be the highest in the world, because that's the level of play he's shown over his entire career. Note that treating the rating this way also helps with another aspect of GR; currently it discourages you from playing newbies with good records, because they're probably stronger than GR gives them credit for. This isn't something we want to be actively discouraging, right?

The uncertainty tells you how likely we are to be wrong about our measurement of their skill. For a player who has played just one game, we are very, very uncertain. Using the current GR formula, our player who soloed against the top 6 in the world is unlikely to, in fact, turn out to be a GR 50 player, or even a GR 100 player. But, it wouldn't surprise you too much to see them, in the long run, as a GR 200 player or as a GR 600 player, right? So there's an enormous uncertainty in that rating. On the other hand, a player who has played 1000 games has a very low uncertainty: we know almost exactly how strong he is.

When you tabulate a new game, both of these numbers matter. You use the current ratings as a measure of the skill you are up against, but you also use the uncertainties. The more certain you are about the skill of the players you're up against, the stronger weight the game has, because it's better information. The more certainty in your own rating, the less weight the game has, because you already have a lot of information, so it's not adding as much that's new. Finally, note that uncertainty should be time-based, too. Your uncertainty should always be slowly growing over time, but reduced every game you play. This means that older games progressively fade into the background, which we wanted (see #2 above)

Finally, one last problem: the reason that GR was done this way, presumably, was that you didn't want a player who played one game and soloed against a strong table to end up atop the list. One option is just to only report players who have a low enough uncertainty, meaning they've played enough games recently. But, I understand wanting every player to appear on the list. Hence another idea: rather than reporting our rating, we report a number that is, say, 2 standard deviations *below* our measured rating, using the uncertainties we are also tracking. In other words, we are reporting not the level of skill they have shown, but the minimum level of true skill we are pretty certain that they have. And this would solve the problem of how to include players in the list who have played a small number of games, without having to distort the entire system because of it.

Finally, orathaic, you're right that gunboat has higher variance. How would we account for that in this system? Simple: your point is well taken that gunboat provides less information from one game than full press, so you treat it as less information. Meaning, it's information, but information which is not going to reduce the uncertainty as much as better information would. So, all things being equal, it will take more gunboat games to learn how strong a player is than full press games, which is the behavior you're going for.
Personally, I'm happy to have GR in its current incarnation with whatever flaws it may or may not have. It's free (to me) and IMO a lot more valuable than the D system. I appreciate those who volunteer their time to provide this free service.

Re CSteinhardt's comment about not wanting a player who only played one awesome game atop the list, this could be solved with provisional ratings. You wouldn't get a real rating until you completed X number of games. However, in the meantime, you would get a provisional rating to give you a sense of where you stand, based on how you've done in your games completed thus far. Once you finish X games, you lose the provisional status and get on the official list. IIRC, that's how it worked with chess ratings back when I had one (and maybe how chess still works, I don't know).
14 Gary 7014 175.907374466528 63 15 ( 23% ) 12 ( 19% ) 21 ( 33% ) 15 ( 23% ) 0 ( 0% ) 2/5/2011

This guy should be number 1
CSteinhardt (9560 D(B))
25 May 12 UTC
@Hanged Man: I agree, it's better than the alternatives. But, it's also easy to improve. The basic problem is that we've taken a 2-dimensional system and made it 1-dimensional, without trying to adapt it. Let me try to explain the problem in non-mathematical terms.

Let's think about another rating system: you play with people, and as you do, you make your own judgment about how good they are, right? Let's imagine you've played a lot of games with The Czech and know he's a strong player, and tonight you play one where he's just awful. Are you going to conclude that he's an awful player? No, you're going to maybe think he's not quite as strong as you did before, but you've played with him enough to know that he's a strong player, and maybe just had an off night.

Now, imagine you play with me, and we've played one before and I didn't do anything too noticeable one way or the other, and tonight I'm horrible. You'd conclude that I'm probably a weak player, although of course it's possible that I'm strong player and you caught me on an off night. If I'm actually a strong player, you'll end up revising your opinion over time as we play more games.

The reason you drew two different conclusions from the same result is that for The Czech, you have a lot more experience playing with him, so one game doesn't change your opinion very much. But for me, you don't have much information, so the new game changes your opinion a lot.

A good rating system should behave the same way: for a player it has seen play a lot of games, each new game shouldn't matter so much, but for a new player, it should try to form the best opinion it can pretty quickly, then revise it as it goes along.

Are there systems like GR which don't attempt to do this? I suspect GR was adapted from http://www.eloratings.net/, which also is essentially one-dimensional with weightings. The reason this works for international soccer/football, though, is that each team *does* play approximately the same number of matches each year, which means you can get away with ignoring how often the teams play since you have about the same amount of information about all of them. If this were true on webDip, we wouldn't need to keep track of this second number, and GR would be fine. But, some players do actually play much more often than others, and that's why GR ends up failing when you use it to compare players who have played very different numbers of games.
Yonni (136 D(S))
25 May 12 UTC
Perhaps what the previous few messages is looking for is an analogy to the 'k' value in the Elo system. The k value dictates how volatile the ratings are. Often the sustem is set up so Low ranked players have a high k value while high ranked players have a low k value. However, sometimes it is set up on the basis of number of games played which I think would work very well for diplomacy. It would allow people to settle into a more appropriate rating quicker and then avoid sudden and massive swings. Combining this with the 'provisional' rating that THM described would probably be ideal.
But, really, when it comes down to it - it just takes far too long to amass the number of games required to make any rating system stable or predictable. When you're amassing 20 odd games a year (talking non-live full press because fuck GB ;) ) you simply don't have any great way of doing this. It's probably best to accept the rating system as a bit of a crude standings rather than a predictive metric.

IMHO, the biggest flaw with GR - as I've said before - is its inability to recognize the difference between losing to 6 n00bs or 6 1337 players. You always 'wager' the same amount of rating so a loss is the same no matter who you are playing.
Frickin'Zeus (85 D)
25 May 12 UTC
What CSteinhardt is describing seems to have a lot of resemblance with
http://research.microsoft.com/en-us/projects/trueskill/
MadMarx (36299 D(G))
25 May 12 UTC
This discussion makes me wish I had stuck with being a math major in college... But then I likely wouldn't be an architect and that's no good... Regardless, I'm enjoying the discussion very much, and that said, CSteinhardt, I challenge you to a 48 hour phase game, I want to see how quickly I can crudely assess your skill at a longer-phased full press classic game, and then, in the long run, see if I can do that better than the current GR system, which will obviously include much debate, but isn't that really the entire point?

Page 3 of 5
FirstPreviousNextLast
 

139 replies
Frank (100 D)
30 May 12 UTC
Live Gunboat - 220
9 replies
Open
Victorious (768 D)
30 May 12 UTC
The Winter Gunboat Tournament has ended.
Congrats Manas.
9 replies
Open
krellin (80 DX)
30 May 12 UTC
Define "Work"
It's what I do between WebDiplomacy posts...
8 replies
Open
NigeeBaby (100 D(G))
18 May 12 UTC
The Galaxians
OK, game 1 & 5 are over and here are your positions:-
50 replies
Open
jwalters93 (288 D)
30 May 12 UTC
Is it just me...
Or does it seem like less and less people actually understand sarcasm?
14 replies
Open
Actaris (100 D)
30 May 12 UTC
Austria has been banned in Spring 1901
No moves yet played. 5 days for talking (with the default 24 hour extension). Standard map. Bet of five. Anyone welcome.
http://webdiplomacy.net/board.php?gameID=89765
0 replies
Open
Diplomat33 (243 D(B))
30 May 12 UTC
New Word Association Thread
Because they are fun! Lets make the first word

"banana"
5 replies
Open
redhouse1938 (429 D)
26 May 12 UTC
THE BEST PLAYERS
Who's the best player you ever played against on this site? Why? Share here.
78 replies
Open
SacredDigits (102 D)
30 May 12 UTC
Lost a player, need a replacement Russia
Not gonna lie, it's not the best position...AT seem pretty hellbent on him. But it's not the most horrible either.

gameID=89589
2 replies
Open
NigeeBaby (100 D(G))
30 May 12 UTC
Do Americans deserve Mitt Romney?
A Mormon or a black guy for Preident, no matter what you think of US politics, surely you wouldn't wish Romney on them. that's just mean.....
8 replies
Open
thatwasawkward (4690 D(B))
30 May 12 UTC
Help me put my brain back together.
High pot gunboat time. I just got back from a long weekend and have no games going... Anyone interested in something along the lines of 1500 point buy-ins, 25 hour turns, WTA?
2 replies
Open
NigeeBaby (100 D(G))
30 May 12 UTC
are you Flame-proof?
http://www.bbc.co.uk/news/technology-18238326

Flame, I'm gonna live for ever, I'm gonna learn how to spy
0 replies
Open
slyster (3934 D)
29 May 12 UTC
Short, concise messages, or long lasting novel in Full Press?
More details inside
21 replies
Open
Zmaj (215 D(B))
30 May 12 UTC
EoG: wta gunboat-165
I knew they would get bored eventually...
3 replies
Open
kestasjk (95 DMod(P))
28 May 12 UTC
Downtime
Apologies for the downtime. The web server's logs were taking up a huge amount of space, so the database had nowhere to write to.
24 hours have been added to all games, except short-phase games which will have been paused.
21 replies
Open
Sargmacher (0 DX)
30 May 12 UTC
We Need To Talk About Kevin
Watching the film right now. Has anyone else seen it or read the book? What do you think?
1 reply
Open
cteno4 (100 D)
29 May 12 UTC
Useful pre-game press
What have other players on this site wanted to learn before games start? I'm curious. Some of my favorite info-seeking bits are below.
16 replies
Open
carson87 (102 D)
29 May 12 UTC
anyone want to play a MED Game?
http://webdiplomacy.net/board.php?gameID=89936
0 replies
Open
MadMarx (36299 D(G))
28 May 12 UTC
august rush
Final password sent, will start tonight or Tuesday, player list inside:

http://webdiplomacy.net/board.php?gameID=89627
9 replies
Open
NigeeBaby (100 D(G))
29 May 12 UTC
Disgusting Double-Standards from the UK government
http://www.bbc.co.uk/news/uk-18245780
19 replies
Open
KingJohnII (1575 D(B))
29 May 12 UTC
Masters of the World
Starting a new game - Masters of the World http://webdiplomacy.net/board.php?gameID=90127

Looking for all good players - it's a 101 bet game. Should be fantastic if we can get the required players. Hope you want to join.
0 replies
Open
KingJohnII (1575 D(B))
29 May 12 UTC
Contacting the Moderators
The game 'Masters of Europe' got paused during the recent problem, and we need to get it re-started by the moderator. There are 2 inactive players who won't vote to resume, and we never voted to pause it. Anyone know how to contact the moderator, or perhaps they will see this?
thanks.
2 replies
Open
mapleleaf (0 DX)
27 May 12 UTC
mapleleaf is turning FIFTY this week!
Happy birthday to ME. In honour of myself and my tireless beneficial contributions to this Forum, I hereby present another installment of mapleleaf's Greatest Hits.
57 replies
Open
James Cartwright (400 D)
29 May 12 UTC
Can a mod please draw "We are Back in Black"
gameID=90088

Turkey is stalling and waiting for one of the rest of us to go offline. He has no tactical/strategic reason not to draw at this point.
4 replies
Open
abgemacht (1076 D(G))
24 May 12 UTC
TV no longer ubiquitous?
So, I'm currently moving to a new apartment and I'm probably not going to pay for TV service. Has anyone else taken this approach? It's so expensive for such a poor service, I'm having trouble justifying the cost.
49 replies
Open
Diplomat33 (243 D(B))
22 May 12 UTC
Can a tactically skilled player succeed?
Would a player who is highly tactical and has great and innovative moving abilities, but lacks on some of the finer Diplomacy skills, manage to succeed? Basically, a debate on tactics vs diplomacy and level of importance.
16 replies
Open
JimTheGrey (968 D(S))
22 May 12 UTC
22nd Annual World Diplomacy Championship
The 22nd Annual World Dip Con is coming to Chicago this summer. The dates are Aug. 10-12. Make your plans to be there!
9 replies
Open
CSteinhardt (9560 D(B))
25 May 12 UTC
Sitter possibly needed
Looking for a sitter for a maximum of one turn in nopress games.
6 replies
Open
Page 918 of 1419
FirstPreviousNextLast
Back to top