Old Forum - webDiplomacy

Forum

A place to discuss topics/games with other webDiplomacy players.

Page 918 of 1419

First

Previous

Next

Last

smcbride1983 (517

)
13 May 12 UTC

Thread is too old to reply to

I, Claudius Discussion Thread

Howdy, Webdiplo. I am coming to the end of The Satanic Verses and about to pick up I,Claudius by Robert Graves. If anyone would like to discuss the book with me feel free to pick up a copy and read along.

34 replies

(S)

)
23 May 12 UTC

Thread is too old to reply to

Ghost Rating update

http://tournaments.webdiplomacy.net/theghost-ratingslist

By Category link added

Page 3 of 5

First

Previous

Next

Last

Frank (100 )
24 May 12 UTC

This is a great addition. Not surprising at all to see Barn3tt at the top of the gunboat list. Great to see 5nk still so high on the all-time list, does anyone still remember him? He was a great player and funny forum poster back in the golden age of live games.

ghug (5068 )
24 May 12 UTC

I'm really bad at gunboat...

Yonni (136 )
24 May 12 UTC

Ghug, I think there's a fair number of us out there. Want to throw your hat in the ring for a few games to try hone our skills?

achillies27 (100 )
24 May 12 UTC

Well ghug, you seem to rank higher then me in everything but gunboat... you pretty darn good.

Alderian (2425 )
24 May 12 UTC

@CSteinhardt, I'll run it tonight without weighting and see how it affects things. Part of me thinks that everyone will be in the same order, just with the ghost rating values 4 times as far from 100 in one direction or the other. The other part of me isn't so sure.

AverageWhiteBoy (314 )
24 May 12 UTC

Ooh, I'm 423 in classic full wta non-live. Sweet.

CSteinhardt (9560 )
24 May 12 UTC

@Alderian: I can mathematically prove to you that wont be the case. PM me if you'd like.

zultar (4180 )
24 May 12 UTC

Will this "proof" involve anything like .9999999999 repeating=1?

If it does, then I can tell you that the webdip community has already concluded that this is not possible. Furthermore, any math that cannot be intuited by everyone is obviously not acceptable. If our feelings are not going to be accounted for by this so-called mathematics, then it cannot and should not be accepted.

:)

Yonni (136 )
24 May 12 UTC

Started writing out a proof and forgot how much I miss doing math (or academics, or using my brain). Then I realized how frustrating it could get and how much wine I've had tonight. And gave up.
Now I just want to ask someone else if they have the answers.

God damn, feels like university again.

Draugnar (0 )
24 May 12 UTC

1 / 3 = .33333333
.33333333 * 3 = .999999999
1 / 3 * 3 = .9999999999
3 / 3 = .999999999
1 = .999999999

:-)

orathaic (1009 )
24 May 12 UTC

I start out writing 0.9999 repeating, but i keep writing 9s and thus never get to an '='s sign... My proof took to long to write, but intutively i know it is true ;)

And no draug, i don't remember thes details of how GR is calculated, also given that i've been active here (with only one game at a time) for the last three months, not being in the list is a bit of a surprise.

Draugnar (0 )
24 May 12 UTC

Didn't ask you to, Ora. I was just chiming in with the .9∞ =1 proof.

Alderian (2425 )
24 May 12 UTC

@CSteinhardt, no need for a proof. Easier for me to just tweak the program to ignore the press weight when run for a specific press type. The numbers came out quite different. A lot of players are in similar positions but others are in very different positions.

I want to look at the specifics of a few people first so that I can explain it properly before making the change official since I've already posted the numbers.

I think it comes down to the difference between players that go W L W L W L and players that go W W W L L L. In the second case, the weighting was cushioning the fall. With the weighting removed, they plummeted.

CSteinhardt (9560 )
24 May 12 UTC

That's one class of ratings where you'll notice it more strongly, but not the effect I was describing. The effect I was describing, rather, was this:

Imagine that you play every single game at the same level, a GR 1000 level. How many games do you have to play to reach GR 1000? Answer: you don't reach it, you just asymptote there. How many games do you have to play to reach GR 500? For the standard formula, if you are playing against similarly-rated players, each game takes you about 1/15 of the way to your true rating. Thus, you might go from 100 to 160, 216, 268, 317, 363, 405, 445, 482, and then you top 500 in 9 games.

What about using the 1/4 weighting? Now it goes from 100 to 115, 130, 144, 159, ..., and finally reaches 500 after 34 games, not 9.

And, this assumes you're able to find people of your rating to play against; in practice, that 9 and 34 ends up more like 30 and 110 games for a typical set of opponents.

To put it another way, the problem is that for your lower-weighted ratings, almost every strong player is *below* their true skill-based rating. However, players who have played more games are not as far below their true rating. So, for example, a player with a true skill of 300 who has played 100 games will end up being rated equal to a player with a true skill of 400 who has played 40 games.

This whole issue, by the way, is one of the reasons that the GR formula should be reworked (I'd be happy to discuss a plan for that if you PM me). The brief answer is that for an ELO-based system, you don't keep the variance constant, but rather you do better to adjust it as information comes in. A player who has played 100 games should see one game have less of an impact than one who has played 10 games, because it's less new information as a fraction of the total. Again, while I'm happy to point you to good ways to solve this problem, what I'm telling you isn't my own, original idea -- this is a problem that's been solved before, and I'm just somebody familiar enough with the literature to point you towards a better solution.

Mr A (386 )
24 May 12 UTC

I don't see the convergence quality too critical. A single diplomacy game's result is by far less representative for the players 'qualities' than e.g., Chess, or especially Go, where the results of single games are _very_ predictable if there is a relevant difference in the players' scores.

I don't think GR (or another rating system for diplomacy) needs to converge faster or 'better' than the current formula. At least any rating system which tries to improve the convergence should not overrate outlier results.

Draugnar (0 )
24 May 12 UTC

Given the multiplayer nature and the emotional aspects of Diplomacy, it is more in line with the "Any Given Sunday" team sports than with a one on one intellectual, pure strategy game.

Yonni (136 )
24 May 12 UTC

Also, because of the shear number of games played as gb as opposed to FP. I see no reason why they need to have the same convergence criteria.

Hellenic Riot (1626 )
24 May 12 UTC

Wow, my live games are better than I thought. Third place on the current list and eighth on the all time list. Gives me something to cling on too!

Pretty good with the WTA Full Press normal games too, considering I play a fairly equal amount of PPSC too.

My gunboat ranking is hilariously bad though. Good reason to give up on those altogether, I think.

Gobbledydook (1389 )
24 May 12 UTC

lol
I'm last place in non-live, full-press games. But I managed to get rank 78 in gunboat.
I suck at written communication I suppose, but my board tactics are much better.

orathaic (1009 )
24 May 12 UTC

'To put it another way, the problem is that for your lower-weighted ratings, almost every strong player is *below* their true skill-based rating.'

but if every 'stronger' player is below their *true* skill rating then the whole system can be seen a deflated, and thus still represent the current skill.

The problem with a faster criteria is that a series of losses (bad luck, but it happens, i've had an entire year go by without a win) throws your skill-rating so completely off - whereas if the skill is just an estimate and everyone is somewhere near their *true skill* then it doesn't matter if on average they are approaching an asymptote, so long as they have played a similar number of games (against similarly skilled players) then their rankings will be close enough to accurate regardless of rating.

Of course ranking is less useful if the number of players is not constant over time, but that effect can be disregarded by looking at the % of players above/below your rank.

Alderian (2425 )
24 May 12 UTC

Yes, and I think I need to add percentile to the rankings.

Regarding the gunboat weighting. Part of the reason it is weighted less in the normal ranking is so that it has less impact compared to a normal press game. But that is only part. The other part is because there is more luck involved.

In full press you may find that no one wants to work with you for reasons other than your press. But, that should be rare. If you are eliminated, odds are, your press played a large part in that.

In gunboat, I would think there is a higher chance of you being screwed no matter what you do and therefore an individual game's outcome is less informative about your skill level, and therefore should be weighted less.

Dejan0707 (1608 )
24 May 12 UTC

Great work Alderian with this add on, it is really interesting to see statistics in different field of play. I think I will have to work on my gunboat skills thought.

orathaic (1009 )
24 May 12 UTC

so wait, if we weight gunboat more heavily hen we can get a measure of the person's luck instead of their true skill? interesting idea :)

CSteinhardt (9560 )
25 May 12 UTC

"I don't see the convergence quality too critical. A single diplomacy game's result is by far less representative for the players 'qualities' than e.g., Chess, or especially Go, where the results of single games are _very_ predictable if there is a relevant difference in the players' scores. "

So, now you're getting to the core of the problem, which is that the current GR formula is poorly chosen regardless of what weighting you pick. Let me be a little bit more abstract for a moment, apologies in advance!

For an typical rating formula, we're going to make two assumptions:

1) Every player does have an underlying level of skill, and on average they play each game with that skill. However, because Diplomacy is not a deterministic game like chess or go, the results for a given skill are pretty disparate. In other words, in chess an 1800 will beat a 1400 a lot more than in Diplomacy, a GR 180 will beat a GR 140.

2) That underlying level of skill does change over time. However -- and this assumption lies at the core of what we're going to do -- we assume that this change is relatively slow in the sense that it takes a number of games played to become substantially better or worse. If it didn't, then you would want the rating system to be based upon just the last few games, after all, because the others would be irrelevant.

OK, so given both of those things, how should a rating system be designed? First off, it should have two numbers, not one. The current number simply has a rating. However, we need to keep track two numbers: a rating and an uncertainty. The rating should be, quite simply, our best attempt at describing the skill with which the player has played all of their games so far. So, if a player has played just one game in his life, but soloed against the top 6 players in the world, his rating would be the highest in the world, because that's the level of play he's shown over his entire career. Note that treating the rating this way also helps with another aspect of GR; currently it discourages you from playing newbies with good records, because they're probably stronger than GR gives them credit for. This isn't something we want to be actively discouraging, right?

The uncertainty tells you how likely we are to be wrong about our measurement of their skill. For a player who has played just one game, we are very, very uncertain. Using the current GR formula, our player who soloed against the top 6 in the world is unlikely to, in fact, turn out to be a GR 50 player, or even a GR 100 player. But, it wouldn't surprise you too much to see them, in the long run, as a GR 200 player or as a GR 600 player, right? So there's an enormous uncertainty in that rating. On the other hand, a player who has played 1000 games has a very low uncertainty: we know almost exactly how strong he is.

When you tabulate a new game, both of these numbers matter. You use the current ratings as a measure of the skill you are up against, but you also use the uncertainties. The more certain you are about the skill of the players you're up against, the stronger weight the game has, because it's better information. The more certainty in your own rating, the less weight the game has, because you already have a lot of information, so it's not adding as much that's new. Finally, note that uncertainty should be time-based, too. Your uncertainty should always be slowly growing over time, but reduced every game you play. This means that older games progressively fade into the background, which we wanted (see #2 above)

Finally, one last problem: the reason that GR was done this way, presumably, was that you didn't want a player who played one game and soloed against a strong table to end up atop the list. One option is just to only report players who have a low enough uncertainty, meaning they've played enough games recently. But, I understand wanting every player to appear on the list. Hence another idea: rather than reporting our rating, we report a number that is, say, 2 standard deviations *below* our measured rating, using the uncertainties we are also tracking. In other words, we are reporting not the level of skill they have shown, but the minimum level of true skill we are pretty certain that they have. And this would solve the problem of how to include players in the list who have played a small number of games, without having to distort the entire system because of it.

Finally, orathaic, you're right that gunboat has higher variance. How would we account for that in this system? Simple: your point is well taken that gunboat provides less information from one game than full press, so you treat it as less information. Meaning, it's information, but information which is not going to reduce the uncertainty as much as better information would. So, all things being equal, it will take more gunboat games to learn how strong a player is than full press games, which is the behavior you're going for.

The Hanged Man (4160 )
25 May 12 UTC

Personally, I'm happy to have GR in its current incarnation with whatever flaws it may or may not have. It's free (to me) and IMO a lot more valuable than the D system. I appreciate those who volunteer their time to provide this free service.

Re CSteinhardt's comment about not wanting a player who only played one awesome game atop the list, this could be solved with provisional ratings. You wouldn't get a real rating until you completed X number of games. However, in the meantime, you would get a provisional rating to give you a sense of where you stand, based on how you've done in your games completed thus far. Once you finish X games, you lose the provisional status and get on the official list. IIRC, that's how it worked with chess ratings back when I had one (and maybe how chess still works, I don't know).

Lando Calrissian (100 )
25 May 12 UTC

14 Gary 7014 175.907374466528 63 15 ( 23% ) 12 ( 19% ) 21 ( 33% ) 15 ( 23% ) 0 ( 0% ) 2/5/2011

This guy should be number 1

CSteinhardt (9560 )
25 May 12 UTC

@Hanged Man: I agree, it's better than the alternatives. But, it's also easy to improve. The basic problem is that we've taken a 2-dimensional system and made it 1-dimensional, without trying to adapt it. Let me try to explain the problem in non-mathematical terms.

Let's think about another rating system: you play with people, and as you do, you make your own judgment about how good they are, right? Let's imagine you've played a lot of games with The Czech and know he's a strong player, and tonight you play one where he's just awful. Are you going to conclude that he's an awful player? No, you're going to maybe think he's not quite as strong as you did before, but you've played with him enough to know that he's a strong player, and maybe just had an off night.

Now, imagine you play with me, and we've played one before and I didn't do anything too noticeable one way or the other, and tonight I'm horrible. You'd conclude that I'm probably a weak player, although of course it's possible that I'm strong player and you caught me on an off night. If I'm actually a strong player, you'll end up revising your opinion over time as we play more games.

The reason you drew two different conclusions from the same result is that for The Czech, you have a lot more experience playing with him, so one game doesn't change your opinion very much. But for me, you don't have much information, so the new game changes your opinion a lot.

A good rating system should behave the same way: for a player it has seen play a lot of games, each new game shouldn't matter so much, but for a new player, it should try to form the best opinion it can pretty quickly, then revise it as it goes along.

Are there systems like GR which don't attempt to do this? I suspect GR was adapted from http://www.eloratings.net/, which also is essentially one-dimensional with weightings. The reason this works for international soccer/football, though, is that each team *does* play approximately the same number of matches each year, which means you can get away with ignoring how often the teams play since you have about the same amount of information about all of them. If this were true on webDip, we wouldn't need to keep track of this second number, and GR would be fine. But, some players do actually play much more often than others, and that's why GR ends up failing when you use it to compare players who have played very different numbers of games.

Yonni (136 )
25 May 12 UTC

Perhaps what the previous few messages is looking for is an analogy to the 'k' value in the Elo system. The k value dictates how volatile the ratings are. Often the sustem is set up so Low ranked players have a high k value while high ranked players have a low k value. However, sometimes it is set up on the basis of number of games played which I think would work very well for diplomacy. It would allow people to settle into a more appropriate rating quicker and then avoid sudden and massive swings. Combining this with the 'provisional' rating that THM described would probably be ideal.
But, really, when it comes down to it - it just takes far too long to amass the number of games required to make any rating system stable or predictable. When you're amassing 20 odd games a year (talking non-live full press because fuck GB ;) ) you simply don't have any great way of doing this. It's probably best to accept the rating system as a bit of a crude standings rather than a predictive metric.

IMHO, the biggest flaw with GR - as I've said before - is its inability to recognize the difference between losing to 6 n00bs or 6 1337 players. You always 'wager' the same amount of rating so a loss is the same no matter who you are playing.

Frickin'Zeus (85 )
25 May 12 UTC

What CSteinhardt is describing seems to have a lot of resemblance with
http://research.microsoft.com/en-us/projects/trueskill/

MadMarx (36299 )
25 May 12 UTC

This discussion makes me wish I had stuck with being a math major in college... But then I likely wouldn't be an architect and that's no good... Regardless, I'm enjoying the discussion very much, and that said, CSteinhardt, I challenge you to a 48 hour phase game, I want to see how quickly I can crudely assess your skill at a longer-phased full press classic game, and then, in the long run, see if I can do that better than the current GR system, which will obviously include much debate, but isn't that really the entire point?

Page 3 of 5

First

Previous

Next

Last

139 replies

)
30 May 12 UTC

Thread is too old to reply to

Live Gunboat - 220

9 replies

Victorious (768

)
30 May 12 UTC

Thread is too old to reply to

The Winter Gunboat Tournament has ended.

Congrats Manas.

9 replies

)
30 May 12 UTC

Thread is too old to reply to

Define "Work"

It's what I do between WebDiplomacy posts...

8 replies

(G)

)
18 May 12 UTC

Thread is too old to reply to

The Galaxians

OK, game 1 & 5 are over and here are your positions:-

50 replies

jwalters93 (288

)
30 May 12 UTC

Thread is too old to reply to

Is it just me...

Or does it seem like less and less people actually understand sarcasm?

14 replies

)
30 May 12 UTC

Thread is too old to reply to

Austria has been banned in Spring 1901

No moves yet played. 5 days for talking (with the default 24 hour extension). Standard map. Bet of five. Anyone welcome.
http://webdiplomacy.net/board.php?gameID=89765

0 replies

Diplomat33 (243

(B)

)
30 May 12 UTC

Thread is too old to reply to

New Word Association Thread

Because they are fun! Lets make the first word

"banana"

5 replies

redhouse1938 (429

)
26 May 12 UTC

Thread is too old to reply to

THE BEST PLAYERS

Who's the best player you ever played against on this site? Why? Share here.

78 replies

SacredDigits (102

)
30 May 12 UTC

Thread is too old to reply to

Lost a player, need a replacement Russia

Not gonna lie, it's not the best position...AT seem pretty hellbent on him. But it's not the most horrible either.

gameID=89589

2 replies

(G)

)
30 May 12 UTC

Thread is too old to reply to

Do Americans deserve Mitt Romney?

A Mormon or a black guy for Preident, no matter what you think of US politics, surely you wouldn't wish Romney on them. that's just mean.....

8 replies

thatwasawkward (4690

(B)

)
30 May 12 UTC

Thread is too old to reply to

Help me put my brain back together.

High pot gunboat time. I just got back from a long weekend and have no games going... Anyone interested in something along the lines of 1500 point buy-ins, 25 hour turns, WTA?

2 replies

(G)

)
30 May 12 UTC

Thread is too old to reply to

are you Flame-proof?

http://www.bbc.co.uk/news/technology-18238326

Flame, I'm gonna live for ever, I'm gonna learn how to spy

0 replies

)
29 May 12 UTC

Thread is too old to reply to

Short, concise messages, or long lasting novel in Full Press?

More details inside

21 replies

(B)

)
30 May 12 UTC

Thread is too old to reply to

EoG: wta gunboat-165

I knew they would get bored eventually...

3 replies

Mod

(P)

)
28 May 12 UTC

Thread is too old to reply to

Downtime

Apologies for the downtime. The web server's logs were taking up a huge amount of space, so the database had nowhere to write to.
24 hours have been added to all games, except short-phase games which will have been paused.

21 replies

)
30 May 12 UTC

Thread is too old to reply to

We Need To Talk About Kevin

Watching the film right now. Has anyone else seen it or read the book? What do you think?

1 reply

)
29 May 12 UTC

Thread is too old to reply to

Useful pre-game press

What have other players on this site wanted to learn before games start? I'm curious. Some of my favorite info-seeking bits are below.

16 replies

)
29 May 12 UTC

Thread is too old to reply to

anyone want to play a MED Game?

http://webdiplomacy.net/board.php?gameID=89936

0 replies

(G)

)
28 May 12 UTC

Thread is too old to reply to

august rush

Final password sent, will start tonight or Tuesday, player list inside:

http://webdiplomacy.net/board.php?gameID=89627

9 replies

(G)

)
29 May 12 UTC

Thread is too old to reply to

Disgusting Double-Standards from the UK government

http://www.bbc.co.uk/news/uk-18245780

19 replies

KingJohnII (1575

(B)

)
29 May 12 UTC

Thread is too old to reply to

Masters of the World

Starting a new game - Masters of the World http://webdiplomacy.net/board.php?gameID=90127

Looking for all good players - it's a 101 bet game. Should be fantastic if we can get the required players. Hope you want to join.

0 replies

KingJohnII (1575

(B)

)
29 May 12 UTC

Thread is too old to reply to

Contacting the Moderators

The game 'Masters of Europe' got paused during the recent problem, and we need to get it re-started by the moderator. There are 2 inactive players who won't vote to resume, and we never voted to pause it. Anyone know how to contact the moderator, or perhaps they will see this?
thanks.

2 replies

)
27 May 12 UTC

Thread is too old to reply to

mapleleaf is turning FIFTY this week!

Happy birthday to ME. In honour of myself and my tireless beneficial contributions to this Forum, I hereby present another installment of mapleleaf's Greatest Hits.

57 replies

James Cartwright (400

)
29 May 12 UTC

Thread is too old to reply to

Can a mod please draw "We are Back in Black"

gameID=90088

Turkey is stalling and waiting for one of the rest of us to go offline. He has no tactical/strategic reason not to draw at this point.

4 replies

abgemacht (1076

(G)

)
24 May 12 UTC

Thread is too old to reply to

TV no longer ubiquitous?

So, I'm currently moving to a new apartment and I'm probably not going to pay for TV service. Has anyone else taken this approach? It's so expensive for such a poor service, I'm having trouble justifying the cost.

49 replies

Diplomat33 (243

(B)

)
22 May 12 UTC

Thread is too old to reply to

Can a tactically skilled player succeed?

Would a player who is highly tactical and has great and innovative moving abilities, but lacks on some of the finer Diplomacy skills, manage to succeed? Basically, a debate on tactics vs diplomacy and level of importance.

16 replies

JimTheGrey (968

(S)

)
22 May 12 UTC

Thread is too old to reply to

22nd Annual World Diplomacy Championship

The 22nd Annual World Dip Con is coming to Chicago this summer. The dates are Aug. 10-12. Make your plans to be there!

9 replies

CSteinhardt (9560

(B)

)
25 May 12 UTC

Thread is too old to reply to

Sitter possibly needed

Looking for a sitter for a maximum of one turn in nopress games.

6 replies

Page 918 of 1419

First

Previous

Next

Last