Forum
A place to discuss topics/games with other webDiplomacy players.
Page 1301 of 1419
FirstPreviousNextLast
Smokey Gem (154 D)
02 Feb 16 UTC
vdiplomacy what is it ??
is that the same as webdip or can you do you need different accounts on both ??
11 replies
Open
Valis2501 (2850 D(G))
28 Dec 15 UTC
(+8)
Beginner Game!
If you'd like to play a game with fellow new players, whether new to the game or just new to the site, please read below.
53 replies
Open
Slyguy270 (527 D)
01 Feb 16 UTC
Democracy4US
Quite simply, Democracy4US is an app built to fix politics in America.
20 replies
Open
brainbomb (290 D)
02 Feb 16 UTC
Donald Trump threatens Iowans
In his speech last night Trump was cold and unforgiving to the Iowans. He threatened to buy a farm and move to Iowa.
2 replies
Open
MrcsAurelius (3051 D(B))
02 Feb 16 UTC
Hope someone knows!
Recent changes question.
1 reply
Open
CommanderByron (801 D(S))
02 Feb 16 UTC
Quality Live Game
I was considering starting a RP (Rulebook Press?) around 4pm EST (New York City Time). I was thinking 10 minute phases (assuming that the auto ready on retreat and builds will save us a ton of time) I would prefer if you have at least 10 games to your name and the bet will be 50 D. add your name in list format below. I will send out the Passwords at 3pm EST
3 replies
Open
lauridsena (910 D)
02 Feb 16 UTC
Checking adjacent territories
Is there any way to check ahead of time, in any manner, if two territories are adjacent? There are some territories that seem adjacent, but I don't know if fleets can travel between them. I just don't know how to see if they are or not and don't want to take the chance they aren't and try to move somewhere only to find out the next move is impossible
4 replies
Open
steephie22 (182 D(S))
13 Dec 15 UTC
Playtesting my boardgame online
Hello everyone,
version 1 of my boardgame is finished. It was brought to my attention that it's probably a good idea to test it online. Two things needed there: 1. players and
2. some sort of adjudicator to use, with which I can easily add and move around 6 unit types, factories, territory markers and 4 kinds of resources, while also keeping track of various variables (although that can be done fairly easily outside of the adjudicator).
Can you help with either of those?
61 replies
Open
brainbomb (290 D)
01 Feb 16 UTC
(+1)
New site feature: Voice chat press
Its simple you get a 7 player game public voice chat and webcam channel. So its like the closest thing to ftf Diplomacy since sliced bread. Can you imagine the look on Valis face when I build a fleet in Mar and hes in Italian Leponto. Come on webcam and voice chat would be hilarious.
20 replies
Open
Fluminator (1500 D)
29 Jan 16 UTC
Epic Mafia makes the news.
Hey, our live Mafia website has made the UK news:
http://www.theguardian.com/technology/2016/jan/28/death-of-a-troll
23 replies
Open
Valis2501 (2850 D(G))
31 Jan 16 UTC
SoS gunoats
I went from 60 games to 3 games waiting for this fucking ODC Finals to start and I need something to play but can't commit to press games.

Here's 14 SoS gunboats. Join if able and willing please. Thanks.
13 replies
Open
A_Tin_Can (2234 D)
29 Jan 16 UTC
(+1)
Reliability Rating (RR) discussion
Since this has come up in the other thread-
82 replies
Open
domwnec (254 D)
31 Jan 16 UTC
How to create an app?
At work we're thinking of creating an app. No one knows how to do it internally. Has anyone gone through this before? Any valuable lessons learned? Pros and cons? Cost of development and upkeep? Thank you.
6 replies
Open
brainbomb (290 D)
01 Feb 16 UTC
And the Groundhog.....
Wait for it...Fuck winter
1 reply
Open
Lando Calrissian (100 D(S))
31 Jan 16 UTC
long phase gb
Id like to play a game but cant get Online consistenly. If you're willing please consider http://webdiplomacy.net/board.php?gameID=173461
3 replies
Open
zultar (4180 DMod(P))
29 Jan 16 UTC
Site Update: Skill Ratings and integration
Important. Please read.
Page 1 of 5
FirstPreviousNextLast
 
zultar (4180 DMod(P))
29 Jan 16 UTC
Hello folks,

As most of you who have participated in the forum a lot probably know, we have had discussions about our rating system and site integration for a long time. The mod team and development team have written up a proposal to see how the community would feel about a new rating system and its integration into the site so that it is not updated once a month or longer anymore. ATC who is taking the lead on this will post a detailed explanation below. We wanted to ask the community if this is the path that we should take.

zultar
Co-Owner
jmo1121109 (3812 D)
29 Jan 16 UTC
ninja
jmo1121109 (3812 D)
29 Jan 16 UTC
dammit, shouldn't have replied to you on gchat before posting that.
A_Tin_Can (2234 D)
29 Jan 16 UTC
(+6)
So, we've discussed integrating ratings with the site before. This would be awesome, because it would mean we don't have to wait for the end of the month (ish) for ratings, and could do other fun things like have graphs of how much your rating changed on a per game basis, and automatic categories. Maybe graphs of your rating vs chosen rival players so you can compete? There are many possibilities.

It would also mean that we could put your rating number where points are on the profile, and potentially move towards replacing points (although, yes, points do a number of things like preventing newbies from signing up for 100 games, which we'd have to make sure are handled by other features).

The first iteration of ratings integration won't be a removal of points. But, we would like to replace GR.

I'm making this post because that's obviously a pretty major change, and without the support of the community, I'm not prepared to do it (especially after the SWS debate).

So! The summary of this post is going to be "Would you support integrated ratings if they were a different system to GR?"

Broadly, I'm proposing the following changes:

* We'd no longer produce a GR list every month
* Ratings would be integrated into the site, and shown on the users profile/hall of fame/username
* There'd be a graph on your profile page showing your rating change over some period.
* There would be categories - probably just gunboat and FP to start with
* It is likely that we'd only include Classic games to start with

I'm going to go into some gory details of why we think this would be a good change. You can skip this bit if you're not a ratings nerd. If you are - well, strap in!
----
Let's start with some quick definitions- in the context of rating players in games, we have:

Score: The outcome from a game. This tells you how good a particular game result is under a particular scoring system. On webdip, we have three scoring systems, including past games - DSS, SWS, and SoS.

Rating: Rating tells you how good a particular game score is for a particular user. Did a newbie get eliminated vs the best on the site? Well, that's not a bad result. Did the best on the site get eliminated vs a bunch of newbies? That's a pretty terrible result.

Our scoring systems are very easy to model (ignoring points for a moment): In DSS, you score 1/(number of winners), where "winners" includes draw participants. You score 0 otherwise. SWS scores 1/(draw size) in a draw, or (number of centres)/34 in a solo. You score 0 otherwise. Modelling the score this way is what GR currently does.

It would be possible to include the point pot as part of the scoring system (higher pot games being worth more), but I haven't looked in to it. Personally, I don't play with the point pot in mind, and I'm assuming most players in the top 2-300 or so don't either.

So, what's wrong with GR?

GR is (loosely) based on the Elo rating system. Elo is primarily used in the chess world, and is a 0-sum way of redistributing rating points between two players to infer a ranking. The basic idea is that if two players A (ranked highly) and B (ranked low) play a game, then we expect A to beat B.

If A beats B, then we figure that our ranking is correct, and A only gets a small boost to their rating, while B gets a small decrease. However, if the unexpected happens and B beats A, we figure both are ranked incorrectly, and A gets a large drop in rating, whereas B gets a correspondingly large increase.

The formula for each player after a game is:

R(new) = R(old) + K * (A-E)

R = rating
A = actual outcome
E = expected outcome, typically modelled as a logistic curve
K = some constant to determine the maximum rating change

This is a good system; it means that if you get beaten by someone who is (rated) much better than you, your rating doesn't really suffer. It also means that if you beat someone who is (rated) much better than you, then you get a big boost in rating. Elo has some drawbacks (ratings inflation, incentive for highly ranked players not to play), but I'm not actually going to go into them here.

This sounds great, what's wrong with GR?

GR doesn't follow some of the principles of Elo. Here's how GR works in DSS:

K=(R(old)/sum(R(everyone in this game))/17.5
E=R(old)/sum(R(everyone in this game))

This K factor is incredibly unusual, since it means that the rating change possible after a game depends on the players in it. It also has some strange issues, since the term in the K factor cancels out the term in the expected outcome calculation. Specifically:

1) Losses always cost the same in GR.

GR can be thought of as "betting" about 6% percent of your points. Everyone puts 6% of their rating in a pot, and the pot is divided amongst the winners. This means that you lose 6% of your rating in a loss, no matter whether we expected you to lose, or expected you to win. This seems wrong.

If the top player on the site gets beaten by a bunch of low ranked players, they should take a big rating hit. GR treats that loss as equivalent to being beaten by other top players.

2) A solo against equally rated players is "worth" more if all the players are top rated.

So, if 7 players rated at 100 play a game, and one of them solos, they get a rating boost of 34.2.

If 7 players rated at 200 play a game, and one of them solos, THEY get a rating boost of 68.5.

It seems wrong for the boost to be larger when you're all rated higher (but equal).

In Elo, you're supposed to get bigger ratings boost for beating players who are better than you - not for beating players who are equally rated. It's always impressive to solo over 7 players who are as good as you are - and I don't think it gets more impressive as those players get better.

Both of these issues are violations of the behaviour of Elo. You might argue that it doesn't matter - but the problem is measurable.

You can measure the prediction error between the expected outcome and the actual outcome, and see how "good" a system is at predicting game outcomes we can do this with RMSE - the Root Means Square Error (smaller is better). If you just look at DSS games, then GR has a RMSE of 0.68. This is pretty high. You can drop that a bit, by using a constant K-factor (which makes GR a lot more-Elo like). This produced a RMSE of around 0.6. By playing around with a system similar to vDip's rating system, I was able to get RMSE down to 0.44 - a vast improvement.

(for the pedantic: yes, I know measuring within-sample isn't the best way to evaluate a rating, and that measuring this way is measuring the "residuals" rather than "prediction error". But the result is still meaningful).

However, even though systems like the vDip system are clearly an improvement, Elo-type rating systems aren't good fits for Diplomacy. Elo isn't really designed to work with more than two players (which is why vDip's rating treats one Diplomacy game as a number of two player games between all the players - a simplification that has its own problems - including introducing new incentives like "kill the highest rated player first"), and also doesn't work well with a variety of outcomes - Elo prefers Win/Draw/Loss.

So, a new rating system for Diplomacy is an open research project. However! There's already a rating system that exists, is well respected, works for any number of players, is scoring system agnostic, and works for a variety of game outcomes.

The TrueSkill ( http://research.microsoft.com/en-us/projects/trueskill/ ) rating system would potentially work very well for our needs, and I believe would be measurably better than GR. It also has the advantage that it is designed to react well to changes in players skill - something that happens all the time on webdip, as players improve very quickly over their first few games.

Using an established rating system would *also* neatly sidestep religious wars over the precise implementation details of GR 2.0 (or whatever).

So - in summary - would you, as the community, support integrated ratings if we chose a different system than GR? Do you have any strong objections to TrueSkill? It is a near perfect fit for our needs.

The intention here is to replace the GR system with a system designed to produce more accurate ratings (which would give a more meaning to the ratings), and to integrate the ratings (which would take ratings out of the cupboard of players who've been around for a while, and into the light where even newbies know what their rating is).

Thoughts and comments welcome, either here or to the mod email address, [email protected].

The primary question is "Would you support site-integrated ratings if we moved away from GR?"
Yoyoyozo (65 D)
29 Jan 16 UTC
(+1)
YEEEEESSSSSSS!!!!!!!!! That would be AWESOMEEEEEEEEEE :DDDDDD :))))
Lethologica (203 D)
29 Jan 16 UTC
(+1)
yay truskil
do iiiiiiiiiiiiit
grumbledook (569 D(S))
29 Jan 16 UTC
(+1)
Yep!
Valis2501 (2850 D(G))
29 Jan 16 UTC
(+1)
yay truskil
do iiiiiiiiiiiiit
abgemacht (1076 D(G))
29 Jan 16 UTC
(+1)
My body is ready
ishirkmywork (1401 D)
29 Jan 16 UTC
(+1)
DO IT
The one thing I would question, and I do support the change, and it sounds great, is whether it really is less impressive to solo against 'better' players?

You say "In Elo, you're supposed to get bigger ratings boost for beating players who are better than you - not for beating players who are equally rated. It's always impressive to solo over 7 players who are as good as you are - and I don't think it gets more impressive as those players get better." - I do think the solo does get more impressive as those players get better... Maybe i'm in the minority, but I do think that the boost should be higher in higher level games, even if you're all rated equally.

Still great thoughts and thanks for the effort.
TrPrado (461 D)
29 Jan 16 UTC
(+1)
YESYESYESYRSFHTSWEGHBDERYHDWWWWGG!!!!!!
Yoyoyozo (65 D)
29 Jan 16 UTC
Question: Will there be a page with the rankings of every player? I think it's rather useful to see not only your rating change, but your ranking among other players.
A_Tin_Can (2234 D)
29 Jan 16 UTC
Socrates: I think Ghost probably thought that solos vs better players were better when he wrote the rating system, too.

It creates an problem where playing with the "top" players means that you're likely to increase your rating faster - since not all game results have the same opportunity to affect your rating. Elo has a cap on the amount that one game can affect your rating - GR does not.

Sidestepping the argument of whether or not it's more impressive to solo on your peers if you're better - if you use a constant K-factor, the prediction accuracy improves. This means that ratings are measurably better if you treat all solos against peers as equally impressive.
Yoyoyozo (65 D)
29 Jan 16 UTC
@SD I think you misread ATC's example. All of the players are of equal skill in both cases.
A_Tin_Can (2234 D)
29 Jan 16 UTC
Yoyo: Yes, there will be some kind of summary page.
ssorenn (0 DX)
29 Jan 16 UTC
(+1)
Love this idea.
Chaqa (3971 D(B))
29 Jan 16 UTC
It sounds fine, but I'd rather see RR decay and a separate live RR first.
A_Tin_Can (2234 D)
29 Jan 16 UTC
Chaqa - RR decay is coming. Separate live RR is interesting. I haven't thought about that much. Feel free to start another thread if you want to discuss it.
ssorenn (0 DX)
29 Jan 16 UTC
Thx ATC. Chaqa is correct, RR decay is important
@ ATC, Thank you for responding to me. I get that, and I do think there should be a cap on potential improvement, and I see that the ratings are better, but...

@ ATC and Yoyo (I didn't misread), I'm just remembering when I joined the site, and had a few solos (against fellow newbies, and thereafter against fellow 'amateurs'), and I think, even though those were against 'equal' players, I think that results like that are less impressive, and SHOULD affect ratings less, than let's say a solo amongst a bunch of the top players in the site.

It may be that this is just a small thing that has to be swallowed - the improvements do sound great, it was just a thought.
Lethologica (203 D)
29 Jan 16 UTC
@ATC: Conservative skill estimate or nah?

@Socrates: The argument can be made that soloing against equally skilled players is more impressive when everyone is higher-skill. However, that doesn't mean Trueskill should explicitly account for that.

The basic idea is that if your rating is 1500 and my rating is 1400, the expected game outcome should be the same as if your rating were 1900 and my rating were 1800 (holding everything else constant). Incorporating your comment about a higher-rated 'solo vs. equals' being more impressive, that means small differences in rating become more impressive at higher ratings--but not necessarily that the rating system should inflate those rating differences.
Lethologica (203 D)
29 Jan 16 UTC
More @Socrates: After all, what you're really saying is that it's more difficult to differentiate between top players in terms of game outcomes than between average players. Why shouldn't that be reflected in the rating system?
A_Tin_Can (2234 D)
29 Jan 16 UTC
Leth: Yes, the rating displayed for the site leaderboard will be the conservative skill estimate.
abgemacht (1076 D(G))
29 Jan 16 UTC
"* There would be categories - probably just gunboat and FP to start with
* It is likely that we'd only include Classic games to start with"

These two points ^ are very important. I think this will ensure the best accuracy-to-information-overload ratio. Classic FP and Classic GB best represent the two main types of games you are likely to find on this site and in F2F. Ideally, there would be a live and non-live separation, but that gets very hard to define and, while there is skill to making moves faster, I don't think you'd see too much variation in players' skill based on phase length. I think this is an excellent choice.
ssorenn (0 DX)
29 Jan 16 UTC
If this system is elected, when will we see the start of it ?.
A_Tin_Can (2234 D)
29 Jan 16 UTC
"Ideally, there would be a live and non-live separation, but that gets very hard to define"

Actually, for webdip, live is "anything less than 60 minute phases". This is what triggers the special processing for live games, and causes the game to appear in the live game ticker on the home page.

I'd like to do more categories, but the more categories you have, the less meaningful they become.

Certainly for GR, the top 10 for live vs non-live are quite different. I haven't actually looked into it much, but I think this is more because there are different player pools rather than radically alternative skill sets.

I'm glad you like those choices, though! I'm pretty happy with those two categories for a starting point.

Long term, I'd like to have the ability to drill down into more detailled stats on player's home pages. That requires quite a bit of thought about what makes the most sense.
A_Tin_Can (2234 D)
29 Jan 16 UTC
ss: it will be some time after I've completed it. I actually want to work on getting a sandbox up and running first (which will allow us to publish the WDC games as they happen, potentially!).

We may move to publishing TrueSkill monthly instead of GR, at least as a first step.
abgemacht (1076 D(G))
29 Jan 16 UTC
"Actually, for webdip, live is "anything less than 60 minute phases"."

Sure, but that was a pretty arbitrary decision. If a ranking system were to be split by live and nonlive, some more thought on the actual binning may be needed and that could be tricky.
Zybodia (355 D)
29 Jan 16 UTC
When you do this, can we also split up win/draw/survive/eliminated stats for gunboat and press? (Honestly, the separate ratings for the two are the most exciting things about this plan for me.)

Page 1 of 5
FirstPreviousNextLast
 

134 replies
steephie22 (182 D(S))
28 Jan 16 UTC
Design Competition
See broadexpert.com if you want to help with the design of my start-up company. You may make some money. I'm not going to a design company for a reason though :-)

Meanwhile, I want to start a discussion: I have a debating competition coming up next week and one of the statements will be: 'High school students lack ambition'. If I'm against this statement, I thought it would be a good idea to bring up the company. I'm not sure whether that's socially acceptable though?
52 replies
Open
Jamiet99uk (865 D)
29 Jan 16 UTC
(+2)
POINTS PER SUPPLY CENTRE
January is almost at an end. Can we anticipate an early publication of the report into the Moderators' grand experiment, and their verdict on the success (or otherwise) of their trial?
30 replies
Open
KingCyrus (511 D)
22 Jan 16 UTC
(+2)
Roe v. Wade
141 replies
Open
abgemacht (1076 D(G))
29 Jan 16 UTC
(+1)
I will Survive
Interested in knowing why we can't ditch Survive stats, but don't want to clutter the other thread...
86 replies
Open
CommanderByron (801 D(S))
30 Jan 16 UTC
(+1)
Volunteer
It seems ATC and others are stressed about all the changes going on. I need a volunteer to show up at his house and give him a massage any takers?
9 replies
Open
Valis2501 (2850 D(G))
07 Oct 15 UTC
(+4)
School of War; Fall 2015
This thread is for the Fall 2015 class of the School of War. Please be courteous to those running the game and respect any reasonable requests they may make. This semester will be taught by Professors The Hanged Man and Hellenic Riot. gameID=168281
406 replies
Open
brofistme (100 D)
30 Jan 16 UTC
JOIN LIVE GAME
NOW NOW NOW
9 replies
Open
brofistme (100 D)
30 Jan 16 UTC
JOIN THE LIVE GAME
please
2 replies
Open
Jamiet99uk (865 D)
29 Jan 16 UTC
Screen shots
Is it against the rules to send screen shots to a player, in an attempt to prove that something has been said to you in private press?
31 replies
Open
TrPrado (461 D)
03 Jan 16 UTC
(+9)
Mafia XVI Game Thread
See inside for buckets of fun.
4426 replies
Open
abgemacht (1076 D(G))
28 Jan 16 UTC
(+1)
Grand Prix and Boroughs 2016
Two tournaments you guys should definitely try to make it to!
Grand Prix at TotalCon http://www.totalcon.com/
The Boroughs 2016: www.TheBoroughsDiplomacy.net
7 replies
Open
Riotleader007 (100 D)
29 Jan 16 UTC
Newish
Hey I am new to the website but have played the board game so I am not a complete newbie haha I want to play some on here and figured someone can set up a fun starter game and we all have a little fun! :) Game on!
4 replies
Open
ishirkmywork (1401 D)
26 Jan 16 UTC
Russian Opening to Silesia Spring '01
This opening has become a little personal favorite of mine, (if I am Russia, France or Italy) and am wondering if anyone has thoughts, experience, or tactics to share about it. Convincing Russia to do it if you are France or Italy can be difficult -- but well worth it for all involved. You need an imaginative Russian though.
33 replies
Open
00matthew2000 (454 D)
28 Jan 16 UTC
New Vdiplomacy game if anyone is interested.
http://www.vdiplomacy.com/board.php?gameID=25187
1 reply
Open
Page 1301 of 1419
FirstPreviousNextLast
Back to top