Two Tier Scoring

Forum rules
This forum is limited to topics relating to the game Diplomacy only. Other posts or topics will be relocated to the correct forum category or deleted. Please be respectful and follow our normal site rules at http://www.webdiplomacy.net/rules.php.

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:points: :-D :eyeroll: :neutral: :nmr: :razz: :raging: :-) ;) :( :sick: :o :? 8-) :x :shock: :lol: :cry: :evil: :?: :smirk: :!:
View more smilies

BBCode is ON
[img] is ON
[flash] is OFF
[url] is OFF
Smilies are ON

Topic review
   

If you wish to attach one or more files enter the details below.

Expand view Topic review: Two Tier Scoring

Re: Two Tier Scoring

by jay65536 » Sun Aug 16, 2020 5:28 pm

For anyone who may have been interested, the original article has been revised.

Re: Two Tier Scoring

by jay65536 » Fri Jan 17, 2020 7:05 pm

Restitution wrote:
Mon Jan 13, 2020 8:52 pm
jay65536, when you mentioned you designed this system for "tournament games", not "cash games" - do you think mine or Mercy's system would be better for cash games? Because, when it comes to implementing a system for webdip, all games are cash games.
Since this thread is very long, I guess let me start by recapping what I remember yours and Mercy's systems to be?

Restitution's system: On a 170-point scale, in any draw, first place gets 5 times their center count, second place gets 5 times their center count, all remaining (DIAS) survivors split all remaining points equally. In case of ties, a 3+ way tie for first is an equally split draw, and a tie for second means only the top player gets their center count as their score.

Mercy's system: On a 180-point scale, in any draw in which no one has more than 13 centers, the draw is equally split. In an Nway draw in which someone is topping the board with 14 or more centers, the board-topper get 90 points and the remaining survivors each get 90/(N-1) points. In case of two powers tied at 14 or more centers in an Nway draw, the other (DIAS) survivors each get 90/(N-1) points, and the joint board-toppers each receive 45 + 45/(N-1) points.

So I guess from here, the first thing I'd say is, asking which system is "better" is futile. You can't really theoretically argue about this stuff without some kind of testing to know how players react (both in terms of how they play to the system and also how well they like it).

Restitution, your system is different enough from mine that I wouldn't presume to say which is "better"--but I think you are really underestimating how much different it is from mine. Changing the incentives for second place actually changes a lot. Not only does it remove a good deal of the incentive for second place to compete against first place, but also, the "remaining" part of the pot that is split equally is now going to be a lot smaller. What that means is that I think this system is going to play much more similarly to just a straight center-based system than to my two-tier idea.

The exception is if there's a tie for first or second. In particular, ties for second are handled very weirdly. For example, in your system, a 15/8/7/4 finish is worth less for the small power than if it were 15/8/8/3 (on a 170-point scale, it would be 27.5 for the 4-center result but 31.67 for the 3-center result). That seems like a "perverse incentive", which seems like a problem.

But in general, I just think if you think your system's good, you may as well just use an even simpler system of straight points per center. I think the top two powers having only center-based incentives will basically make the games play as if they were all center-based. (Contrast with my system--or Mercy's--where the second-place power has much different scoring incentives than the top power.)

As far as Mercy's system goes, it is similar to mine in structure. Some differences are:

1) There is one huge leap, from 13 to 14 centers, and no incentive to keep gaining centers after that, as long as you're topping the board but can't solo.

2) There is no "buffer zone" at the top--a board top is a board top. So for example, in a 17/16/1 3way draw, the scores break down 90/45/45. (In my system, the 1-center power still gets 45 but the top 2 would each get 67.5.)

3) Below the 14-center threshold, draw size is the only scoring motivator.

So, I do think this system is clearly better for "long games" than it is for tournament play. But is it better for long games than the other two systems (Restitution's or mine)? I'm not sure. That's not rhetorical--I'm really not sure. Here are some ways I think the gameplay could end up being different.

For one thing, I'd reiterate my observation from earlier, that by having such a high threshold in (3), the system will almost always make draw size the primary motivator of play. As an example, let's say it's 10/9/9/3/3. In my version of two-tier, each of the top three powers can claim a good score by hurting another top power--so the 3-center powers could have interesting roles to play. In Mercy's version, since no one is close to 14, the motivation for the top powers to compete is simply not there. And it doesn't show up until someone starts getting in striking distance of 14.

But now, on the other hand, let's say it's 13/10/6/5 in an endgame. In my version of two-tier, the scores break down 66/38/38/38; if the game ended 15/11/8 (for example, if the 5-center power were whittled out), it would be 78/51/51. All three powers have gained. (The difference from regular CP is that the top power is gaining from gaining centers--if the two middle powers tried to whittle out the fourth power but freeze out the leader, the leader could step in and stop it with no scoring hit.)

But in Mercy's system, each power is currently earning 45. If the 13-center power crossed the 14-center threshold, and the draw were whittled down to three, it would be 90/45/45. In other words, once someone nears 14 centers, the motivation to keep that power under 14 is TOTALLY driving the action--the smaller powers have (seemingly) no incentive to do anything but stop the leader.

That, in turn, could lead to the exact situation Mercy and I don't want--the leader giving up centers to whittle the draw down for scoring purposes. In order to get past 14 in this scenario, the leader would have to convince one of the smaller powers that they'll be whittled out if they don't help the leader get over that threshold. In other words, I think this system would be prone to "half-solo-throwing", just as draw-based games are prone to actual solo-throwing. It would, in other words, play very similar to a draw-based system in a LOT of cases. If you like draw-based scoring, maybe this is not a drawback, but then why not just keep playing CP?

I could be wrong about my assessments, of course, and this the point of saying I want to try play tests.

Re: Two Tier Scoring

by Restitution » Mon Jan 13, 2020 9:13 pm

I am hosting a test game for my version. If anybody wants to join, please PM me.

Re: Two Tier Scoring

by Restitution » Mon Jan 13, 2020 8:57 pm

@RoganJosh, could you critique my version of 2TS? I'm going to call it "Proportional-2TS".

The top 2 players in a game are allocated a share of the pot proportional to their share of the SCs.

All other players then receive the remaining pot split equally among them.

By "top 2 players", it is possible for there to be zero, one or two top players. If two players are tied for first, they are the top 2. If three players are tied for first, then there are zero top players. If there is one top player and 2 players tied for 2nd, then only the very top player is the top player.

Re: Two Tier Scoring

by Restitution » Mon Jan 13, 2020 8:52 pm

jay65536, when you mentioned you designed this system for "tournament games", not "cash games" - do you think mine or Mercy's system would be better for cash games? Because, when it comes to implementing a system for webdip, all games are cash games.

Re: Two Tier Scoring

by Restitution » Mon Jan 13, 2020 8:42 pm

jay65536 wrote:
Mon Jan 13, 2020 2:19 pm
I don't do Discord, so I can't join that link; but Restitution, if you get a bunch of people for a play-test, keep me apprised by PM. My original plan was to see if enough people PMed me about play-testing that I could organize a game or games myself, but maybe your way will yield better results.
I already have, I think, 5 or 6 people from Discord. I seriously recommend you join that channel as I expect discussion about the system will continue in there once the game begins.

I'll PM you the game

Re: Two Tier Scoring

by RoganJosh » Mon Jan 13, 2020 6:22 pm

Ah, let me begin with clarifying, I am only arguing against the statement

I don’t believe a scoring system should ever punish someone for trying to win

which Jay maybe only meant in a tournament context, and Mercy never actually stated. I'll leave it up to you guys to express if you agree with this statement or not. Both Mercy's and Jay's system does make the German solo attempt risk free - and this has some unwanted side-effects for the incentives of A/I. That said, making the solo attempt risk free is essentially what solves the draw-whittling problem. I have no problem understanding that the draw-whittling problem is the more important issue to solve in an FtF tournament situation. Still, I do think there is merit in analyzing a proposed scoring system in itself.

Mercy, I was just gonna say, you changed the model to one where Germany gets a 3WD in case the solo fails. Definitely a realistic scenario, but it is not a scenario where Germany is punished for trying to solo (since the solo becomes the dominant strategy), so it is not the scenario I consider.
Mercy wrote:
Mon Jan 13, 2020 8:40 am
Let me just give you one example. Suppose one player has 17 centers. Another player has 1 or 2 centers and has successfully gotten himself in a position where he is vital for stopping the solo; a stalemate line is formed. Under DSS, the 17 center player has an incentive to retreat his units to give room to the other player to eliminate the small player, so that everyone gets more points. Do you think that is fun?
Forming the stalemate line is fun. The last part about retreating - not so much. Yes, this should be dicouraged. What I am saying is that, in your system, if people play according to incentives, the game will not even get close to the stalemate line. It will end a 4WD, or maybe even a 5WD, with a board top at 14. Is that better?
Mercy wrote:
Mon Jan 13, 2020 8:40 am
But yes, under my scoring system, A/I would not attack T indeed. If they both successfully eliminate T and stop the solo, then Germany will be so big that each of them will get 1/4 of the pot anyway, the same as when they would have just played for the 4/way draw.
Exactamundo! There will be no exciting end-game along the stalemate line! Where is my adrenalin?!
Mercy wrote:
Mon Jan 13, 2020 8:40 am

My scoring system does not give the same result. A 7-way draw is worse than a 3-way draw in which Germany is a solo threat.
I realize that also here you considered the alternative model where if the solo fails then it is a 3WD, while I used a model where if the solo fails it is a 7WD. Of course, there is a range of intermediate possibilities here. So let me acknowledge that as soon as A/I have a possibility of obtaining something better than a 7WD, then they do have some incentives to cut down the draw, also in your system. Not as much incentive as in DSS, though. And in my opinion, the problem already in DSS is that A/I has too little incentive to attack T.
jay65536 wrote:
Mon Jan 13, 2020 4:34 pm
These calculations assume that T has no agency--he either sits back and waits to be eliminated, or helps stalemate G and gets a 4way. In reality, T has another option: he can actively aid G's solo bid to prevent A/I benefiting from eliminating him.
This is included in the probability p. Which G and A/I can only estimate.

As both of you point out in your last posts, and which is also the essence of the decreasing function in p we found when analyzing Jay's system: yes, Germany should play down his solo threat so that A/I lets their guards down. This is the beautiful paradox of playing for a solo. If the threat is too obvious - then the defensive alliance will form too early. If you are like me, and you want the game to end with a climax around the stalemate line, then you want G to play for the solo and you want A/I to attack T. But notice the asymmetry! If you only increase G's incentives to play for a solo, then you will discourage A/I to attack T. But if you try to encourage A/I to attack T, then that can actually increase G's incentives to go for a solo.

Re: Two Tier Scoring

by jay65536 » Mon Jan 13, 2020 4:34 pm

Mercy's further analysis reminds me of something. These calculations assume that T has no agency--he either sits back and waits to be eliminated, or helps stalemate G and gets a 4way. In reality, T has another option: he can actively aid G's solo bid to prevent A/I benefiting from eliminating him.

This is another divide between "cash game play" and tournament play. As Mercy correctly points out, a savvy G who is playing for draw size can simply weaken himself to allow T to be eliminated safely, or at least to appear to allow this before making a well-timed "comeback run". In a standalone game, Germany can make this run aggressively, so as to maximize the chances that T helps throw him the game. But in a tournament, Germany probably is smartest by erring on the side of caution, making sure that the worst result he's stuck with is the 3way even if he's also trying to win.

Re: Two Tier Scoring

by Mercy » Mon Jan 13, 2020 3:54 pm

To add to my previous post:

I was wrong to assume midway that I/A would always be successful in their effort to defeat T. Under the assumptions of RoganJosh, indeed I/A will only try to eliminate T if p < 1/9. Under the assumption that they will be successful in eliminating T even if G tries to solo, they will try to eliminate T if p < 1/4, and G will always try to solo.

I do think my alternative assumption is revealing of what can happen in a real game, though. In a real game where there is a strong Germany, an Austria/Italy alliance and a weak Turkey, my alternative assumption would benefit Germany, so it is in the German interest to make that assumption come true. He can do so by simply withdrawing his forces, make himself weaker, until the point is reached where A/I can always eliminate T even while fighting G at the same time. Interestingly, G does only need to reduce his solo chances to below 1/4 for him to create a situation where it is in the best interest of A/I to eliminate T, assuming that A/I will always be successful in their efforts. Maybe even more realistically, G will not immediately try to solo when A/I attack T, but instead he will wait precisely until the situation arises that it is in the best interest of A/I to continue fighting T even if G is threatening a solo - this situation does not need to arise immediately when they attack T.

This means that G can achieve a solo even while all players are behaving rationally, and T gets always screwed. That is a downside to DSS, especially the fact that T is always screwed here. I also think that situations like I sketched above are not uncommon to see in normal games, especially if G is a savvy player and both G and A/I want to optimize their rating.

Re: Two Tier Scoring

by jay65536 » Mon Jan 13, 2020 2:19 pm

I don't do Discord, so I can't join that link; but Restitution, if you get a bunch of people for a play-test, keep me apprised by PM. My original plan was to see if enough people PMed me about play-testing that I could organize a game or games myself, but maybe your way will yield better results.

Sorry I have not had time to write a real reply lately. I'm going to start here, and then segue into a response to RoganJosh.
foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
Is any of this relevant outside of tournaments?

...

But, by the same token, it wouldn't really matter to me if such a system were implemented - I already "play DSS" in SoS games and would play DSS if I found myself in this sort of game, too. But I remember reading or hearing somewhere that one of webDip's design philosophies is simplicity and ease of entry for new players. My recommendation would be to make any tourney scoring, such as this or SoS, available to TDs but hidden from regular games.
So, first off, in case I wasn't clear, yes, I intended this as a tournament scoring system, not an alternate rating system. Since getting back into the online scene 3 years ago (I consider myself mainly a FtF player), I have noticed a shocking, zealot-level devotion to Calhamer points as a scoring system. It's totally pervasive, and I'm not interested in trying to change that.

Instead, my goals in trying to make this system include (but maybe aren't limited to) the following:

1) Trying to make a tournament system that preserves what I consider to be the good aspects of Calhamer points. If you were to play a competitive FtF tournament today, it is extremely rare that the system you'd be playing under even has draw-based scoring as a component, let alone its primary component. The reason for that, as I pointed out in my original article and Mercy seems to agree, is that draw-based endgames are just less fun.

So if draw-whittling and playing expressly to eliminate people aren't the good parts of Calhamer points, what is? Well, in my opinion, the best part of Calhamer points is the lack of emphasis on center count at the end of the game. If you make it to a 14/10/10 endgame, there's no motivation for the 14-center power to dot someone, like there is in SoS (or some other quadratic system--see Peter McNamara's comment on my article). And there's also no motivation for a 10-center power to dot the other 10-center power, like there would be in a rank-based system. In this endgame, as soon as each power is convinced they can't solo, they take the draw. (Well, or there's a 2way push, but whatever, you see what I mean, right?) As another example, if a 2-power coalition is stalemating someone on 17 centers, draw-based scoring doesn't care about the distribution of centers between the stalemating powers.

I am in the minority among FtF players, I think, but I consider those things to be positives. The negatives are that in real life, the "you take a draw as soon as everyone knows they won't solo" ideal isn't reached. Another negative is that in a tournament, you need a way to create "separation" so there can be a winner. Draw-based systems in use in FtF tournaments today do this by considering center count, which of course means that what I consider the good parts of Calhamer points are partially if not entirely diluted.

So I tried to make a scoring mechanism that preserves the lack of caring about center counts as much as possible for a tournament scenario. Hence two "tiers" of scoring instead of one. This is also the main reason why my system has the "buffer zone" at the top, that 1-center difference exception that none of the proposed alternatives are considering. But I do think, based on personal experience, that the rank-based component is important in a tournament setting also, which is why I want to have the top score for the board leader even in scenarios where that top power is too small to be a real solo threat.

(Mercy, our disagreement about how the endgame scoring will or will not drive midgame decision-making is something I feel is best settled by play-testing, not arguing. Although I will somewhat address your comment later on below.)

2) I started making my system before realizing this was a thing, but now that I know it is, it dovetails really nicely. In the above quote, the poster cites the mentality of just always playing for draw size and not caring about the scoring system in use. In my system, if you do that, but manage to achieve what you consider a good result, you aren't heavily penalized (and sometimes you are not penalized at all).

I first noticed this mentality while observing the 2019 ODC on this site. There were multiple games in which people played out 2way draws simply because they were playing for a CP result instead of a SoS result. One of the two participants in the 2way was actually costing themselves points (and potentially a berth in the semifinals) by playing the way they were used to instead of adapting to the system. I was not the only one who noticed this; Dave Maletsky (longtime FtF player and inventor of the Carnage system) played in the tournament and commented on this mentality on the forum (the post must still exist somewhere).

I guess there is an argument to be made that if people choose to be ignorant of the system then they should be penalized, but I'd counter that we primarily want people to have fun in a tournament, otherwise what's the point? So in my system, draw-based players can play the way they are used to and not risk torching their tournament prospects. As I said above, I'm a pluralist, so I don't believe every good system must have this property; but most systems in use today don't, and so I wanted to try to make one that does.

* * *

OK, now to RoganJosh, who is trying to speak my language. (I am extremely used to these kinds of calculations.) I take it I'm not the only poker player here, since I saw someone else using the term "cash games". RoganJosh's calculations, unfortunately, assume we are in a "cash game" scenario. What I mean by "cash game", if you're not familiar with the analogy, is a standalone game where, once it's done, you just move on to the next one, of which there always is one. In that scenario, a smart cash game player will play to maximize their expected utility at all times. This is the basis of the calculations RoganJosh is doing.

However, this is not true for tournaments. This is because in some scenarios where your EV is maximized by a high-risk, high-reward option, you have to pass on that option anyway because you know those spots don't come up enough for the reward to appear before the tournament is over. This is directly applicable to the 15/8/7/4 example if we remember that it is a tournament. EV considerations take a backseat to tournament considerations.

In that light, actually, the most pertinent question for Germany isn't "What is the EV of going for the solo?" Instead, the first question is "What are the tournament standings?" and the next question is "What are my chances of getting the solo if I push for it?" If we assume it's the last round and Germany can't make the top board without a solo, Germany will push for the solo no matter what his chances are. But if we assume it's the first round and everyone still has 0 points, the chances of getting stuck in a 4way matter more than the EV.

I can tell you, from personal experience playing under draw-based systems in high-level tournaments, that all top tournament players intuitively grasp this. 4way draws are absolute anathema to them. Getting stuck with one 4way draw over the course of 3 rounds will usually torpedo their chances to get a high finish in the tournament.

This gets back to what Mercy was saying about how the downsides of draw-whittling only play a role in the endgame. My experience is that this isn't true. In a midgame with 4 roughly equal powers, good players are going to be thinking ahead to make sure it'll be easy to procure a 3way, not trying to put maximum pressure on their opponents. I will grant you, though, that some of this may be due to it being a FtF environment, where real time (as opposed to game time) is a factor. Even in untimed rounds, there is always the "let's get this done so we can go [to sleep/out drinking/to dinner]" factor. But even in some recent online games I have played, people will play passively in the opening and midgame just because they know that a 3way is a good result andn they don't need or want anything better. But ultimately, I'd rather try to test who's right than argue.

Re: Two Tier Scoring

by Mercy » Mon Jan 13, 2020 9:02 am

(I can't edit my previous post, But I wanted to say that I assumed (which RoganJosh did not) that A/I would always be successful in eliminating T, which changes a few things.)

Re: Two Tier Scoring

by Mercy » Mon Jan 13, 2020 8:40 am

I haven't replied in this thread for a while, but I see that I am mentioned by name and I will take the time to write some replies.
jay65536 wrote:
Tue Dec 31, 2019 4:17 pm

That's why Mercy's proposed "simplification" is actually a sizable alteration, in my view, and I see it as a shift away from one of the goals of the system. If we just say all draws are equally split until someone has 13+ centers, that means that until we reach the point where someone grows large enough, draw-whittling is still a major motivator of play.
Jay, I agree with everything from your post except the part I am quoting here; I think we are pretty much on the same line and are only disagreeing about details. That being said, this is why I don't agree with this part of your post:

If no one is yet in reach of a large number of centers, typically draw-whittling is not a major motivator of play. The major motivator of play is building up to a strong position in the endgame. Especially if you reward players for being a mere solo threat, I think in the early- and midgame, players will be far more inclined to think about how they can get in a strong position to be a solo threat, or to outright solo, than to draw-whittle. And even if they are draw-whittling at an early stage of the game, every player can still be cut from the draw.

In your blog post, you outlined some flaws of DSS and I agree with you, but I think these flaws only play a role in the endgame.

On top of this, I think that if you can give points equally to all survivors while not giving bad incentives to players, I think you should do so. Otherwise, you can get some center-grabbing when the game should be over. Consider for instance a case where an EG alliance and an AI alliance are in a stalemate. Germany is the biggest power by a few centers, but arguably England is in a stronger position strategically than Germany is. Should Germany get the biggest share of the draw? Should in this case England demand that Germany throws him a center to make their center counts more equal before they hit draw, thereby prolonging the game? No, I think this is a case where the points should just be split equally, and the game should be drawn if no one wants to stab their ally.
jay65536 wrote:
Wed Jan 01, 2020 8:24 pm
teccles wrote:
Wed Jan 01, 2020 7:55 pm
Jay: Thanks for the explanation on the rationale for top scores, that makes perfect sense. I wonder whether, in practise, you don't need to worry so much about people draw-whittling. For example, your 7 centre top score is based on a rather absurd scenario, where a board leader on 7 centres has the power to ensure a 7/13/14. So it might be fine to change the top score to something simpler (with SCs/34 being a natural option), despite the theoretical issues with that.
I mean, it's not just the assumption that it could happen in the same game; it's the straight fact that I don't want a 7-center 3way to be worth more than a 7-center board top. In the set of all X-center finishes, I want board tops to be the highest possible score for all X.
I agree with teccles. I don't think board tops should necessarily receive the highest possible score; see my EG vs AI example. While I agree that giving board tops the highest score when they are a solo threat gives good incentives to players, as Jay argued, and I think that's a good reason to give them the highest score; but that's the only reason I see, and it does not apply for low-center board tops.
Restitution wrote:
Tue Dec 31, 2019 5:47 pm
For the sake of my sanity, can you reformulate the system assuming that DIAS is true (which it is in webdip)?
Yeah, I think it's better to formulate the system assuming that DIAS is true. I am a proponent of DIAS only anyway, and it's the only one that's used on webDip.

For the sake of time, I am skipping over a lot of posts now and move straight to the post from RoganJosh.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
This post will only be about the solo-incentives, and not about the draw-whittling aspects. Anyway. F* me, I put way to much sambal oelek on these potatoes.

Mercy, I don't even think you contemplated about how small 1/9 is. It seems it's rather a principal question for you - that the board leader should have nothing to lose going for the solo. Or as Jay said in the article: the scoring system should not punish people for trying to win. I'm gonna try to explain why this is a bad idea.
I don't know about Jay, but for me it is not a principal question. I don't want the board leader to have nothing to lose from going for the solo just for the sake of it. It's about giving players incentives that make the game fun to play. Jay mentions a lot of reasons in his blog post why this implies giving more points to the board leader, at least when the board leader is big enough. Let me just give you one example.

Suppose one player has 17 centers. Another player has 1 or 2 centers and has successfully gotten himself in a position where he is vital for stopping the solo; a stalemate line is formed. Under DSS, the 17 center player has an incentive to retreat his units to give room to the other player to eliminate the small player, so that everyone gets more points. Do you think that is fun? If not, then this is an instance where you'd prefer the systems Jay and I are proposing; in our systems, the 17 center power has nothing to gain from the elimination of the small player. By the way, under SOS, the small player would barely get any points so I won't see that as a solution either - the small player would have too little to play for.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
First, let's just all remind ourselves:

That option A could potentially yield higher rewards, does not necessarily imply that the player has any incentives to choose option A.

It's funny because you all know this. This is Mercy's complaint about DSS: despite the solo yielding the highest reward, a player might not have incentives to play for the solo.
That's not my complaint, see above.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
But you seem to forget that this is true for all players, not only the current board leader. And you keep posting these end-of-game point distributions, with some intuitive interpretations of how they affect incentives, without actually checking what these scoring systems do to incentives. If you only care about the board leader then that is often enough but - no - you need to look at all players.

Man, these potatoes are spicy.

Let's go back to the Germany example from the article. Before math, maybe just a comment about "playing for the solo." Because, it's not only Germany that has a choice. The duo Austria/Italy also has a choice: securing the four way draw or try for the three way draw. Now, playing for the solo is often a balancing act of seeming innocent enough so that Austria/Italy keep fighting Turkey and then - at the right moment - be aggressive.

So, I used the following model:
* If A/I decides to secure the four way draw, then the game ends G/A/I/T.
* If G settles for a three way draw, and A/I attacks T, the the game ends G/A/I
* If Germany goes for the solo, and A/I attack T, then the game ends G solo with probability p and it ends G/A/I/T with probability (1-p).
Turkey has no choice - his tactics is just a function of A/I. Since there is no separation between 2nd and 3rd place in either system (in this scenario!), we can treat A/I as one player. Giving a nice two-by-two game. Btw, you're more than welcome to criticize the model or, even better, come up with a model of your own an analyze it.

In DSS scoring, we get a pure state Nash equilibrium which depends on p. Germany should play for the solo if she thinks it has a 1/9 probability of success.
I checked your calculations and I agree.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Likewise, Austria/Italy should try to eliminate Turkey only if they think it gives Germany at most a 1/9 shot at the solo.
That is wrong; A/I should try to eliminate T if G has at most a 1/4 shot at the solo. You can calculate this as follows. If A/I play for the 4-way draw, each gets 1/4 of the pot. Suppose p = 1/4. If they attack T, each gets 1/3 of the pot with probability 3/4, and 1/3 x 3/4 = 1/4 expected payoff. If p < 1/4, then the expected payoff is higher than the 1/4 they would get if they didn't attack Turkey.

This means that if p is between 1/9 and 1/4, we play DSS and everyone is behaving rationally, then A/I will attack T even while knowing that it gives Germany a shot at the solo. T may say 'Don't do this, you're giving Germany a shot at a solo' and A/I may reply 'We know, but it is worth it anyway'! In practice, draw whittling by weaker players is how solo's sometimes happen.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
(I'm gonna pause here and just be amazed. If Germany has a 1/9 shot at the solo, then the reward of a three way draw for Austria/Italy is so small that it's not worth the risk.)
So that's wrong.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Let us now use Mercy's scoring system from his first post in this thread. That is, we remove the incentives for Germany to settle with a three way draw. That is, we remove all of Germany's innocence. The result? We get a payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the four way draw.

I actually find this to be quite intuitive. Germany has the by far biggest possible reward - the solo. Now, it is a zero sum game. If you remove all the risk from Germany, then A/I get stuck with all the risk, yet their possible reward is tiny. They have no incentives whatsoever to take that risk.
Now I think I know where your mistake came from in assuming that A/I should attack T only if p < 1/9. G and A/I are not playing a zero-sum game, because the payoff for G/A/I is not fixed. But yes, under my scoring system, A/I would not attack T indeed. If they both successfully eliminate T and stop the solo, then Germany will be so big that each of them will get 1/4 of the pot anyway, the same as when they would have just played for the 4/way draw.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Let's do this. Let's pretend it's not only T sitting in the corner, its T/F/R/E are all sitting on one center each. So that A/I is not cutting the draw down from 4 to 3, they're cutting the draw down from 7 to 3. Well, Mercy's scoring system gives the same result. The payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the seven way draw. These incentives are bizarre. This is a scoring system where as soon as one player is beginning to look like a solo threat, the other players should immediately ally and force draw the game.
My scoring system does not give the same result. A 7-way draw is worse than a 3-way draw in which Germany is a solo threat. In a 3-way draw with German solo threat, A/I each get 1/4 of the pot. But anyway, even if you were right, honestly I don't think these incentives are bizarre. The incentives you mention would also happen in SOS scoring (since A/I gets the highest score from preventing Germany from getting extra centers; I am ignoring the incentive for A/I to attack each other, which also happens under SOS and which I don't like), with the difference that in my scoring system, small players get rewarded more from getting in a draw - under SOS scoring, they would get virtually nothing.

I am skipping your commentary on the scoring system of Jay.
RoganJosh wrote:
Sat Jan 11, 2020 3:16 pm
Now shoot me. Game theory is not my subject, so I probably made plenty of mistakes. Let me know if you want me to post the payoff matrices and the computations, or if you want them in PM. I'll be happy to share it, just think this post is too long already. I'm hoping this will inspire you guys to do some proper analysis. After all, you now have the incentives to prove me wrong!
You've inspired me indeed. :razz:

Re: Two Tier Scoring

by RoganJosh » Sat Jan 11, 2020 3:16 pm

This post will only be about the solo-incentives, and not about the draw-whittling aspects. Anyway. F* me, I put way to much sambal oelek on these potatoes.

Mercy, I don't even think you contemplated about how small 1/9 is. It seems it's rather a principal question for you - that the board leader should have nothing to lose going for the solo. Or as Jay said in the article: the scoring system should not punish people for trying to win. I'm gonna try to explain why this is a bad idea.

First, let's just all remind ourselves:

That option A could potentially yield higher rewards, does not necessarily imply that the player has any incentives to choose option A.

It's funny because you all know this. This is Mercy's complaint about DSS: despite the solo yielding the highest reward, a player might not have incentives to play for the solo. But you seem to forget that this is true for all players, not only the current board leader. And you keep posting these end-of-game point distributions, with some intuitive interpretations of how they affect incentives, without actually checking what these scoring systems do to incentives. If you only care about the board leader then that is often enough but - no - you need to look at all players.

Man, these potatoes are spicy.

Let's go back to the Germany example from the article. Before math, maybe just a comment about "playing for the solo." Because, it's not only Germany that has a choice. The duo Austria/Italy also has a choice: securing the four way draw or try for the three way draw. Now, playing for the solo is often a balancing act of seeming innocent enough so that Austria/Italy keep fighting Turkey and then - at the right moment - be aggressive.

So, I used the following model:
* If A/I decides to secure the four way draw, then the game ends G/A/I/T.
* If G settles for a three way draw, and A/I attacks T, the the game ends G/A/I
* If Germany goes for the solo, and A/I attack T, then the game ends G solo with probability p and it ends G/A/I/T with probability (1-p).
Turkey has no choice - his tactics is just a function of A/I. Since there is no separation between 2nd and 3rd place in either system (in this scenario!), we can treat A/I as one player. Giving a nice two-by-two game. Btw, you're more than welcome to criticize the model or, even better, come up with a model of your own an analyze it.

In DSS scoring, we get a pure state Nash equilibrium which depends on p. Germany should play for the solo if she thinks it has a 1/9 probability of success. Likewise, Austria/Italy should try to eliminate Turkey only if they think it gives Germany at most a 1/9 shot at the solo. (I'm gonna pause here and just be amazed. If Germany has a 1/9 shot at the solo, then the reward of a three way draw for Austria/Italy is so small that it's not worth the risk.)

Let us now use Mercy's scoring system from his first post in this thread. That is, we remove the incentives for Germany to settle with a three way draw. That is, we remove all of Germany's innocence. The result? We get a payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the four way draw.

I actually find this to be quite intuitive. Germany has the by far biggest possible reward - the solo. Now, it is a zero sum game. If you remove all the risk from Germany, then A/I get stuck with all the risk, yet their possible reward is tiny. They have no incentives whatsoever to take that risk.

Let's do this. Let's pretend it's not only T sitting in the corner, its T/F/R/E are all sitting on one center each. So that A/I is not cutting the draw down from 4 to 3, they're cutting the draw down from 7 to 3. Well, Mercy's scoring system gives the same result. The payoff matrix that has a pure strategy Nash equilibria, independent of p. Germany should always play for the solo. Austria/Italy should always secure the seven way draw. These incentives are bizarre. This is a scoring system where as soon as one player is beginning to look like a solo threat, the other players should immediately ally and force draw the game.

Let's continue with Jay's scoring system, from the article. (No, I'm not gonna go through all proposed modifications.) Now, in this system, the rewards depend on Germany's sc count, so we have to be a little more elaborate. Germany's alternative to playing for the solo is to simply play to optimize sc count. (I have no clue how this particular game played out, but it is easy to imagine that a Germany which doesn't even try to capture Tunis could secure Spain and/or Marseilles.) The reward of Germany's alternative play depends on how many centers she can secure. So, let's consider three scenarios.
J15: Germany could only have secured 15 centers - equalling the failed solo attempt.
J16: Germany could have secured 16 centers, if she had not tried to solo.
J17: Germany could have secured 17 centers, if she had not tried to solo.

In J15, Jay's scoring system gives the same incentives as Mercy's. That is, Germany should always go for the solo, and A/I should always secure the four way draw. This is a general thing in Jay's scoring system: if going for the solo and optimizing center count is the same for the top player, then the other players should always secure the draw.

J16 and J17 are similar to each other, with slightly different figures.
- If p is sufficiently small (in J16: p < 1/17 and in J17: p < 2/17), then we get a pure state Nash equilibrium for G to back off and A/I to eliminate T.
- If p is larger, then we get a mixed Nash equilibrium. Let F(p) denote the probability by which G should go for the solo. Interestingly enough, F(p) is actually decreasing in p! Yes, read that again. If the probability for G to succeed with the solo is large enough (in J16 p > 8/17 and in J17 p > 15/34), then the mixed strategy favors that G does not go for the solo! (Btw, both of these numbers are smaller than 1/2..) What happens is the following. If attacking T gives Germany a solo probability of at least 1/2, then A/I has too small incentives to attack T, diminishing the possibility of the solo, making it a better strategy for Germany to ignore the solo and aim at 16 or 17 centers.

So, to summarize. Again, p denotes the probability that G succeeds with the solo if she goes for it and A/I tries to eliminate T. Let's assume player are "smart" and understand the incentives of each scoring system. What are Germany's incentives?
DSS: G should go for the solo if .111 < p
Mercy: G should go for the solo, to no avail, as this is a settled 4WD.
Jay15: G should go for the solo, to no avail, as this is a settled 4WD.
Jay16: G should go for the solo if .058 < p < .470
Jay17: G should go for the solo if .117 < p < .441

I checked for comparison, and SoS (which is the by far most misunderstood scoring system) is very similar to Jay's scoring system, in this scenario.

Now shoot me. Game theory is not my subject, so I probably made plenty of mistakes. Let me know if you want me to post the payoff matrices and the computations, or if you want them in PM. I'll be happy to share it, just think this post is too long already. I'm hoping this will inspire you guys to do some proper analysis. After all, you now have the incentives to prove me wrong!

Re: Two Tier Scoring

by Restitution » Fri Jan 10, 2020 9:58 pm

tr1285 wrote:
Fri Jan 10, 2020 8:47 pm
Two-tier scoring is maybe not my first choice for an alternative, but at least it's an alternative.
The reason 2TS is so good is because it elegantly completely removes the incentive for the top player to cut the draw, replacing it entirely with an incentive to nab centers. When I figured this out it blew my mind.

Re: Two Tier Scoring

by tr1285 » Fri Jan 10, 2020 8:47 pm

I mainly just think there should be a more moderate scoring system between the two extremes we have today. On the one side DSS gives the same points to all draw participants. Number of centers completely doesn't matter. On the other extreme, SoS rewards large powers and penalizes small powers. In many cases, SoS can give over 50% of the pot to a leader who has failed to solo, and it could go as high as 80+% with certain center distributions. If you control 3 centers or less you likely don't have enough skin in the game to justify putting much effort in any more.

I don't know the whole story of why PPSC system was discontinued on this site, but it seems the main problem was it awarded points to losers. So why not just modify it to winner-take-all and in the case of draws, award points proportionally?

Two-tier scoring is maybe not my first choice for an alternative, but at least it's an alternative.

Re: Two Tier Scoring

by Restitution » Fri Jan 10, 2020 6:04 pm

https://discord.gg/qRH5Du

I made a new discord for anybody who wants to test 2ts.

Re: Two Tier Scoring

by foodcoats » Fri Jan 10, 2020 4:02 pm

Claesar wrote:
Fri Jan 10, 2020 2:16 pm
foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:
...
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.
Ahh, okay, I didn't know that! I am clearly a luddite. I thought GR was a sort of "pure Elo" system (not that I really know what "pure Elo" means but... in any event...). Thank you for the clarification. :)

Re: Two Tier Scoring

by Octavious » Fri Jan 10, 2020 2:47 pm

Claesar wrote:
Fri Jan 10, 2020 2:16 pm
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.
You'd do even better to not care about your rating :)

Re: Two Tier Scoring

by Claesar » Fri Jan 10, 2020 2:16 pm

foodcoats wrote:
Fri Jan 10, 2020 1:31 pm
:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:
...
GR also takes the scoring system into account. If you care about your rating, you'd do well to play accordingly.

Re: Two Tier Scoring

by foodcoats » Fri Jan 10, 2020 1:31 pm

:evil: Is any of this relevant outside of tournaments? Isn't GR better than points when it comes to the "eternal" ranking of cash/pick up games? :evil:

I totally get why you'd want to develop scoring systems that incentivize a certain kind of play (or that avoid incentivizing certain kinds of "bad" play) in a tournament environment where you have fixed parameters (# of games, specified player pool, time limit, etc.), but I'm not sure it makes sense to implement such a scoring system for non- tournament play. :points: Diddle coins :points: are fun and all but the only true evaluation of Diplomacy play is solo, draw, survival or elimination. Tournaments necessarily must compromise their qualitative purity to establish quantitative differences between the overwhelming proportion of draws. But webDip has no such limitations and therefore no need to profane itself thusly.

But, by the same token, it wouldn't really matter to me if such a system were implemented - I already "play DSS" in SoS games and would play DSS if I found myself in this sort of game, too. But I remember reading or hearing somewhere that one of webDip's design philosophies is simplicity and ease of entry for new players. My recommendation would be to make any tourney scoring, such as this or SoS, available to TDs but hidden from regular games.

Top