Okay this thread should never have been brought to my attention, because it has resulted in me spending far more time than is worthwhile thinking about what this whole mess actually entails and has directly resulted in what I would like to call:
#anchor(The Ultimate Highlander Match Points Essay)
I’d like to start by asking readers to hold onto a thought for me: None of what I’m about to discuss are obviously correct solutions. Whatever we settle on will require compromises. So I’d like to lead with a suggested course of action first, then work my way around to it by establishing all the logical and mathematical groundwork before we start discussing concerns of convenience or practicality. Here’s what I think we should do:
We should hold a poll asking players to rate the importance they place on wins/losses versus rounds won, then use this as a basis to adjust our ranking system accordingly.
Alright, now hold onto that thought.
°fa-info°(Note that from here on, when capitalized, “Match Points”, “Win/Loss”, and “Rounds Won” all refer to their respective systems of ranking teams, and lowercase uses of the words just mean the things themselves) I had an epiphany that Match Points as a system has only one genuinely unique use, and that use is explicitly to strike a compromise between Win/Loss and Rounds Won. Here’s what I mean: when used to represent either pure Win/Loss or Rounds Won, Match Points is just a messier version of both. Why would you multiply a 5W-2L record by a potential 9 match points and display the team as having 45MP, when you could just use the 5 wins and 2 losses which is far more immediately parsable?
vs.
Or why would you allot those 9 match points proportionally to rounds won, whether using decimals or fractions, when simply displaying the percentage of rounds won gets the point across much more directly?
vs.
Match Points in both these instances mainly just obfuscates the information it’s trying to convey underneath. But as the second pair of images hints at, there is merit to using Match Points to convey rounds won as base information in a league format, but let’s get to that point later. I want to discuss each system on its own first, then draw small conclusions that we can use to reach better solutions. If you have no interest in esoteric mathematical speculation, skip ahead to Match Points to see a direct proposal that I think looks quite nice, or skip to Practical Considerations and take it for granted that I’ve proven all the things I assume.
Win/Loss
Win/Loss is super straightforward. We’re not gonna spend too much time on it because it’s basically impossible to misunderstand it (though some people in this community might manage anyway). In discussing why 6s uses W/L as its ranking system, Mothership said:
…as you play more games throughout the season, how much Rounds matter vs Win/Loss decreases exponentially.
A more statistically precise way of phrasing this is that Win/Loss is an unreliable measure of skill in smaller sample sizes, but overtakes Rounds Won in accuracy as the sample size increases. With 6s playing twice as many matches as Highlander in a regular season, its data set is twice as large, and W/L handily establishes itself as the preferred ranking system. This seems to be a point of general agreement among the Highlander community as well, given that W/L is consistently voted against in polls, with players often stating they feel inadequately represented by it when the number of matches is so low.
Let’s draw a couple conclusions then:
- Win/Loss is insufficient given the low sample size of Highlander matches in regular season.
- Since each match in Highlander is twice as statistically valuable as a 6s counterpart, HL players want the rewards from each match to be more accurate and representative of their performance relative to their opponent.
A quick detour before we move onto Rounds Won: this sample size problem is well understood and modeled in the statistical world. One solution offered to counteract the unreliability of low sample sizes is Bayesian averaging, which calculates a weighted average based on a predetermined set average and a confidence rating for the current data set. It essentially compares the data set against the history of data sets for the topic, and guesses what the “real” average would roughly be. But tools like Bayesian averages shouldn’t be the first thing we reach for when we already have a second available set of data to use to reach a more accurate result:…
Rounds Won
So let’s jump straight to the direct opposition to Win/Loss and explore what the limitations of this system are. On its face it seems like the most unbiased possible system to use. What could possibly be wrong with rewarding teams in perfect proportion to their in-game performance? It’s easy enough to demonstrate potential problems with a simple scenario.
Team Alice has a 6W-1L record, but they only have a 53.33% RW record over regular season. Meanwhile, Team Bob has a 2W-5L record while sitting at 55.55% RW. This result is possible with this spread of matches:
A | B
4-3 w | 4-0 w
4-3 w | 4-0 w
0-4 l | 3-4 l
2-1 w | 1-2 l
2-1 w | 1-2 l
2-1 w | 1-2 l
2-1 w | 1-2 l
Intuition tells us that Team Bob being better than Team Alice is a ridiculous idea, and that it looks like Bob is prone to choking hard and only seem to perform well on KOTH occasionally, while Alice have very consistent wins across multiple gamemodes and only had a single poor showing. Meanwhile, Team Charlie is somehow leading over both of them with 3W-4L and 59.26% RW (trust that I plugged the numbers in right). For good measure, let’s throw in Team Dave with 4W-3L and 72.73% RW. Life is terribly unfair.
I’m obviously deliberately crafting these teams to be as objectionable as possible, and this isn’t representative of what the majority experience of Rounds Won would be like. Actually, I think if you took a peek at the current Invite 6s standings, looking only at the % Won column would give you a pretty accurate idea of their skill relative to each other, though this is obviously a statement of opinion. For the most part, I think Rounds Won would be a quite alright way of measuring performance. But it doesn’t account for the fact that in our hypothetical case, we know full well that Alice beats out almost all of their division in head-to-head matches. What I’m trying to say here is - most of the time, Rounds Won is pretty good, until it’s very, very wrong.
Rounds Won’s main failings actually lie precisely where high degrees of confidence in the rankings are most desired: when teams and matches are close. We almost universally agree that the better teams are the ones that win more, but Rounds Won has the potential to directly undermine that view.
There’s also some subtle mathematical issues underneath this whole assumption, and it has to do with the fact that we’re counting rounds at all. Stopwatch maps have a possible play range of 2-3 rounds, while KOTH maps have a range from 4-7. This creates problems in both relative and absolute terms. In relative terms, you can only win a maximum of 33% of the rounds in a Stopwatch loss while you can win up to 42.86% of the rounds while losing KOTH, which turns KOTH into a much more important battleground for precious ranking. In absolute terms, trying to come up with a salient mathematical comparison for the impact of winning 2/3 Stopwatch rounds versus 4/7 KOTH rounds evolved into a headache so massive that even I wasn’t willing to push through and figure it out. All that matters is that you understand that KOTH and Stopwatch aren’t the same, not even if we assign them equal values. KOTH should be understood as the difference maker in ranking, due to having greater fidelity in demonstrating relative skill levels of teams.
Some conclusions I draw from all this:
- Rounds Won tends to be a stable measurement of relative skill in a division, but breaks down in nuanced situations and can sometimes dramatically disregard the intuitive preference we have for win/loss.
- Once we start factoring individual rounds, KOTH emerges as a critical source of round wins for closer match-ups, potentially granting a greater proportion of %RW or match points. Opinion: this slight imbalance of importance towards KOTH should be noted and preserved as best as possible, due to there only being 3 KOTH maps to 4 Stopwatch ones.
It’s also worth observing that there’s already an element of win/loss in rounds won: the winning team gets more rounds, after all. If pure Win/Loss is too chaotic for Highlander’s small sample size of matches, and Rounds Won can result in questionable seeding surrounding close divisions, how then do we strike a balance between the two?
Match Points
The answer may be to quite straightforwardly balance between the two. Awarding from a pool of match points per match means we can distribute them however we see fit, in any arbitrary fashion we please.
I’ll start by demonstrating a simple 80/20 RW/WL bias. We’ll use the current 9MP per match. All it takes is to calculate the exact MP given by Rounds Won, multiply it by 80%, then add 1.8 to the winning team (20% of 9MP). This gives us the following table of potential MP results:
Stopwatch KOTH
2-0 = 9-0 | 4-0 = 9-0
2-1 = 6.6-2.4 | 4-1 = 7.56-1.44
| 4-2 = 6.6-2.4
| 4-3 = 5.91-3.09
Honestly, I quite like this already, and all I would change is to round everything to the nearest .5MP for both practical and aesthetic reasons, which conveniently results in deviations of <=.1 in absolute terms and <=1.5% in relative terms. The advantage of this is that at its core, it’s mostly proportional, and it mathematically operates on both gamemodes identically. Including a built-in W/L bias also avoids potential issues with MP rounding errors like the one addressed this season.
Small detour to clarify my earlier math on this topic where I framed this bias as “errors”: while calling them errors certainly makes it seem like I’m implying rounds won to be the One True System, it instead has everything to do with the fact that under the current system, Stopwatch and KOTH do not bias towards W/L in equal measures. Stopwatch is currently identical to RW, while KOTH scales from pure RW to a 44.44% W/L bias. As I alluded to in the last paragraph of that post, the consequences of this are much greater than first seems, as losing KOTH maps, and only losing KOTH maps, results in a team receiving less than a proportional share of points, while losing Stopwatch awards you full points for your performance. This directly devalues the importance of KOTH in the regular season, and is the sole reason why I changed my mind against the currently implemented system. It single-handedly reverses the capacity for KOTH to be the gamemode where you recoup much needed points against a team you’re closely matched with and turns it into a mode where every hard-fought round you win is valued less. °fa-info°(This problem came about by applying an absolute measure to a relative question. By assigning winners 6 points minimum regardless of how the game was won, RGL ignored the basic fact that Stopwatch and KOTH are not at all the same. Their win conditions are determined differently and their ranges of rounds are markedly different.)
There is, of course, no reason why such a bias can’t be formulated differently. It could be graded on a curve such that the closer the match score, the further we skew MP away from a W/L bias and approach a purer RW result. As far as Match Points is concerned, the sky’s the limit on how you could choose to award points.
- Match Points allows for the possibility of accounting for both Win/Loss and Rounds Won to any degree so desired. But this leaves a great question unanswered: to what degree should we account for W/L or RW?
Practical Considerations
Decimals and Fractions
°fa-info°(Coders, bear with me as I’m going to butcher your terminology.) When it comes to implementing these systems in RGL, we avoid repeating decimal numbers like the plague because they inevitably result in rounding errors when stored as values inside a computer. This is the reason a fraction like 1/3 caused such an issue with playoffs in prior seasons, as rounding it to 0.33 allowed for a 0.01 error that cost some teams in their final seeding. This is also the reason we’re having trouble figuring out how to divide KOTH points, because 1/7 converts into an obscene decimal to have to work with. One easy solution I can think of is to store numbers to four decimal places, but round and display them to two. 1/3 would be stored and operated on as 0.3333, but displayed as 0.33. This alone would have solved the issue with the rounding error, as subsequently multiplying 0.3333 by 3 results in 0.9999, which we would round to 1 after the two decimal place rounding operation. Whole number achieved, panic averted. This also works with divisions by 7: 1/7 would be stored as 0.1429, which multiplies by 7 into 1.0003, which again rounds very neatly to 1. So would 3/7 and 4/7: 0.4286 + 0.5714 = 1.
This is a cheap and effective solution to rounding errors which would enable any kind of fractional/decimal calculation we could desire for our match points without impacting seeding, but alas, mathematical accuracy isn’t the only consideration.
Aesthetics
I am deadly serious when I say the aesthetic experience of reading and interpreting the numbers used for team rankings is important as well. Even I don’t want to check out the division standings and try to figure out what the hell having 32.74 MP actually means. I especially don’t want to use 210 MP for each match and need to deal with match points in the hundreds for every single team. Both Win/Loss and % Rounds Won have the advantage of always being immediately comprehensible, which enables interpretation and discussion of results and encourages healthy league dialogue. Likewise, I can’t abide by a system that creates MP rankings that, while you can still tell at a glance who’s above who, make it difficult for you to determine what’s actually happening underneath, and what’s being valued.
This is the reason for the .5 rounding in my hypothetical 80/20 RW/W-L system. Let’s revisit Teams Alice, Bob, and Charlie, and recalculate where they would end up with this new system, including .5 rounding.
Applying my Match Points idea
- Alice = 38MP (6W-1L, 53.33%RW)
- Charlie = 34MP (3W-4L, 59.26%RW)
- Bob = 31MP (2W-5L, 55.55%RW)
I don’t know about you, but that looks like quite a respectable measure of consistency to me. Lemme calculate it for the current Invite HL standings too.
Current:
Adjusted:
Most of it’s pretty similar, but the biggest difference is that the gap between the fog and somebody help has closed by a total of 2.5, which is just enough to make that playoffs spot that much more competitive. The only inference I’m willing to draw from this is that it looks to me like the impact of win/loss is slightly inflated in the current rankings, with the adjusted division rankings tightening it up by just that little bit. This is more or less what I predicted for the current system in my analysis of it, and even after building in the 20% W/L bias, it still compensates nicely for the undervaluing of KOTH.
Conclusion and Replies
I didn’t directly state as much in any of the above sections, but it seems quite clear to me that given the difference in which Stopwatch and KOTH games are valued, the system needs to be adjusted to account for this. It’s also clear from surrounding discussion that while the community largely wants to base match points on rounds won, the precise degree to which to do that has never been discussed - and indeed, that there was a degree to consider at all has never even been put forward until recently! This is a simple enough question to get an answer to.
Put out a poll that includes a question asking respondents to rate the importance of rounds won vs. win/loss on a scale from 0-10, 0 meaning pure RW and 10 meaning pure W/L.
It won’t be perfect, but it’ll get us a pretty good idea of how the community views the relative importance of these two.
I threw it together in a hurry, but I quite like my proposed 80RW/20WL system. It outputs similar results to the current system while eliminating inconsistencies and clearing edge cases well.
Here’s some responses to some posts made in the time that I’ve been planning and writing this essay.
There is always the option to ditch rounds in favor of halfs for koth. First to 2 wins the half., best of 3 halfs.
This strikes me as quite an arbitrary solution to the issue in question. Considering that a won round can already include a variety of performances from a clean 3:00-0:00 sweep to a close 0:00-0:00 double overtime, a single round is already a winner-takes-all game, and to further group these rounds into individual halves is to make the winner-take-all the winner-takes-alls. It can turn a 4-2 round result into a false 2-0 sweep. It “solves” the mathematical problem, sure, but compounds on the problem of trying to accurately represent performance and consistency.
@vibeisveryo
You are looking at this from the viewpoint of measuring its deviation (error) from pure proportionality, which you believe is an ideal to strive towards.
Already addressed this somewhat with my clarification on that math post, but the point isn’t whether or not I think pure proportionality is the ideal to strive towards, but rather the inconsistency in how rounds are measured and valued between gamemodes, which creates a serious imbalance, and is especially strange when taking into consideration that we’re not even sure how exactly we’re valuing win/loss vs. rounds won.
I highly value the teams in playoffs truly being the best teams in the div - I think this is more important than whatever system we use to get there, and I don’t think pure win-loss would do a good job. So I support match points, BUT: I still value winning matches, and so I support the current match point system where 2/3 of the points are given to the match winner, but the
loser still has a shot at some value out of the match beyond for tiebreaks.
I believe the methods I’ve proposed provides adequate support for precisely this principle while also mitigating inconsistencies.
The reason I don’t personally believe a poll is necessary is because I’m not convinced the system you propose is a good enough alternative to the current system to be pitted against it.
As I’ve stated before, I think the fact that the current system inconsistently values Stopwatch and KOTH games to the extent that it does (44%!!) should have been immediately disqualifying on principle, and polling was just the most obvious way to clarify exactly how we should be performing the valuation.
A clear example of this, in the context of RGL, is that before the rotation in HL was established, it seemed that every season, people voted for either Cascade or Lakeside, and then at the end of that season poll, hit the “fuck go back” button.
RE: this and the general notion of polls not being great ways to decide things, honestly, I can do nothing but to point the finger at RGL for this problem. We’ve managed to agree that the previous polls on this topic have not been sufficiently well worded to achieve insightful results, but I also distinctly recall a former admin (who I will not name) bragging about having rigged the methodology of a poll to achieve the specific result he desired for the season. Call it hearsay if you want, but it still speaks to the obvious truth that it’s child’s play to lie or manipulate people using statistics. Knowing that RGL can make mistakes, and knowing that admins inside RGL could even be actively subverting the process, RGL ought to respect the difficulty and seriousness of asking the right questions. Knowing what the right questions even are is not a small task. Please take it seriously and don’t make light of it.
As Inq later alluded to, the notion that simply because people voted for Cascade or Lakeside one season, then subsequently voted to swap them out again, this doesn’t mean that the community just suddenly realized they made a horrible mistake and tried to undo it the following season. Interpreting survey results is just as much of a difficult task as planning a fair survey, and things are rarely as obvious as they seem (think the old adage correlation ≠ causation). When directly asked later, the community indicated that they’re in favor of rotating maps out even if they don’t necessarily strongly prefer the maps rotated to, just for the sake of variety and novelty. Knowing this now, would you rule out the possibility that the previous polls that resulted in switching out Cascade and Lakeside were just the community using the only mechanism they could find for this exact idea?
RGL asks the questions, and RGL controls what the possible answers are. So it seems to me that RGL needs to step up their statistical game if they intend to do things this way.
Someone suggest a publisher for this novel.
shoutout to Xenagos upvoting this post one minute after it went up, when he couldn’t possibly have managed to read a single paragraph of it