Highlander Match Point System
@vibeisveryo I have not read a word of anything that anyone has written in this thread.
@Flare hey I was just appreciating the work you put into your post. I was like damn, this is in fact an essay.
@Flare I have a few notes on some things you said.
Note: See the edit below my post, but this first paragraph is what I originally wrote.
Firstly, I know Flare shouldn’t be the one to explain what he mentioned about a former admin “rigging” a poll, so I’ll elaborate. What Flare refers to is when I wrote the Cascade vs Lakeside poll for season 7 (Maybe season 6 or season 8? I don’t remember). I came out the encounter saying that I “rigged” the poll, but this was mostly a joke because I knew what the results of it were going to be before it was even published. It wasn’t actually rigged in any sense, as there was no misconduct in counting it, there was no fraud, and the wording wasn’t misleading. The difference was that I worded the poll to explain directly the consequences of the results of the poll, rather than polling something more general. Precisely what happened was that Lakeside was favored over Cascade by some small margin like a few percent, but keeping Lakeside in the rotation would mean playing the same map pool twice in a row (This was the precursor to the later poll that introduced the seasonal rotation). I knew that, even though people preferred Lakeside to Cascade in direct comparison, most people didn’t want to play the same maps. Hence, I wrote the poll to indicate that the map pool would be identical if we kept Lakeside in. I believe a vast majority, something like 70% of people voted to not play Lakeside. This was obvious to me before the poll went out because I knew that most people cared more about rotation than about what map we played. In this sense, I still believe that I did my job to extract the correct information, and when told Flare that I “rigged” the poll, it was not meant to be taken literally. He of course, never the details of the encounter, so I don’t blame him for citing it in his essay.
Regarding your math on percentage of rounds won, I’ll remind people of something that I know that Flare knows, but he fails to mention. My system to assign match points is not equivalent to computing rankings based on percentage of rounds won for the entire season. I argue that match points for a given match should be divided according to rounds won, which accounts for the fact that there are more rounds in a KOTH match than a stopwatch map. Under my system, where we assign 9 match points to any given match, you’d see the following results for the teams Flare uses.
Team Alice ends up with 34.3 match points (rounded) and Team Bob will have 42.75 match points. This is a much more significant difference than Flare’s purely percentage of rounds won calculation. Now, I’ll refer back to my earlier post responding to Vibe about the value of teams in playoffs. Crudely, if the Team Alice loss was to the top team, the rest of their wins are basically worth nothing because once they finally get to grand finals, they’ll just get rolled by the top team. On the other hand, if Team Bob makes it to grand finals, they’ll most likely lose, but are also significantly more likely than Team Alice to win the entire division. While upsets do happen, it almost never the case that these upsets are complete rolls, which is why it’s safe to assume that Team Alice got fisted by the top team in their 0-4 loss.
I wanted to put some math to this, but its not worth it because I’d have to more speculation than I’d like to about what the rest of the match results were. It’s more important to note that Flare example is a monster case that shows the ugliest results of the system, not what it will usually do. Its still the case that dividing matches by rounds won as I suggested would be a fair way to divide points, and the example I posed earlier about teams having perfect wins and losses vs imperfect wins and losses is much more reasonable than his example.
Lastly, I want to comment on your idea about polls. It would certainly be useful to see how important they rank win/loss vs pure rounds, but its important to note that this could cause more problems than it solves. Most people will be relatively one-sided on this issue, meaning that they’ll either vote to almost entirely value match points, or almost entire value rounds won. This will create an average somewhere in the middle, which will make satisfy nobody. This is why I wrote my proposed poll in such a way that clearly tells us whether we prefer to stick it at one end of the win/loss - rounds won spectrum, or whether it should be somewhere in the middle. In essence, your method of finding out where on the spectrum we lie is only valuable to us if we already know that a majority prefer the compromise in the first place. It should not be a replacement for my poll, but it could be a good addition.
Edit: I have been informed that Flare was not referring to me regarding the rigging of polls. The above story is still true and mildly relevant about polls, but apparently there’s something else he was referring to.
@Xenagos just teasin bro
you mention that W/L becomes a more accurate way to judge teams at a larger sample size compared to RW, but at one point does it take over? From your statement its somewhere between 7 and 18
The answer to this isn’t much of a mathematical one, and more of a logical one. The reason W/L tends to be more accurate is by definition: we discuss team skill as a judgment of their ability to win against other teams. Therefore, winning matches means you’re more skilled. The thing that makes this a statistical problem is that given a small sample size, the skill that’s being measured in the W/L is actually quite narrow. It’s a win against ‘this team’ on ‘this map’. You then play a ‘different team’ on a ‘different map’. If we take this to its logical conclusion, we should theoretically get a perfect skill measurement by having every team face every other team on every other map at least once, which in our Inv/Adv 8-team round robin means 49 (7x7) matches at minimum, then in increments of additional 49s if you want to be really, really, really sure.
Because this gets stupider when you factor in outside issues. “Oh this team was just less skilled on that particular day because the Demo’s dog made him sad/their Sniper had 10 higher ping than usual/the Medic was having dinner during the match.” All of these can absolutely be true, and so would just mean that we have to understand match results as not being a measurement of the team’s skill as a whole, but the team’s skill against their opponent in that particular slice of space and time.
This is veering into “I’m 14 and this is deep” territory, so I’ll cut it off and say that the point I’m making is that we understand a win to be an absolutely true measurement of skill for the match in question, but that various circumstances can prevent us from seeing this as a reasonable measure of skill for any other version of the team playing any other match, much less in relation to the set of all possible matches they could play. So it logically follows that the more of the different possible matches are played and measured, the more complete our picture.
(sorry that at no point did I actually directly answer your question)
[stuff about how Inquisition did not in fact “rig” the Lakeside vs Cascade poll]
As you later learned and edited your post to reflect, I wasn’t referring to you or to that particular poll.
I’ll remind people of something that I know that Flare knows, but he fails to mention. My system to assign match points is not equivalent to computing rankings based on percentage of rounds won for the entire season.
This is true, and mostly irrelevant as far as I’m concerned. I must’ve cut out about 1500 words of fat from the essay to try to keep it as lean and focused as possible, and explaining and analyzing the difference between a Rounds Won percentage and Match Points assigned proportionally to rounds won ended up being one of the things that, while not insignificant from a mathematical perspective, failed to render any meaningful change in principle – so I just went with % Rounds Won which is already functional and viewable in the 6s league.
[adjusted analysis for Teams Alice and Bob using proportional match points instead of % rounds won]
While you’re right that there’s a more significant difference in results, this does nothing to address my accusation that it strongly contradicts the intuitive understanding that a team that wins 6 games and loses 1 is almost certainly stronger than the team that wins 2 and loses 5. The gap between their results also widens rather than narrows, which indicates that your MP-based system actually exacerbates this disconnect between intuitive understanding and the league’s officially endorsed ranking.
Hence why I started this post by explaining the reason why W/L is, by definition, the more accurate measure of skill, because who wins or loses is the exact thing we are trying to predict/discuss when we talk about skill. This is important because implicit in your further discussion of the hypothetical division is the assumption that rounds won is the measure of skill, rather than a single data set in attempting to predict the set of all possible matches.
I need not rehash why directly measuring W/L is a poor choice despite W/L ostensibly being exactly the thing we’re looking for – yada yada sample size, it’s only perfectly true for that particular match, etc. etc. But I also explicitly admitted that the hypothetical division was crafted specifically to showcase the worst possible results of a purely rounds-based system, and my conclusion that I explicitly stated was that while rounds won tends to work in average sample sets, going purely by rounds can potentially usurp common and basic knowledge about what a “win” is. A win is the thing that happens when you’re more skilled than the other guy. We’re just using ancillary data to try and model this skill, but it should not be confused with skill itself.
tl;dr W/L is by definition perfect, but only for that one match. Rounds won tends to be pretty good, but carries the possibility of being outright wrong in certain cases.
Everything else is speculation, and that’s a given. We cannot get a perfect measure of all possible matches to be played, so we make do with the limited data we can get.
[stuff about why surveying for the community’s relative preference for either RW or W/L isn’t helpful]
It seems like an assumption that we would take the results of that poll and directly implement them is implicit in your post, but this isn’t at all how I championed (or would champion) using this data. At the end of the essay, I explained at length to @vibeisveryo why poll results must be treated with care and require a certain amount of interpretation. Given that the phrasing of the question I proposed was not “which of these listed courses of action should we take”, but rather “where do your priorities lie on this sliding scale”, of course it’s foolish to take action on those results directly. That was never how they were intended to be used. It’s clearly not a poll on taking an action a la democratic referendum, it’s survey designed to gather more data.
By all means, feel free to concurrently directly ask what type of system they want to see implemented, but don’t confuse my poll question for that.
Besides these points, at the end of the day I’m still in agreement that the currently implemented system is clearly flawed and must go, and that we ought to ask the community in precise terms what they want to get out of the ranking system. The purpose of my essay is less about advocating for any one particular choice, and more about educating and clarifying what each choice actually means, so that people who read can make a more informed decision.
@Flare I didn’t mean to imply that you were in favor of using your poll to create direct action. I just meant to comment on what the correct usage of such a poll should be.
I also forgot earlier to address another part of your previous post. The aesthetics of the match point are in no way relevant and should never be considered part of the discussion. Whatever we determine to be the best system should not be compromised in favor a nice representation of it on the league table. You did suggest that we could store data with greater precision than what we display, which is an acceptable option. You did mention later when discussing the 80/20 method that we could round to get nice figures. We should never be rounding in any case that alters the system we decide to be correct, regardless of whatever that system may be.
Thirdly, the point I was making about your example was that it exists only in theory. We don’t need to account for it in the match point system because its not going to happen. Even if it did, I’ll still bite the bullet and say that your intuition about which team is superior is wrong. The regular season is not about finding which teams beat which other teams, its about finding the correct seeding for playoffs. Playoffs is the only time where we really care about who beats who. If a team is consistently getting close games, they’ll be more relevant in playoffs than a team that we know wont stand a chance against top teams.
@Flare I am aware that its a never ending struggle for more and more accuracy. In context it seemed like I wanted a hard answer, but in reality I’m fine with a mere feeling. How many matches would it be before a straight W/L would be a better way to rank teams compared to RW or a convoluted MP system, regardless of how its defined.
@Inquisition Flare’s point about number aesthetics is quite valid. To say its irrelevant its incorrect. This season MP went to 9 to help with the issues of .3333 showing up and the problems around it, why wouldn’t you adjust things to a similar standard with whatever new system arises?
I also find your intuition about the regular season complete bullocks. How can you say teams who make close games are better than ones who can win their games? If all you care about is playoffs then why bother with a season? Just play an elimination bracket at that rate. If your goal is to find teams who can compete and outperform their opponents, then thats not going to come from teams who can only come close
Flare’s point about number aesthetics is quite valid. To say its irrelevant its incorrect. This season MP went to 9 to help with the issues of .3333 showing up and the problems around it, why wouldn’t you adjust things to a similar standard with whatever new system arises?
It was to solve the rounding errors. The aesthetics are a bonus.
@vibeisveryo yes… so why wouldn’t you want to also have a new MP system also not have rounding errors and make it whole or .5 numbers as was suggested? a .1 error is way less drastic and imo would be acceptable to do so. its the same concept
@Pain-Seer Note that rounding errors of 0.01 in the 3 point system had implications both for teams making playoffs and with playoff seeding.
@Pain-Seer I can’t speak for vibe, but my point is that the way to solve rounding issues is to just write things down in a way that a way that doesn’t cause rounding issues. This can be accomplished by just using fractions, or by using the method suggested by Flare of storing more decimals than displayed. We shouldn’t change the match point system itself to round just because it looks better. Rounding and representation should be an afterthought and should never corrupt the purity of whatever match point system we decide on.
Hey, I’ve been busy but I do intend to take a look at this thread later
Yea me too
The aesthetics of the match point are in no way relevant and should never be considered part of the discussion. (…) We shouldn’t change the match point system itself to round just because it looks better.
Flare’s point about number aesthetics is quite valid. (…)
It was to solve the rounding errors. The aesthetics are a bonus.
To be absolutely clear, rounding errors and aesthetics are separate – though related – issues. It’s true that avoiding rounding errors creates more aesthetically pleasing results, but this is not at all the crux of my argument surrounding aesthetics – hence why I provided a potential rounding error solution first, then tackled aesthetics afterwards.
My point regarding aesthetics is not about achieving results that look good, but are parsable to the average player. It isn’t about achieving some visual perfection, but about serving the audience that the system is built for and not partaking in some creator’s indulgent fantasy of “pure” and “perfect” math. If we formulate these two considerations into premises,
- The underlying math should not be unacceptably compromised to create incorrect or misleading results, and
- The presentation should be readable and understandable at a glance to those not versed in statistics
Effectively communicating your results to a non-mathematical audience is not an irrelevant consideration – it is the entire purpose of condensing wide and varied match results into a single number. Otherwise if we were just trying to calculate the most mathematically correct result, we would be publishing division results in the form of statistical papers with p-values. If the use of the word aesthetics has clouded the point I’m attempting to make then that’s on me, but this should be a crystal clear clarification of what the real consideration I’m pointing out is.
I revisited S5 of Main HL to fully explore the consequences of rounding errors in real situations, in which the 5 of the 6 playoffs teams had their standings under scrutiny due to 0.01 differences in MP.
Here’s the reconstructed standings of the division going into playoffs:
Of the 6 teams, only the 4th place team perc30 is unquestionably correctly seeded. I would venture to say Klowwd9 is placed correctly due to having a stronger W-L record despite being tied with Farmer$ in MP, and similarly I’d say We Overslept deserved first seed while whalemart ringme and Utopia Turned Dystopian ought to have played a tiebreaker match. But let’s see how this shakes out when we apply my suggested MP methods with and without rounding to .5.
I’ve recalculated the standings without rounding using the teams’ match records here:
(click to see these standings using the anti-rounding error method I proposed here)
Well, this is not the result I expected. While We Overslept is now definitively at the top as I supposed, whalemart ringme and UTD have been adjusted such that one is clearly placed above another. The bigger surprise is that Farmer$ was moved above Klowwd9 despite having a worse W-L by one, which was entirely contrary to my prediction. In fact, all the teams now have clear rankings. What the hell happened?
Digging into this reveals the reason: the close seeding actually stemmed from the faulty valuation of KOTH rounds, the exact reason I changed my mind about how we rank teams. whalemart’s mixed results include 2-1 Swiftwater, 2-4 Cascade, and 1-2 Vigil (2 Stopwatch and 1 KOTH), which if properly calculated by proportion should have granted match points exactly by 2-1 to the teams, but instead robbed them of 0.33MP. UTD on the other hand had a 4-1 Ashville and a 1-2 Swiftwater. Their Ashville game granted them 2.67 MP, when the correct proportion would have been 2.4MP, overvaluing their game by 0.2666…MP. This created a false equivalence between their games when in fact whalemart had won a higher proportion of their rounds played. (Of UTD’s lost matches, they only earned MP on one of the two, while whalemart earned MP on both of their lost matches.)
Understanding this, it’s also easy to see now why Klowwd9 and Farmer$ switched places. Despite Klowwd9’s original 0.01MP lead and their slightly stronger W-L record, they dropped MP on their wins, failed to claim MP on either of their losses, and had their KOTH wins overvalued by a total of 0.6257MP. Farmer$ on the other hand dropped fewer rounds on wins, and had both their KOTH losses undervalued by 0.5557MP.
Here are those standings again using .5 rounded match results:
No surprise, once the overvaluing of KOTH wins is corrected, it blows any potential rounding errors out of the water. It also led me to realize that, in fact, rounding errors that impact seeding are not possible if we were to use any system that begins by properly apportioning match points in proportion to rounds played. There’s just no way to approach a small enough difference after rounding when we only play 3 KOTH maps, but we could have a potential MP denominator anywhere from 5 to 7, so we would have to play at least 5 KOTH maps for there to be any possibility of a significant rounding error.
There is no problem here. The entire rounding error problem was in fact brought about by the faulty apportioning of match points in KOTH to begin with. We fix the inconsistent MP rewards, we fix rounding errors impacting seeding.
Okay so… now lets run the poll. Seems pretty clear that we should be doing it now.
i miss perc30…