Be careful with using online Tournament Results. Magic-League masters (which this is one of) should be fine to use, as they are swiss tournaments, and get large numbers of people. Don't use Trials though, as they are knock out tournaments, which distorts results completely.
yea, i reviewed it beforehand
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
Complete_Jank, it looks awfully ambitious. I have to question a few of the additions you are putting in, although I can see where you are getting ideas from.
First, I do not think that "difficulty" (via decay or an added bonus for 'luck') does anything for a rating. In all honesty, there is no way to judge how one deck will perform vs another. Luck would say that a deck, in essence, just did better than another one on a particular day. Placement aka finish describes this; imo, there is no need for an additional abstract value to add to the total value of a deck for 'luck'. It also questions the notion of "regular" and "lucky" type status -- by what I mean is that your added bit for "luck" would say that a deck could have been "luckier" one day than the other -- and if this is true, there is some standard rate for luck. I disagree with this concept as a whole because I do not think you can value luck. Things are purely relative...kinda like baseball. If somebody has a higher average than a few other people, is there a "standard average" to which all batting averages are compared to? Luck is always involved in every tourney outcome, but I disagree on a need to break it apart.
Additionally, to really assess luck would be difficult. You'd want to calculate misplays for every scenario, probably want to attribute what deck faced what, who had a solid topdeck, etc -- to really get an idea of "luck"
Tourney Length I agree with. I think finding that information is incredibly hard however -- but I would agree on its conclusion "in a perfect world" lol. The only issue is the limited info organizers post. I like the R vs "standardized amounts of rounds" -- however, i'd have it adjustable to some kind of scale (small, medium, large, GP-sized) -- not just vs 5 rounds.
Top8/Top4 idea
I'm open to seeing this implemented, however, I do not like it in the current form -- this must be reworked somehow. At the moment I think sheer placement works fine, but if this could be improved to provide some real differential (i see it adds some sort of "weight" to things), I'd consider adding it.
Cutoff
Not every tourney runs multiple days. In fact, only a few do. Even a large scale legacy tourney (100+ people) will be worked out in its entirety in 1 day. A tourney with 30 people doesn't need a cutoff. Same goes for many other sizes -- I'm a bit confused by it (unless its meant for only a GP or something). Legacy simply doesnt get 2 day events as often as other formats.
Decay
This is by far the hardest thing to do without having things automated + I disagree with the concept. Have values and scores for X time and cut them out after a certain time (like an expiration date). No decay or anything is needed. Plus just keeping up with a simple formula + having a life is tough enough lol. Decay = OMG. Having scores simply expire is a more rationale decision. This also does not hurt downtimes in legacy. I'm talking about "dead months" -- summer will be more active than fall/winter. Stuff like that.
-I'd love to hear feedback. I like some of the ideas, I didn't intend to come off nasty in any of my responses.
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
Complete_Jank, it looks awfully ambitious. I have to question a few of the additions you are putting in, although I can see where you are getting ideas from.
The formula is taking into account many things. Math is like English in many ways, sometimes you have to write a very long sentence to get what you want, sometimes it is very short. However, the more detailed you are the more exact the sentence will explain to others.
First, I do not think that "difficulty" (via decay or an added bonus for 'luck') does anything for a rating. In all honesty, there is no way to judge how one deck will perform vs another. Luck would say that a deck, in essence, just did better than another one on a particular day. Placement aka finish describes this; imo, there is no need for an additional abstract value to add to the total value of a deck for 'luck'. It also questions the notion of "regular" and "lucky" type status -- by what I mean is that your added bit for "luck" would say that a deck could have been "luckier" one day than the other -- and if this is true, there is some standard rate for luck. I disagree with this concept as a whole because I do not think you can value luck. Things are purely relative...kinda like baseball. If somebody has a higher average than a few other people, is there a "standard average" to which all batting averages are compared to? Luck is always involved in every tourney outcome, but I disagree on a need to break it apart.
Additionally, to really assess luck would be difficult. You'd want to calculate misplays for every scenario, probably want to attribute what deck faced what, who had a solid topdeck, etc -- to really get an idea of "luck"
You can remove luck from what I said above, that whole formula is not about calulating out the luck factor.
Tourney Length I agree with. I think finding that information is incredibly hard however -- but I would agree on its conclusion "in a perfect world" lol. The only issue is the limited info organizers post. I like the R vs "standardized amounts of rounds" -- however, i'd have it adjustable to some kind of scale (small, medium, large, GP-sized) -- not just vs 5 rounds.
Calculating (R/5) means that any tournament less than 5 rounds will have less value, and longer tournaments will gain more value. (R/5) alows the formula to adjust itself for the size.
Top8/Top4 idea
I'm open to seeing this implemented, however, I do not like it in the current form -- this must be reworked somehow. At the moment I think sheer placement works fine, but if this could be improved to provide some real differential (i see it adds some sort of "weight" to things), I'd consider adding it.
By doing this you value finishes of decks that had to play out the top 8 verse when the tournament just plays an extra round of swiss to decide top 8.
Cutoff
Not every tourney runs multiple days. In fact, only a few do. Even a large scale legacy tourney (100+ people) will be worked out in its entirety in 1 day. A tourney with 30 people doesn't need a cutoff. Same goes for many other sizes -- I'm a bit confused by it (unless its meant for only a GP or something). Legacy simply doesnt get 2 day events as often as other formats.
I know that there is about 1 tournament every 12-18 months that goes two days. The point of this is for the determining how easy it was to make top 8, but it can be used for cuts to day two as well.
If you are in an event with 33 people, it is much easier to make top 8 then if you play an event with 64 people, even though you would play the same amount of rounds before cutting to top 8.
Hmm, haven't read the thread, and I have to jet to class, but:
The formula looks really poor/arbitrary. I'm sure a tiny bit of effort went into making it, but it doesn't show very well. For instance: Why 1.5 * the difference? That's just a random number. Why not at least a cool number like 2.7182818? That value doesn't seem to make sense.
But the real thing you're trying to measure is the probability of a given deck winning. It would be straightforward to sculpt an algorithm that corresponds to this, given the similarities of swiss round pairings and then transmuting that into an ELO rating.
Anyway, since I have some statistics/programming experience, I can do that sometime, hopefully done by this weekend.
Hmm, haven't read the thread, and I have to jet to class, but:
The formula looks really poor/arbitrary. I'm sure a tiny bit of effort went into making it, but it doesn't show very well. For instance: Why 1.5 * the difference? That's just a random number. Why not at least a cool number like 2.7182818? That value doesn't seem to make sense.
But the real thing you're trying to measure is the probability of a given deck winning. It would be straightforward to sculpt an algorithm that corresponds to this, given the similarities of swiss round pairings and then transmuting that into an ELO rating.
Anyway, since I have some statistics/programming experience, I can do that sometime, hopefully done by this weekend.
I'd recommend reading the thread
And no, the goal is not looking to assess winning probability, it's assessing results -- at least that's my intention. Every deck has the same probability of winning. I was looking at computing some sort of score to a finish and then averaging all finishes of a deck.
*haven't worked on this in a while because I am incredibly busy with real-world stuff at the moment (missed all of march).
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
I read through the rest of the thread. Nobody says anything relevant, why did you make me read that? My criticisms are still valid on their own.
I do realize from your posting that you're very defensive of your formula. I hope you don't take this too personally, I'm not intending to insult you, but the formula is quite ridiculous and I have no idea where you got it from. It appears to have no statistical knowledge or anything behind it. It looks like you just pulled it out of your ass and you sortof thought the numbers sortof worked.
Please do the following: 1) Read the rest of my post. 2) Look up what a Pareto Distribution is (I assume that you might not know, due to the fact that you presented this formula). If you would still like to defend your formula, please provide a list of the tests you ran against your formula and a justification for why you think taking the mean of a sample out of a Pareto distribution is adequate for measuring anything.
(I could pretty easily have an ELO rating system for each deck that would measure how often the deck plays as well as how often it wins, similar to real life players. If you're pretty good and don't play much, you'll have a lower rating than someone who's just as good who plays a ton (assuming you don't play enough for your score to reach equilibrium). It's not perfect, but it pretty accurately would measure how likely you are to see a deck in a high slot in a tournament. Decks that are played the most and win the most have very good odds, but decks that are extremely rare or lose a lot have low odds.
The formula sorta falls apart for sub-500 decks, because decks that aren't played at all would be rated higher than bad decks that are played a lot. I could add some provisional clause, or just hope nobody is worried about a sub-500 deck, even if a lot of people play it, it's simply not good enough to be Proven Competitive.
I think that's pretty much what people are looking for in decks. They want to see which decks win a lot as well as which decks are popular in the same reading (perhaps reported separately and together).)
On the other hand, your rating system suffers from, "Oh look, one guy played landstill one time. Landstill now has a massive rating." Or even just: "Oh look, one random deck-that-always-loses won a medium-sized tournament one time. Now it's a Proven Competitive Deck."
The bottom line: You designed a Pareto distribution. Almost all the points are given out to the winners. In order to be "proven competitive" you can't simply perform well and consistently. You have to win a tournament. This is exactly what a formula is supposed to prevent, but a pareto distribution welcomes this heavy-handedness with open arms.
Here are a few examples from the vault of 2 minutes of toying around with Excel:
Let's say a deck places completely RANDOMLY and plays in an infinite number of 100 person tournaments. After a while, its rating would be 2.866. You'd win some, but most people would be really unhappy to play such a crappy deck as you go 0-X as often as you win.
Let's say you have another deck that places 15th every single tournament. For a 100 person tournament, that's pretty solid. That would be ~top 8 in a 50 person tournament. Most people would be very happy with that finish. Your rating is only 2.777.
Let's say you always place in the top half (50th or better), but never win or place second. So every 48 tournaments, you place third, etc. Your rating is even less: 2.714.
I think everyone would consider a deck that always gets in the top half but doesn't grab first or second to be superior to a deck that half the time places in the bottom half but wins 1/100 times. It's winning considerably more than half its games, probably on the order of 60 or 70% of its matches, but just hasn't won the big cheese. The other deck wins about 50% of its matches, and goes 0-7 just as often as it goes 7-0. I wouldn't say that a deck that wins 50% of the time is proven competitive.
The following example is even more clear:
A deck that wins 1/100 tournaments (again in the 100 person tournament), but goes 0-X EVERY other time to grab the nut low slot 99 tournaments in 100 would still get equivalent point payout of a deck that placed in the top 30% every single time (1.39 points).
It doesn't take my silly counterexamples of pareto distribution follies to show that the formula is crappy. Do you really want to grade on a pareto distribution, especially doing something as ridiculous as "take the mean of a small sample on the pareto distribution?"
That's like if someone asks you what the average income is in the US and you report the mean. It's a completely meaningless number. Might as well report the mode income: 0 (or whatever the welfare pays out these days). Actually, it's not even a simile. That's exactly what you're doing... and not even in a metaphorical sense... you're doing EXACTLY that... literally!
The number that your equation gets is virtually meaningless. The meaned average of the semi-meaningless numbers is totally meaningless.
I don't understand how you could possibly defend your formula. The other guy's was pretty bad, but at least the value it reported had a tangible meaning. Again, you shouldn't take this personally. As a rational human, you should be able to throw your formula out. But even if you do, I'm genuinely curious: What were you thinking? It really looks like you just pulled that out of your ass and went, "Hmm, this doesn't work... ok, 1.5* the difference... Hmm, this would be really inflated for decks that are played a lot... let's just take the mean." At that point, I'd go, "Oh, wait, I'm just taking the mean of a pareto distribution. Time to remember back to elementary school mathematics. Oh yeah, that's really silly because the distribution is skewed more than Fox News. I need to do something other than scramble my data irretrievably"
Actually, before then, I'd go "Wait, I'm just making stuff up now with guess and check. What am I really trying to measure? etc. etc."
And to refute your other argument: What your formula lacks in quality, it doesn't make up for in simplicity. Garbage in, garbage out. Except it's more like: Useful data in, garbage out. The numbers are meaningless. You might as well use this formula: =100*Rand(). It'd be much easier because you don't even have to look at the data and you end up with the same garbage. THAT formula would make up for quality with simplicity.
There's a million ways to present an argument, but starting one with "I hope you don't take this too personally, I'm not intending to insult you" is a sure sign of what's to follow, and you did follow it with a pretty much insulting post. If you want to put your argument forward, feel free, but do it in such a way that doesn't intimidate other users.
I read through the rest of the thread. Nobody says anything relevant, why did you make me read that? My criticisms are still valid on their own.
Horrible way to get people to read your post.
Forbiddian,
I agree that the formula currently being used is complete garbage. I tried suggesting a formula that actually referenced itself off of predetermined facts/rules/guidelines/results, yet was still completely dumbed down to the point that a non high school graduate might understand it. The fact remains that many people who post just don't know much about mathmatics, and to go back to your openning statement...I'm insulted that you didn't find anything relevant said in my posts.
While what you mentioned is correct, the fact remains that often the only information that is provided about a tourney is the Top 8. We don't know each deck that entered, nor the player's skill ratings of the decks that were piloted. Tourney information is limited to what is reported unfortunately, so that guy that went 0-2 drop isn't going to make the formula 95% of the time.
It is possible to make an rating system for each deck, while allowing player rating to factor in as well, but again we as players do not have access to all the information, and even if we did, there will always be determining factors that can't be calculated in hard values. Example of one would be, did the player have a headache or a cold that day; was he hung over or still drunk; was he high or depressed about life? Those effect the outcome of games and matches, but are not reported. In a perfect world, we could gather all the information, but this is not a perfect world.
On the other hand, your rating system suffers from, "Oh look, one guy played landstill one time. Landstill now has a massive rating." Or even just: "Oh look, one random deck-that-always-loses won a medium-sized tournament one time. Now it's a Proven Competitive Deck."
This is a problem because not every deck played is known, as I stated above.
I don't understand how you could possibly defend your formula. The other guy's was pretty bad, but at least the value it reported had a tangible meaning.
You have to realize that is the most we can hope for right now. A formula that actually calculates and compares true facts, and that was what I was trying to get them to accept.
However, in the end this is a formula that has two purposes:
It is for people who aren't smart enough to look at a format and determine which decks are performing well.
It is a formula people will twist so that their pet deck is "Proven."
So, the formula as it is now reflects that.
There's a million ways to present an argument, but starting one with "I hope you don't take this too personally, I'm not intending to insult you" is a sure sign of what's to follow, and you did follow it with a pretty much insulting post. If you want to put your argument forward, feel free, but do it in such a way that doesn't intimidate other users.
Higher Math will always intimidate people so I don't think that is an option here.
Private Mod Note
():
Rollback Post to RevisionRollBack
In Vintage (Type 1) > Budget Deck Discussion forum:
I agree with Forbiddian's comments. When I looked at the formula on the front page it did seem a bit meaningless to me as in I read it and thought to myself "what does this number mean?" and I couldn't come to any kind of reasonable conclusion.
Though the first thing that came to my mind was using binominal rather than pareto though essentially the same thing.
The question to answer in my opinion is "does the placing in top 8 matter?" clearly the original formula is at least attempting to say that it is and from the looks of the formula quite a big difference if you came 8th or 1st. Pareto would only work if you assume that i.e. you believe it only matters whether or not a deck comes in the top 8. I personally agree with that view but there needs to be an answer before you can really do any analysis of the actual formula so that you start off with the same base.
Also determining how important various factors are needs to be done first. Theres no point designing a formula when you basic assumptions are not solid.
1) In a field of all the same deck that deck should always score the same for the tournament. i.e the output for a tournament of 64 landstill decks should be athe same as for 65 or 32. I believe that this is required for it to be internaly consistant.
2) Does anyone have anything else to add as a requirement?
This first problem is largely unimportant and will be eliminated by adding more samples to the pool. It's an outlier, so what? What are the odds of 64 landstill decks showing up to tourney? If all decks in all tournaments suddenly become landstill, then there's probably a reason for it and it should be measured and landstill should be called the best deck until they invariably ban whatever is giving landstill the nuts against every other deck. It's like Flash based decks. They *were* the best decks for a brief period of time before flash got the axe.
For the second, quite honestly, the percentage of all top 8's that each represented deck takes combined with a weighting system based on tournament attendance is probably all this formula really needs to work to determine some combination of what is best/what is most popular. It'll be skewed, admittedly, because of some metas being just plain underdeveloped, but it should even out over enough samples.
My own personal take would be: type of decks in top 8, rankings of said decks (1.0 for #1, 0.9 for #2 etc.), compile the # of people in all top 8's and the % that are running those deck types. Essentially, weight each deck for placement, and then measure what % of each top 8 the deck combined. Over a period of time, this should work.
Edit: This is still an impressive and ambitious piece of work any critiques aside, and I would make the comment that judging what % of decks in a top 8 is also a decent method of determining quality, as the decks that climb to the top 8 of any tournament should probably have competent pilots and strong, intelligent construction while determining presence. Kudos to Warden and everyone else who has worked on this project.
Before starting... dammit, there's a mosquito in my dorm room. I hate those things.
Anyway, I'll just quickly explain what a pareto distribution is, because a lot more people than I suspected don't know what a Pareto Distribution is (including the moderator who gave me a warning, lmao).
80-20 Pareto distribution is the standard wealth/economics/everything distribution. I'd say it's almost as important as the normal distribution, only it's applicable to non-random events. It was named when some guy (apparently surname "Pareto") realized that 80% of the land was owned by 20% of the people (apparently Italians). Now it seems obvious, because we've studied this family of distributions.
Anyway, it's a very widely studied set of distributions, and you can actually get a lot of data beyond just the 80-20 (for instance, 1% of people use 44% of the bandwidth).
Interestingly enough, the formula given by the OP fits Pareto quite closely (not just at the 80-20 mark, but it does hit that critical mark on the head).
Each tournament gives out a Pareto distribution of points. The winner gets 18.5x the points as the 8th place finisher? The 5th place finisher gets 1.7x the points even though they effectively did the same? These things fundamentally don't make sense. Instead of 0-1, the winner went 3-0, but the point giveout invalidates the hard work it takes to get to the top 8 by making play within the top 8 so important.
By comparison, the 8th place finisher gets 18.5x more points than the... 138th place finisher. I think the difference between T8 and 138th place is a lot more than the difference between 8th and 1st, but ok.
Anyway, we can use lessons learned about Pareto distributions to analyze the formula, since the equation fits the standard Pareto so precisely. It's basically a massively top-heavy, massively right-skew distribution. The mean is found around the 85-90th percentile.
Forbiddian,
I agree that the formula currently being used is complete garbage. I tried suggesting a formula that actually referenced itself off of predetermined facts/rules/guidelines/results, yet was still completely dumbed down to the point that a non high school graduate might understand it. The fact remains that many people who post just don't know much about mathmatics, and to go back to your openning statement...I'm insulted that you didn't find anything relevant said in my posts.
He sent me back to read the whole thread ostensibly to change my opinion.
I thought your posts were worth reading, but I felt like he wasted my time by forcing me to sift through a lot of the thread when I'd obviously come to the same conclusion. It's not like anybody was going to refute modern understanding of mathematics between pages 2 and 3. It was a futile effort.
Even though you had the best posts of the thread, your posts were in a weird way the biggest waste of time because you have your head on straight and I agreed with what you said. Reading through the thread was supposed to change my opinion, but it just made the OP seem about 10x more stubborn (especially given some of the interactions that transpired).
While what you mentioned is correct, the fact remains that often the only information that is provided about a tourney is the Top 8. We don't know each deck that entered, nor the player's skill ratings of the decks that were piloted. Tourney information is limited to what is reported unfortunately, so that guy that went 0-2 drop isn't going to make the formula 95% of the time.
I understand that part of the problem. Recording only census tournaments or only Top 8 play is one solution, but you sacrifice a lot of data. I think a general ELO solution would be good. The formula should reflect not only deck popularity but also deck effectiveness. If a deck is half the metagame and makes up 3 slots of the Top 8, even though it did below expectations, it's still likely a "proven competitive" deck based on its popularity alone. The formula chosen should reward top 8 appearances, but also take into account as much known data as possible. Probably the best way is to report known census records and then report sample data as such.
It is possible to make an rating system for each deck, while allowing player rating to factor in as well, but again we as players do not have access to all the information, and even if we did, there will always be determining factors that can't be calculated in hard values. Example of one would be, did the player have a headache or a cold that day; was he hung over or still drunk; was he high or depressed about life? Those effect the outcome of games and matches, but are not reported. In a perfect world, we could gather all the information, but this is not a perfect world.
Huh? It is true that some decks systematically attract stronger players (like Nassif's NLU deck is getting a lot of play among good players from its flexibility, even if it might not be the "best" deck) that cause it to perform better that the deck might if it got random pilots or bad pilots like e.g. Goblins. I don't think that's really important, though, we're not trying to figure out what the best deck is, we're trying to look at which decks are played a lot *and* are good.
However, in the end this is a formula that has two purposes:
It is for people who aren't smart enough to look at a format and determine which decks are performing well.
It is a formula people will twist so that their pet deck is "Proven."
So, the formula as it is now reflects that.
I reject your premise and your conclusion. The formula does not have to be one that idiots have to understand. It should be a tool that everyone can use once created.
I have no idea how my computer works. I'm sure it's ****ing confusing. I'm also sure that the designers didn't think, "Hey, what if some ignoramus named Forbiddian without a computer science degree and only rudimentary knowledge of quantum mechanics and electromagnitism wants to use this computer? Let's make it much ****tier so that he can enjoy it as well!"
No! I want my computer to be incredibly powerful and esoteric. I want it to be a 21st century obelisk to mankind's ingenuity that I can't understand but can use for purposes practical to me. Plus, you can't get porno as fast on vacuum tubes.
At Dracover: The input is a binomial distribution. That's what you want to have for the output. The OP formula output is incredibly top-heavy, and especially taking the mean of that is useless.
You've been warned before, so it's an infraction now. For someone who displays high aptitude for math, I'm disappointed that you can't argue in a rational, logical manner without resorting to flaming.
Even though you had the best posts of the thread, your posts were in a weird way the biggest waste of time because you have your head on straight and I agreed with what you said. Reading through the thread was supposed to change my opinion, but it just made the OP seem about 10x more stubborn (especially given some of the interactions that transpired).
LOL, I guess that's a compliment, but your observation of the OP seems to be a little harsh. You have to forgive those that know not what they do.
Huh? It is true that some decks systematically attract stronger players...that cause it to perform better that the deck might if it got random pilots or bad pilots like e.g. Goblins.
Absolutely, I've always weighed the option of playing a deck that is adaptive vs overpowering. Rock<Paper<Scissors <<< Swiss Army Knife
I don't think that's really important, though, we're not trying to figure out what the best deck is, we're trying to look at which decks are played a lot *and* are good.
See that is why we have to decide what we want to calculate before we just throw a formula together. Do we want to value decks that are played a lot or do we want to value finishes by any decks.
We have to come to terms that our information will be limited to Top 8 reports 90% of the time, and because that is the case, we need to create an equation that calculates as such.
I think we need to agree first on what we want to factor into our formula.
I think the three I originally suggested should be used, but we can also calculate in appearances over last 50 Tourney reports. We can also allow for additional information to be factored in if we have it, and if not to zero out that part of the equation.
Disclaimer:
I am fully capeable of working on this with anyone, but if we have people who refuse to use it, then there is no point in wasting our time. I am smart enough to look at the format and make well informed choices about what I want to play, and how to make choices that reflect expected Metas, and I would expect the same from almost anyone else able to create a formula of this caliber.
Private Mod Note
():
Rollback Post to RevisionRollBack
In Vintage (Type 1) > Budget Deck Discussion forum:
The Pareto distribution in simple terms wants to work out
P(X>x)
This can be done in several ways. The two that comes to mind immediately is 1. Have a fixed x and say what is the probability of this deck coming lower than the value x (say x=8) so you data needs to say either above 8 or below 8.
2. Have a varying x i.e P(X>Y) where Y is the "standard" you want to have. e.g. If I were to do this myself I would have said I want the probability that a deck will perform above being random i.e. the probability that you would do better than if you assume winners are determined by the flip of a coin. However to do this you would need basically all ranking and decks used.
The problem with it in general is that you need data on the whole tourney not just top 8. I dont know about the practicalities of getting it (from some posts its seems that it may be impossible)
The issue with the OPs formula as Forbiddian said is that it top bias. E.g. if a deck comes first in a tourney of 64 twice and then misses top 8 across eight other tourneys they're score is 5.12. Now if u came 8th in all 10 tourneys you would score 3.46. Now the question is which deck should get the higher score? The one that consistenly breaks itno top 8 or the one that sure when they break into top 8 they win but otherwise have a hard time getting into top 8.
See that is why we have to decide what we want to calculate before we just throw a formula together. Do we want to value decks that are played a lot or do we want to value finishes by any decks.
We have to come to terms that our information will be limited to Top 8 reports 90% of the time, and because that is the case, we need to create an equation that calculates as such.
I think we need to agree first on what we want to factor into our formula.
I think the three I originally suggested should be used, but we can also calculate in appearances over last 50 Tourney reports. We can also allow for additional information to be factored in if we have it, and if not to zero out that part of the equation.
Disclaimer:
I am fully capeable of working on this with anyone, but if we have people who refuse to use it, then there is no point in wasting our time. I am smart enough to look at the format and make well informed choices about what I want to play, and how to make choices that reflect expected Metas, and I would expect the same from almost anyone else able to create a formula of this caliber.
Wow buncha' stuff happened while I did things on Earth (aka real life world)
Yes, the formula is definitely not what it should be, but I think getting the ball rolling was my bigger agenda more than anything. Complete_Jank, sorry I wasn't seeing eye to eye beforehand...everything you just said that I quoted is like spot-on golden.
----
Imo, I wholeheartedly believe MTG tourneys are random in terms of overall output. You can argue that certain players have better odds and deck vs deck has better % of wins/loss...in the end either player A or player B wins the round. And in the end, if the entire alphabet played in a tourney, you cannot put odds on a winner. Too much is random.
I consider the following essential if we are to build a cohesive, multi-party-planned formula:
Deck Result/Place/Finish = overall performance
# of competitors = gives weight to *above*
From there, this is where ideologies clash. I understand luck/randomness is a factor, however, I cannot pinpoint where it occurs (IMO, it occurs so often, you cannot track it). A component of an overall deck score should not include randomness/luck because it cannot fundamentally be attributed to a precise location (ie; a misplay, quality topdeck, opponent conceeds, the noob beats the pro, etc). I am interested in knowing more about Pareto -- If somebody can further explain it, that'd be appreciated. Reading the 80-20 thing I sort of understood but I'd like to make a full assessment after I understand how it all works out.
Another component is what to favor. Dracover, your above post elegantly describes a fundamental issue...what do you decide is better? Winning it all + complete meltdowns or consistency at a "somewhat decent" output?
Dracover, I'm glad you also see the statistical "handicap" you have to work with....unless you personally run tournaments and have every player's decklist, you cannot know what happened overall. Top 8's are given ~99% of the time (large Chicago-GP style things can dip into T16 or T32 if you're lucky).
If the Pareto wants to establish (as its goal): P(X>x), I would agree with this. The concept of a "standard" to reference scores off of (rather than sheer relativity) is a great idea.
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
I propose we try and compute the commonly played decks that perform well, and we can use the following as means to calculate our formula:
Length of Tourney/Number of Rounds in reference to 5 rounds (Used to set accepted size of tourney)
Number of Players in reference to Players required for cutoff for rounds played (Used to devalue finishes in tourneys that are easier to make top 8, and boost value of decks that finish in top 8 in more difficult tourneys)
Place finished in reference to 8th place (Used to establish an accepted cutoff for better decks.)
Number of times the deck has made Top 8 finishes in the last 25 tourneys (Used to calculate the popularity & consistant performance of a deck.)
Difficulty of the majority of the Field in reference to known MU%* (Used to further expaand on a decks finish)
Difficulty of match ups in the Top 8 in reference to known MU%** (Used to further expaand on a decks finish)
* - A value would be put into the formula as either 1, 0, or -1 depending on if there was a report on what the majority of the field was like, and only used if MU% are know for the deck being calculated. New decks would receive a value of "0" Decks with an advantage would receive "-1" and decks with bad match ups would receive a "1"
** - A value would be put into the formula as either 1, 0, or -1 depending on if in the top 8 MU% are know for the deck being calculated. New decks would receive a value of "0" Decks with an advantage would receive "-1" and decks with bad match ups would receive a "1"
Private Mod Note
():
Rollback Post to RevisionRollBack
In Vintage (Type 1) > Budget Deck Discussion forum:
I propose we try and compute the commonly played decks that perform well, and we can use the following as means to calculate our formula:
Length of Tourney/Number of Rounds in reference to 5 rounds (Used to set accepted size of tourney)
Number of Players in reference to Players required for cutoff for rounds played (Used to devalue finishes in tourneys that are easier to make top 8, and boost value of decks that finish in top 8 in more difficult tourneys)
Place finished in reference to 8th place (Used to establish an accepted cutoff for better decks.)
Number of times the deck has made Top 8 finishes in the last 25 tourneys (Used to calculate the popularity & consistant performance of a deck.)
Difficulty of the majority of the Field in reference to known MU%* (Used to further expaand on a decks finish)
Difficulty of match ups in the Top 8 in reference to known MU%** (Used to further expaand on a decks finish)
@ 5 and 6 --> are you saying difficulty as in "win %" or "odds" of DECK vs its matchup? If so, this brings about several key issues...above all, the belief that there actually is a MU%. I feel any deck can beat any other deck 100% of the time (in other words, every matchup is either a win or a loss with no other strings attached). To attribute a value or favoring of a deck vs what it faced to me seems backwards, even if there is an accepted/generalized "underdog" or not. In the end, the match can be won by either side....there is nothing like "60% of the time, deck A beats deck B."
Correct me if I misinterpretted 5 and 6.
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
I have to say point 5 and 6 are impossible to measure without severely complex formulae and lots data i wouldnt go there.
I'm a bit of a arcade person so I play a bit of street fighter, tekken etc and one method of measuring the characters being used is by looking at the % that they win against any other character. so an X by X matrix with win percentages and rating character based on the total summed across.
of course this is a bit more difficult to do in magic becuase of the varieties of decks but you can limit the number of decks by e.g. only if its made X appearances in certain tourneys in the last year or month etc.
Benefits I see:
1. this will show the decks actual abilities and matchups
2. eliminates good performances from luck as well as small tourneys etc
3. eliminates bias due to popularity
4. can be used to work out meta. i.e. if your meta is heavy control what decks will give you an edge even if it's not the best.
5. overall I think it'll show on paper what deck should win more games
problems i see:
1. can you get data on the matches played?
2. ignores the meta. this will only show if you play all the decks the same number of times you'll win more overall. but what about if when there is an abundance of particular decks this system doesnt take into account this though can be used to derive an aswer (ref to above)
I have to say point 5 and 6 are impossible to measure without severely complex formulae and lots data i wouldnt go there.
You do realize you have a few people offering their services to assist in creating a complex formula.
Warden, it is to be determined that the MU% would be known from proven testing. If unknown, then the value entered is "0" and it eliminates that part of the formula.
Private Mod Note
():
Rollback Post to RevisionRollBack
In Vintage (Type 1) > Budget Deck Discussion forum:
I'm a bit of a arcade person so I play a bit of street fighter, tekken etc and one method of measuring the characters being used is by looking at the % that they win against any other character. so an X by X matrix with win percentages and rating character based on the total summed across.
I understand the general logic, aka creating a comparison -- but at the end of the day, both decks have the same % of winning.
Quote from Complete_Jank »
Warden, it is to be determined that the MU% would be known from proven testing. If unknown, then the value entered is "0" and it eliminates that part of the formula.
In regards to adding another calculation/clause/section of a larger formula, I am in favor of always putting "0" --- or removing this chunk.
----
Unlike a video game, there realistically is no deck % vs deck % to compute. I understand certain decks can be "favored" (however you want to look at it), but in the real world, a player can pull a win out of nowhere. Even a strong player can be defeated by a rookie simply due to luck and/or misplays, etc.
I see the overall development of wanting to establish "deck odds" and I completely understand that side of the argument/desire for wanting to build it. However, I think that odds in-and-of themselves hold no bearing on an overall outcome. I think that's the best way of expressing things...MU% and odds have no correlation to an overall outcome.
A deck can be extremely favored (think of a heavy lop-sided effect) and lose. There is no statistical reason why it lost...it just happened. In this light, real-world effects and the "human factor" come into play (aka luck).
Quote from Korsakow »
I think it does no longer serve you, but your serve it. (If you know what I mean, can't express it more precise in english)...Imo it will be the biggest problem to make a difference between the Decks. What is Baseruption, what is NLU, what is Trash, should there be different slots for the splashes in Trash, Landstill and so on.
Korsakow, I had no problem getting the message, I think you bring in nice ideas. However, this all comes back to a "how do you get points" -- kind of a thing; which brings into play "what formula do you use" debate + discussion.
Decks earn merit based on something and there is a scale to calculate the difference between 1st (undoubtedly the best) and 8th/16th (depending on how far results are posted...this tends to be T8).
Regarding archetype breakdown, I believe the Thresh decks are divided by CB/Top and non-CB/Top; Landstill has no precise breakdown (I tried to give some run-down of color splits); Baseruption is interesting because it's a grey-area IMO and once some factor of calculations is agreed upon, division of archetype specifics can come out again.
The general consensus (agreed upon by memebers other than myself in discussion), is that there should be a separation of large decktypes to a point. For example, Thresh can be broken into a trillion color/gameplan/styles but this would be cumbersome. CB/Top vs non-CB/Top was a compromised agreement because created a significant separation without splintering the lists too much.
Private Mod Note
():
Rollback Post to RevisionRollBack
That which nourishes me, destroys me
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I mean, hell, we're all on a forum for something that most people would describe as a "children's card game"...do what makes you happy. You are never too old to enjoy yourself.
This is big, and even applies to fighting games, when a character is populair (god tier) and nearly everyone plays him, a character that is strong against that character is automatically high tier, even when he has bad matchups otherwise. Meta decks are such an important of tournament Magic.
I agree with the point it's a factor but I dont think we should try to measure it anyways. Taking a fighting game analogy again, every area will have a different character at the top because of a variety of things including the meta but also player skill etc. How do you measure something thats so different from one area to another. Every shop will have different decks being used so how do u measure the impact of meta from a tourney with one ddeck different? In my opinion you should measure the quality of the deck on paper and then it's a matter for any individual to work out depending on their meta what deck to chose. e.g. in my above example using a generic rating you would say that deck 2 can only be beaten by deck 1 and that if your area is deck 2 heavy you need to run deck one. However with comparisons deck 3 can also be used against deck 2 and someone can then say he/she is a lot more comfortable with deck 3 so they'll take that into a deck 2 heavy meta instead.
yea, i reviewed it beforehand
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
Complete_Jank, it looks awfully ambitious. I have to question a few of the additions you are putting in, although I can see where you are getting ideas from.
First, I do not think that "difficulty" (via decay or an added bonus for 'luck') does anything for a rating. In all honesty, there is no way to judge how one deck will perform vs another. Luck would say that a deck, in essence, just did better than another one on a particular day. Placement aka finish describes this; imo, there is no need for an additional abstract value to add to the total value of a deck for 'luck'. It also questions the notion of "regular" and "lucky" type status -- by what I mean is that your added bit for "luck" would say that a deck could have been "luckier" one day than the other -- and if this is true, there is some standard rate for luck. I disagree with this concept as a whole because I do not think you can value luck. Things are purely relative...kinda like baseball. If somebody has a higher average than a few other people, is there a "standard average" to which all batting averages are compared to? Luck is always involved in every tourney outcome, but I disagree on a need to break it apart.
Additionally, to really assess luck would be difficult. You'd want to calculate misplays for every scenario, probably want to attribute what deck faced what, who had a solid topdeck, etc -- to really get an idea of "luck"
Tourney Length I agree with. I think finding that information is incredibly hard however -- but I would agree on its conclusion "in a perfect world" lol. The only issue is the limited info organizers post. I like the R vs "standardized amounts of rounds" -- however, i'd have it adjustable to some kind of scale (small, medium, large, GP-sized) -- not just vs 5 rounds.
Top8/Top4 idea
I'm open to seeing this implemented, however, I do not like it in the current form -- this must be reworked somehow. At the moment I think sheer placement works fine, but if this could be improved to provide some real differential (i see it adds some sort of "weight" to things), I'd consider adding it.
Cutoff
Not every tourney runs multiple days. In fact, only a few do. Even a large scale legacy tourney (100+ people) will be worked out in its entirety in 1 day. A tourney with 30 people doesn't need a cutoff. Same goes for many other sizes -- I'm a bit confused by it (unless its meant for only a GP or something). Legacy simply doesnt get 2 day events as often as other formats.
Decay
This is by far the hardest thing to do without having things automated + I disagree with the concept. Have values and scores for X time and cut them out after a certain time (like an expiration date). No decay or anything is needed. Plus just keeping up with a simple formula + having a life is tough enough lol. Decay = OMG. Having scores simply expire is a more rationale decision. This also does not hurt downtimes in legacy. I'm talking about "dead months" -- summer will be more active than fall/winter. Stuff like that.
-I'd love to hear feedback. I like some of the ideas, I didn't intend to come off nasty in any of my responses.
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
The formula is taking into account many things. Math is like English in many ways, sometimes you have to write a very long sentence to get what you want, sometimes it is very short. However, the more detailed you are the more exact the sentence will explain to others.
You can remove luck from what I said above, that whole formula is not about calulating out the luck factor.
Calculating (R/5) means that any tournament less than 5 rounds will have less value, and longer tournaments will gain more value. (R/5) alows the formula to adjust itself for the size.
By doing this you value finishes of decks that had to play out the top 8 verse when the tournament just plays an extra round of swiss to decide top 8.
I know that there is about 1 tournament every 12-18 months that goes two days. The point of this is for the determining how easy it was to make top 8, but it can be used for cuts to day two as well.
If you are in an event with 33 people, it is much easier to make top 8 then if you play an event with 64 people, even though you would play the same amount of rounds before cutting to top 8.
This can be decided on after a formula is worked out.
Nothing says budget help like receiving $5000 in recommendations.
I guess leaving out Time Walk, Timetwister, and Ancestral Recall is budget.
The formula looks really poor/arbitrary. I'm sure a tiny bit of effort went into making it, but it doesn't show very well. For instance: Why 1.5 * the difference? That's just a random number. Why not at least a cool number like 2.7182818? That value doesn't seem to make sense.
But the real thing you're trying to measure is the probability of a given deck winning. It would be straightforward to sculpt an algorithm that corresponds to this, given the similarities of swiss round pairings and then transmuting that into an ELO rating.
Anyway, since I have some statistics/programming experience, I can do that sometime, hopefully done by this weekend.
I'd recommend reading the thread
And no, the goal is not looking to assess winning probability, it's assessing results -- at least that's my intention. Every deck has the same probability of winning. I was looking at computing some sort of score to a finish and then averaging all finishes of a deck.
*haven't worked on this in a while because I am incredibly busy with real-world stuff at the moment (missed all of march).
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I do realize from your posting that you're very defensive of your formula. I hope you don't take this too personally, I'm not intending to insult you, but the formula is quite ridiculous and I have no idea where you got it from. It appears to have no statistical knowledge or anything behind it. It looks like you just pulled it out of your ass and you sortof thought the numbers sortof worked.
Please do the following: 1) Read the rest of my post. 2) Look up what a Pareto Distribution is (I assume that you might not know, due to the fact that you presented this formula). If you would still like to defend your formula, please provide a list of the tests you ran against your formula and a justification for why you think taking the mean of a sample out of a Pareto distribution is adequate for measuring anything.
(I could pretty easily have an ELO rating system for each deck that would measure how often the deck plays as well as how often it wins, similar to real life players. If you're pretty good and don't play much, you'll have a lower rating than someone who's just as good who plays a ton (assuming you don't play enough for your score to reach equilibrium). It's not perfect, but it pretty accurately would measure how likely you are to see a deck in a high slot in a tournament. Decks that are played the most and win the most have very good odds, but decks that are extremely rare or lose a lot have low odds.
The formula sorta falls apart for sub-500 decks, because decks that aren't played at all would be rated higher than bad decks that are played a lot. I could add some provisional clause, or just hope nobody is worried about a sub-500 deck, even if a lot of people play it, it's simply not good enough to be Proven Competitive.
I think that's pretty much what people are looking for in decks. They want to see which decks win a lot as well as which decks are popular in the same reading (perhaps reported separately and together).)
On the other hand, your rating system suffers from, "Oh look, one guy played landstill one time. Landstill now has a massive rating." Or even just: "Oh look, one random deck-that-always-loses won a medium-sized tournament one time. Now it's a Proven Competitive Deck."
The bottom line: You designed a Pareto distribution. Almost all the points are given out to the winners. In order to be "proven competitive" you can't simply perform well and consistently. You have to win a tournament. This is exactly what a formula is supposed to prevent, but a pareto distribution welcomes this heavy-handedness with open arms.
Here are a few examples from the vault of 2 minutes of toying around with Excel:
Let's say a deck places completely RANDOMLY and plays in an infinite number of 100 person tournaments. After a while, its rating would be 2.866. You'd win some, but most people would be really unhappy to play such a crappy deck as you go 0-X as often as you win.
Let's say you have another deck that places 15th every single tournament. For a 100 person tournament, that's pretty solid. That would be ~top 8 in a 50 person tournament. Most people would be very happy with that finish. Your rating is only 2.777.
Let's say you always place in the top half (50th or better), but never win or place second. So every 48 tournaments, you place third, etc. Your rating is even less: 2.714.
I think everyone would consider a deck that always gets in the top half but doesn't grab first or second to be superior to a deck that half the time places in the bottom half but wins 1/100 times. It's winning considerably more than half its games, probably on the order of 60 or 70% of its matches, but just hasn't won the big cheese. The other deck wins about 50% of its matches, and goes 0-7 just as often as it goes 7-0. I wouldn't say that a deck that wins 50% of the time is proven competitive.
The following example is even more clear:
A deck that wins 1/100 tournaments (again in the 100 person tournament), but goes 0-X EVERY other time to grab the nut low slot 99 tournaments in 100 would still get equivalent point payout of a deck that placed in the top 30% every single time (1.39 points).
It doesn't take my silly counterexamples of pareto distribution follies to show that the formula is crappy. Do you really want to grade on a pareto distribution, especially doing something as ridiculous as "take the mean of a small sample on the pareto distribution?"
That's like if someone asks you what the average income is in the US and you report the mean. It's a completely meaningless number. Might as well report the mode income: 0 (or whatever the welfare pays out these days). Actually, it's not even a simile. That's exactly what you're doing... and not even in a metaphorical sense... you're doing EXACTLY that... literally!
The number that your equation gets is virtually meaningless. The meaned average of the semi-meaningless numbers is totally meaningless.
I don't understand how you could possibly defend your formula. The other guy's was pretty bad, but at least the value it reported had a tangible meaning. Again, you shouldn't take this personally. As a rational human, you should be able to throw your formula out. But even if you do, I'm genuinely curious: What were you thinking? It really looks like you just pulled that out of your ass and went, "Hmm, this doesn't work... ok, 1.5* the difference... Hmm, this would be really inflated for decks that are played a lot... let's just take the mean." At that point, I'd go, "Oh, wait, I'm just taking the mean of a pareto distribution. Time to remember back to elementary school mathematics. Oh yeah, that's really silly because the distribution is skewed more than Fox News. I need to do something other than scramble my data irretrievably"
Actually, before then, I'd go "Wait, I'm just making stuff up now with guess and check. What am I really trying to measure? etc. etc."
And to refute your other argument: What your formula lacks in quality, it doesn't make up for in simplicity. Garbage in, garbage out. Except it's more like: Useful data in, garbage out. The numbers are meaningless. You might as well use this formula: =100*Rand(). It'd be much easier because you don't even have to look at the data and you end up with the same garbage. THAT formula would make up for quality with simplicity.
There's a million ways to present an argument, but starting one with "I hope you don't take this too personally, I'm not intending to insult you" is a sure sign of what's to follow, and you did follow it with a pretty much insulting post. If you want to put your argument forward, feel free, but do it in such a way that doesn't intimidate other users.
Horrible way to get people to read your post.
Forbiddian,
I agree that the formula currently being used is complete garbage. I tried suggesting a formula that actually referenced itself off of predetermined facts/rules/guidelines/results, yet was still completely dumbed down to the point that a non high school graduate might understand it. The fact remains that many people who post just don't know much about mathmatics, and to go back to your openning statement...I'm insulted that you didn't find anything relevant said in my posts.
While what you mentioned is correct, the fact remains that often the only information that is provided about a tourney is the Top 8. We don't know each deck that entered, nor the player's skill ratings of the decks that were piloted. Tourney information is limited to what is reported unfortunately, so that guy that went 0-2 drop isn't going to make the formula 95% of the time.
It is possible to make an rating system for each deck, while allowing player rating to factor in as well, but again we as players do not have access to all the information, and even if we did, there will always be determining factors that can't be calculated in hard values. Example of one would be, did the player have a headache or a cold that day; was he hung over or still drunk; was he high or depressed about life? Those effect the outcome of games and matches, but are not reported. In a perfect world, we could gather all the information, but this is not a perfect world.
This is a problem because not every deck played is known, as I stated above.
You have to realize that is the most we can hope for right now. A formula that actually calculates and compares true facts, and that was what I was trying to get them to accept.
However, in the end this is a formula that has two purposes:
Higher Math will always intimidate people so I don't think that is an option here.
Nothing says budget help like receiving $5000 in recommendations.
I guess leaving out Time Walk, Timetwister, and Ancestral Recall is budget.
Though the first thing that came to my mind was using binominal rather than pareto though essentially the same thing.
The question to answer in my opinion is "does the placing in top 8 matter?" clearly the original formula is at least attempting to say that it is and from the looks of the formula quite a big difference if you came 8th or 1st. Pareto would only work if you assume that i.e. you believe it only matters whether or not a deck comes in the top 8. I personally agree with that view but there needs to be an answer before you can really do any analysis of the actual formula so that you start off with the same base.
Also determining how important various factors are needs to be done first. Theres no point designing a formula when you basic assumptions are not solid.
This first problem is largely unimportant and will be eliminated by adding more samples to the pool. It's an outlier, so what? What are the odds of 64 landstill decks showing up to tourney? If all decks in all tournaments suddenly become landstill, then there's probably a reason for it and it should be measured and landstill should be called the best deck until they invariably ban whatever is giving landstill the nuts against every other deck. It's like Flash based decks. They *were* the best decks for a brief period of time before flash got the axe.
For the second, quite honestly, the percentage of all top 8's that each represented deck takes combined with a weighting system based on tournament attendance is probably all this formula really needs to work to determine some combination of what is best/what is most popular. It'll be skewed, admittedly, because of some metas being just plain underdeveloped, but it should even out over enough samples.
My own personal take would be: type of decks in top 8, rankings of said decks (1.0 for #1, 0.9 for #2 etc.), compile the # of people in all top 8's and the % that are running those deck types. Essentially, weight each deck for placement, and then measure what % of each top 8 the deck combined. Over a period of time, this should work.
Edit: This is still an impressive and ambitious piece of work any critiques aside, and I would make the comment that judging what % of decks in a top 8 is also a decent method of determining quality, as the decks that climb to the top 8 of any tournament should probably have competent pilots and strong, intelligent construction while determining presence. Kudos to Warden and everyone else who has worked on this project.
Anyway, I'll just quickly explain what a pareto distribution is, because a lot more people than I suspected don't know what a Pareto Distribution is (including the moderator who gave me a warning, lmao).
80-20 Pareto distribution is the standard wealth/economics/everything distribution. I'd say it's almost as important as the normal distribution, only it's applicable to non-random events. It was named when some guy (apparently surname "Pareto") realized that 80% of the land was owned by 20% of the people (apparently Italians). Now it seems obvious, because we've studied this family of distributions.
Anyway, it's a very widely studied set of distributions, and you can actually get a lot of data beyond just the 80-20 (for instance, 1% of people use 44% of the bandwidth).
Interestingly enough, the formula given by the OP fits Pareto quite closely (not just at the 80-20 mark, but it does hit that critical mark on the head).
Each tournament gives out a Pareto distribution of points. The winner gets 18.5x the points as the 8th place finisher? The 5th place finisher gets 1.7x the points even though they effectively did the same? These things fundamentally don't make sense. Instead of 0-1, the winner went 3-0, but the point giveout invalidates the hard work it takes to get to the top 8 by making play within the top 8 so important.
By comparison, the 8th place finisher gets 18.5x more points than the... 138th place finisher. I think the difference between T8 and 138th place is a lot more than the difference between 8th and 1st, but ok.
Anyway, we can use lessons learned about Pareto distributions to analyze the formula, since the equation fits the standard Pareto so precisely. It's basically a massively top-heavy, massively right-skew distribution. The mean is found around the 85-90th percentile.
He sent me back to read the whole thread ostensibly to change my opinion.
I thought your posts were worth reading, but I felt like he wasted my time by forcing me to sift through a lot of the thread when I'd obviously come to the same conclusion. It's not like anybody was going to refute modern understanding of mathematics between pages 2 and 3. It was a futile effort.
Even though you had the best posts of the thread, your posts were in a weird way the biggest waste of time because you have your head on straight and I agreed with what you said. Reading through the thread was supposed to change my opinion, but it just made the OP seem about 10x more stubborn (especially given some of the interactions that transpired).
I understand that part of the problem. Recording only census tournaments or only Top 8 play is one solution, but you sacrifice a lot of data. I think a general ELO solution would be good. The formula should reflect not only deck popularity but also deck effectiveness. If a deck is half the metagame and makes up 3 slots of the Top 8, even though it did below expectations, it's still likely a "proven competitive" deck based on its popularity alone. The formula chosen should reward top 8 appearances, but also take into account as much known data as possible. Probably the best way is to report known census records and then report sample data as such.
Huh? It is true that some decks systematically attract stronger players (like Nassif's NLU deck is getting a lot of play among good players from its flexibility, even if it might not be the "best" deck) that cause it to perform better that the deck might if it got random pilots or bad pilots like e.g. Goblins. I don't think that's really important, though, we're not trying to figure out what the best deck is, we're trying to look at which decks are played a lot *and* are good.
I reject your premise and your conclusion. The formula does not have to be one that idiots have to understand. It should be a tool that everyone can use once created.
I have no idea how my computer works. I'm sure it's ****ing confusing. I'm also sure that the designers didn't think, "Hey, what if some ignoramus named Forbiddian without a computer science degree and only rudimentary knowledge of quantum mechanics and electromagnitism wants to use this computer? Let's make it much ****tier so that he can enjoy it as well!"
No! I want my computer to be incredibly powerful and esoteric. I want it to be a 21st century obelisk to mankind's ingenuity that I can't understand but can use for purposes practical to me. Plus, you can't get porno as fast on vacuum tubes.
At Dracover: The input is a binomial distribution. That's what you want to have for the output. The OP formula output is incredibly top-heavy, and especially taking the mean of that is useless.
You've been warned before, so it's an infraction now. For someone who displays high aptitude for math, I'm disappointed that you can't argue in a rational, logical manner without resorting to flaming.
-C_c
LOL, I guess that's a compliment, but your observation of the OP seems to be a little harsh. You have to forgive those that know not what they do.
Absolutely, I've always weighed the option of playing a deck that is adaptive vs overpowering. Rock<Paper<Scissors <<< Swiss Army Knife
See that is why we have to decide what we want to calculate before we just throw a formula together. Do we want to value decks that are played a lot or do we want to value finishes by any decks.
We have to come to terms that our information will be limited to Top 8 reports 90% of the time, and because that is the case, we need to create an equation that calculates as such.
I think we need to agree first on what we want to factor into our formula.
I think the three I originally suggested should be used, but we can also calculate in appearances over last 50 Tourney reports. We can also allow for additional information to be factored in if we have it, and if not to zero out that part of the equation.
Disclaimer:
I am fully capeable of working on this with anyone, but if we have people who refuse to use it, then there is no point in wasting our time. I am smart enough to look at the format and make well informed choices about what I want to play, and how to make choices that reflect expected Metas, and I would expect the same from almost anyone else able to create a formula of this caliber.
Nothing says budget help like receiving $5000 in recommendations.
I guess leaving out Time Walk, Timetwister, and Ancestral Recall is budget.
P(X>x)
This can be done in several ways. The two that comes to mind immediately is 1. Have a fixed x and say what is the probability of this deck coming lower than the value x (say x=8) so you data needs to say either above 8 or below 8.
2. Have a varying x i.e P(X>Y) where Y is the "standard" you want to have. e.g. If I were to do this myself I would have said I want the probability that a deck will perform above being random i.e. the probability that you would do better than if you assume winners are determined by the flip of a coin. However to do this you would need basically all ranking and decks used.
The problem with it in general is that you need data on the whole tourney not just top 8. I dont know about the practicalities of getting it (from some posts its seems that it may be impossible)
The issue with the OPs formula as Forbiddian said is that it top bias. E.g. if a deck comes first in a tourney of 64 twice and then misses top 8 across eight other tourneys they're score is 5.12. Now if u came 8th in all 10 tourneys you would score 3.46. Now the question is which deck should get the higher score? The one that consistenly breaks itno top 8 or the one that sure when they break into top 8 they win but otherwise have a hard time getting into top 8.
Wow buncha' stuff happened while I did things on Earth (aka real life world)
Yes, the formula is definitely not what it should be, but I think getting the ball rolling was my bigger agenda more than anything. Complete_Jank, sorry I wasn't seeing eye to eye beforehand...everything you just said that I quoted is like spot-on golden.
----
Imo, I wholeheartedly believe MTG tourneys are random in terms of overall output. You can argue that certain players have better odds and deck vs deck has better % of wins/loss...in the end either player A or player B wins the round. And in the end, if the entire alphabet played in a tourney, you cannot put odds on a winner. Too much is random.
I consider the following essential if we are to build a cohesive, multi-party-planned formula:
Deck Result/Place/Finish = overall performance
# of competitors = gives weight to *above*
From there, this is where ideologies clash. I understand luck/randomness is a factor, however, I cannot pinpoint where it occurs (IMO, it occurs so often, you cannot track it). A component of an overall deck score should not include randomness/luck because it cannot fundamentally be attributed to a precise location (ie; a misplay, quality topdeck, opponent conceeds, the noob beats the pro, etc). I am interested in knowing more about Pareto -- If somebody can further explain it, that'd be appreciated. Reading the 80-20 thing I sort of understood but I'd like to make a full assessment after I understand how it all works out.
Another component is what to favor. Dracover, your above post elegantly describes a fundamental issue...what do you decide is better? Winning it all + complete meltdowns or consistency at a "somewhat decent" output?
Dracover, I'm glad you also see the statistical "handicap" you have to work with....unless you personally run tournaments and have every player's decklist, you cannot know what happened overall. Top 8's are given ~99% of the time (large Chicago-GP style things can dip into T16 or T32 if you're lucky).
If the Pareto wants to establish (as its goal): P(X>x), I would agree with this. The concept of a "standard" to reference scores off of (rather than sheer relativity) is a great idea.
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
** - A value would be put into the formula as either 1, 0, or -1 depending on if in the top 8 MU% are know for the deck being calculated. New decks would receive a value of "0" Decks with an advantage would receive "-1" and decks with bad match ups would receive a "1"
Nothing says budget help like receiving $5000 in recommendations.
I guess leaving out Time Walk, Timetwister, and Ancestral Recall is budget.
Correct me if I misinterpretted 5 and 6.
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
I'm a bit of a arcade person so I play a bit of street fighter, tekken etc and one method of measuring the characters being used is by looking at the % that they win against any other character. so an X by X matrix with win percentages and rating character based on the total summed across.
of course this is a bit more difficult to do in magic becuase of the varieties of decks but you can limit the number of decks by e.g. only if its made X appearances in certain tourneys in the last year or month etc.
Benefits I see:
1. this will show the decks actual abilities and matchups
2. eliminates good performances from luck as well as small tourneys etc
3. eliminates bias due to popularity
4. can be used to work out meta. i.e. if your meta is heavy control what decks will give you an edge even if it's not the best.
5. overall I think it'll show on paper what deck should win more games
problems i see:
1. can you get data on the matches played?
2. ignores the meta. this will only show if you play all the decks the same number of times you'll win more overall. but what about if when there is an abundance of particular decks this system doesnt take into account this though can be used to derive an aswer (ref to above)
You do realize you have a few people offering their services to assist in creating a complex formula.
Warden, it is to be determined that the MU% would be known from proven testing. If unknown, then the value entered is "0" and it eliminates that part of the formula.
Nothing says budget help like receiving $5000 in recommendations.
I guess leaving out Time Walk, Timetwister, and Ancestral Recall is budget.
I understand the general logic, aka creating a comparison -- but at the end of the day, both decks have the same % of winning.
In regards to adding another calculation/clause/section of a larger formula, I am in favor of always putting "0" --- or removing this chunk.
----
Unlike a video game, there realistically is no deck % vs deck % to compute. I understand certain decks can be "favored" (however you want to look at it), but in the real world, a player can pull a win out of nowhere. Even a strong player can be defeated by a rookie simply due to luck and/or misplays, etc.
I see the overall development of wanting to establish "deck odds" and I completely understand that side of the argument/desire for wanting to build it. However, I think that odds in-and-of themselves hold no bearing on an overall outcome. I think that's the best way of expressing things...MU% and odds have no correlation to an overall outcome.
A deck can be extremely favored (think of a heavy lop-sided effect) and lose. There is no statistical reason why it lost...it just happened. In this light, real-world effects and the "human factor" come into play (aka luck).
Korsakow, I had no problem getting the message, I think you bring in nice ideas. However, this all comes back to a "how do you get points" -- kind of a thing; which brings into play "what formula do you use" debate + discussion.
Decks earn merit based on something and there is a scale to calculate the difference between 1st (undoubtedly the best) and 8th/16th (depending on how far results are posted...this tends to be T8).
Regarding archetype breakdown, I believe the Thresh decks are divided by CB/Top and non-CB/Top; Landstill has no precise breakdown (I tried to give some run-down of color splits); Baseruption is interesting because it's a grey-area IMO and once some factor of calculations is agreed upon, division of archetype specifics can come out again.
The general consensus (agreed upon by memebers other than myself in discussion), is that there should be a separation of large decktypes to a point. For example, Thresh can be broken into a trillion color/gameplan/styles but this would be cumbersome. CB/Top vs non-CB/Top was a compromised agreement because created a significant separation without splintering the lists too much.
10th at SCG: Syracuse (2014), GP:NJ Last-Chance Grinder Winner (2014):: Former Legacy Mod
They wont have the same %.
Take a simple example where only 3 decks exists
deck 1 wins against deck 2 60% of the time and deck 3 80% of the time
deck 2 therefore wins against deck 1 40% of the time and say against deck 3 45% of the time
deck 3 wins against deck 1 20% of the time and deck 2 55% of the time
deck 1 = 1.3
deck 2 = 0.85
deck 3 = 0.75
and so overall you'd say deck 1 is better than 3 thats better then 2.
I agree with the point it's a factor but I dont think we should try to measure it anyways. Taking a fighting game analogy again, every area will have a different character at the top because of a variety of things including the meta but also player skill etc. How do you measure something thats so different from one area to another. Every shop will have different decks being used so how do u measure the impact of meta from a tourney with one ddeck different? In my opinion you should measure the quality of the deck on paper and then it's a matter for any individual to work out depending on their meta what deck to chose. e.g. in my above example using a generic rating you would say that deck 2 can only be beaten by deck 1 and that if your area is deck 2 heavy you need to run deck one. However with comparisons deck 3 can also be used against deck 2 and someone can then say he/she is a lot more comfortable with deck 3 so they'll take that into a deck 2 heavy meta instead.