As many of you know, the Xmage Cube Group (come cube online with us at https://discord.gg/4eHADtb) has a 3-0 deck archive (Imgur Archive: https://imgur.com/a/KxQwT / Spreadsheet Breakdown: https://tinyurl.com/yc5rkozt). At the time of writing this, the archives currently has 329 decks! Our latest project has been to track individual card data in 3-0 decks. Currently we are 80%-ish done and I wanted to share the data we’ve uncovered so far.
LINKS
Top Cards (this is the data I had at the time of this post, more updated link below if you want to follow): https://tinyurl.com/yc6bgmtr
- Our meta mainly plays unpowered cubes, so powered cards will be near the bottom of the list (which is fine because we all know they’re great anyways).
- Thanks to Tjornan for writing a program to break down all our data, THUNDERWANG for helping me transcribe the decks, and everyone in the Xmage Cube group who has helped contributed to the archives!
GENERAL NOTES:
- Esper and Orzhov are the two most successful color combinations with 31 and 26 wins respectively, so it’s no surprise that those colors have the most cards.
- Keep in mind that the cards with the highest numbers aren’t necessarily the most powerful cards, but the most versatile cards that can appear in multiple decks.
BLACK
- No surprises in the top black cards. Mind Twist is the one exception to the rule in where it’s both the most represented and most powerful card.
- Recurring Nightmare is one of the few buildarounds that is in the top 8 cards of their respective colors.
- For some reason Grave Titan only appeared 7 times? I might have to go in and manually count this in the future.
RED
- I was expecting Lightning Bolt to be at the top, but the top 4 are all 1 card away from each other. Honestly wasn’t expecting Fiery Confluence to be at the top though.
- Flametongue Kavu being so high both does and doesn’t surprise me. It doesn’t surprise me because it’s one of red’s better splashable cards, but it does surprise me because I know a good amount of people who underrate it.
- Inferno Titan 1 of 2 spells in their respective top 8 at 6-cmc (the highest represented cmcs outside of X spells).
BLUE
- Vendilion Clique is great, but I didn’t expect it to be this high. I always assumed Fact or Fiction would take top honors because it’s blue’s Mind Twist / Lightning Bolt / Swords to Plowshares that can be splashed into pretty much everything. I also expected True-Name Nemesis to be the higher 1UU creature.
- I think Treachery is criminally low at 17, especially compared to Control Magic / Sower of Temptation.
- Forbid at 21 was higher than I expected, but not too surprising because the card is great. Just barely missed the top 8!
GREEN
- I expected Sylvan Library to be at the top, but it’s close enough. Garruk Relentless / Reclamation Sage / Sylvan Library are the more splashable green cards, so it’s no surprise they’re at the top 3.
- Everything else is pretty much mana dorks. Green is a very specific color, but I guess the mana dorks are to green like how generic removal is to other colors (especially since green really doesn’t have removal).
- Survival of the Fittest is also one of the few build arounds to crack a top 8.
WHITE
- Swords to Plowshares being the highest represented colored card isn’t too big of a surprise since it’s the strongest removal spell at that cost. What does surprise me though is how much higher it is than Path to Exile (37 to 20).
- While I expected Elspeth, Knight-Errant to be higher than Gideon, Ally of Zendikar, I didn’t think it’d dwarf it by 10. Elspeth over Gideon was a surprise to some, but not by me All the other top planeswalkers are in the 21-24 range (with Garruk Relentless being the second highest). Some pointed the finger at me for directly influencing Elspeth’s numbers over Gideon, but I don’t think that’s a plausible accusation since I value both of them almost equally. If I rate Elspeth as a 10/10 in my book, then Gideon would be a 9.5-9.75. I don’t think my slight preference of Elspeth can cause a 10 card differential.
- Council’s Judgment is a lot higher than I thought it’d be, especially since Oblivion Ring effects are more splashable.
- If you haven’t heard, Palace Jailer is the white JTMS! I’ve always said that Palace Jailer is the best Nekrataal effect.
- If I did have a direct influence on inflating a single card, it’d be Land Tax. Regardless, that card is dumb!
- White being in both in the highly represented Orzhov / Boros aggro decks makes it not too much of a surprise that Selfless Spirit is pretty up there.
MULTICOLOR
- The top 5 cards being in Esper colors is no surprise because Esper and Orzhov are the two most winning color combinations.
Fractured Identity and Lingering Souls as mutlicolored cards are higher than mono red and green’s top cards.
- While I think Dack Fayden is the strongest multicolored card overall, Orzhov / Esper is stronger than Izzet / Grixis. It’s no surprise that Dack Fayden is the highest muti-colored card that’s not in Esper colors.
- There’s no green in the multicolor top 8
LAND
- For some reason, Wasteland is above Strip Mine by 6 decks.
COLORLESS
- No big surprises here except I didn’t expect Phyrexian Revoker and Walking Ballista to be so high. Ballista is a bit more surprising to me since I do play a lot of Phyrexian Revoker myself and not so much Walking Ballista. I was expecting something a lot more generic over these cards like Mind Stone / Coldsteel Heart /Coalition Relic.
- Wurmcoil Engine is one of the two cards at 6-cmc.
MISC
- Some of the newer cards are quite the sleeping giants and I’d have no doubt in my mind that they’d be in their respective top 8s if they were released last year: 10 Karn, Scion of Urza / 8 Teferi, Hero of Dominaria / 6 Saheeli, the Gifted. Karn being the only cheap colorless planeswalker means it’d probably be a top card overall numbers wise and probably the only planeswalker that could dwarf Elspeth.
While there are correlations to a cards true power level and how many 3-0 decks it shows up in, the correlation is not 100%... also the sample required to get a very accurate picture is obscenely high.
You can see that by comparing Strip mine and wasteland, one card is vastly strictly superior to the other, and it has 15% less top 3-0 decks.
City of brass and mana confluence are functionally identical, yet are almost as far apart as Elspeth and Gideon.
Sorry to be negative, just worried that people will misinterpret these results. Not trying to discourage the exercise as it's a fun project, and as the samples grow, the more interesting it becomes... Only suggest being tepid in drawing any conclusions that Card A is better than Card B because of what this data shows.
I’m the person that’s currently running the analysis on these decklists. If you’re interested in contributing (from a coding/statistics perspective or in contributing your own decklists), let me know! I'm a fiend for data, so if you like keeping track of how decks in your cube do, I'd love to talk. We’re hoping to turn this into a larger project (and even grow our dataset by using submissions from verified cubers) and do some cool analyses, like using the data to write a decent drafting AI.
LucidVisions makes some good points. There are important questions of sample sizes, biases, and statistical strength in any data collection effort. I want to address them now as we move forward with this project. At some point, I’ll do a write up of the setup of the project (including discussions of sample size and bias), how we’re analyzing the data, and what we hope to do moving forward.
While there are correlations to a cards true power level and how many 3-0 decks it shows up in, the correlation is not 100%
While it’s true that this correlation is not 100%, I’d argue it’s much higher than you think. The only reason to draft cards and put them in your decks is because they win games of Magic. Cards do this in different ways (Karn Liberated outright wins you the game, while Volcanic Island improves deck consistency), but the output is the same: a higher winrate. In fact, I would argue that the ONLY reliable metric for a cards power level is how it contributes to win rates. While assaying this contribution by looking at 3 - 0 decklists isn’t perfect, good cards lead to higher chances of 3-0 decklists. It would be nice to have all 8 decklists and their records from every draft, but this isn’t currently feasible.
The sample required to get a very accurate picture is obscenely high.
Undoubtedly true. If you gave this dataset to any person that does data science or machine learning, you’d be laughed out of the room - the dimensionality of the data relative to our sample size is laughable.
But we’re not even looking for a very accurate picture - we’re looking for a general one. As of right now, there aren’t many conclusions that you can draw from this data that experienced cubers don’t already know (Fractured Identity is good, lands are important, blue decks are good), but having a falsifiable basis to make these claims is something the cube community has lacked for some time.
As someone who’s worked on low level data projects before, there’s a surprising amount of conclusions you can get from this dataset. I personally believe that Land Tax isn’t very good, but seeing it so high on the White list makes me question my beliefs. As long as the questions you ask aren’t very complex, this dataset might do better than you think.
You can see that by comparing Strip mine and wasteland, one card is vastly strictly superior to the other, and it has 15% less top 3-0 decks. City of brass and mana confluence are functionally identical, yet are almost as far apart as Elspeth and Gideon.
This gets at an important point, and it’s one that I’m sure will come up often as we work on this project. One point is intrinsic biases in drafting (and therefore, in the dataset). One explanation for why Wasteland does better than Strip Mine is that drafters underrate Wasteland and overrate Strip Mine. Experienced drafters (those more likely to 3-0) end up with Wasteland a higher proportion of the time because of this.
One could also validly argue, as you claim, that it’s simply due to noise. This is especially true for your comparison of Mana Confluence and City of Brass. These high variance points are undoubtedly present in the data, and this will be true of almost any dataset. This leads nicely into the next two points.
I’m just worried that people will misinterpret these results
We’re just presenting the data and noting things that are potentially interesting - how you interpret them is up to you. But I would argue that having some basis to argue for a cards inclusion is better than no basis at all. All too often the cube community relies on the testing results of high profile members to see if a card is good or not, but this doesn’t always work and isn’t verifiable. And many times, people don't cube enough to test themselves. Some experienced cubers swear by Reveillark, other equally experienced members hate it. This project hopes to begin to provide some verifiable, statistical basis for examining a cube card (and the characteristics of winning decks in general).
I suggest being tepid in drawing any conclusions that Card A is better than Card B
Couldn’t agree more. I’m hoping to be able to construct a statistical framework for this (if you’re familiar with statistics - defining null distributions, p-values, etc). But without even doing this I can tell you that the ability to directly compare the performance of two cards will almost always be well outside the reach of this dataset, and that’s important to acknowledge. I would caution anyone that is seeking to compare two cards (that includes Steveman and his claims above)
In summary/ TLDR
This dataset, as it currently stands, is question-generating, not as question-answering. When the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences.
How you choose to interpret it is up to you, but I see this as a developing tool for a community that historically has based card choices entirely on personal experience.
My Cubes - The Busted Cube. A fully functional, almost 100% custom cube. The project started out by asking "What if other colors got cards on the power level of Mana Drain,Ancestral Recall, and Time Walk?" Draft and enjoy!
I agree with most of what you are saying, and glad you are putting in the work! Overall fully encourage the work!
The biggest problem with using this list as an indicator for a cards power level, is that the differences in main deck % between the cards, plays a much larger roll on their relative 3-0 frequency than the relative power between the cards when they do make main decks.
Unless the card is black lotus, time walk etc.. Most indvidual cards don't effect an overall decks win% by a factor of more than 0.1-2%. Variations of main deck %'s differ by dramatically larger numbers between cards with similar power level.
If a card misses a main deck, it's no different than it not being included in the cube period, and that has an enormous effect on the likelyhood it shows up in a 3-0 deck.
That's why it's not surprising to see cards like Mana confluence, artifacts and swords to plowshares at the top.
It's probably too ambitious to track all these cards main deck %, but doing so would enormously increase the accuracy of the project (assuming the goal is to better identify a cards power level). This also gets around the issue of combining data results from cubes that don't have the same card list.
Cards like Tinker, Natural order etc will have many less 3-0's simply due to them being opened later in the draft, or be abandoned to the sideboard if the drafter switches game plans. That's why it's not surprising to see cards like Mana confluence, versatile artifacts and swords to plowshares at the top.
It's probably too ambitious to track all these cards main deck %, but doing so would greatly increase the accuracy of the project (assuming the goal is to better identify a cards power level). This also gets around the issue of combining data results from cubes that don't have the same card list.
In the original post I stated this: "Keep in mind that the cards with the highest numbers aren’t necessarily the most powerful cards, but the most versatile cards that can appear in multiple decks."
If you are looking to get a sense if your data sample is becoming more accurate, the mana elves are probably the best anchor... Since there are 4 that are basically functionally equivalent.
Llanowar, Elvish mystic, Fyndhorn elfs and Arbor elf*.
If you are comparing two cards who's spread in 3-0% is less than the spread between the elves, it's probably not that useful.
Tho, I know you answered that that's not what this is for. Question generating ,not answering etc.
Cards like Tinker, Natural order etc will have many less 3-0's simply due to them being opened later in the draft, or be abandoned to the sideboard if the drafter switches game plans. That's why it's not surprising to see cards like Mana confluence, versatile artifacts and swords to plowshares at the top.
It's probably too ambitious to track all these cards main deck %, but doing so would greatly increase the accuracy of the project (assuming the goal is to better identify a cards power level). This also gets around the issue of combining data results from cubes that don't have the same card list.
In the original post I stated this: "Keep in mind that the cards with the highest numbers aren’t necessarily the most powerful cards, but the most versatile cards that can appear in multiple decks."
Yeah, I read into your initial post a bit, thinking you were talking more about relative power level than you were.
My latest response was in response to Tjornan's response to me "While it’s true that this correlation is not 100%, I’d argue it’s much higher than you think."
Which I disagreed with.
This data set is comparing the total number of 3-0's of cards that have effectively participated in a different number of drafts... Thankfully, the total number of drafts a card participates in IS correlated to how good it is and how versatile it is (main deck%). So this is very useful data to get a general sense of how good a card is.. in an entangled mess of perceived value, versatility and actual effect on win%.
If a card shows up here at the top of the list, there's a very good chance that it's a very good card... which is useful information. Enormous differences in data from expectation, is good grounds to question, why?
Example: I am surprised Clique is as high as it is on this list, given that it costs UU, thus harder to play than some other blue cards. If it maintains it's rank when the data set doubles or tripples, I would take a serious look at what's making clique do so well.
In summary/ TLDR
This dataset, as it currently stands, is question-generating, not as question-answering. When the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences.
How you choose to interpret it is up to you, but I see this as a developing tool for a community that historically has based card choices entirely on personal experience.
I could not agree more with Tjornan's post. This aptly sums up why I'm interested in the data. Obviously it's not enough to conclude card A is better than card B, but it's useful as a catalyst to question our beliefs. If cards that I normally let wheel or often ignore are showing up in relatively high numbers of 3 - 0 decks, then perhaps I may need to reevaluate a few of my drafting choices--or not because someone may be able to explain to me why the high frequency is misleading. The point is to generate questions and discussions and I think this type of data is generally more useful than anecdotal evidence, which may be biased or easily distorted. I would like to encourage everyone to sort or sift through the master spreadsheet to see if you find anything intriguing.
This dataset, as it currently stands, is question-generating, not as question-answering.
Accurate.
When the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences.
How you choose to interpret it is up to you, but I see this as a developing tool for a community that historically has based card choices entirely on personal experience.
Less accurate.
The data is always useful. But when the data present information that is clearly unusable towards a specific goal, we have to try and reconsider what the data might be useful for. As an example, if I want to try to use the data to tell me what the best cards are, and Wasteland finishes significantly higher than Strip Mine, than the data has issues. We can use the data to determine some interesting trends, but the significance of the data gets called into question.
The data provides correlation, not causation.
Importantly, with the sample size, card pool, and playgroup considerations, card/color/theater/archetype preferences of 3-4 strong players in a weaker field can skew the data in a significant way.
When I first tried tracking information for my cube, we were trying to find maindeck % stats and win % stats. We did this for a while, but found inconsistencies in the data that prevented the data from being much more than one additional tool in a vast toolbox of them. And ultimately, the information provided by the players regarding their experiences was a more important part of card evaluation than the data we were mining. So we stopped.
The card that provides amazing value in a losing effort gets ignored. The card that provides poor (or no) value in a winning deck wielded by a skilled player gets credit for winning a draft. Without feedback from players regarding card performance, the data in and of itself doesn't carry much weight at all.
I would pick up the winning final 40 and pan through it, and ask the winner for specific feedback on cards. Like, "wow, I didn't know you were running New Card X in here, and you won the draft with it! How did it perform?!" ...And the answer was "I don't know--I never drew it". That was the end of us mining win % data. Because it simply didn't provide information about specific cards that we could rely on to provide meaningful information.
There is information that can be mined in this data. Popularity trends, drafting tendencies, player performance, etc... but I don't think that card quality determinations can be made from this data. Correlational support is great info to have, but to say that "when the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences" is blatantly false. Because if a player tells me card X was great in a losing effort, and another player tells me card Y was mediocre in a ridiculously good deck" ...that's data I can use.
Thank you guys so much for taking the time to extract all this data. It's fascinating stuff, and there will be good info to extract from it.
Thank you guys so much for taking the time to extract all this data. It's fascinating stuff, and there will be good info to extract from it.
Thanks! It's really interesting to see the data from multiple angles. One of the more useful ways I've found is sorting it through color / cmc where there's a lot of discussion to be found.
Example: White 3-5 CMC Cards (DISCLAIMER: For the sake of brevity, I pulled out some cards with very low showings if they were only briefly in my cube).
- Council's Judgment being that much higher than Oblivion Ring / Banishing Light is a bit of a surprise to me since the latter are much more splashable.
- Gideon of the Trials being that high threw me off a bit, but then again he's a very solid 3-cmc walker that can be played in any white deck.
- Monastery Mentor is very powerful, I just didn't think it was as popular with others as I thought it'd be.
____________________________
4-CMC
Elspeth, Knight-Errant 44
Palace Jailer 38
Gideon, Ally of Zendikar 35
Hero of Bladehold 24
Parallax Wave 23
Ravages of War 21
Restoration Angel 20
Armageddon 17
Wrath of God 12
Day of Judgment 11
Moat 10
____________________________
- The one big stand out to me is Parallax Wave being really high, especially compared to Wrath of God / Day of Judgment. Now that I think about it, Parallax Wave can go into multiple archetypes were *** / DoJ lean heavily towards control.
- Palace Jailer just isn't a top white card, but it's currently the #4 most represented creature in the 3-0 archives.
____________________________
5-CMC
Reveillark 23
Gideon Jura 22
Angel of Invention 18
Archangel Avacyn // Avacyn, the Purifier 12
Cloudgoat Ranger 7
____________________________
Not much to say here, there's no surprise that Reveillark / Gideon Jura / Angel of Invention are the cream of the crop for white 5-drops. Your move, Reveillark haters
If you are comparing two cards who's spread in 3-0% is less than the spread between the elves, it's probably not that useful.
There’s a more principled way of doing this in statistics. You define what’s called a “null distribution” and ask if differences between cards (or absolute %’s) could be explained by this null distribution. This is usually translated into something called a p-value, which is indicative of how significant any given result is. This is something I plan to develop over the course of this project, but you’re right - differences between functionally identical cards serve as a good benchmark.
The card that provides amazing value in a losing effort gets ignored. The card that provides poor (or no) value in a winning deck wielded by a skilled player gets credit for winning a draft. Without feedback from players regarding card performance, the data in and of itself doesn't carry much weight at all.
The nice thing about random variation like this is that it theoretically goes away with sample size. The card that provides amazing value in one losing draft should eventually (absent bias) increase in frequency in winning decklists because it is a good card - the reverse is true for bad cards that sometimes go in good decks. This is true regardless of player skill. If our dataset had 1 million decklists in it, we could easily ask card vs. card comparisons (as long as the cards occupied similar niches in the cube).
Now, will our data set ever approach the requisite sample size to really answer specific card comparison questions? Probably not, but it doesn't mean it theoretically can't. In any case, what I’m more excited to investigate are the deck qualities. How many reanimation spells does the average winning reanimator deck have? How many tinker targets/fodder does the average tinker deck have? The nice thing about these questions is that they dodge some of the build-around-bias that LucidVision mentioned (Tinker and NO having lower win rates because they’re build around, for example), and they’re less susceptible to noise.
If your best player like Orzhov,and your worst one prefers playing other colours, then your data will always be skewed towards white and black.
This is 100% true. Steveman is the one who contributes winning decklists the most (partly because he’s a good player and partly because he hosts the most drafts), and he has a tendency to draft 3-4 color midrange piles. I, on the other hand, draft aggressive decks almost entirely.
This is a source of systematic bias that will not go away as we gather more decklists. One way I’m hoping to fix this (and get more data) is to have decklist submissions from other users in the community. It’s also fairly easy to keep these decklists separate during analysis, so I can return stats specific to anyone’s cube (as well as merge their decklists into the general pool). These lists don’t even have to be just 3-0 lists.
If anyone is interested in contributing decklists to the project, let me know!
My Cubes - The Busted Cube. A fully functional, almost 100% custom cube. The project started out by asking "What if other colors got cards on the power level of Mana Drain,Ancestral Recall, and Time Walk?" Draft and enjoy!
If your best player like Orzhov,and your worst one prefers playing other colours, then your data will always be skewed towards white and black.
Funny you say that because I used to play a lot of Orzhov when we first started playing online together. When I compiled all our old decklists and created the 3-0 archives, Orzhov had 10 wins when every other 2-color combination only had 2 wins at most.
How close would you say the players in your group are? Do you always have the same people playing?
Like all playgroups, there's definitely a big variance in skill / experience. We're a pretty large group of rotating players, plus randoms online. Would be nice to have a cube group full of pro tour hall of famers, but that's not a reality for most people.
LINKS
Top Cards (this is the data I had at the time of this post, more updated link below if you want to follow): https://tinyurl.com/yc6bgmtr
Top Cards (UPDATED INFO): https://tinyurl.com/yav44vqh
Master Spreadsheet: https://tinyurl.com/yb6258m5
DISCLAIMER / SHOUTOUTS
- Our meta mainly plays unpowered cubes, so powered cards will be near the bottom of the list (which is fine because we all know they’re great anyways).
- Roughly 90% of the data comes from my cube (http://www.cubetutor.com/viewcube/89350)
- Thanks to Tjornan for writing a program to break down all our data, THUNDERWANG for helping me transcribe the decks, and everyone in the Xmage Cube group who has helped contributed to the archives!
GENERAL NOTES:
- Esper and Orzhov are the two most successful color combinations with 31 and 26 wins respectively, so it’s no surprise that those colors have the most cards.
- Keep in mind that the cards with the highest numbers aren’t necessarily the most powerful cards, but the most versatile cards that can appear in multiple decks.
BLACK
- No surprises in the top black cards. Mind Twist is the one exception to the rule in where it’s both the most represented and most powerful card.
- Recurring Nightmare is one of the few buildarounds that is in the top 8 cards of their respective colors.
- For some reason Grave Titan only appeared 7 times? I might have to go in and manually count this in the future.
RED
- I was expecting Lightning Bolt to be at the top, but the top 4 are all 1 card away from each other. Honestly wasn’t expecting Fiery Confluence to be at the top though.
- Flametongue Kavu being so high both does and doesn’t surprise me. It doesn’t surprise me because it’s one of red’s better splashable cards, but it does surprise me because I know a good amount of people who underrate it.
- Inferno Titan 1 of 2 spells in their respective top 8 at 6-cmc (the highest represented cmcs outside of X spells).
BLUE
- Vendilion Clique is great, but I didn’t expect it to be this high. I always assumed Fact or Fiction would take top honors because it’s blue’s Mind Twist / Lightning Bolt / Swords to Plowshares that can be splashed into pretty much everything. I also expected True-Name Nemesis to be the higher 1UU creature.
- I think Treachery is criminally low at 17, especially compared to Control Magic / Sower of Temptation.
- Forbid at 21 was higher than I expected, but not too surprising because the card is great. Just barely missed the top 8!
GREEN
- I expected Sylvan Library to be at the top, but it’s close enough. Garruk Relentless / Reclamation Sage / Sylvan Library are the more splashable green cards, so it’s no surprise they’re at the top 3.
- Everything else is pretty much mana dorks. Green is a very specific color, but I guess the mana dorks are to green like how generic removal is to other colors (especially since green really doesn’t have removal).
- Survival of the Fittest is also one of the few build arounds to crack a top 8.
WHITE
- Swords to Plowshares being the highest represented colored card isn’t too big of a surprise since it’s the strongest removal spell at that cost. What does surprise me though is how much higher it is than Path to Exile (37 to 20).
- While I expected Elspeth, Knight-Errant to be higher than Gideon, Ally of Zendikar, I didn’t think it’d dwarf it by 10. Elspeth over Gideon was a surprise to some, but not by me
- Council’s Judgment is a lot higher than I thought it’d be, especially since Oblivion Ring effects are more splashable.
- If you haven’t heard, Palace Jailer is the white JTMS! I’ve always said that Palace Jailer is the best Nekrataal effect.
- If I did have a direct influence on inflating a single card, it’d be Land Tax. Regardless, that card is dumb!
- White being in both in the highly represented Orzhov / Boros aggro decks makes it not too much of a surprise that Selfless Spirit is pretty up there.
MULTICOLOR
- The top 5 cards being in Esper colors is no surprise because Esper and Orzhov are the two most winning color combinations.
Fractured Identity and Lingering Souls as mutlicolored cards are higher than mono red and green’s top cards.
- While I think Dack Fayden is the strongest multicolored card overall, Orzhov / Esper is stronger than Izzet / Grixis. It’s no surprise that Dack Fayden is the highest muti-colored card that’s not in Esper colors.
- There’s no green in the multicolor top 8
LAND
- For some reason, Wasteland is above Strip Mine by 6 decks.
COLORLESS
- No big surprises here except I didn’t expect Phyrexian Revoker and Walking Ballista to be so high. Ballista is a bit more surprising to me since I do play a lot of Phyrexian Revoker myself and not so much Walking Ballista. I was expecting something a lot more generic over these cards like Mind Stone / Coldsteel Heart /Coalition Relic.
- Wurmcoil Engine is one of the two cards at 6-cmc.
MISC
- Some of the newer cards are quite the sleeping giants and I’d have no doubt in my mind that they’d be in their respective top 8s if they were released last year: 10 Karn, Scion of Urza / 8 Teferi, Hero of Dominaria / 6 Saheeli, the Gifted. Karn being the only cheap colorless planeswalker means it’d probably be a top card overall numbers wise and probably the only planeswalker that could dwarf Elspeth.
My High Octane Unpowered Cube on CubeCobra
You can see that by comparing Strip mine and wasteland, one card is vastly strictly superior to the other, and it has 15% less top 3-0 decks.
City of brass and mana confluence are functionally identical, yet are almost as far apart as Elspeth and Gideon.
Sorry to be negative, just worried that people will misinterpret these results. Not trying to discourage the exercise as it's a fun project, and as the samples grow, the more interesting it becomes... Only suggest being tepid in drawing any conclusions that Card A is better than Card B because of what this data shows.
Last Updated 02/06/23
Streaming Standard/Cube on Twitch https://www.twitch.tv/heisenb3rg96
Strategy Twitter https://www.twitter.com/heisenb3rg
My High Octane Unpowered Cube on CubeCobra
I’m the person that’s currently running the analysis on these decklists. If you’re interested in contributing (from a coding/statistics perspective or in contributing your own decklists), let me know! I'm a fiend for data, so if you like keeping track of how decks in your cube do, I'd love to talk. We’re hoping to turn this into a larger project (and even grow our dataset by using submissions from verified cubers) and do some cool analyses, like using the data to write a decent drafting AI.
LucidVisions makes some good points. There are important questions of sample sizes, biases, and statistical strength in any data collection effort. I want to address them now as we move forward with this project. At some point, I’ll do a write up of the setup of the project (including discussions of sample size and bias), how we’re analyzing the data, and what we hope to do moving forward.
While it’s true that this correlation is not 100%, I’d argue it’s much higher than you think. The only reason to draft cards and put them in your decks is because they win games of Magic. Cards do this in different ways (Karn Liberated outright wins you the game, while Volcanic Island improves deck consistency), but the output is the same: a higher winrate. In fact, I would argue that the ONLY reliable metric for a cards power level is how it contributes to win rates. While assaying this contribution by looking at 3 - 0 decklists isn’t perfect, good cards lead to higher chances of 3-0 decklists. It would be nice to have all 8 decklists and their records from every draft, but this isn’t currently feasible.
Undoubtedly true. If you gave this dataset to any person that does data science or machine learning, you’d be laughed out of the room - the dimensionality of the data relative to our sample size is laughable.
But we’re not even looking for a very accurate picture - we’re looking for a general one. As of right now, there aren’t many conclusions that you can draw from this data that experienced cubers don’t already know (Fractured Identity is good, lands are important, blue decks are good), but having a falsifiable basis to make these claims is something the cube community has lacked for some time.
As someone who’s worked on low level data projects before, there’s a surprising amount of conclusions you can get from this dataset. I personally believe that Land Tax isn’t very good, but seeing it so high on the White list makes me question my beliefs. As long as the questions you ask aren’t very complex, this dataset might do better than you think.
This gets at an important point, and it’s one that I’m sure will come up often as we work on this project. One point is intrinsic biases in drafting (and therefore, in the dataset). One explanation for why Wasteland does better than Strip Mine is that drafters underrate Wasteland and overrate Strip Mine. Experienced drafters (those more likely to 3-0) end up with Wasteland a higher proportion of the time because of this.
One could also validly argue, as you claim, that it’s simply due to noise. This is especially true for your comparison of Mana Confluence and City of Brass. These high variance points are undoubtedly present in the data, and this will be true of almost any dataset. This leads nicely into the next two points.
We’re just presenting the data and noting things that are potentially interesting - how you interpret them is up to you. But I would argue that having some basis to argue for a cards inclusion is better than no basis at all. All too often the cube community relies on the testing results of high profile members to see if a card is good or not, but this doesn’t always work and isn’t verifiable. And many times, people don't cube enough to test themselves. Some experienced cubers swear by Reveillark, other equally experienced members hate it. This project hopes to begin to provide some verifiable, statistical basis for examining a cube card (and the characteristics of winning decks in general).
Couldn’t agree more. I’m hoping to be able to construct a statistical framework for this (if you’re familiar with statistics - defining null distributions, p-values, etc). But without even doing this I can tell you that the ability to directly compare the performance of two cards will almost always be well outside the reach of this dataset, and that’s important to acknowledge. I would caution anyone that is seeking to compare two cards (that includes Steveman and his claims above)
In summary/ TLDR
This dataset, as it currently stands, is question-generating, not as question-answering. When the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences.
How you choose to interpret it is up to you, but I see this as a developing tool for a community that historically has based card choices entirely on personal experience.
Regular 450 unpowered cube (with some custom cards) - 450 Unpowered
The biggest problem with using this list as an indicator for a cards power level, is that the differences in main deck % between the cards, plays a much larger roll on their relative 3-0 frequency than the relative power between the cards when they do make main decks.
Unless the card is black lotus, time walk etc.. Most indvidual cards don't effect an overall decks win% by a factor of more than 0.1-2%. Variations of main deck %'s differ by dramatically larger numbers between cards with similar power level.
If a card misses a main deck, it's no different than it not being included in the cube period, and that has an enormous effect on the likelyhood it shows up in a 3-0 deck.
That's why it's not surprising to see cards like Mana confluence, artifacts and swords to plowshares at the top.
It's probably too ambitious to track all these cards main deck %, but doing so would enormously increase the accuracy of the project (assuming the goal is to better identify a cards power level). This also gets around the issue of combining data results from cubes that don't have the same card list.
Last Updated 02/06/23
Streaming Standard/Cube on Twitch https://www.twitch.tv/heisenb3rg96
Strategy Twitter https://www.twitter.com/heisenb3rg
In the original post I stated this: "Keep in mind that the cards with the highest numbers aren’t necessarily the most powerful cards, but the most versatile cards that can appear in multiple decks."
My High Octane Unpowered Cube on CubeCobra
If you are looking to get a sense if your data sample is becoming more accurate, the mana elves are probably the best anchor... Since there are 4 that are basically functionally equivalent.
Llanowar, Elvish mystic, Fyndhorn elfs and Arbor elf*.
If you are comparing two cards who's spread in 3-0% is less than the spread between the elves, it's probably not that useful.
Tho, I know you answered that that's not what this is for. Question generating ,not answering etc.
Last Updated 02/06/23
Streaming Standard/Cube on Twitch https://www.twitch.tv/heisenb3rg96
Strategy Twitter https://www.twitter.com/heisenb3rg
Yeah, I read into your initial post a bit, thinking you were talking more about relative power level than you were.
My latest response was in response to Tjornan's response to me "While it’s true that this correlation is not 100%, I’d argue it’s much higher than you think."
Which I disagreed with.
This data set is comparing the total number of 3-0's of cards that have effectively participated in a different number of drafts... Thankfully, the total number of drafts a card participates in IS correlated to how good it is and how versatile it is (main deck%). So this is very useful data to get a general sense of how good a card is.. in an entangled mess of perceived value, versatility and actual effect on win%.
If a card shows up here at the top of the list, there's a very good chance that it's a very good card... which is useful information. Enormous differences in data from expectation, is good grounds to question, why?
Example: I am surprised Clique is as high as it is on this list, given that it costs UU, thus harder to play than some other blue cards. If it maintains it's rank when the data set doubles or tripples, I would take a serious look at what's making clique do so well.
Last Updated 02/06/23
Streaming Standard/Cube on Twitch https://www.twitch.tv/heisenb3rg96
Strategy Twitter https://www.twitter.com/heisenb3rg
I could not agree more with Tjornan's post. This aptly sums up why I'm interested in the data. Obviously it's not enough to conclude card A is better than card B, but it's useful as a catalyst to question our beliefs. If cards that I normally let wheel or often ignore are showing up in relatively high numbers of 3 - 0 decks, then perhaps I may need to reevaluate a few of my drafting choices--or not because someone may be able to explain to me why the high frequency is misleading. The point is to generate questions and discussions and I think this type of data is generally more useful than anecdotal evidence, which may be biased or easily distorted. I would like to encourage everyone to sort or sift through the master spreadsheet to see if you find anything intriguing.
450 Unpowered Cube Cobra
Accurate.
Less accurate.
The data is always useful. But when the data present information that is clearly unusable towards a specific goal, we have to try and reconsider what the data might be useful for. As an example, if I want to try to use the data to tell me what the best cards are, and Wasteland finishes significantly higher than Strip Mine, than the data has issues. We can use the data to determine some interesting trends, but the significance of the data gets called into question.
The data provides correlation, not causation.
Importantly, with the sample size, card pool, and playgroup considerations, card/color/theater/archetype preferences of 3-4 strong players in a weaker field can skew the data in a significant way.
When I first tried tracking information for my cube, we were trying to find maindeck % stats and win % stats. We did this for a while, but found inconsistencies in the data that prevented the data from being much more than one additional tool in a vast toolbox of them. And ultimately, the information provided by the players regarding their experiences was a more important part of card evaluation than the data we were mining. So we stopped.
The card that provides amazing value in a losing effort gets ignored. The card that provides poor (or no) value in a winning deck wielded by a skilled player gets credit for winning a draft. Without feedback from players regarding card performance, the data in and of itself doesn't carry much weight at all.
I would pick up the winning final 40 and pan through it, and ask the winner for specific feedback on cards. Like, "wow, I didn't know you were running New Card X in here, and you won the draft with it! How did it perform?!" ...And the answer was "I don't know--I never drew it". That was the end of us mining win % data. Because it simply didn't provide information about specific cards that we could rely on to provide meaningful information.
There is information that can be mined in this data. Popularity trends, drafting tendencies, player performance, etc... but I don't think that card quality determinations can be made from this data. Correlational support is great info to have, but to say that "when the data tells us something that doesn’t agree with our experience, it might be time to start questioning the reliability of those experiences" is blatantly false. Because if a player tells me card X was great in a losing effort, and another player tells me card Y was mediocre in a ridiculously good deck" ...that's data I can use.
Thank you guys so much for taking the time to extract all this data. It's fascinating stuff, and there will be good info to extract from it.
My 540 Card Powered Cube
My Article - "Cube Design Philosophy"
My Article - "Mana Short: A study in limited resource management."
My 45th Set (P)review - Discusses my top 20 Cube cards from ONE!
Thanks! It's really interesting to see the data from multiple angles. One of the more useful ways I've found is sorting it through color / cmc where there's a lot of discussion to be found.
Example: White 3-5 CMC Cards (DISCLAIMER: For the sake of brevity, I pulled out some cards with very low showings if they were only briefly in my cube).
3-CMC
Councils Judgment 38
Banishing Light 28
Gideon of the Trials 26
Recruiter of the Guard 25
Porcelain Legionnaire 24
Monastery Mentor 23
Blade Splicer 23
Oblivion Ring 22
Brimaz, King of Oreskos 19
Flickerwisp 16
Thalia, Heretic Cathar 14
Spear of Heliod 9
Hallowed Spiritkeeper 5
____________________________
- Council's Judgment being that much higher than Oblivion Ring / Banishing Light is a bit of a surprise to me since the latter are much more splashable.
- Gideon of the Trials being that high threw me off a bit, but then again he's a very solid 3-cmc walker that can be played in any white deck.
- Monastery Mentor is very powerful, I just didn't think it was as popular with others as I thought it'd be.
____________________________
4-CMC
Elspeth, Knight-Errant 44
Palace Jailer 38
Gideon, Ally of Zendikar 35
Hero of Bladehold 24
Parallax Wave 23
Ravages of War 21
Restoration Angel 20
Armageddon 17
Wrath of God 12
Day of Judgment 11
Moat 10
____________________________
- The one big stand out to me is Parallax Wave being really high, especially compared to Wrath of God / Day of Judgment. Now that I think about it, Parallax Wave can go into multiple archetypes were *** / DoJ lean heavily towards control.
- Palace Jailer just isn't a top white card, but it's currently the #4 most represented creature in the 3-0 archives.
____________________________
5-CMC
Reveillark 23
Gideon Jura 22
Angel of Invention 18
Archangel Avacyn // Avacyn, the Purifier 12
Cloudgoat Ranger 7
____________________________
Not much to say here, there's no surprise that Reveillark / Gideon Jura / Angel of Invention are the cream of the crop for white 5-drops. Your move, Reveillark haters
My High Octane Unpowered Cube on CubeCobra
There’s a more principled way of doing this in statistics. You define what’s called a “null distribution” and ask if differences between cards (or absolute %’s) could be explained by this null distribution. This is usually translated into something called a p-value, which is indicative of how significant any given result is. This is something I plan to develop over the course of this project, but you’re right - differences between functionally identical cards serve as a good benchmark.
The nice thing about random variation like this is that it theoretically goes away with sample size. The card that provides amazing value in one losing draft should eventually (absent bias) increase in frequency in winning decklists because it is a good card - the reverse is true for bad cards that sometimes go in good decks. This is true regardless of player skill. If our dataset had 1 million decklists in it, we could easily ask card vs. card comparisons (as long as the cards occupied similar niches in the cube).
Now, will our data set ever approach the requisite sample size to really answer specific card comparison questions? Probably not, but it doesn't mean it theoretically can't. In any case, what I’m more excited to investigate are the deck qualities. How many reanimation spells does the average winning reanimator deck have? How many tinker targets/fodder does the average tinker deck have? The nice thing about these questions is that they dodge some of the build-around-bias that LucidVision mentioned (Tinker and NO having lower win rates because they’re build around, for example), and they’re less susceptible to noise.
This is 100% true. Steveman is the one who contributes winning decklists the most (partly because he’s a good player and partly because he hosts the most drafts), and he has a tendency to draft 3-4 color midrange piles. I, on the other hand, draft aggressive decks almost entirely.
This is a source of systematic bias that will not go away as we gather more decklists. One way I’m hoping to fix this (and get more data) is to have decklist submissions from other users in the community. It’s also fairly easy to keep these decklists separate during analysis, so I can return stats specific to anyone’s cube (as well as merge their decklists into the general pool). These lists don’t even have to be just 3-0 lists.
If anyone is interested in contributing decklists to the project, let me know!
Regular 450 unpowered cube (with some custom cards) - 450 Unpowered
Funny you say that because I used to play a lot of Orzhov when we first started playing online together. When I compiled all our old decklists and created the 3-0 archives, Orzhov had 10 wins when every other 2-color combination only had 2 wins at most.
Like all playgroups, there's definitely a big variance in skill / experience. We're a pretty large group of rotating players, plus randoms online. Would be nice to have a cube group full of pro tour hall of famers, but that's not a reality for most people.
My High Octane Unpowered Cube on CubeCobra
It's fun to try answer some questions raised by it. One I have: why is impulse, which is so generic, so low in the list?
Impulse was out of my cube for a while.
My High Octane Unpowered Cube on CubeCobra