So, I'm a bit of a burn player, but I didn't feel that this really went in the Burn thread, because some of my previous attempts to do this have gotten a bit spammy. Basically my attempt is to build a simulator that can be given a decklist or a range of decks from a predefined cardpool, and then goldfish them many thousands of times in a few hours providing a whole bunch of data to examine such as what configurations mulligan the best, which decks are the fastest, and so on.
This isn't my first attempt at this. Previously I made a thread on it here but not many were interested. So I revamped things a bit and went to the thread for the deck I was trying to simulate here but that's too spammy.
Since then, I rebuilt my process to go about doing this, and switched my focus to Legacy Burn. I have a thread with these efforts on the source but also don't like spamming discussion threads and would prefer to just have something discussing this process/result instead. So rather than post them in the Burn thread here, I'll post in this topic.
For starters I should probably explain what I'm doing. I'm attempting to gather large datasets and play them looking at various metrics to answer questions like how much damage does a Goblin Guide really do vs a Monastery Swiftspear, am I insane for playing SDT? Is Magma Jet defensible? How much do opposing DRS's slow us down? What is the optimal manabase? And many others.
So to answer those questions I went about rewriting my simulator. For the programmers (don't worry if you don't understand this sentence) I changed it from procedural to object oriented and then I changed the resulting output from writing to columns to be imported into a spreadsheet for each test, to storing all information in a database so that all previous relevant games are available as part of whatever data set is being used. It is written in Python using MySQL and I am more than happy to share both the program and the results with anyone interested who knows how to use those things.
However, for those who don't know how to use those things, I think I should explain how it works. Basically this program builds a deck, then plays x games with that deck, and it records the game state on every turn.
We have some small screenshots there, you can almost think of this as a log. That first image shows the two decks available in this small sample as well as information on the card list. A close eye will show that both of those lists are identical, but more importantly, you can see that they're running 6 fetchlands, 6 lands, 12 bolt spells, and 4 magma jets among other cards.
So we'll go a level deeper from the decks to the games. Here you can see all of the game number of all of the games playing deck 2. There's also some handy mulligan information.
This would be enough to answer the mana question, but we want to go a level beyond and look at the turn breakdown of each game. That would be the third screenshot. Here you can see, everything was filtered to results from game 8. Which if you remember from the GamesTable had zero mulligans. Looking at that information, turn 5 appears to be the most interesting since it did 11 damage and set up for a win.
From there we can go to the final two screenshots, hand/board and look at that specific turn, which shows 7 burn, and an eidolon, with the last 2 coming from an eidolon trigger.
So that's the brief non technical explanation of what this simulator does. It builds datasets like that for analysis to see what does and doesn't work over a large number of games and using some wizardry you can look at many games at once.
Now, it was only recently finished, and some of my dataset indicates some bugs still exist (particularly with divining top) but the dataset is usable as is for certain things and some questions can be answered. The answers to those so far:
In one set of the data (all that I had at the time this was done), Monastery Swiftspear 18,316 damage over 9981 attacks. Which means it's average damage/turn is below that of Goblin Guide slightly.
Following the Swiftspear data I ran some games removing it and adding in Hellspark Elemental and Keldon Marauder instead of the Swiftspears in a creature heavy list and it resulted in faster games.
Another piece of useful data so far is in mulligan rates. With 19 lands it would mulligan about 1/3 of hands, so 1/3 go to 6, 1/3 of that to 5, and so on. With 22 lands it mulligans about 1/4 of hands and has an average kill turn of about .5 turns faster.
Last, it appears that both Magma Jet and SDT slow the deck down so far.
I've run a few more simulations since these. Last night I finished implementing the first of several optimizations I would like to do which enable me to run far more games. The one I used last night boosted the program from 3000 games per hour to 60,000 per hour.
What I've found is some general trends.
The first is that contrary to some current theories, SDT does not speed up the deck. Neither does Magma Jet.
Swiftspear also seems to slow things down by about .1 turns when compared to some other options.
22 land decks mulligan about 25% of hands while 19 land decks mulligan 33
With prowess Swiftspear evaluated to 1.835 damage/turn, that number probably goes up with faster games, but I don't think it ever beats the 2.0 of Goblin Guide
Burn typically needs between 17 and 24 damage to win a game depending on fetches and lifegain. That 7 life swing is worth 1 turn on average.
Ran a few more tests today. Had a really big one with manabases all set up, had it completed, then realized I introduced a bug into the mulligan decision. It didn't really impact any previous data but it had a huge impact on manabase information, fortunately it was there for just the one night.
So, after fixing it I did a couple more tests today. I tried decks with and without Probe. In all cases the decks with Probe underperformed compared to the decks without it but I didn't record the numbers because the pre probe points of comparison were with the bugged mulligan information. Regardless, it was a very big difference for probe (talking, losing half a turn at any land count) while the mulligan problem really only affected very low land counts.
After that I ran a couple thousand games at each land count using a ~55% fetch/45% fetchable manabase at all land counts from 2 to 30. (the percents come from earlier tests that showed it to be optimal). Then I thought the info would be a bit more presentable as a chart than a bunch of spammed numbers, so here it is.
With the list used which was heavy on 1's, few 2's, and no 3's, you can see that 19 lands was the sweet spot. What was interesting to me though was how quickly the win turn dropped as lands were added until it hit that minimum, and how slowly the average win turn rose past that point even though better cards were getting cut (the final 10 cuts were all bolt variants).
I'm running the same test again tonight but with Firecrafts over Magma Jets. One, it will give some idea on the value of each of those cards MB. And 2 it will give some data on a frequent game 2 configuration once we bring in 3 drops with a plan of casting them on time.
After these sets of tests I'll probably stop for a few days and change a few things in my program. There are certain tests I would like to run but can't because the right data isn't being recorded. Additionally, there's the issue of slow performance which needs to be corrected. I've done a little bit of that so far, increasing the speed from 3,000 to 60,000 games per hour, but Magic is a game of intense variance and it should realistically be able to run 1 million games per hour, or more since 86% of the programs runtime is currently tied up in the overhead for disk writes rather than actually executing code. Some restructuring of things should fix that.
This is interesting and I would be interested in looking at the code base.
I tend to use Perl over Python strictly because I started with Perl earlier.
I'll refrain from my opinion on using MySQL.
I'm using Python because it's a language I'm comfortable with. It's not really a good language to do this in because of speed issues which is a major issue for me right now (been reading up on SQL optimization to fix that) but I didn't really want to use Java or C++... I get enough of them in school, both of which I know but am not very good with. Honestly, I'm not very good with Python either, but I can atleast get things done and I find it enjoyable to work with.
The database being used is SQLite, typo up above when I said MySQL, but the two are pretty similar.
The code is being updated occasionally and has a couple known bugs (and certainly a couple unknown) what I'll do is PM you a link to a pastebin with it tomorrow.
You'll have to deal with my bad comment habits though, some of it is commented but for the most part I'm just using the text from logging information that can be turned on/off as a type of comment so it should still be descriptive enough.
Here's the chart with Exquisite Firecraft from 2 to 30 lands, same decklist as the previous chart except Magma Jets were turned into Firecraft http://imgur.com/jkmJGX1
You'll notice that the fastest turn happens at 21 lands, but the curve back up from that minimum point happens much slower. The gap between 18 and 21 land is bigger than the gap between 21 and 28 land.
So again, this leads towards the idea that if you're not sure, go a little higher on lands.
Also worth noting is that the lowest time with Magma Jet was 4.94 turns while with Firecraft it was 4.89 turns, so Firecraft beats out Jet.
That's the current code, the pastebin is good for a week, I set a time limit on it because I didn't want an old version of the code to be available in the future when it's updated. So if you want it, grab it now.
I've run a few more simulations since these. Last night I finished implementing the first of several optimizations I would like to do which enable me to run far more games. The one I used last night boosted the program from 3000 games per hour to 60,000 per hour.
What I've found is some general trends.
The first is that contrary to some current theories, SDT does not speed up the deck. Neither does Magma Jet.
Swiftspear also seems to slow things down by about .1 turns when compared to some other options.
22 land decks mulligan about 25% of hands while 19 land decks mulligan 33
With prowess Swiftspear evaluated to 1.835 damage/turn, that number probably goes up with faster games, but I don't think it ever beats the 2.0 of Goblin Guide
Burn typically needs between 17 and 24 damage to win a game depending on fetches and lifegain. That 7 life swing is worth 1 turn on average.
And people say burn doesn't require that much thinking...........
Red is the thinking mans color.
I don't think anyone has made the claim that SDT is going to speed up the deck. It decreases explosiveness but increases consistency. Also, "1.8xxxxx damage/turn" should not lead anyone to believe that cutting swiftspears is the right thing to do. GG is not "strictly better", they are strictly different cards.
Griefing aside, your brute force program is definitely interesting, although, I think the results should be taken with a few large grains of salt.
I don't think anyone has made the claim that SDT is going to speed up the deck.
No, I made that very argument a year or two ago, I was one of the earlier adopters of Top. I normally don't post in the Legacy section here though and stick to the Source. The theory that makes SDT worthwhile is that you can always cash it in for a card, and you get selection out of it. It sucks up excess mana but always cashes in for damage. I have a pretty elaborate spreadsheet that shows SDT being a benefit to speed. Obviously the theory behind that is wrong though based on these simulations.
It decreases explosiveness but increases consistency. Also, "1.8xxxxx damage/turn" should not lead anyone to believe that cutting swiftspears is the right thing to do. GG is not "strictly better", they are strictly different cards.
No, that alone isn't reason to say Swiftspear is a bad card. It's lower damage than Goblin Guide but that doesn't necessarily mean it's not our next best option. I'm basing the conclusion that Swiftspear is a poor choice on the fact that every Swiftspear deck is underperforming compared to a non Swiftspear deck using Keldon Marauders in it's place. It's a result that I don't fully understand the why of (especially when my before mentioned spreadsheet calculates Swiftspear as being more damage than Guide), but I have over 1 million games of Magic at this point that are telling me Swiftspear is not a good choice.
Griefing aside, your brute force program is definitely interesting, although, I think the results should be taken with a few large grains of salt.
I welcome criticism of it, it provides a basis for improvement. That said, I'm well aware of the limitations of such an approach, as I've said in the opening thread, I've tried this approach a couple other times and the resulting optimal decks weren't all that spectacular. That said, a more focused approach can reveal some valuable information.
----------------------------
Edit: Ran another simulation on a Probe vs non Probe deck.
The base list was
11 fetch
9 land
4 eidolon
4 swiftspear
4 goblin guide
4 price
4 fireblast
4 atarka's command
8 bolts
4 exquisite firecraft
Then it ran either 4 additional bolts or 4 probes
Each ran for 5,000 games and the probe deck came out ahead by about 0.03 turns. I then repeated it and it came out at about the same.
Next I decided to use the data to compare Swiftspear again. With the Probe deck Swiftspear dealt 23,192 damage over 11,262 turns it was on the board making it 2.059 power. With the Bolt deck Swiftspear dealt 21,593 damage over 11,108 turns making it 1.944 damage.
That's probably the last simulation I'll run short of an interesting suggestion until I make a few changes to the program. Originally I didn't build in turn tracking because I figured I could just take the rowID in a database query, but I've found a few places where it's useful so I'll be adding in some turn tracking. For the non technical explanation, this will let me link data such as hands on turn 1 to games that ended on turn 3 or 4 and that's something I think could be pretty useful to evaluate opening hands.
Are there any sideboard/MB cards people would like to see added that I'm not currently using? Given that I've been doing more focused testing I'm not as concerned with limiting the number of card types that a more broad approach testing all combinations requires because of math. Because of my approach not every card can be implemented (this program isn't DotP or MTGO) but if the card sounds useful enough or easy enough I'm willing to include it.
Right now Sulfuric Vortex sounds good. I would like to include Grim Lavamancer but because I'm not tracking the GY it's a bit bigger of an undertaking, though not completely impossible to implement, I would just have to add a GY. If I add a GY I could try out Barbarian Ring as well. If I do add Barbarian Ring I'll also add a pain index tracking how much damage we deal to ourselves per game, though the Price of Progress/Flame Rift category won't accurately add damage.
So to sum that up, here's the current to do list, roughly in order of priority:
1. Add turn tracking
2. Add Sulfuric Vortex
3. Add Barbarian Ring
4. Add Grim Lavamancer
5. Add opposing interaction
6. Add self damage
7. Separate Flame Rift/Price of Progress (relevant due to self damage) category
Most of these are simple (in fact, turn tracking is already done as of this edit)
Would there be any more cards/features people are interested in?
Items 1, 2, and 4 on the list are implemented. Depending on how the rest of my night goes #3 may be implemented too, if not it's almost certainly going in tomorrow.
I think I need to redo my priority on casting certain cards though, Lavamancer, Probe, and Vortex have made things a little murky.
Most likely I'll be setting a single large test up tonight with one of the better performing lists, and then tomorrow making some links between opening hands/faster wins and opening hands/slower wins.
Knocked a few points off of my to do list. Turn tracking, Gitaxian Probe, Sulfuric Vortex, Grim Lavamancer, and Barbarian Ring were all added and a bug with Fireblast was fixed.
My approach has some flaws though since I'm starting to get some more slowdown in the program. Run time has doubled since I added the extra logic for those cards (down to about 500 games/minute) so I probably can't add anymore until I find some ways to optimize things (which I've been tinkering with).
The decks are also winning a little slowly, not quite sure what's up with that (it's probably just my card choices) but it's fine to just say one deck is faster than the other, so it's not really all that important.
I recently ran a few thousand games with some varying decklists. They were all some variation on this. Everything from 1-4 GLM, then some 2 GLM's with different land counts filling in more Burn as stuff came out.
11 Fetch
9 Fetchable land
2 Barbarian Ring
4 Eidolon of the Great Revel
4 Goblin Guide
3 Vexing Devil
2 Grim Lavamancer
4 Price of Progress
4 Fireblast
9 Lightning Bolt (or Lava Spike if you would prefer)
4 Gitaxian Probe
4 Exquisite Firecraft
Not the best list, I admit but I was looking at some specific things that those cards would show rather than going for raw speed.
As I said, it's slowing down so over the 7 decks I played 1000 with each deck for 7000 games total.
With the above deck the fastest wins were on turn 4, 13/1000 games had a turn 4 win.
Oddly enough, there wasn't a single 7 card hand that won that early, they were all mulligans. The average hand size for a turn 4 win was 5.92 and the average hand was
1.42 Fetch
0.50 Land
0.00 Barbarian Ring
0.50 Eidolon of the Great Revel
0.42 Vexing Devil
0.75 Price of Progress
0.08 Fireblast
1.42 Lightning Bolt
0.58 Exquisite Firecraft
0.25 Grim Lavamancer
So roughly 1.9 lands, .8 creatures, 2.9 burn spells, and anything else. So without the decimals that would imply the opening ideal hand on 6 cards is 2 land, 1 creature, 3 burn, and 1 of anything.
Opening the dataset up to all 7000 games there were 80 games which had a turn 4 win, and again there were no 7 card keeps and almost no 5 card keeps. It's interesting enough that I'll have to look into the why of this further as there were plenty of slower wins that kept 7 cards. The hands were all on the play so I don't think it has anything to do with going 8 cards deep with a mulligan scry on the draw. Because of the differing decklists I don't think average cards are all that useful for the whole dataset but there are a few trends that I see as peculiar. There were no Goblin Guides in any of the fastest hands and no Swiftspears (a couple lists had them). It was almost always looking for 2+ bolts, an additional 3 damage burn spell, and a 4 damage. Exquisite Firecraft featured rather prominently in the fastest hands and had a significantly higher presence than Fireblast. Lavamancer showed up from time to time, and when he was around Barbarian Ring never fired but I find it very odd that Lavamancer, who should in this situation be strictly worse than Goblin Guide would show up when the Guide made 0 appearances in thousands of games, my only explanation is that it's an example of variance with small sample sizes.
Some data on Lavamancer: He appeared in 11,295 turns and activated 2571 times for a grand total of 13,866 damage or 1.23 per turn.
Lots of info this time, only tested 2 decks but there's lots to say because of the turn tracking implementation. I used a burn deck and a sligh deck. The Burn deck was almost straight burn though I did use Eidolons and Vexing Devils.
Spoilered is my decklists out of code, it should be obvious what is what.
Now what I really want to talk about here is the average opening hand, starting with the Burn deck. I ran 10,000 games with each deck. The simulator had no games that ended on turn 3 but several that ended on turn 4. Out of the turn 4 games the frequency of each card showing up in the opening hand was
2.35 Fetch
0.57 Land
0.10 Barbarian Ring
0.24 Vexing Devil
0.91 Flame Rift
0.20 Fireblast
2.06 Lightning Bolt
0.36 Exquisite Firecraft
So the best performing hand was 3 Land, 2 Bolts, and a 4 mana burn, and then a 1 of 3 or 4 mana burn. Or to put that another way, 3 lands 13 points of Burn.
With the Sligh deck there were fewer games. The hand information for them was
2.36 Fetch
0.57 Land
1.14 Flame Rift
0.07 Fireblast
0.29 Atarka's Command
1.64 Lightning Bolt
0.07 Gitaxian Probe
0.78 Exquisite Firecraft
0.07 Grim Lavamancer
So what you're looking for here is again 3 Land, and some 3 and 4 mana bolt spells. I find it extremely telling that the best performing hands in the creature deck involved no creatures at all and that even though my card choices were clearly pushing Eidolons, there were no Eidolons involved in the fastest decks.
Over on the Source one of the posters suggested that my report of average win turn isn't really telling the whole story and started asking for things like the standard deviation of turns. Instead I'll do one better especially because it's easier for me, and I'll provide a graph of the turns.
Do note, I cap the games at turn 12 so there will always be a spike on that turn.
First, here's the burn deck, the average was 6.02 http://imgur.com/lGbRbWf
Based on this information I'm going to try putting a list together that's just the best performing cards.
One last thought, I find it very interesting that the fetch/land distribution was near identical in the fastest hands though it could just be coincidence.
Edit: Bug found, creatures other than Lavamancer weren't dealing damage after I added it. That explains a lot. Will post amended results later, leaving the rest because it's still relevant for straight burn lists.
With that combat damage bug fixed (which only applied to post Lavamancer data) I reran the previous two tests. Same decklists, same number of turns.
In the case of the burn deck it came out to 14 points of burn in hand being the average in the winning decklists (vs the 13.5 last time). The best average hand was
So an optimal speed hand is 3 lands, 1 creature, 1 3 damage burn, 1 4 damage burn, 1 misc burn
The average kill turn was 5.42 turns. There were still no turn 3 kills, though it should be theoretically possible. In chart form the turn curve looked like this: http://imgur.com/8XcqvvP
With the Sligh deck there were a handful of turn 3 kills, but they were very rare. The average hand for turn 3's was
1.50 Fetch
1.75 Land
0.25 Swiftspear
1.25 Guide
0.25 Fireblast
0.75 Atarka
1.00 Lightning Bolt
0.25 Gitaxian Probe
So again, 3 lands, 2 1 drop creatures, 1 4 damage spell, 1 3 damage spell
The average hand for a 4 turn game was
1.34 Fetch
1.69 Land
0.73 Eidolon
0.12 Swiftspear
0.40 Guide
0.53 Flame Rift
0.28 Fireblast
0.28 Atarka's Command
0.85 Lightning Bolt
0.06 Probe
0.42 Firecraft
0.11 Lavamancer
More data... this time from another poster who asked me to run his deck through it. It was essentially 21 land, 4 guide/lavamancer, and the rest burn. So I took this and tested 11 decklists using everything from the base list -5 lands to the base list +5 lands. When removing land I first added a Firecraft, then I added 4 Swiftspears. When adding land I removed Lavamancers followed by a Lightning Bolt.
The data indicated that in this case, more lands were not necessarily better http://imgur.com/66dNeNO
That's the graph at various land counts. The lower the land count got, the faster the deck got. I suspect part of this is due to Swiftspear being more powerful than a land
Across all 11 decks the average opening hand that won on turn 4 was
1.38 Fetch
1.47 Land
0.02 Swiftspear
0.34 Goblin Guide
1.02 Flame Rift/Price
0.34 Fireblast
1.95 Lightning Bolt
0.04 Exquisite Firecraft
0.26 Grim Lavamancer
0.17 Sulfuric Vortex
So rounding stuff off that's 3 Land, 1 Creature, 11 Burn as your ideal opening hand at 7.
The fastest hands on a mulligan to 6 were
0.96 Fetch
1.50 Land
0.01 Eidolon
0.03 Swiftspear
0.36 Guide
0.85 Rift
0.21 Fireblast
1.63 Bolt
0.05 Firecraft
0.25 Lavamancer
0.14 Vortex
So that's 2-3 Land, 1 Recurring threat (creature/vortex), and 10 points of Burn.
Basically the same thing as the 7 land.
Going down to 5 the data starts getting a little thin but the average optimal 5 was
0.58 Fetch
1.31 Land
0.02 Swiftspear
0.42 Guide
0.69 Rift
0.14 Fireblast
1.51 Bolt
0.05 Firecraft
0.22 Lavamancer
0.06 Vortex
Here you want 2 land, 1 creature, and 8 points of burn with no 3's.
So if you want to apply this information to mulligan decisions, a 6 with less than that would be better off as a 5. Interesting to note, 5's that won on the earliest turn possible were about half as common as 6's that won on the earliest turn possible, so that would suggest you want to mulligan pretty aggressively on 6.
I'll probably start trying to incorporate some of this mulligan information into my mulligan decision process both IRL and in this program once I have a bit more data.
Been taking part in a few discussions on fetch counts lately so I decided to throw them in my program. I tested 20 to 24 lands builds with 16, 12, and 8 fetchlands (and the remainder being fetchables) over 1000 games each on a list that I thought was optimal with 22 lands, and then with an additional 5000 games (for 6000 total per configuration) because I thought the results were just variance.
The optimal build was 21 lands with 8 fetches. The next best build however was 22 lands with 16 fetches, and then at 22 lands as the fetch count decreased the deck speed decreased as well. I don't really have any rule of thumb to suggest here, but it looks pretty clear to me that the fetch count can be tweaked to make an optimal land count deck. If for example the optimal point is above 21 lands but below 22 lands, 22 lands with lots of fetches gets you closer to that point. The tradeoff is in mulligans, I've started tracking this number quite a bit and as the mulligan count increases the ratio of fetch:fetchable that you want to see in your hand changes. Obviously this only applies to Burn because we have little to no color fixing to worry about, but lands are much better than fetches when you go to 6 and 5.
If I could think of some good representative lists this is actually a project I would like to do. It's becoming very clear to me that the value of each card changes as the hand size changes. Creatures hold less value at 7 than they do at 5 (I think most had already figured this out), but the value of Burn spells also change and it appears the same is true of lands.
This most practical application of this is that the ideal list contains a mix of everything, so that you always have a chance of drawing to the higher value cards. I also suspect that there's some tuning that can be done here on the play/draw when sideboarding such as taking out one creature on the draw and bringing in an extra spell or taking out a spell for a creature on the play.
Ran a test with some cantrips last night, the card choice is more Modern relevant than Legacy just because Gitaxian Probe and Serum Visions are a lot easier to code than Ponder and Brainstorm. The results were consistent though, the deck with 8 cantrips fared worse than the deck with 4, and the deck with 4 was worse than the deck with none. Even in the deck with 8, the best performing hands were the hands with no cantrips.
Built a new AI, it has a slower run time but offers much improved performance over the previous system (average turn time to win a game dropped by .6 turns/game with the same deck). I'll be posting quite a few new tests with it over the next few days as I get around to things.
Going back to my old to do list
1. Add turn tracking
2. Add Sulfuric Vortex
3. Add Barbarian Ring
4. Add Grim Lavamancer
5. Add opposing interaction
6. Add self damage
7. Separate Flame Rift/Price of Progress (relevant due to self damage) category
Items 1, 2, 3, and 4 are done. 6 and 7 will be finished tonight (Barbarian Rings self damage might be a little tricky though). Opposing interaction requires a few more systems to be in place but the foundation is being laid, I don't know when it's going to happen but it will happen.
My current system can handle most cards, but not all. Recently I experimented with cantrips and they didn't perform very well. Are there any other cards people would like me to add keeping in mind that alternate costs and activated abilities aren't really doable right now? Though activated abilities are on my to do list so an interesting enough card could bump that up in priority.
Edit: New data, found a pretty substantial bug in the old data. Let that be a lesson to triple check things when results go against expectations, rather than thinking it's something new and meaningful.
Ok, so I figured out something else I can do with this dataset that's perhaps more useful than measuring games. I figured out that I can actually use it to measure cards, this is something of an offshoot from my previous few posts where I mentioned being able to track the average opening hand in the best performing lists. Using a little bit of math (and a spreadsheet to make life easy), it's possible to create a value that scores a card based on how much it over/under performs. In my system these results range from -1 (over performs the most) to 1 (big under performer).
I'll be breaking a rule I've tried to stick to with posting these simulator results, and I'll be giving more numbers than usual this time, mainly because I'm not quite sure of a good way to graph them. I started running various tests looking for more optimal lists. Currently I'm on the following list:
7 Fetch
16 Land
1 Barbarian Ring
This list has an average turn of 5.19 which is down a bit from previous decks. But one thing you'll notice is that it's a bit creature heavy. While the new AI does limit this somewhat (it stops over extension) it's still something of a symptom of goldfishing rather than having real opponents.
Using the T3 information (about 1% of games) I'm able to generate the following score for each card in the deck. A lower number means the card is over performing relative to other cards in the deck, and a higher number means it under performs. It's a relative rather than absolute ranking, and should always sum to the average of 0 (minus rounding)
-0.75 Lightning Bolt
-0.47 Price of Progress
-0.43 Land
-0.23 Goblin Guide
-0.18 Atarka's Command
0.02 Grim Lavamancer
0.02 Sulfuric Vortex
0.07 Vexing Devil
0.12 Barbarian Ring
0.23 Fireblast
0.32 Fetch
0.37 Monastery Swiftspear
0.47 Eidolon of the Great Revel
0.47 Exquisite Firecraft
The T4 information changes a bit, which should be no surprise. As games go longer the value of each card is going to change.
-0.17 Land
-0.14 Eidolon
-0.11 Price of Progress
-0.05 Atarka's Command
-0.03 Bolt
-0.02 Guide
-0.01 Exquisite Firecraft
0.00 Fetch
0.02 Swiftspear
0.03 Barbarian Ring
0.04 Sulfuric Vortex
0.08 Fireblast
0.09 Grim Lavamancer
0.29 Vexing Devil
Finally, the T5 information
-0.36 Land
-0.19 Eidolon
-0.04 Fireblast
-0.03 Firecraft
-0.02 Fetch
-0.02 Ring
-0.01 Atarka's Command
0.00 Rift
0.00 Sulfuric Vortex
0.05 Swiftspear
0.06 Lightning Bolt
0.09 Grim Lavamancer
0.18 Guide
0.29 Vexing
Now, to turn all of those numbers into the useful information I'm sure you're waiting for I'm weighing each turn win by how common it is.
-0.05 Land
-0.04 Eidolon
-0.04 Price
-0.02 Atarka's
-0.01 Swiftspear
0.00 Bolt
0.00 Fetch
0.00 Firecraft
0.01 Barbarian Ring
0.01 Vortex
0.03 Fireblast
0.03 Lavamancer
0.09 Guide
0.09 Vexing
So we can see here that most cards are actually performing pretty close to the desired average. In a perfect world every card here would be near zero and that would indicate that the numbers are right on every card in the deck. We're not that far off though. I find this to be a pretty damn good case against Atarka's Command which is a card I've supported. It's just barely overperforming as a 1 of, but 2 gives the results that it's too many copies. Considering what this suggest the optimal fetch count is I see no way that the green splash is viable. Vexing Devil also looks to be slightly over rated at 4 copies. Removing one copy should increase the value of Guide and Lavamancer since they won't be competing as a T1 play. In exchange for that cut I'll add an additional price which is over performing and suggests it needs more copies. I'm hesitant to go up another land here and I think increasing the mana curve slightly will accomplish the same goal of balancing out the lands.
There's also a possibility that a 5th Eidolon in the form of a Pyrostatic Pillar could be a worthwhile addition over the Lavamancer. But, Pillar isn't in my simulator (yet) so that remains an unknown.
The biggest surprise to me here is that Exquisite Firecraft out performs Fireblast so consistently. With them both being 4 damage, I really thought Firecraft would be the weaker of the two.
What I'll probably do is run this simulation again on the draw (this was on the play) and figure out what should be done from there. And then make the suggested change and try again.
-------------------------------
Edit: On the draw results are in. They also suggest -1 Vexing Devil, +1 Land. -1 Lavamancer, +1 Bolt is also looking quite good. Both changes will be implemented for the next run... but for now it's time to add new card types, self damage, and lay the groundwork to have an opponent that interacts.
Have you ever tested a no fetch, no B.Ring build? Stifle is a huge card locally, and I'm really interested in what a no fetch vs fetch build looks like as far as average kill turn.
Have you ever tested a no fetch, no B.Ring build? Stifle is a huge card locally, and I'm really interested in what a no fetch vs fetch build looks like as far as average kill turn.
Keep doing the good work!
Weird, I swore I wrote a response to this earlier. I did run a fetchless build. 16 fetches had the thinning ability of cutting about one land. So if you cut too many fetches you'll want to add a land. The average kill turn doesn't really change though because the optimal builds are using only a few fetches. This isn't the only build that works, but here's how it tunes what I've been working with.
7 Fetch
16 Mountain
2 Barbarian Ring
4 Eidolon
4 Swiftspear
4 Guide
1 Lavamancer
4 Price
2 Fireblast
11 Bolt
4 Firecraft
1 Vortex
Next up is adding an opponent. I've been writing my components to allow for an opponent without too much trouble but it's still a significant amount of work so that will keep me busy for some time.
All has been quiet here for a bit. Real life caught me a bit unaware with me being the victim of a break in and my testing computer being stolen. On top of that, teaching a couple of AI's to play Magic, particularly attacking/blocking is really freaking complicated. It took me 6 attempts (each attempt being a day of theory followed by a day of coding) to come up with an attacking/blocking system that I liked. It kept getting stuck on odd corner cases. For example:
P1 at 20 life, owns a tapped goblin guide
P2 at 18 life, owns a swiftspear
P2 would never attack in this scenario because the Guide represented a faster clock and it was losing the race, but at the same time it wouldn't block with the Swiftspear and throw it away so it should very clearly be attacking. Little things like that which make common sense to humans are difficult for computers to grasp. Even if the life totals were 8 and 6 this would remain true, so I didn't want to go the cheap route of simply always attacking if over X life. Eventually I figured something out though.
In the end, my AI is pretty simple but I think it has a solid strategy behind it. Where most AI's look for reasons to include cards in an attack/block, which leads to millions of permutations that have to be compared I took the opposite approach. Everything that can attack, attacks. Everything that can block, blocks. Once a card has been identified as a potential attacker/blocker I look for reasons to exclude it and if I don't find one, it's in combat. At a minimum I think it's good enough to beat novice players, and the beauty of my approach is that it only requires a quantity of games, not quality. 10,000 games worth of data generated by Finkel playing Budde is as good as 1,000,000 games of two brand new players, so simply generating more data makes up for any deficiencies in the AI's skill.
I was able to speed things up a bit too, it takes about 2 hours to run 100,000 games where previously that would take 5 hours.
The opposing player used is a generic Delver deck. For reasons I won't explain (PM me if you're really curious) certain cards are more difficult to program than others, so that limits my card pool a bit. Basically, on action triggers are still beyond my programs capabilities which excludes Young Pyromancer, in depth card evaluation is too time consuming (in particular shuffling away bad cards) which excludes Ponder and Brainstorm, and alternate costs are close to impossible to do in an elegant fashion which excludes Force of Will. This causes the test deck to be some weird hybrid that falls between Legacy and Modern legality. It also doesn't really impact testing because counters, removal, and blockers are still counters, removal, and blockers even if they exist at a lower quality, and really it's the interactions as they come up that are being measured. The "final" UR Delver list being used is
8 fetches
10 fetchable land
4 Monastery Swiftspear
4 Stormchaser Mage
4 Delver of Secrets
After looking at the data, and filtering the results to only the games Burn won (it won ~67% of games btw), using the previous list I gave which was the optimal performing goldfish list we get the following results, looking only at T3/T4/T5 wins (9339 games over the initial dataset):
0.37 Fetch
-0.62 Land
-0.01 Ring
-0.13 Eidolon
0.15 Swiftspear
0.15 Goblin Guide
-0.06 Rift
0.01 Fireblast
0.05 Bolt
0.01 Firecraft
0.07 Lavamancer
-0.01 Vortex
So from this we can say that the big under performer was fetchlands, the card the deck most wanted was lands (note that it's already running 25, and this says that even more are still optimal), another Eidolon would be fantastic (already running 4, but maybe a Pyrostatic Pillar?), and the 1 drop hasted creatures are a little over represented.
So I think that for the next test I'm going to try -1 Fetch, -1 Swiftspear, -1 Lavamancer, +2 Land, +1 Eidolon (for now it will just be 5 Eidolons, I'll downgrade to a more accurate Pyrostatic Pillar in the future, the next time I add new cards to the pool, which will probably happen at the point that I really, really want Searing Blaze).
As an aside, for a bit of trivia, I expect Eidolon to consistently overperform in the future. In fact, my original test deck for the opposition was 40 lands, 20 Eidolons and it won 80% of games despite the mulligan code almost always forcing it to 4 card hands.
Some more updates to my program. Going to keep this one as brief as I can. I finally got around to adding Searing Blaze but due to some issues that I found extremely interesting I couldn't just take the quick and dirty approach to it. I actually had to do things the right way and create a targeting system. This comes with a few benefits. Most notably, Lightning Bolt is actually Lightning Bolt now rather than mimicing Lava Spike. I was even able to do it in such a way that the logic is generalized, which means one function of targeting logic can equally let Burn, Delver, and Miracles all find targets for their spells. The downside is it made card creation a little more unwieldy. Right now I can make cards that target, creatures, players, or creatures and players or not target at all but i cannot make cards that target two creatures. This means no Electrolyze. In theory I can do Searing Blood but I don't really have an event trigger system so it would require custom logic (which I don't want to do) rather than just parsing targets and numbers on the card. Searing Blaze however was doable. Based on the results, I'm glad I worked Blaze in but I don't think I'm going to need to work Blood in.
The list I have right now is not the optimized list against the opposing AI, because I think optimizing for a bad deck is a poor idea. Instead I'm just looking at various card values. With my current list (posted below) over 10,000 games Searing Blaze looked optimal at 4 cards (score of 0, so perfectly balanced).
Firecraft is looking pretty damn good as a mainboard 4 of as well. It does work out poorly in turn 3 games where it's over represented as a 4 of, but in the turn 4 games it's only slightly high and in turn 5 games it's under represented. Depending on the expected meta I could see an argument here to only play 3 MB if you expect things to be fast.
Guide and Swiftspear are starting to climb the charts of over representation now that there's an opponent with blockers and the ability to kill them. According to the numbers Guide is a little better than Swiftspear, but Prowess gets exponentially more powerful (just think about it, it's way better when you trigger on two creatures at once) as you put more of it in your deck so the real weaker card might be Guide.
Lands show up as under represented still.
Finally, here's the list and it's worth noting that this is the first list to go through my program (ever) that had more turn 4 wins than turn 5 wins, despite the fact that the opponent has blockers. Searing Blaze is just that good.
22 Mountain
2 Barbarian Ring
4 Eidolon of the Great Revel
4 Monastery Swiftspear
4 Goblin Guide
One last comment, over the past few days I've been thinking about how to implement Bedlam Reveler because he's new, topical, and I bet a lot of people would like some extra data before Louisville in 5 months. Bad news. I'm not sure I can implement him. I'm going to try, but given my limited knowledge of AI programming I'm not sure how to write a variable cost. It's a really difficult concept to explain to a computer. I have some ideas, one is a really ugly way to implement it but would work. The other would probably require expanding my trigger system but would be a little more sustainable. If I implement it that's the direction I'm going to take.
If I'm being honest with myself though, the card I really want to know about is Collective Defiance. I like this card more in Modern but I think it has some SB potential here against Miracles and combo. Sadly, modal spells are way beyond me. Modal spells are 500x harder than variable mana costs. Spells with both of those combined are probably impossible given my current approach.
I ran a test earlier today with 1000 games/deck looking at the average win turn of the deck using every configuration from 17 lands to 34 lands. The results were very surprising. The fastest deck was at 22 lands, the slowest deck was at 23 lands, and all 17 decks rated within 0.15 turns of each other. This brings up some interesting questions about just how much land counts actually matter. The goldfish results certainly suggest that more lands are better, but the common wisdom is that fewer lands are better.
For now I'm a bit confused by this and am going to rerun the test with a much larger sample size and see if it gives similar results.
As you can see it's almost completely flat, which suggests no functional difference due to land counts.
This could be because it just doesn't matter, or it could be a result of the card choices I made in each deck (My process started with the 17 land deck, which would be the lowest curve deck, and simply cut cards for lands rather than readjusting the entire mana curve as lands were added), or it could just be an example of the huge variance in the game.
Ran some more simulations. The flatness was explained by a bug in my program. Where it should have been going through a bunch of decklists it was repeating going over the same list over and over although my database was updating the deck itself. It wasn't a bug that was easy to catch because the reporting was correct, it was only in looking at detailed results that I saw it (I noticed it because Barbarian Rings were showing up in decks that it shouldn't have been in). So I'm going to leave the previous graph up, because I think it's a good representation of how much variance there is in Magic.
Anyways with that bug caught and fixed I repeated some tests that I ran earlier before there was a second deck
I ended up testing 10,000 games per deck with every land range from 17 to 33. After testing the decks I went into each deck using my card evaluation spreadsheet, and tuned each list for something somewhat appropriate to it's land count. This isn't something I intend to do often because it's slow and error prone. After that I ran the 10,000 games again and looked at the results
There's a few weird things about this, I was expecting the slowest deck to be on the edge, either 17 or 33 lands and instead it was around 30 lands, with the 31+ land decks speeding up. It's really interesting (to me atleast) that 17 lands and 33 lands performed nearly equally. The other odd thing was the spike at 26, which happened to coincide with the 26 land deck losing nearly twice as many games as the other lists.
The spike at 26 is still there, and so is the dropoff at 31+. I don't have explanations for those right now, but I'm pretty confident in them occurring. The other thing I'm pretty confident in is that while the optimal land count for a goldfish was at 26, it appears to be 23 against this deck. 21 and 19 lands also performed well. Realistically anything between 19 and 23 is probably ok. What seems to make 23 perform so well is Exquisite Firecraft.
I'll probably be focusing on the 23 land decks for awhile because the 1 drop haste beaters keep performing poorly and I think something a little higher curve using some Marauders might do better because Marauders performed very well in the decklists I included it in which were typically higher curve (a large portion of their win speed was attributable to Marauder I think)
Abbot of Keral Keep is something I would like to test out with 23 lands opposed to Swiftspear. It's going to take some recoding to make it work though because I currently play a land before any turn logic takes place. Alternatively, Hellspark Elemental might make it back in for testing.
I really like manabases, so I haven't given up on the 25+ land counts yet. With some good lands that can act as useful mana sinks they can be back in the competition. I'm not quite sure what they do yet because 4 CMC cards burn is interested in are few and far between but if the mana is viable there might be a reason to take a look at them. So, I added Forgotten Cave to my simulator, and after teaching the software all about cycling it started to use it. Basically it treats it as a moderate priority 1 drop, whenever it has enough lands available.
I ran it through a battery of games, around 26,000 games. It performed ok but not exceptionally. Basically, I just took my test deck lists and stuck it in. 27 lands with 4 caves was about equal in speed to 22 lands without, 25 lands with 4 caves was just slower than 23 lands without, and 23 without was faster than 23 with.
It's possible the card just isn't good enough (which is certainly the opinion prior to me running these games), or it's possible I wasn't including the right cards with it (more Rings/Lavamancers). Whichever the case may be, I'll probably spend another day or two tinkering with it before moving on to another card I would like to implement.
Private Mod Note
():
Rollback Post to RevisionRollBack
To post a comment, please login or register a new account.
This isn't my first attempt at this. Previously I made a thread on it here but not many were interested. So I revamped things a bit and went to the thread for the deck I was trying to simulate here but that's too spammy.
Since then, I rebuilt my process to go about doing this, and switched my focus to Legacy Burn. I have a thread with these efforts on the source but also don't like spamming discussion threads and would prefer to just have something discussing this process/result instead. So rather than post them in the Burn thread here, I'll post in this topic.
For starters I should probably explain what I'm doing. I'm attempting to gather large datasets and play them looking at various metrics to answer questions like how much damage does a Goblin Guide really do vs a Monastery Swiftspear, am I insane for playing SDT? Is Magma Jet defensible? How much do opposing DRS's slow us down? What is the optimal manabase? And many others.
So to answer those questions I went about rewriting my simulator. For the programmers (don't worry if you don't understand this sentence) I changed it from procedural to object oriented and then I changed the resulting output from writing to columns to be imported into a spreadsheet for each test, to storing all information in a database so that all previous relevant games are available as part of whatever data set is being used. It is written in Python using MySQL and I am more than happy to share both the program and the results with anyone interested who knows how to use those things.
However, for those who don't know how to use those things, I think I should explain how it works. Basically this program builds a deck, then plays x games with that deck, and it records the game state on every turn.
So to use a handy screenshot I put together on imgur
http://imgur.com/a/tKA8D
We have some small screenshots there, you can almost think of this as a log. That first image shows the two decks available in this small sample as well as information on the card list. A close eye will show that both of those lists are identical, but more importantly, you can see that they're running 6 fetchlands, 6 lands, 12 bolt spells, and 4 magma jets among other cards.
So we'll go a level deeper from the decks to the games. Here you can see all of the game number of all of the games playing deck 2. There's also some handy mulligan information.
This would be enough to answer the mana question, but we want to go a level beyond and look at the turn breakdown of each game. That would be the third screenshot. Here you can see, everything was filtered to results from game 8. Which if you remember from the GamesTable had zero mulligans. Looking at that information, turn 5 appears to be the most interesting since it did 11 damage and set up for a win.
From there we can go to the final two screenshots, hand/board and look at that specific turn, which shows 7 burn, and an eidolon, with the last 2 coming from an eidolon trigger.
So that's the brief non technical explanation of what this simulator does. It builds datasets like that for analysis to see what does and doesn't work over a large number of games and using some wizardry you can look at many games at once.
Now, it was only recently finished, and some of my dataset indicates some bugs still exist (particularly with divining top) but the dataset is usable as is for certain things and some questions can be answered. The answers to those so far:
In one set of the data (all that I had at the time this was done), Monastery Swiftspear 18,316 damage over 9981 attacks. Which means it's average damage/turn is below that of Goblin Guide slightly.
Following the Swiftspear data I ran some games removing it and adding in Hellspark Elemental and Keldon Marauder instead of the Swiftspears in a creature heavy list and it resulted in faster games.
Another piece of useful data so far is in mulligan rates. With 19 lands it would mulligan about 1/3 of hands, so 1/3 go to 6, 1/3 of that to 5, and so on. With 22 lands it mulligans about 1/4 of hands and has an average kill turn of about .5 turns faster.
Last, it appears that both Magma Jet and SDT slow the deck down so far.
EDH: Grand Arbiter $tax, Freyalise Stompy, Mimeoplasm Death From the Grave
What I've found is some general trends.
The first is that contrary to some current theories, SDT does not speed up the deck. Neither does Magma Jet.
Swiftspear also seems to slow things down by about .1 turns when compared to some other options.
22 land decks mulligan about 25% of hands while 19 land decks mulligan 33
With prowess Swiftspear evaluated to 1.835 damage/turn, that number probably goes up with faster games, but I don't think it ever beats the 2.0 of Goblin Guide
Burn typically needs between 17 and 24 damage to win a game depending on fetches and lifegain. That 7 life swing is worth 1 turn on average.
Red is the thinking mans color.
Red is the thinking mans color.
I agree, with my burn deck I think.more than I do with my spiral tide deck.
I tend to use Perl over Python strictly because I started with Perl earlier.
I'll refrain from my opinion on using MySQL.
So, after fixing it I did a couple more tests today. I tried decks with and without Probe. In all cases the decks with Probe underperformed compared to the decks without it but I didn't record the numbers because the pre probe points of comparison were with the bugged mulligan information. Regardless, it was a very big difference for probe (talking, losing half a turn at any land count) while the mulligan problem really only affected very low land counts.
After that I ran a couple thousand games at each land count using a ~55% fetch/45% fetchable manabase at all land counts from 2 to 30. (the percents come from earlier tests that showed it to be optimal). Then I thought the info would be a bit more presentable as a chart than a bunch of spammed numbers, so here it is.
http://imgur.com/MLvMQUF
With the list used which was heavy on 1's, few 2's, and no 3's, you can see that 19 lands was the sweet spot. What was interesting to me though was how quickly the win turn dropped as lands were added until it hit that minimum, and how slowly the average win turn rose past that point even though better cards were getting cut (the final 10 cuts were all bolt variants).
I'm running the same test again tonight but with Firecrafts over Magma Jets. One, it will give some idea on the value of each of those cards MB. And 2 it will give some data on a frequent game 2 configuration once we bring in 3 drops with a plan of casting them on time.
After these sets of tests I'll probably stop for a few days and change a few things in my program. There are certain tests I would like to run but can't because the right data isn't being recorded. Additionally, there's the issue of slow performance which needs to be corrected. I've done a little bit of that so far, increasing the speed from 3,000 to 60,000 games per hour, but Magic is a game of intense variance and it should realistically be able to run 1 million games per hour, or more since 86% of the programs runtime is currently tied up in the overhead for disk writes rather than actually executing code. Some restructuring of things should fix that.
I'm using Python because it's a language I'm comfortable with. It's not really a good language to do this in because of speed issues which is a major issue for me right now (been reading up on SQL optimization to fix that) but I didn't really want to use Java or C++... I get enough of them in school, both of which I know but am not very good with. Honestly, I'm not very good with Python either, but I can atleast get things done and I find it enjoyable to work with.
The database being used is SQLite, typo up above when I said MySQL, but the two are pretty similar.
The code is being updated occasionally and has a couple known bugs (and certainly a couple unknown) what I'll do is PM you a link to a pastebin with it tomorrow.
You'll have to deal with my bad comment habits though, some of it is commented but for the most part I'm just using the text from logging information that can be turned on/off as a type of comment so it should still be descriptive enough.
http://imgur.com/jkmJGX1
You'll notice that the fastest turn happens at 21 lands, but the curve back up from that minimum point happens much slower. The gap between 18 and 21 land is bigger than the gap between 21 and 28 land.
So again, this leads towards the idea that if you're not sure, go a little higher on lands.
Also worth noting is that the lowest time with Magma Jet was 4.94 turns while with Firecraft it was 4.89 turns, so Firecraft beats out Jet.
Edit: http://pastebin.com/ZzMMvsmC
That's the current code, the pastebin is good for a week, I set a time limit on it because I didn't want an old version of the code to be available in the future when it's updated. So if you want it, grab it now.
I don't think anyone has made the claim that SDT is going to speed up the deck. It decreases explosiveness but increases consistency. Also, "1.8xxxxx damage/turn" should not lead anyone to believe that cutting swiftspears is the right thing to do. GG is not "strictly better", they are strictly different cards.
Griefing aside, your brute force program is definitely interesting, although, I think the results should be taken with a few large grains of salt.
Happy coding!
And apparently I've changed my name: Ugh
No, I made that very argument a year or two ago, I was one of the earlier adopters of Top. I normally don't post in the Legacy section here though and stick to the Source. The theory that makes SDT worthwhile is that you can always cash it in for a card, and you get selection out of it. It sucks up excess mana but always cashes in for damage. I have a pretty elaborate spreadsheet that shows SDT being a benefit to speed. Obviously the theory behind that is wrong though based on these simulations.
No, that alone isn't reason to say Swiftspear is a bad card. It's lower damage than Goblin Guide but that doesn't necessarily mean it's not our next best option. I'm basing the conclusion that Swiftspear is a poor choice on the fact that every Swiftspear deck is underperforming compared to a non Swiftspear deck using Keldon Marauders in it's place. It's a result that I don't fully understand the why of (especially when my before mentioned spreadsheet calculates Swiftspear as being more damage than Guide), but I have over 1 million games of Magic at this point that are telling me Swiftspear is not a good choice.
I welcome criticism of it, it provides a basis for improvement. That said, I'm well aware of the limitations of such an approach, as I've said in the opening thread, I've tried this approach a couple other times and the resulting optimal decks weren't all that spectacular. That said, a more focused approach can reveal some valuable information.
----------------------------
Edit: Ran another simulation on a Probe vs non Probe deck.
The base list was
11 fetch
9 land
4 eidolon
4 swiftspear
4 goblin guide
4 price
4 fireblast
4 atarka's command
8 bolts
4 exquisite firecraft
Then it ran either 4 additional bolts or 4 probes
Each ran for 5,000 games and the probe deck came out ahead by about 0.03 turns. I then repeated it and it came out at about the same.
Next I decided to use the data to compare Swiftspear again. With the Probe deck Swiftspear dealt 23,192 damage over 11,262 turns it was on the board making it 2.059 power. With the Bolt deck Swiftspear dealt 21,593 damage over 11,108 turns making it 1.944 damage.
That's probably the last simulation I'll run short of an interesting suggestion until I make a few changes to the program. Originally I didn't build in turn tracking because I figured I could just take the rowID in a database query, but I've found a few places where it's useful so I'll be adding in some turn tracking. For the non technical explanation, this will let me link data such as hands on turn 1 to games that ended on turn 3 or 4 and that's something I think could be pretty useful to evaluate opening hands.
Are there any sideboard/MB cards people would like to see added that I'm not currently using? Given that I've been doing more focused testing I'm not as concerned with limiting the number of card types that a more broad approach testing all combinations requires because of math. Because of my approach not every card can be implemented (this program isn't DotP or MTGO) but if the card sounds useful enough or easy enough I'm willing to include it.
Right now Sulfuric Vortex sounds good. I would like to include Grim Lavamancer but because I'm not tracking the GY it's a bit bigger of an undertaking, though not completely impossible to implement, I would just have to add a GY. If I add a GY I could try out Barbarian Ring as well. If I do add Barbarian Ring I'll also add a pain index tracking how much damage we deal to ourselves per game, though the Price of Progress/Flame Rift category won't accurately add damage.
So to sum that up, here's the current to do list, roughly in order of priority:
1. Add turn tracking
2. Add Sulfuric Vortex
3. Add Barbarian Ring
4. Add Grim Lavamancer
5. Add opposing interaction
6. Add self damage
7. Separate Flame Rift/Price of Progress (relevant due to self damage) category
Most of these are simple (in fact, turn tracking is already done as of this edit)
Would there be any more cards/features people are interested in?
I think I need to redo my priority on casting certain cards though, Lavamancer, Probe, and Vortex have made things a little murky.
Most likely I'll be setting a single large test up tonight with one of the better performing lists, and then tomorrow making some links between opening hands/faster wins and opening hands/slower wins.
My approach has some flaws though since I'm starting to get some more slowdown in the program. Run time has doubled since I added the extra logic for those cards (down to about 500 games/minute) so I probably can't add anymore until I find some ways to optimize things (which I've been tinkering with).
The decks are also winning a little slowly, not quite sure what's up with that (it's probably just my card choices) but it's fine to just say one deck is faster than the other, so it's not really all that important.
I recently ran a few thousand games with some varying decklists. They were all some variation on this. Everything from 1-4 GLM, then some 2 GLM's with different land counts filling in more Burn as stuff came out.
11 Fetch
9 Fetchable land
2 Barbarian Ring
4 Eidolon of the Great Revel
4 Goblin Guide
3 Vexing Devil
2 Grim Lavamancer
4 Price of Progress
4 Fireblast
9 Lightning Bolt (or Lava Spike if you would prefer)
4 Gitaxian Probe
4 Exquisite Firecraft
Not the best list, I admit but I was looking at some specific things that those cards would show rather than going for raw speed.
As I said, it's slowing down so over the 7 decks I played 1000 with each deck for 7000 games total.
With the above deck the fastest wins were on turn 4, 13/1000 games had a turn 4 win.
Oddly enough, there wasn't a single 7 card hand that won that early, they were all mulligans. The average hand size for a turn 4 win was 5.92 and the average hand was
1.42 Fetch
0.50 Land
0.00 Barbarian Ring
0.50 Eidolon of the Great Revel
0.42 Vexing Devil
0.75 Price of Progress
0.08 Fireblast
1.42 Lightning Bolt
0.58 Exquisite Firecraft
0.25 Grim Lavamancer
So roughly 1.9 lands, .8 creatures, 2.9 burn spells, and anything else. So without the decimals that would imply the opening ideal hand on 6 cards is 2 land, 1 creature, 3 burn, and 1 of anything.
Opening the dataset up to all 7000 games there were 80 games which had a turn 4 win, and again there were no 7 card keeps and almost no 5 card keeps. It's interesting enough that I'll have to look into the why of this further as there were plenty of slower wins that kept 7 cards. The hands were all on the play so I don't think it has anything to do with going 8 cards deep with a mulligan scry on the draw. Because of the differing decklists I don't think average cards are all that useful for the whole dataset but there are a few trends that I see as peculiar. There were no Goblin Guides in any of the fastest hands and no Swiftspears (a couple lists had them). It was almost always looking for 2+ bolts, an additional 3 damage burn spell, and a 4 damage. Exquisite Firecraft featured rather prominently in the fastest hands and had a significantly higher presence than Fireblast. Lavamancer showed up from time to time, and when he was around Barbarian Ring never fired but I find it very odd that Lavamancer, who should in this situation be strictly worse than Goblin Guide would show up when the Guide made 0 appearances in thousands of games, my only explanation is that it's an example of variance with small sample sizes.
Some data on Lavamancer: He appeared in 11,295 turns and activated 2571 times for a grand total of 13,866 damage or 1.23 per turn.
Spoilered is my decklists out of code, it should be obvious what is what.
#Burn
self.fetchCount = 11
self.landCount = 9
self.ringCount = 2
self.eidolonCount = 4
self.hellsparkCount = 0
self.swiftspearCount = 0
self.guideCount = 0
self.marauderCount = 0
self.vexingCount = 4
self.riftCount = 6
self.fireblastCount = 4
self.atarkaCount = 0
self.incinerateCount = 0
self.boltCount = 16
self.jetCount = 0
self.topCount = 0
self.probeCount = 0
self.firecraftCount = 4
self.vortexCount = 0
self.lavamancerCount = 0
self.playGames()
#Sligh
self.fetchCount = 11
self.landCount = 11
self.ringCount = 0
self.eidolonCount = 4
self.hellsparkCount = 0
self.swiftspearCount = 4
self.guideCount = 4
self.marauderCount = 0
self.vexingCount = 0
self.riftCount = 4
self.fireblastCount = 3
self.atarkaCount = 2
self.incinerateCount = 0
self.boltCount = 8
self.jetCount = 0
self.topCount = 0
self.probeCount = 4
self.firecraftCount = 4
self.vortexCount = 0
self.lavamancerCount = 1
self.playGames()
Now what I really want to talk about here is the average opening hand, starting with the Burn deck. I ran 10,000 games with each deck. The simulator had no games that ended on turn 3 but several that ended on turn 4. Out of the turn 4 games the frequency of each card showing up in the opening hand was
2.35 Fetch
0.57 Land
0.10 Barbarian Ring
0.24 Vexing Devil
0.91 Flame Rift
0.20 Fireblast
2.06 Lightning Bolt
0.36 Exquisite Firecraft
So the best performing hand was 3 Land, 2 Bolts, and a 4 mana burn, and then a 1 of 3 or 4 mana burn. Or to put that another way, 3 lands 13 points of Burn.
With the Sligh deck there were fewer games. The hand information for them was
2.36 Fetch
0.57 Land
1.14 Flame Rift
0.07 Fireblast
0.29 Atarka's Command
1.64 Lightning Bolt
0.07 Gitaxian Probe
0.78 Exquisite Firecraft
0.07 Grim Lavamancer
So what you're looking for here is again 3 Land, and some 3 and 4 mana bolt spells. I find it extremely telling that the best performing hands in the creature deck involved no creatures at all and that even though my card choices were clearly pushing Eidolons, there were no Eidolons involved in the fastest decks.
Over on the Source one of the posters suggested that my report of average win turn isn't really telling the whole story and started asking for things like the standard deviation of turns. Instead I'll do one better especially because it's easier for me, and I'll provide a graph of the turns.
Do note, I cap the games at turn 12 so there will always be a spike on that turn.
First, here's the burn deck, the average was 6.02
http://imgur.com/lGbRbWf
Second, here's the sligh deck, the average was 7.57
http://imgur.com/vd97MKx
Based on this information I'm going to try putting a list together that's just the best performing cards.
One last thought, I find it very interesting that the fetch/land distribution was near identical in the fastest hands though it could just be coincidence.
Edit: Bug found, creatures other than Lavamancer weren't dealing damage after I added it. That explains a lot. Will post amended results later, leaving the rest because it's still relevant for straight burn lists.
In the case of the burn deck it came out to 14 points of burn in hand being the average in the winning decklists (vs the 13.5 last time). The best average hand was
1.30 Fetch
1.51 Land
0.15 Barbarian Ring
0.90 Eidolon
0.15 Vexing Devil
0.68 Flame Rift
0.32 Fireblast
1.49 Lightning Bolt
0.36 Firecraft
So an optimal speed hand is 3 lands, 1 creature, 1 3 damage burn, 1 4 damage burn, 1 misc burn
The average kill turn was 5.42 turns. There were still no turn 3 kills, though it should be theoretically possible. In chart form the turn curve looked like this:
http://imgur.com/8XcqvvP
With the Sligh deck there were a handful of turn 3 kills, but they were very rare. The average hand for turn 3's was
1.50 Fetch
1.75 Land
0.25 Swiftspear
1.25 Guide
0.25 Fireblast
0.75 Atarka
1.00 Lightning Bolt
0.25 Gitaxian Probe
So again, 3 lands, 2 1 drop creatures, 1 4 damage spell, 1 3 damage spell
The average hand for a 4 turn game was
1.34 Fetch
1.69 Land
0.73 Eidolon
0.12 Swiftspear
0.40 Guide
0.53 Flame Rift
0.28 Fireblast
0.28 Atarka's Command
0.85 Lightning Bolt
0.06 Probe
0.42 Firecraft
0.11 Lavamancer
So 3 land, 1.5 creatures, 1 4 damage burn, 1 3 damage burn, 1 misc
The average kill turn for this deck was 4.85 turns. The curve looks like
http://imgur.com/VIjfN2f
The data indicated that in this case, more lands were not necessarily better
http://imgur.com/66dNeNO
That's the graph at various land counts. The lower the land count got, the faster the deck got. I suspect part of this is due to Swiftspear being more powerful than a land
The base list had the following turn graph
http://imgur.com/Qgj5Neo
The fastest list looked like this
http://imgur.com/nacN4Wp
Across all 11 decks the average opening hand that won on turn 4 was
1.38 Fetch
1.47 Land
0.02 Swiftspear
0.34 Goblin Guide
1.02 Flame Rift/Price
0.34 Fireblast
1.95 Lightning Bolt
0.04 Exquisite Firecraft
0.26 Grim Lavamancer
0.17 Sulfuric Vortex
So rounding stuff off that's 3 Land, 1 Creature, 11 Burn as your ideal opening hand at 7.
The fastest hands on a mulligan to 6 were
0.96 Fetch
1.50 Land
0.01 Eidolon
0.03 Swiftspear
0.36 Guide
0.85 Rift
0.21 Fireblast
1.63 Bolt
0.05 Firecraft
0.25 Lavamancer
0.14 Vortex
So that's 2-3 Land, 1 Recurring threat (creature/vortex), and 10 points of Burn.
Basically the same thing as the 7 land.
Going down to 5 the data starts getting a little thin but the average optimal 5 was
0.58 Fetch
1.31 Land
0.02 Swiftspear
0.42 Guide
0.69 Rift
0.14 Fireblast
1.51 Bolt
0.05 Firecraft
0.22 Lavamancer
0.06 Vortex
Here you want 2 land, 1 creature, and 8 points of burn with no 3's.
So if you want to apply this information to mulligan decisions, a 6 with less than that would be better off as a 5. Interesting to note, 5's that won on the earliest turn possible were about half as common as 6's that won on the earliest turn possible, so that would suggest you want to mulligan pretty aggressively on 6.
I'll probably start trying to incorporate some of this mulligan information into my mulligan decision process both IRL and in this program once I have a bit more data.
The optimal build was 21 lands with 8 fetches. The next best build however was 22 lands with 16 fetches, and then at 22 lands as the fetch count decreased the deck speed decreased as well. I don't really have any rule of thumb to suggest here, but it looks pretty clear to me that the fetch count can be tweaked to make an optimal land count deck. If for example the optimal point is above 21 lands but below 22 lands, 22 lands with lots of fetches gets you closer to that point. The tradeoff is in mulligans, I've started tracking this number quite a bit and as the mulligan count increases the ratio of fetch:fetchable that you want to see in your hand changes. Obviously this only applies to Burn because we have little to no color fixing to worry about, but lands are much better than fetches when you go to 6 and 5.
If I could think of some good representative lists this is actually a project I would like to do. It's becoming very clear to me that the value of each card changes as the hand size changes. Creatures hold less value at 7 than they do at 5 (I think most had already figured this out), but the value of Burn spells also change and it appears the same is true of lands.
This most practical application of this is that the ideal list contains a mix of everything, so that you always have a chance of drawing to the higher value cards. I also suspect that there's some tuning that can be done here on the play/draw when sideboarding such as taking out one creature on the draw and bringing in an extra spell or taking out a spell for a creature on the play.
Going back to my old to do list
1. Add turn tracking
2. Add Sulfuric Vortex
3. Add Barbarian Ring
4. Add Grim Lavamancer
5. Add opposing interaction
6. Add self damage
7. Separate Flame Rift/Price of Progress (relevant due to self damage) category
Items 1, 2, 3, and 4 are done. 6 and 7 will be finished tonight (Barbarian Rings self damage might be a little tricky though). Opposing interaction requires a few more systems to be in place but the foundation is being laid, I don't know when it's going to happen but it will happen.
My current system can handle most cards, but not all. Recently I experimented with cantrips and they didn't perform very well. Are there any other cards people would like me to add keeping in mind that alternate costs and activated abilities aren't really doable right now? Though activated abilities are on my to do list so an interesting enough card could bump that up in priority.
Ok, so I figured out something else I can do with this dataset that's perhaps more useful than measuring games. I figured out that I can actually use it to measure cards, this is something of an offshoot from my previous few posts where I mentioned being able to track the average opening hand in the best performing lists. Using a little bit of math (and a spreadsheet to make life easy), it's possible to create a value that scores a card based on how much it over/under performs. In my system these results range from -1 (over performs the most) to 1 (big under performer).
I'll be breaking a rule I've tried to stick to with posting these simulator results, and I'll be giving more numbers than usual this time, mainly because I'm not quite sure of a good way to graph them. I started running various tests looking for more optimal lists. Currently I'm on the following list:
7 Fetch
16 Land
1 Barbarian Ring
4 Eidolon
4 Swiftspear
4 Goblin Guide
4 Vexing Devil
1 Grim Lavamancer
2 Price of Progress
2 Fireblast
1 Atarka's Command
9 Lightning Bolt
4 Exquisite Firecraft
1 Sulfuric Vortex
This list has an average turn of 5.19 which is down a bit from previous decks. But one thing you'll notice is that it's a bit creature heavy. While the new AI does limit this somewhat (it stops over extension) it's still something of a symptom of goldfishing rather than having real opponents.
The chart for the turns is
http://imgur.com/yuizCUc
A bit over 70% of games end on turn 5 or sooner.
Using the T3 information (about 1% of games) I'm able to generate the following score for each card in the deck. A lower number means the card is over performing relative to other cards in the deck, and a higher number means it under performs. It's a relative rather than absolute ranking, and should always sum to the average of 0 (minus rounding)
-0.75 Lightning Bolt
-0.47 Price of Progress
-0.43 Land
-0.23 Goblin Guide
-0.18 Atarka's Command
0.02 Grim Lavamancer
0.02 Sulfuric Vortex
0.07 Vexing Devil
0.12 Barbarian Ring
0.23 Fireblast
0.32 Fetch
0.37 Monastery Swiftspear
0.47 Eidolon of the Great Revel
0.47 Exquisite Firecraft
The T4 information changes a bit, which should be no surprise. As games go longer the value of each card is going to change.
-0.17 Land
-0.14 Eidolon
-0.11 Price of Progress
-0.05 Atarka's Command
-0.03 Bolt
-0.02 Guide
-0.01 Exquisite Firecraft
0.00 Fetch
0.02 Swiftspear
0.03 Barbarian Ring
0.04 Sulfuric Vortex
0.08 Fireblast
0.09 Grim Lavamancer
0.29 Vexing Devil
Finally, the T5 information
-0.36 Land
-0.19 Eidolon
-0.04 Fireblast
-0.03 Firecraft
-0.02 Fetch
-0.02 Ring
-0.01 Atarka's Command
0.00 Rift
0.00 Sulfuric Vortex
0.05 Swiftspear
0.06 Lightning Bolt
0.09 Grim Lavamancer
0.18 Guide
0.29 Vexing
Now, to turn all of those numbers into the useful information I'm sure you're waiting for I'm weighing each turn win by how common it is.
-0.05 Land
-0.04 Eidolon
-0.04 Price
-0.02 Atarka's
-0.01 Swiftspear
0.00 Bolt
0.00 Fetch
0.00 Firecraft
0.01 Barbarian Ring
0.01 Vortex
0.03 Fireblast
0.03 Lavamancer
0.09 Guide
0.09 Vexing
So we can see here that most cards are actually performing pretty close to the desired average. In a perfect world every card here would be near zero and that would indicate that the numbers are right on every card in the deck. We're not that far off though. I find this to be a pretty damn good case against Atarka's Command which is a card I've supported. It's just barely overperforming as a 1 of, but 2 gives the results that it's too many copies. Considering what this suggest the optimal fetch count is I see no way that the green splash is viable. Vexing Devil also looks to be slightly over rated at 4 copies. Removing one copy should increase the value of Guide and Lavamancer since they won't be competing as a T1 play. In exchange for that cut I'll add an additional price which is over performing and suggests it needs more copies. I'm hesitant to go up another land here and I think increasing the mana curve slightly will accomplish the same goal of balancing out the lands.
There's also a possibility that a 5th Eidolon in the form of a Pyrostatic Pillar could be a worthwhile addition over the Lavamancer. But, Pillar isn't in my simulator (yet) so that remains an unknown.
The biggest surprise to me here is that Exquisite Firecraft out performs Fireblast so consistently. With them both being 4 damage, I really thought Firecraft would be the weaker of the two.
What I'll probably do is run this simulation again on the draw (this was on the play) and figure out what should be done from there. And then make the suggested change and try again.
-------------------------------
Edit: On the draw results are in. They also suggest -1 Vexing Devil, +1 Land. -1 Lavamancer, +1 Bolt is also looking quite good. Both changes will be implemented for the next run... but for now it's time to add new card types, self damage, and lay the groundwork to have an opponent that interacts.
Keep doing the good work!
Cheeri0sXWU
Reid Duke's Level One
Who's the Beatdown
Alt+0198=Æ
Weird, I swore I wrote a response to this earlier. I did run a fetchless build. 16 fetches had the thinning ability of cutting about one land. So if you cut too many fetches you'll want to add a land. The average kill turn doesn't really change though because the optimal builds are using only a few fetches. This isn't the only build that works, but here's how it tunes what I've been working with.
7 Fetch
16 Mountain
2 Barbarian Ring
4 Eidolon
4 Swiftspear
4 Guide
1 Lavamancer
4 Price
2 Fireblast
11 Bolt
4 Firecraft
1 Vortex
Next up is adding an opponent. I've been writing my components to allow for an opponent without too much trouble but it's still a significant amount of work so that will keep me busy for some time.
P1 at 20 life, owns a tapped goblin guide
P2 at 18 life, owns a swiftspear
P2 would never attack in this scenario because the Guide represented a faster clock and it was losing the race, but at the same time it wouldn't block with the Swiftspear and throw it away so it should very clearly be attacking. Little things like that which make common sense to humans are difficult for computers to grasp. Even if the life totals were 8 and 6 this would remain true, so I didn't want to go the cheap route of simply always attacking if over X life. Eventually I figured something out though.
In the end, my AI is pretty simple but I think it has a solid strategy behind it. Where most AI's look for reasons to include cards in an attack/block, which leads to millions of permutations that have to be compared I took the opposite approach. Everything that can attack, attacks. Everything that can block, blocks. Once a card has been identified as a potential attacker/blocker I look for reasons to exclude it and if I don't find one, it's in combat. At a minimum I think it's good enough to beat novice players, and the beauty of my approach is that it only requires a quantity of games, not quality. 10,000 games worth of data generated by Finkel playing Budde is as good as 1,000,000 games of two brand new players, so simply generating more data makes up for any deficiencies in the AI's skill.
I was able to speed things up a bit too, it takes about 2 hours to run 100,000 games where previously that would take 5 hours.
The opposing player used is a generic Delver deck. For reasons I won't explain (PM me if you're really curious) certain cards are more difficult to program than others, so that limits my card pool a bit. Basically, on action triggers are still beyond my programs capabilities which excludes Young Pyromancer, in depth card evaluation is too time consuming (in particular shuffling away bad cards) which excludes Ponder and Brainstorm, and alternate costs are close to impossible to do in an elegant fashion which excludes Force of Will. This causes the test deck to be some weird hybrid that falls between Legacy and Modern legality. It also doesn't really impact testing because counters, removal, and blockers are still counters, removal, and blockers even if they exist at a lower quality, and really it's the interactions as they come up that are being measured. The "final" UR Delver list being used is
8 fetches
10 fetchable land
4 Monastery Swiftspear
4 Stormchaser Mage
4 Delver of Secrets
4 Flame Rift
2 Fireblast
6 Lightning Bolt
4 Gitaxian Probe
4 Preordain
4 Serum Visions
1 Exquisite Firecraft
1 Cryptic Command
3 Counterspell
2 Force Spike
After looking at the data, and filtering the results to only the games Burn won (it won ~67% of games btw), using the previous list I gave which was the optimal performing goldfish list we get the following results, looking only at T3/T4/T5 wins (9339 games over the initial dataset):
0.37 Fetch
-0.62 Land
-0.01 Ring
-0.13 Eidolon
0.15 Swiftspear
0.15 Goblin Guide
-0.06 Rift
0.01 Fireblast
0.05 Bolt
0.01 Firecraft
0.07 Lavamancer
-0.01 Vortex
So from this we can say that the big under performer was fetchlands, the card the deck most wanted was lands (note that it's already running 25, and this says that even more are still optimal), another Eidolon would be fantastic (already running 4, but maybe a Pyrostatic Pillar?), and the 1 drop hasted creatures are a little over represented.
So I think that for the next test I'm going to try -1 Fetch, -1 Swiftspear, -1 Lavamancer, +2 Land, +1 Eidolon (for now it will just be 5 Eidolons, I'll downgrade to a more accurate Pyrostatic Pillar in the future, the next time I add new cards to the pool, which will probably happen at the point that I really, really want Searing Blaze).
As an aside, for a bit of trivia, I expect Eidolon to consistently overperform in the future. In fact, my original test deck for the opposition was 40 lands, 20 Eidolons and it won 80% of games despite the mulligan code almost always forcing it to 4 card hands.
The list I have right now is not the optimized list against the opposing AI, because I think optimizing for a bad deck is a poor idea. Instead I'm just looking at various card values. With my current list (posted below) over 10,000 games Searing Blaze looked optimal at 4 cards (score of 0, so perfectly balanced).
Firecraft is looking pretty damn good as a mainboard 4 of as well. It does work out poorly in turn 3 games where it's over represented as a 4 of, but in the turn 4 games it's only slightly high and in turn 5 games it's under represented. Depending on the expected meta I could see an argument here to only play 3 MB if you expect things to be fast.
Guide and Swiftspear are starting to climb the charts of over representation now that there's an opponent with blockers and the ability to kill them. According to the numbers Guide is a little better than Swiftspear, but Prowess gets exponentially more powerful (just think about it, it's way better when you trigger on two creatures at once) as you put more of it in your deck so the real weaker card might be Guide.
Lands show up as under represented still.
Finally, here's the list and it's worth noting that this is the first list to go through my program (ever) that had more turn 4 wins than turn 5 wins, despite the fact that the opponent has blockers. Searing Blaze is just that good.
22 Mountain
2 Barbarian Ring
4 Eidolon of the Great Revel
4 Monastery Swiftspear
4 Goblin Guide
4 Price of Progress
2 Fireblast
4 Lightning Bolt
1 Rift Bolt (program actually just uses 5 Lightning Bolts right now)
4 Exquisite Firecraft
1 Sulfuric Vortex
4 Searing Blaze
4 Chain Lightning
One last comment, over the past few days I've been thinking about how to implement Bedlam Reveler because he's new, topical, and I bet a lot of people would like some extra data before Louisville in 5 months. Bad news. I'm not sure I can implement him. I'm going to try, but given my limited knowledge of AI programming I'm not sure how to write a variable cost. It's a really difficult concept to explain to a computer. I have some ideas, one is a really ugly way to implement it but would work. The other would probably require expanding my trigger system but would be a little more sustainable. If I implement it that's the direction I'm going to take.
If I'm being honest with myself though, the card I really want to know about is Collective Defiance. I like this card more in Modern but I think it has some SB potential here against Miracles and combo. Sadly, modal spells are way beyond me. Modal spells are 500x harder than variable mana costs. Spells with both of those combined are probably impossible given my current approach.
I ran a test earlier today with 1000 games/deck looking at the average win turn of the deck using every configuration from 17 lands to 34 lands. The results were very surprising. The fastest deck was at 22 lands, the slowest deck was at 23 lands, and all 17 decks rated within 0.15 turns of each other. This brings up some interesting questions about just how much land counts actually matter. The goldfish results certainly suggest that more lands are better, but the common wisdom is that fewer lands are better.
For now I'm a bit confused by this and am going to rerun the test with a much larger sample size and see if it gives similar results.
Here's the chart for the average win turn
http://imgur.com/a/ICk4P
As you can see it's almost completely flat, which suggests no functional difference due to land counts.
This could be because it just doesn't matter, or it could be a result of the card choices I made in each deck (My process started with the 17 land deck, which would be the lowest curve deck, and simply cut cards for lands rather than readjusting the entire mana curve as lands were added), or it could just be an example of the huge variance in the game.
Anyways with that bug caught and fixed I repeated some tests that I ran earlier before there was a second deck
I ended up testing 10,000 games per deck with every land range from 17 to 33. After testing the decks I went into each deck using my card evaluation spreadsheet, and tuned each list for something somewhat appropriate to it's land count. This isn't something I intend to do often because it's slow and error prone. After that I ran the 10,000 games again and looked at the results
http://imgur.com/Hoec9ku
There's a few weird things about this, I was expecting the slowest deck to be on the edge, either 17 or 33 lands and instead it was around 30 lands, with the 31+ land decks speeding up. It's really interesting (to me atleast) that 17 lands and 33 lands performed nearly equally. The other odd thing was the spike at 26, which happened to coincide with the 26 land deck losing nearly twice as many games as the other lists.
So with this in mind, I ran the tests again.
http://imgur.com/qnlr71I
The spike at 26 is still there, and so is the dropoff at 31+. I don't have explanations for those right now, but I'm pretty confident in them occurring. The other thing I'm pretty confident in is that while the optimal land count for a goldfish was at 26, it appears to be 23 against this deck. 21 and 19 lands also performed well. Realistically anything between 19 and 23 is probably ok. What seems to make 23 perform so well is Exquisite Firecraft.
I'll probably be focusing on the 23 land decks for awhile because the 1 drop haste beaters keep performing poorly and I think something a little higher curve using some Marauders might do better because Marauders performed very well in the decklists I included it in which were typically higher curve (a large portion of their win speed was attributable to Marauder I think)
Abbot of Keral Keep is something I would like to test out with 23 lands opposed to Swiftspear. It's going to take some recoding to make it work though because I currently play a land before any turn logic takes place. Alternatively, Hellspark Elemental might make it back in for testing.
I ran it through a battery of games, around 26,000 games. It performed ok but not exceptionally. Basically, I just took my test deck lists and stuck it in. 27 lands with 4 caves was about equal in speed to 22 lands without, 25 lands with 4 caves was just slower than 23 lands without, and 23 without was faster than 23 with.
It's possible the card just isn't good enough (which is certainly the opinion prior to me running these games), or it's possible I wasn't including the right cards with it (more Rings/Lavamancers). Whichever the case may be, I'll probably spend another day or two tinkering with it before moving on to another card I would like to implement.