The web-based card generator gives some strange results... is it normal for cards to say "Destroy target opponent"? Also, I'm getting pictures of magic cards within magic cards, as well as photos of Vietnam War-era protests for enchantments. It's a bit amusing, but I have the feeling that the neural network needs to learn a bit more.
The web-based card generator gives some strange results... is it normal for cards to say "Destroy target opponent"? Also, I'm getting pictures of magic cards within magic cards, as well as photos of Vietnam War-era protests for enchantments. It's a bit amusing, but I have the feeling that the neural network needs to learn a bit more.
That's not normal for the networks that I've been sampling from lately, not at reasonable temperatures anyway. I would need to see the training parameters of the network that you're sampling from as well as know what your sampling parameters are (e.g. temperature) in order to better answer your question. As for the art, I'm sure Croxis might be able to offer an explanation.
I have been using the default settings for the website and I set it to the highest epoch available right now. If you try it, you will probably see the interesting results that I am talking about.
Speaking of art, how's the image generation stuff going? I've been thinking about using Google's DeepDream trained on all existing magic cards, but I'm not sure that 15k images would be nearly sufficient, and getting those images is tricky without spamming Gatherer with requests. I'm reasonably sure I could write a python script to crop cards to the art only (but that would mess up something fierce on any non-standard cards). I'd tag the images with their supertypes and subtypes, since they usually give a pretty succinct description and act really well as tags, I think. But given that Google did this and their results for getting the network to generate pictures of common objects ended up like this, and they had millions of images, I don't even know it's worth trying with our limited MtG image selection, especially given the variance displayed between, say, all depictions of dragons.
edit: I found a relatively recent paper by Google where they test their DRAW neural network. On the 8th page they have examples of wildlife photos generated by DRAW, after being trained on 50,000 such photos. They look... convincing from a distance, but too blurry for our purposes. I think that seals my thought that doing this with the limits of MtG's 15,000 images wouldn't end that well. The annoying part is, I know there are hundreds of millions of fantasy art images around the web... if only I could collect, tag and size them all for a decent "fantasy image" training database. But that seems like a heinous task.
Lowest loss (the number after epoch on the checkpoints?) is 0.46
Let me know if there are better parameter options, or if you have any cpu checkpoints you want to share I can post them as well.
Here is the google search code I am using. It was written by Nafnlaus, it has nothing to do with neural nets other than using the card texts and color. If you copy image location and paste it here/pm it/github isssue it I'll be able to do a little bit more troubleshooting on it. There is some random magic happening so I might not be able to reproduce the particular image. Wont be today. Its my husband's birthday and I have my first gig tonight.
Maybe I should run all the images through deep dream
Private Mod Note
():
Rollback Post to RevisionRollBack
Proud to be saving the world since 1984 -- I also have an open source website to make AI generated magic cards. Source code
Speaking of art, how's the image generation stuff going? I've been thinking about using Google's DeepDream trained on all existing magic cards, but I'm not sure that 15k images would be nearly sufficient, and getting those images is tricky without spamming Gatherer with requests. I'm reasonably sure I could write a python script to crop cards to the art only (but that would mess up something fierce on any non-standard cards). I'd tag the images with their supertypes and subtypes, since they usually give a pretty succinct description and act really well as tags, I think. But given that Google did this and their results for getting the network to generate pictures of common objects ended up like this, and they had millions of images, I don't even know it's worth trying with our limited MtG image selection, especially given the variance displayed between, say, all depictions of dragons.
Right now the code I'm wanting to test (Vivanov's DRAW implementation) tells me "Could not connect to localhost:8172: closed" and then starts training but I get NaN results for training loss, which sounds like somewhere the code is calling some other code that wants to establish a connection and download the MNIST dataset but can't secure the port it wants (maybe). They say in the readme file that their code "works with the 28x28 MNIST dataset. You can adjust it to other datasets by changing A, N, and replacing the number '28' everywhere in the script. I haven't done it but it is possible." I'd like to make sure it works well enough on the MNIST dataset before I try rewriting the code to handle other, larger images. I'm sure I'll figure it out soon. Once I do, I can easily change things over and I already have all the card images I want to work with on the hard drive.
EDIT: One cool thing is that they provide dot files that detail the structure of the encoder and decoder networks that are generated by the program, but they're so massive that I can't render them very effectively. The picture I've attached is a diagram detailing the decoder network, but the cells are so numerous that everything is very tiny and hard to see, haha.
So slightly lower dropout and slightly larger network. I had to adjust the batch size because I was putting too much memory pressure on my GPU (evidently), so your training parameters will probably differ from mine according to your architecture. Of course, now that we can convert between GPU and CPU checkpoints, I should be able to convert my latest checkpoint to a CPU version for you to have. I'll look into that this evening. I trained for 22 epochs and had a loss at around 0.40.
EDIT(3): Maplesmall, DRAW can do much better work than those blurry images because the algorithm can be scaled up. The real question is how memory efficient it is (we'll need to see about that). If it doesn't work well enough for our purposes, we can find another approach, or just wait a year because I'm 100% certain they'll come up with something a thousand times better by that point, haha.
Interesting enough the results from JetBeans with Deep Dream is more promising. They used their simple, flat logo and it resulted in something much more interesting than running photos through deep dream.
Edit: "better_croxis" should start showing up in about a half hour using Talco's parameters. I'm not using a randomized source so that could be an additional factor too.
Followed your instructions and everything works fine until I try to generate a sample from a checkpoint. Using the same command you mentioned for the samples (tried different parameters too, same problem) I get the following error:
[0mcreating an lstm...[0m
[0mmissing seed text, using uniform probability over first character[0m
[0m--------------------------[0m
/home/erico/torch/install/bin/luajit: /home/erico/torch/install/share/lua/5.1/nn/Linear.lua:46: invalid arguments: DoubleTensor number DoubleTensor number FloatTensor DoubleTensor
expected arguments: *DoubleTensor~2D* [DoubleTensor~2D] [double] DoubleTensor~2D DoubleTensor~2D | *DoubleTensor~2D* double [DoubleTensor~2D] double DoubleTensor~2D DoubleTensor~2D
stack traceback:
[C]: in function 'addmm'
/home/erico/torch/install/share/lua/5.1/nn/Linear.lua:46: in function 'func'
/home/erico/torch/install/share/lua/5.1/nngraph/gmodule.lua:253: in function 'neteval'
/home/erico/torch/install/share/lua/5.1/nngraph/gmodule.lua:288: in function 'forward'
sample.lua:151: in main chunk
[C]: in function 'dofile'
...rico/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Tried with the Shakespear input and with Onderzeeboot's MTG input that's mentioned on Talco's post. Any idea of what's going on?
Followed your instructions and everything works fine until I try to generate a sample from a checkpoint. Using the same command you mentioned for the samples (tried different parameters too, same problem) I get the following error:
[0mcreating an lstm...[0m
[0mmissing seed text, using uniform probability over first character[0m
[0m--------------------------[0m
/home/erico/torch/install/bin/luajit: /home/erico/torch/install/share/lua/5.1/nn/Linear.lua:46: invalid arguments: DoubleTensor number DoubleTensor number FloatTensor DoubleTensor
expected arguments: *DoubleTensor~2D* [DoubleTensor~2D] [double] DoubleTensor~2D DoubleTensor~2D | *DoubleTensor~2D* double [DoubleTensor~2D] double DoubleTensor~2D DoubleTensor~2D
stack traceback:
[C]: in function 'addmm'
/home/erico/torch/install/share/lua/5.1/nn/Linear.lua:46: in function 'func'
/home/erico/torch/install/share/lua/5.1/nngraph/gmodule.lua:253: in function 'neteval'
/home/erico/torch/install/share/lua/5.1/nngraph/gmodule.lua:288: in function 'forward'
sample.lua:151: in main chunk
[C]: in function 'dofile'
...rico/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Tried with the Shakespear input and with Onderzeeboot's MTG input that's mentioned on Talco's post. Any idea of what's going on?
That's very bizarre. The error you are getting, if I'm understanding what I'm reading correctly, is that you are attempting to use a function meant to multiply two matrices and it looks like you are passing vectors instead, and it hates you for doing that. Unless someone else comes up with the answer first, I promise that I'll look into your problem this evening and try to figure it out.
EDIT: Quick question, what does it look like when you are training? Can you start up the training script and copy and paste what the output looks like?
@Talcos, what are the memory (therefore hardware) requirements to have DRAW draw MtG-sized images? I tried to find the DRAW code so I could try to run it, but it's either not available publicly or I suck at finding things.
Yeah, it's interesting to think about what we'll be using for neural networks in a year's time. It'll make today's stuff look like a child's toy probably.
Weird, I had a dream that the web UI was made much better. Then it got weird, cos my dreams always do.
The defeault temperature on the web ui is 70, and changing it did not appear to alter the results. Setting it to less than 1 was treated as a missing field.
And now I want to make cards with Quarter-Life: Halfway to Distruction references, such as 'It Will Explod' and 'Distruaction is Imminent'.
@Talcos, what are the memory (therefore hardware) requirements to have DRAW draw MtG-sized images? I tried to find the DRAW code so I could try to run it, but it's either not available publicly or I suck at finding things.
Yeah, it's interesting to think about what we'll be using for neural networks in a year's time. It'll make today's stuff look like a child's toy probably.
The version I saw was this one. As for the hardware requirements, I'm still not quite sure. I need to investigate that. What I meant was they were doing their work on very small images, and I wonder at what point the images become too large to be manageable using their technique, if such a point exists.
Ran the training script and this is what it looks like (no problems apparently): Parameters / Checkpoint
I used rnn_size as 1 because I just wanted a quick run to see if it would at least work, regardless of the quality of the results. If I manage to figure out the problems, I'll do a full run (didn't want to wait 7h just to have this problem show up). Also, I did run it with the default size using the parameters "th train.lua -data_dir data/Formatted -gpuid -1 -eval_val_every 3600", as suggested by Kinje and when I tried to generate a sample from one of the early checkpoints, I had the same problem.
Thanks for looking into this, the whole idea is really awesome and I really like your post. My motivation for getting this to work is trying to expand this to other games (Generating World of Warcraft skills comes to mind), though that's something I'll only consider once I get the MTG version actually working.
Ran the training script and this is what it looks like (no problems apparently): Parameters / Checkpoint
I used rnn_size as 1 because I just wanted a quick run to see if it would at least work, regardless of the quality of the results. If I manage to figure out the problems, I'll do a full run (didn't want to wait 7h just to have this problem show up). Also, I did run it with the default size using the parameters "th train.lua -data_dir data/Formatted -gpuid -1 -eval_val_every 3600", as suggested by Kinje and when I tried to generate a sample from one of the early checkpoints, I had the same problem.
Thanks for looking into this, the whole idea is really awesome and I really like your post. My motivation for getting this to work is trying to expand this to other games (Generating World of Warcraft skills comes to mind), though that's something I'll only consider once I get the MTG version actually working.
Oh, clever idea!
Send me a copy of the checkpoint (rmmilewi at gmail dot com), and I can try running it after dinner. Whether or not the problem shows up for me will help us narrow down the source of the problem.
EDIT: To the rest of you, I'll do some experiments with Meka (the multi-label version of Weka) to see about how well color filtering works on real cards. I'll let you know how that goes.
I had a quick look at Meka; it looks fascinating. How would one use it with a mtg dataset of all cards? I can't really tell from the documentation if that data has to be encoded in any special way or if I could use the json file with all cards, or our input.txt encoded version...
I had a quick look at Meka; it looks fascinating. How would one use it with a mtg dataset of all cards? I can't really tell from the documentation if that data has to be encoded in any special way or if I could use the json file with all cards, or our input.txt encoded version...
Oh if you like that, you'll love Weka all the more, it's one of my favorites. Meka is a spinoff that lets you do multi-label classification. I use Weka 99% of the time though. As for the input format, it'll be in the arff format. What I'll pass it are the content vectors followed by the labels, and then I can use the tool to train a classifier to map the input vector to the output labels.
It's a collection of creature stats that I gathered to do some tests on how Magic cards work (statistically speaking). I've attached a picture of me using Weka to use linear regression to calculate a CMC formula, and another one shows me using Weka to construct a decision tree to determine whether or not a creature ought to have lifelink based on all its other relevant stats. It's ever-so-much fun. And I can use Weka with virtually any kind of dataset, be it survey data, astronomical data, you name it.
EDIT: If you zoom in on the linear regression picture, you'll notice that flying is worth roughly 0.3414 CMC, and that blue and black creatures, on average, cost about 0.1399 and 0.1286 CMC more than creatures of other colors (all other things being equal). It's fun.
EDIT(2): By the way, I just made a personal discovery using that decision tree. Did you know that there has never been a creature with both hexproof and lifelink? Natively, anyway, ignoring cards like Soulflayer, who evidently is the only card in the game to mention both hexproof and lifelink by name in the same card. For shroud and lifelink, only Cairn Wanderer does that.
By the way, I'm getting some weird issues with the latest version of char-rnn. I couldn't train a network on the cpu due to NaN issues (that's new), and when I tried to convert a working GPU checkpoint to a CPU one, I got the same bug that ericomoura is experiencing. Like the CPU checkpoint is just broken and won't work. Has anyone else noticed anything similar?
EDIT(3): Erico confirmed for me that the latest version of char-rnn was what was causing his problems. Rolling back to a previous version eliminated the issues. That's something to look into.
EDIT(4): Almost got the Meka stuff working! I'm very curious about what I'll find. See attached picture. Much data. Very correlation. Wow. I'll let you know how things go.
PSA: after talking to Talcos via e-mail, he kindly offered to help me and we eventually reached the conclusion that the last version of char-rnn broke CPU generated checkpoints. The problem I was having disappeared after I tried using an earlier version. The broken version is the Aug 5th commit called "quick patch for converting GPU checkpoints to CPU checkpoints". I downloaded the one before that (Aug 1st) and it seems to be ok now.
Am I the only one imagining Wizards buying this off us the way they did with EDH/Commander and then using like IBM's Watson to run the code?
Because that would be awesome.
Wouldn't it though? Haha.
It's weird. Unlike many others (sidenote: to every corporate representative reading this now, I am truly flattered and honored by your attention), Wizards has made no attempt to contact me. No idea why. Not that I'm offended. Maybe it's just that they have a preference for flesh and blood designers. After all, I'm not the one designing the cards, the machine is. I just feed it and keep it happy.
---
So, I did some messing around with Meka. I computed the content vectors on every card (leaving out the mana cost part, as that would just be cheating), and attempted to train classifiers to guess what colors a card really was based on its content. The question was whether this was possible. The answer is yes, it looks like it.
Here are my findings thus far. I picked a classification strategy at random because I have no idea which ones work best for multi-label problems. I went with Bayesian classifier chains built on top of decision trees (that's BCC using the J48 algorithm, if you want the particulars). I tried to predict the colors of a card based on the content vector. Here are the stats, which I'll try to help make sense of.
Okay, so the takeaway message is that I was able to produce a classifier that could correctly identify whether a card was white, blue, black, red, or green about 80% of the time. However, the exact match accuracy was only 36.6%, that is, I only ever got all the colors right about that often.
Since only roughly 10% of cards are multicolored, that would suggest that we're overestimating the number of colors of cards, like we say a card is green but then also tack on white or red when we really shouldn't. Still, that our solution includes the right color 80% of the time is a great starting point.
I'll have to do more a thorough investigation, but I think we've established that color filtering is a viable strategy.
EDIT: For viewers at home, what all that means we are much closer to having the network churn out consistently good sets for drafting purposes. This kind of stuff is necessary for us to ensure that the cards we get out of the network are always of high quality.
Well there's no end product yet. Once we have a fully trained network that can generate valid cards with on-color ability sets, appropriate costs, etc. and minimal noise, it might be worth something to them (especially if that version of the network isn't available on a public website.)
Well there's no end product yet. Once we have a fully trained network that can generate valid cards with on-color ability sets, appropriate costs, etc. and minimal noise, it might be worth something to them (especially if that version of the network isn't available on a public website.)
Ah, well, I'm an academic, and "profit motive" is not part of my day-to-day vocabulary. Among my kind, we practice "une économie du don", that is, I do research and make it available, and you build on my research to do your own, and we both benefit reciprocally.
That's why it's all the code is open source, and I've made an effort to ensure that anyone and everyone can replicate the work. :-D
I imagine Wizards would pay even if they didn't strictly have to just to generate goodwill from the player base, but I'd still keep the final iteration proprietary pending an attempt at cutting a deal.
Trusting that others are going to be as decent as you are is how I got canned from my job for reporting that my female co-workers were sexually harassing me.
OK, I finally finished adding MSE2 exporting to hardcast_sixdrop's decode.py script! I've sent him the code and I assume it'll be in his git repo before too long (by tomorrow, I think).
All you have to do is use the -mse flag when exporting, for example:
In this case, it'll create a file called 'temp08.mse-set' because that's the output file name we gave it.
To open this, you obviously need Magic Set Editor 2. You'll also need the m15 templates which don't come as default (for some daft reason). Click that link after installing MSE2 and it'll automagically download and install the new templates. Then you should be able to open up the .mse-set file and have all sorts of fun.
Did you know that MSE2 has a statistics page? I didn't before now, but with 65k generated cards to analyse, it's all sorts of fun. It'll also auto-expand keywords (a bit overzealously, if a keyword is part of the name of the card).
As of now, the script assumes all cards are normal ones, so this doesn't work with planeswalkers or flip cards out of the box. HOWEVER, you can easily change the style of the card to a flip card or walker card simply by using the 'Style' tab in MSE2.
So go generate a few hundred cards from a checkpoint, export to a set file, and enjoy opening it in an actual card editor rather than inconvenient-to-read text Hopefully this means we get more awesome card images here.
I need to upload a bunch of fun ones from the web ui to imgur with comment. One complicating factor is that power/toughness isn't rendering on artifact creatures, so I'll have to add them manually.
Also, saving the images is a little tricky since I have to give them sane filenames (some are invalid so they can't be saved from a mobile browser, and most don't have the .png extension.) But it's doable.
So, last night with the color filtering I made the very mistake I had warned against and included colorless cards in the dataset. Most artifacts are colorless but do things associated with colors (like Tower of Calamities), and that means that artifacts are trick questions in the game of guessing colors. I also managed to include colorless Eldrazi cards, which was also a similar mistake. No rational intellect, be it man or machine, can hope to comprehend the Eldrazi. To have included them in the dataset was an exercise in futility.
I used the RAkEL1 classifier strategy with J48 trees underneath and got decent results. According to Tsoumakas and Vlahavas 2007, RAkEL is an ensemble method where we have lots of little classifiers that work together to try and predict the colors of the card. In this case, I made each little classifier a decision tree, so you have a monowhite expert, a monoblue expert, an Orzhov expert, a Jeskai expert, and so on. The different experts come together and vote on the colors of the card. The technique was originally designed to help automatically recognize different kinds of proteins in a sample, but I think it works well enough for us.
Since removing the colorless cards and using a better classification strategy, accuracy has gone up to 50.2% (up from 44.3%) accuracy, and per-label accuracy is still holding at around 80%. But I figure that we could do much, much better. The problem I'm seeing is that we're too eager to mark cards as multicolored.
So instead of asking "could this card qualify as white?", perhaps instead we should ask "how white is this card?" Because there are plenty of cards on the periphery of white that could possibly pass for green or blue. If you try to say that all white cards are equally white, you end up struggling with that noise. I'll look into that at some point.
In any case, what we have should work for now. I think it's powerful enough that we could identify a green Doom Blade as inappropriate for its colors, and that would help us a lot.
So the question is how do we use this, and the answer is quite simple. We can save the model produced by Meka for use in our code (at least Java code, I'm not sure about interoperability with other languages). We feed a program that loads our color classifier and a text file containing a corpus of cards, and we make a pass over the cards in the following way:
Load the vector model, card set, and classifier.
For each card in the card set:
Get the stated color(s) of the card from the mana cost.
Compute the vector representation of the card's text excluding the mana cost.
Give the vector to the classifier. The classifier gives back a set of colors.
If the colors match the card:
Keep the card as is!
Otherwise:
Either recolor the card or throw it away.
The next step is to do syntactic garbage filtering. Once we have that and the color filtering in place, I can make a pass over the card sets used to build our experimental set to clean up or remove any problem cards, and then we can rerun the set generator. The resulting set should be more coherent than that last one, which is exciting.
Lastly, it's been awhile since I shared any cards. Here are a few fresh off of the printing press:
Mystic Treefolk 4G
Creature - Treefolk (Uncommon)
Whenever an opponent casts a spirit or arcane spell, you may draw a card.
2/5
#It's spirit/arcane hate with a wording that only shows up on one real card, Ishi-Ishi, Akki Crackshot, with a Heartwood Storyteller sort of vibe. I know it's nothing fancy, but it's interesting to me all the same.
Decree of Touchs 3RR
Enchantment (Rare)
Whenever a creature enters the battlefield under your control, that creature gets +X/+X until end of turn, where X is its converted mana cost.
#I like this card. There's combo potential, but the most straightforward way to use it is to play creatures with haste or have a way to give creatures haste.
Sandstorm Dog 2RR
Creature - Phoenix (Special)
Flying
Whenever Sandstorm Dog deals combat damage to a creature, that damage is dealt to its controller instead.
When Sandstorm Dog leaves the battlefield, sacrifice a red creature.
3/3
#Whoops. Looks like I didn't get rid of the "special" rarity, which is reserved for cards like Nalathni Dragon.
Wild Enlightermancer 1WW
Creature - Human Soldier (Rare)
First strike
Miracle W(You may cast this card for its miracle cost when you draw it if it's the first card you drew this turn.)
2/2
#It's a permanent card with miracle! I included the reminder text in case you've forgotten about it.
Umbra Spray G
Instant (Common)
Destroy target artifact with converted mana cost 2.
#I love that the network thinks its being clever when really it just made a strictly worse Oxidize, not a green Spell Snare.
EDIT: I believe the bug Erico encountered last night was patched. Just an FYI.
That's not normal for the networks that I've been sampling from lately, not at reasonable temperatures anyway. I would need to see the training parameters of the network that you're sampling from as well as know what your sampling parameters are (e.g. temperature) in order to better answer your question. As for the art, I'm sure Croxis might be able to offer an explanation.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
edit: I found a relatively recent paper by Google where they test their DRAW neural network. On the 8th page they have examples of wildlife photos generated by DRAW, after being trained on 50,000 such photos. They look... convincing from a distance, but too blurry for our purposes. I think that seals my thought that doing this with the limits of MtG's 15,000 images wouldn't end that well. The annoying part is, I know there are hundreds of millions of fantasy art images around the web... if only I could collect, tag and size them all for a decent "fantasy image" training database. But that seems like a heinous task.
Lowest loss (the number after epoch on the checkpoints?) is 0.46
Let me know if there are better parameter options, or if you have any cpu checkpoints you want to share I can post them as well.
Here is the google search code I am using. It was written by Nafnlaus, it has nothing to do with neural nets other than using the card texts and color. If you copy image location and paste it here/pm it/github isssue it I'll be able to do a little bit more troubleshooting on it. There is some random magic happening so I might not be able to reproduce the particular image. Wont be today. Its my husband's birthday and I have my first gig tonight.
Maybe I should run all the images through deep dream
Right now the code I'm wanting to test (Vivanov's DRAW implementation) tells me "Could not connect to localhost:8172: closed" and then starts training but I get NaN results for training loss, which sounds like somewhere the code is calling some other code that wants to establish a connection and download the MNIST dataset but can't secure the port it wants (maybe). They say in the readme file that their code "works with the 28x28 MNIST dataset. You can adjust it to other datasets by changing A, N, and replacing the number '28' everywhere in the script. I haven't done it but it is possible." I'd like to make sure it works well enough on the MNIST dataset before I try rewriting the code to handle other, larger images. I'm sure I'll figure it out soon. Once I do, I can easily change things over and I already have all the card images I want to work with on the hard drive.
EDIT: One cool thing is that they provide dot files that detail the structure of the encoder and decoder networks that are generated by the program, but they're so massive that I can't render them very effectively. The picture I've attached is a diagram detailing the decoder network, but the cells are so numerous that everything is very tiny and hard to see, haha.
EDIT(2): Croxis, my last parameters were
So slightly lower dropout and slightly larger network. I had to adjust the batch size because I was putting too much memory pressure on my GPU (evidently), so your training parameters will probably differ from mine according to your architecture. Of course, now that we can convert between GPU and CPU checkpoints, I should be able to convert my latest checkpoint to a CPU version for you to have. I'll look into that this evening. I trained for 22 epochs and had a loss at around 0.40.
EDIT(3): Maplesmall, DRAW can do much better work than those blurry images because the algorithm can be scaled up. The real question is how memory efficient it is (we'll need to see about that). If it doesn't work well enough for our purposes, we can find another approach, or just wait a year because I'm 100% certain they'll come up with something a thousand times better by that point, haha.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Edit: "better_croxis" should start showing up in about a half hour using Talco's parameters. I'm not using a randomized source so that could be an additional factor too.
Tried with the Shakespear input and with Onderzeeboot's MTG input that's mentioned on Talco's post. Any idea of what's going on?
That's very bizarre. The error you are getting, if I'm understanding what I'm reading correctly, is that you are attempting to use a function meant to multiply two matrices and it looks like you are passing vectors instead, and it hates you for doing that. Unless someone else comes up with the answer first, I promise that I'll look into your problem this evening and try to figure it out.
EDIT: Quick question, what does it look like when you are training? Can you start up the training script and copy and paste what the output looks like?
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yeah, it's interesting to think about what we'll be using for neural networks in a year's time. It'll make today's stuff look like a child's toy probably.
The defeault temperature on the web ui is 70, and changing it did not appear to alter the results. Setting it to less than 1 was treated as a missing field.
And now I want to make cards with Quarter-Life: Halfway to Distruction references, such as 'It Will Explod' and 'Distruaction is Imminent'.
The version I saw was this one. As for the hardware requirements, I'm still not quite sure. I need to investigate that. What I meant was they were doing their work on very small images, and I wonder at what point the images become too large to be manageable using their technique, if such a point exists.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I used rnn_size as 1 because I just wanted a quick run to see if it would at least work, regardless of the quality of the results. If I manage to figure out the problems, I'll do a full run (didn't want to wait 7h just to have this problem show up). Also, I did run it with the default size using the parameters "th train.lua -data_dir data/Formatted -gpuid -1 -eval_val_every 3600", as suggested by Kinje and when I tried to generate a sample from one of the early checkpoints, I had the same problem.
Thanks for looking into this, the whole idea is really awesome and I really like your post. My motivation for getting this to work is trying to expand this to other games (Generating World of Warcraft skills comes to mind), though that's something I'll only consider once I get the MTG version actually working.
Oh, clever idea!
Send me a copy of the checkpoint (rmmilewi at gmail dot com), and I can try running it after dinner. Whether or not the problem shows up for me will help us narrow down the source of the problem.
EDIT: To the rest of you, I'll do some experiments with Meka (the multi-label version of Weka) to see about how well color filtering works on real cards. I'll let you know how that goes.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Oh if you like that, you'll love Weka all the more, it's one of my favorites. Meka is a spinoff that lets you do multi-label classification. I use Weka 99% of the time though. As for the input format, it'll be in the arff format. What I'll pass it are the content vectors followed by the labels, and then I can use the tool to train a classifier to map the input vector to the output labels.
Here's an example of a Weka data file (Meka is virtually identical except it allows you to have a field with multiple values in it, or however it is that Meka allows multi-labeled classification, I forget): https://drive.google.com/file/d/0BxF7G2b8kigCcWtUaHBsY2c5RmM/view?usp=sharing
It's a collection of creature stats that I gathered to do some tests on how Magic cards work (statistically speaking). I've attached a picture of me using Weka to use linear regression to calculate a CMC formula, and another one shows me using Weka to construct a decision tree to determine whether or not a creature ought to have lifelink based on all its other relevant stats. It's ever-so-much fun. And I can use Weka with virtually any kind of dataset, be it survey data, astronomical data, you name it.
EDIT: If you zoom in on the linear regression picture, you'll notice that flying is worth roughly 0.3414 CMC, and that blue and black creatures, on average, cost about 0.1399 and 0.1286 CMC more than creatures of other colors (all other things being equal). It's fun.
EDIT(2): By the way, I just made a personal discovery using that decision tree. Did you know that there has never been a creature with both hexproof and lifelink? Natively, anyway, ignoring cards like Soulflayer, who evidently is the only card in the game to mention both hexproof and lifelink by name in the same card. For shroud and lifelink, only Cairn Wanderer does that.
By the way, I'm getting some weird issues with the latest version of char-rnn. I couldn't train a network on the cpu due to NaN issues (that's new), and when I tried to convert a working GPU checkpoint to a CPU one, I got the same bug that ericomoura is experiencing. Like the CPU checkpoint is just broken and won't work. Has anyone else noticed anything similar?
EDIT(3): Erico confirmed for me that the latest version of char-rnn was what was causing his problems. Rolling back to a previous version eliminated the issues. That's something to look into.
EDIT(4): Almost got the Meka stuff working! I'm very curious about what I'll find. See attached picture. Much data. Very correlation. Wow. I'll let you know how things go.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Because that would be awesome.
Wouldn't it though? Haha.
It's weird. Unlike many others (sidenote: to every corporate representative reading this now, I am truly flattered and honored by your attention), Wizards has made no attempt to contact me. No idea why. Not that I'm offended. Maybe it's just that they have a preference for flesh and blood designers. After all, I'm not the one designing the cards, the machine is. I just feed it and keep it happy.
---
So, I did some messing around with Meka. I computed the content vectors on every card (leaving out the mana cost part, as that would just be cheating), and attempted to train classifiers to guess what colors a card really was based on its content. The question was whether this was possible. The answer is yes, it looks like it.
Here are my findings thus far. I picked a classification strategy at random because I have no idea which ones work best for multi-label problems. I went with Bayesian classifier chains built on top of decision trees (that's BCC using the J48 algorithm, if you want the particulars). I tried to predict the colors of a card based on the content vector. Here are the stats, which I'll try to help make sense of.
Bayesian Classifier Chains using Decision Trees
Classifier_name : meka.classifiers.multilabel.BCC
Classifier_ops : [-W, weka.classifiers.trees.J48, --, -C, 0.25, -M, 2, -X, Ibf]
Classifier_info :
Dataset_name : cardvecdata
Type : CV
Threshold : 1.0E-5
Verbosity : 3
N(test) : 1527.2 +/- 0.422
L : 5 +/- 0
Accuracy : 0.443 +/- 0.011
Hamming score : 0.799 +/- 0.005
Exact match : 0.366 +/- 0.012
Jaccard dist : 0.557 +/- 0.011
Hamming loss : 0.201 +/- 0.005
ZeroOne loss : 0.634 +/- 0.012
Harmonic score : NaN +/- NaN
One error : 0.588 +/- 0.015
Rank loss : 0.278 +/- 0.01
Avg precision : 0.556 +/- 0.004
Log Loss (max L) : 0.323 +/- 0.008
Log Loss (max D) : 1.472 +/- 0.037
F1 micro avg : 0.468 +/- 0.012
F1 macro avg, by ex. : 0.374 +/- 0.012
F1 macro avg, by lbl : 0.468 +/- 0.012
Percent no-labels : 0.277 +/- 0.009
Accuracy[0](white) : 0.796 +/- 0.01
Accuracy[1](blue) : 0.809 +/- 0.008
Accuracy[2](black) : 0.8 +/- 0.009
Accuracy[3](red) : 0.802 +/- 0.007
Accuracy[4](green) : 0.789 +/- 0.01
LCard_pred : 0.926 +/- 0.022
N_train : 13744.8 +/- 0.422
N_test : 1527.2 +/- 0.422
LCard_train : 0.963 +/- 0.002
LCard_test : 0.963 +/- 0.014
Build_time : 69.557 +/- 2.71
Test_time : 0.022 +/- 0.009
Total_time : 69.579 +/- 2.71
Okay, so the takeaway message is that I was able to produce a classifier that could correctly identify whether a card was white, blue, black, red, or green about 80% of the time. However, the exact match accuracy was only 36.6%, that is, I only ever got all the colors right about that often.
Since only roughly 10% of cards are multicolored, that would suggest that we're overestimating the number of colors of cards, like we say a card is green but then also tack on white or red when we really shouldn't. Still, that our solution includes the right color 80% of the time is a great starting point.
I'll have to do more a thorough investigation, but I think we've established that color filtering is a viable strategy.
EDIT: For viewers at home, what all that means we are much closer to having the network churn out consistently good sets for drafting purposes. This kind of stuff is necessary for us to ensure that the cards we get out of the network are always of high quality.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Ah, well, I'm an academic, and "profit motive" is not part of my day-to-day vocabulary. Among my kind, we practice "une économie du don", that is, I do research and make it available, and you build on my research to do your own, and we both benefit reciprocally.
That's why it's all the code is open source, and I've made an effort to ensure that anyone and everyone can replicate the work. :-D
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Trusting that others are going to be as decent as you are is how I got canned from my job for reporting that my female co-workers were sexually harassing me.
All you have to do is use the -mse flag when exporting, for example:
py decode.py lm_lstm_epoch20.00_0.3001.t7.output.temp08.txt temp08 -v -mse --norarity
In this case, it'll create a file called 'temp08.mse-set' because that's the output file name we gave it.
To open this, you obviously need Magic Set Editor 2. You'll also need the m15 templates which don't come as default (for some daft reason). Click that link after installing MSE2 and it'll automagically download and install the new templates. Then you should be able to open up the .mse-set file and have all sorts of fun.
Did you know that MSE2 has a statistics page? I didn't before now, but with 65k generated cards to analyse, it's all sorts of fun. It'll also auto-expand keywords (a bit overzealously, if a keyword is part of the name of the card).
As of now, the script assumes all cards are normal ones, so this doesn't work with planeswalkers or flip cards out of the box. HOWEVER, you can easily change the style of the card to a flip card or walker card simply by using the 'Style' tab in MSE2.
So go generate a few hundred cards from a checkpoint, export to a set file, and enjoy opening it in an actual card editor rather than inconvenient-to-read text Hopefully this means we get more awesome card images here.
Also, saving the images is a little tricky since I have to give them sane filenames (some are invalid so they can't be saved from a mobile browser, and most don't have the .png extension.) But it's doable.
Looks like mse2 is Windows-only
I used the RAkEL1 classifier strategy with J48 trees underneath and got decent results. According to Tsoumakas and Vlahavas 2007, RAkEL is an ensemble method where we have lots of little classifiers that work together to try and predict the colors of the card. In this case, I made each little classifier a decision tree, so you have a monowhite expert, a monoblue expert, an Orzhov expert, a Jeskai expert, and so on. The different experts come together and vote on the colors of the card. The technique was originally designed to help automatically recognize different kinds of proteins in a sample, but I think it works well enough for us.
Classifier_name : meka.classifiers.multilabel.MULAN
Classifier_ops : [-S, RAkEL1, -W, weka.classifiers.trees.J48, --, -C, 0.25, -M, 2]
Classifier_info :
Dataset_name : cardvecdata
Type : CV
Threshold : 0.5
Verbosity : 4
N(test) : 1299.9 +/- 0.316
L : 5 +/- 0
Accuracy : 0.502 +/- 0.009
Hamming score : 0.798 +/- 0.005
Exact match : 0.381 +/- 0.008
Jaccard dist : 0.498 +/- 0.009
Hamming loss : 0.202 +/- 0.005
ZeroOne loss : 0.619 +/- 0.008
Harmonic score : NaN +/- NaN
One error : 0.404 +/- 0.011
Rank loss : 0.226 +/- 0.004
Avg precision : 0.462 +/- 0.006
Log Loss (max L) : 0.346 +/- 0.004
Log Loss (max D) : 0.607 +/- 0.012
F1 micro avg : 0.566 +/- 0.01
F1 macro avg, by ex. : 0.544 +/- 0.01
F1 macro avg, by lbl : 0.566 +/- 0.01
Percent no-labels : 0.118 +/- 0.005
Accuracy[0]white : 0.793 +/- 0.008
Harmonic[0]white : 0.683 +/- 0.022
Precision[0]white : 0.542 +/- 0.029
Recall[0]white : 0.568 +/- 0.033
Accuracy[1]blue : 0.808 +/- 0.01
Harmonic[1]blue : 0.724 +/- 0.012
Precision[1]blue : 0.569 +/- 0.023
Recall[1]blue : 0.624 +/- 0.016
Accuracy[2]black : 0.799 +/- 0.01
Harmonic[2]black : 0.691 +/- 0.026
Precision[2]black : 0.557 +/- 0.026
Recall[2]black : 0.576 +/- 0.036
Accuracy[3]red : 0.805 +/- 0.012
Harmonic[3]red : 0.698 +/- 0.019
Precision[3]red : 0.563 +/- 0.03
Recall[3]red : 0.583 +/- 0.026
Accuracy[4]green : 0.786 +/- 0.009
Harmonic[4]green : 0.673 +/- 0.019
Precision[4]green : 0.526 +/- 0.025
Recall[4]green : 0.557 +/- 0.029
LCard_pred : 1.194 +/- 0.015
LCard_diff : -0.063 +/- 0.016
LCard_diff[0]white : -0.011 +/- 0.017
LCard_diff[1]blue : -0.022 +/- 0.01
LCard_diff[2]black : -0.008 +/- 0.012
LCard_diff[3]red : -0.008 +/- 0.012
LCard_diff[4]green : -0.014 +/- 0.017
N_train : 11699.1 +/- 0.316
N_test : 1299.9 +/- 0.316
LCard_train : 1.131 +/- 0.001
LCard_test : 1.131 +/- 0.011
Build_time : 144.338 +/- 2.031
Test_time : 0.058 +/- 0.018
Total_time : 144.396 +/- 2.024
Since removing the colorless cards and using a better classification strategy, accuracy has gone up to 50.2% (up from 44.3%) accuracy, and per-label accuracy is still holding at around 80%. But I figure that we could do much, much better. The problem I'm seeing is that we're too eager to mark cards as multicolored.
So instead of asking "could this card qualify as white?", perhaps instead we should ask "how white is this card?" Because there are plenty of cards on the periphery of white that could possibly pass for green or blue. If you try to say that all white cards are equally white, you end up struggling with that noise. I'll look into that at some point.
In any case, what we have should work for now. I think it's powerful enough that we could identify a green Doom Blade as inappropriate for its colors, and that would help us a lot.
So the question is how do we use this, and the answer is quite simple. We can save the model produced by Meka for use in our code (at least Java code, I'm not sure about interoperability with other languages). We feed a program that loads our color classifier and a text file containing a corpus of cards, and we make a pass over the cards in the following way:
The next step is to do syntactic garbage filtering. Once we have that and the color filtering in place, I can make a pass over the card sets used to build our experimental set to clean up or remove any problem cards, and then we can rerun the set generator. The resulting set should be more coherent than that last one, which is exciting.
Lastly, it's been awhile since I shared any cards. Here are a few fresh off of the printing press:
Mystic Treefolk
4G
Creature - Treefolk (Uncommon)
Whenever an opponent casts a spirit or arcane spell, you may draw a card.
2/5
#It's spirit/arcane hate with a wording that only shows up on one real card, Ishi-Ishi, Akki Crackshot, with a Heartwood Storyteller sort of vibe. I know it's nothing fancy, but it's interesting to me all the same.
Decree of Touchs
3RR
Enchantment (Rare)
Whenever a creature enters the battlefield under your control, that creature gets +X/+X until end of turn, where X is its converted mana cost.
#I like this card. There's combo potential, but the most straightforward way to use it is to play creatures with haste or have a way to give creatures haste.
Sandstorm Dog
2RR
Creature - Phoenix (Special)
Flying
Whenever Sandstorm Dog deals combat damage to a creature, that damage is dealt to its controller instead.
When Sandstorm Dog leaves the battlefield, sacrifice a red creature.
3/3
#Whoops. Looks like I didn't get rid of the "special" rarity, which is reserved for cards like Nalathni Dragon.
Wild Enlightermancer
1WW
Creature - Human Soldier (Rare)
First strike
Miracle W (You may cast this card for its miracle cost when you draw it if it's the first card you drew this turn.)
2/2
#It's a permanent card with miracle! I included the reminder text in case you've forgotten about it.
Umbra Spray
G
Instant (Common)
Destroy target artifact with converted mana cost 2.
#I love that the network thinks its being clever when really it just made a strictly worse Oxidize, not a green Spell Snare.
EDIT: I believe the bug Erico encountered last night was patched. Just an FYI.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.