Came across this thread a few days ago, after a couple weeks following the RoboRosewater twitter! Loving everything I've seen so far, and had moderate success using the RNN myself.
I'm currently 33 epochs into a size-512, 2-layer network(running on a VM so that's about as much as it'll handle) and the network is spitting out an alarming amount of cards with names that already exist, and in a few cases carbon copies of existing cards down to rarity. The training loss and validation loss are both between .15 and .2, but hardcast's tutorial(which I followed to get started) says training_loss generally starts to hover around .5. I assume this is related to this over(under?)fitting issue. How can I fix it? Does it have something to do with dropout? I haven't experimented with that parameter yet. What's the difference between, say, 0, .5, and 1.0 dropout?
Sorry if these questions have been answered earlier in the thread, I'm not much for reading dozens of forum pages
Thanks in advance!
Came across this thread a few days ago, after a couple weeks following the RoboRosewater twitter! Loving everything I've seen so far, and had moderate success using the RNN myself.
Welcome aboard! Always good to see someone new enjoying this technology.
I'm currently 33 epochs into a size-512, 2-layer network(running on a VM so that's about as much as it'll handle) and the network is spitting out an alarming amount of cards with names that already exist, and in a few cases carbon copies of existing cards down to rarity. The training loss and validation loss are both between .15 and .2, but hardcast's tutorial(which I followed to get started) says training_loss generally starts to hover around .5. I assume this is related to this over(under?)fitting issue. How can I fix it? Does it have something to do with dropout? I haven't experimented with that parameter yet. What's the difference between, say, 0, .5, and 1.0 dropout?
So, as far as choosing parameters goes, I have very little experience with 2-layer networks, but I'm just finishing up a bunch of work trying to optimize the training parameters for 3-layer networks. As in, I wrote an 8-page research paper for a class, and I'll post a link to the final version when I hand it in in a few hours.
You're massively overfitting if the network is spitting out copies of real cards. In general the way to prevent this is to increase the dropout. Essentially, dropout turns off a randomly selected fraction of neurons for each training batch, so the network can't become too reliant on particular connections. Dropout 0.25 turns of a quarter of connections, dropout 0.5 turns off half of them, and so on. So, dropout 1.0 would be a bad idea.
I'm actually pretty curious to see what the best parameters for 2-layer networks are, as I haven't used them much at all. With 3 layers, I've had the best success with size 768, dropout 0.5, and seq_length 200. Training loss is actually worse if you increase the dropout above 0, but other metrics indicate that this is very, very misleading about the quality of the output. This is all using the latest encoding format (explicit labels, name field last) as currently on GitHub.
Sorry if these questions have been answered earlier in the thread, I'm not much for reading dozens of forum pages
Thanks in advance!
No worries! It's a beast of a thread. I have some Chrome windows I have to remember not to close because they have tabs open to important pages, lol.
I think I'm the first person to do a hyperparameter optimization sweep, so if you want to know what the best training parameters all, you've come at exactly the right time. And if you just want to sample from existing checkpoints, I've trained something like 100 different networks in the past month. I'll try to organize and post as much of my work as possible.
EDIT:
Link to paper and to the poster I made a week ago on google drive.
Thanks for the reply and answers! I'm doing a size-256, 2-layer one with dropout 0.5 now as a test run, since it's a little faster to generate. Hoping this one works out a little better. I did get some good results from early checkpoints in the size-512 net though.
Golgari Ritual 2UU
Enchantment ~ Aura (rare)
Enchant land
Enchanted land has "T: add B or R to your mana pool. WW, T: Exile target artifact or enchantment. 2U: Put a % counter on Golgari Ritual.
At the beginning of your upkeep, you may return target creature to its owner's hand.
#This doesn't seem very Golgari, but maybe they've changed their ways. Gotta love that third ability too, really enables Thief of Blood.
Action of Shaid 1BB
Enchantment (rare)
Whenever an opponent discards a card, that player loses the game.
#Brutal. I'll take 4.
Firestorm 1R
Sorcery (common)
Firestorm deals 31 damage to target creature or player.
#Here we have the issue of repeating card names, but this one ups the ante a little bit. At least it's only sorcery speed, otherwise it'd be broken.
Pretty much everything you need to get started is here: https://github.com/billzorn/mtgencode. The decode.py script includes options to generate MSE sets (which was really interesting to code up, let me tell you) from card dumps, which is my favourite feature.
I trained lots of checkpoints to do my hyperparameter sweep; I have about 80 of various sizes that all use the most recent version of the encoding. I'll try to get them organized and post at least the good ones to Google Drive so others can use them. They're all GPU checkpoints, which are significantly smaller than CPU checkpoints because apparently the CUDA libraries do some compression (or just use float32 or something), so I think the best solution is to have people use the gpu-to-cpu feature that now comes with char-rnn. I'll post some documentation about how to use that in the mtgencode readme.
I also have a bunch of disorganized scripts, including an Ipython notebook for plotting data, and a huuuuuuuuuuuge amount of data that I couldn't fit into the paper I wrote. I'd like to make that available as well, as some of it is certain to be interesting. For example I just produced a bunch of dumps to compare what happens if we put the name in the first field as opposed to the last field of the encoding. I just have to run my analysis on it and then fiddle with the graphs until they're readable.
So yeah, whether you want to see more cards or know more about the best hyperparameters to use for training the networks, stay tuned over the next few days. I'll try to provide as much as I can, and document / automate my techniques so that others can reproduce my work and expand on it.
Well it's been awhile and I've still been churning out art for cards. Last time I did a batch I know some people were disappointed that I hadn't done any white creatures yet. Here's a batch of white creatures generated from Magicbox (my RNN).
Also worth note: Wall of Denium is the first legendary wall! Woo!
I had to do some rewiring of the image code to get it to behave as I wanted, and in the process I ran into a crippling bug in the underlying machine learning library. It was keeping me from making further progress with the image generation stuff for several days. Fortunately, a fix came out for the bug just the other day (just in time!). But things still aren't working the way they should.
The good news is that other people have been putting out implementations of the algorithms using other libraries (the one I was trying to use was written in Theano, but Torch and Chainer implementations just came out) , so if this doesn't work, something else will. I might try messing with the Torch implementation tonight, because in that one the CuDNN library is optional (in the last it was mandatory, and getting around that requirement was annoying).
I really, really, really want to get this working. Know why? Take a look at the image I've attached. Know what they have in common? They're all completely original characters. That's why I'm so interested.
EDIT: YES! Training! No idea if it will work, but it's not breaking apart! If and when I get anything out of all this, I'll be sure to share it with y'all.
EDIT(2): Okay, first attempt was a failure. But on the bright side, I know why. I fed all the card art to it and treated everything as if they belonged to the same category of image. Too much diversity. I need to split them into folders. Perhaps I'll just do creature art, and split on subtypes (elf, goblin, etc.). That way there are consistencies that it can latch onto.
I also experimented with some of the pre-trained models. As you can see, I can churn out novel bedrooms. Note that the network is trained on small versions of images, and if I ask for an image that's very large, reality starts to break down (see attached, upscaled slightly using waifu2x). Although bizarre, the results are very beautiful.
Talcos in your madness have you actually created a robot that is capable of creating anime characters?
Combined with the proper feed/training of manga scans and episode synopsis you could have entirely RNN written plot/characters for a manga now. That is pretty ridiculous.
Derailing topic for just random thought: Personally, I was kind of intrigued about using this to teach AI for various videogames. Instead of having AI cheat to compensate for its lack of skill, we can give it much more organic characteristics by recording the movements and gameplay of a human behind the controls. For example, let's talk about Racing game AI and "rubber-banding". You're cutting corners and taking shortcuts, the opponents are falling behind... the computer doesn't know how to compensate for your clearly superior skill, so it increases the driving speed/handling to speed up your competition when you're not looking.
Instead, let's say we record a player on a level, and you take checkpoints and record the movement and behavior of the player for 100 races. Then you take the times for various checkpoints.
Slower checkpoints = put into training network for "Easy AI"
faster checkpoints = put into training network for "Hard AI"
Is there really too much of a risk of overtraining? You would just end up with a ghost-race at that point; the end result should be instructions on how to be a good, organic racer instead of a clunky robot trying to Race By Math.
But back to the topic of magic-cards, couldn't we use something like this to make better AI for magic the gathering pc games? You record however many thousands of games and the actions taken, it's got to be able to figure out some sort of connection of "how to play a deck" and how to recognize/identify and respond to a threat. I know I may be giving the RNN too much credit but there is a very weird logic hidden in there behind all of this, and I keep getting surprised by these results you come up with.
Combined with the proper feed/training of manga scans and episode synopsis you could have entirely RNN written plot/characters for a manga now. That is pretty ridiculous.
Well, we're not quite there yet, but we're getting there. If you ask for more detail in the images, they start becoming distorted. If you ask for a long plot, it loses track of where it's going. It's all very dream-like: hazy, insubstantial, and unstable. But as I've said before, I think that maintaining lucidity is an engineering problem that we'll overcome in the future.
Derailing topic for just random thought: Personally, I was kind of intrigued about using this to teach AI for various videogames. Instead of having AI cheat to compensate for its lack of skill, we can give it much more organic characteristics by recording the movements and gameplay of a human behind the controls.
Well, Deepmind showed it was possible for DNNs to learn to play Atari games, and people are continuing to put out papers on that very subject, both on human-assisted learning as well as independent learning. What you're suggesting is definitely a thing.
Is there really too much of a risk of overtraining? You would just end up with a ghost-race at that point; the end result should be instructions on how to be a good, organic racer instead of a clunky robot trying to Race By Math.
Possibly, but I think there's a happy medium. It also depends on the amount of non-determinism involved in the game. It's not really a ghost race if the choices can't be known in advance.
But back to the topic of magic-cards, couldn't we use something like this to make better AI for magic the gathering pc games? You record however many thousands of games and the actions taken, it's got to be able to figure out some sort of connection of "how to play a deck" and how to recognize/identify and respond to a threat. I know I may be giving the RNN too much credit but there is a very weird logic hidden in there behind all of this, and I keep getting surprised by these results you come up with.
It's possible. My concern wouldn't be with whether or not you could teach a system to play Magic in that way (you definitely can), but there's a need to have it develop knowledge that's highly transferable because the game is constantly evolving. That part, I think, is the more interesting and important challenge.
---
Btw, I've been on the road for quite a long time today and am very tired. I might settle in for the evening. I'll see about restarting the training again tomorrow after I divide up the images into categories. I feel that I'm very close to having novel Magic art, so that's exciting.
EDIT: On the subject of derailment, I have totally unrelated plug. The chances that this will be relevant to any of you are slim, but I promised a very nice Italian colleague of mine that I'd shamelessly advertise a conference at every opportunity that I got: the paper submission system is open for the WETICE'16 conference, which will be held in Paris in July! Specifically, I'm on the programming committee for the verification track, so if you have any ideas about how to prevent smart technologies from breaking, failing, or trying to kill us all, I welcome the contributions.
I was just being a bit silly and facetious my post, but in all seriousness I'm just excited with all the attention these networks are getting and being developed/researched further. Keep us posted on the magic art, I'm really interested to see how that goes. I've been really enjoying some of the re-imaginings with influence of other artists' styles.
Hmm. I'm still working out the details of the training (using the torch version). I tried training on just images of Islands, and right now I'm getting results that are a bluish mess. I'll have to fine-tune the parameters.
The setup is an adversarial one in which we have a generator that makes images and a discriminator that tries to distinguish artificial/genuine images. The idea is that you pit them against each other and it leads to an arms race (a very elaborate and drawn-out game of knifey spoony), and over time the generator gets better and better at making novel, interesting images.
Right now they're reaching a point where they're at a standstill - a situation neither knows how to outsmart the other - and the losses are still too high at that point to have reasonable images. I'll see about fixing that.
Depending on the data you use and the parameters you choose, the results can vary greatly. As an example, I've attached results from a low-end pretrained network that generates faces (it studied celebrity photographs). They're very unfortunate looking people, but they are recognizable as people. Moreover, they aren't just permutations of celebrity faces, they're novel images. However, the network's definition of what constitutes a person may need some improvement.
You can do lots of fun things with the generator. For example, you can morph between different generated images (see attached).
EDIT: I would be failing you if I didn't share this awesomeness. CloudCV put out a demo for their visual question answering system, and it's too much fun. Give it an image and ask any question you'd like. I believe it was trained on photographs, but it works well on illustrations. I've attached some examples. It can be hit or miss sometimes, but the ability to move between understanding natural language and understanding images is an impressive feat. I really want this technology on a phone app.
And yes, the correct answers were "blue", "orange", and "bananas". But hey, the answers it gave weren't bad, right? (I kid)
I've been trying to use mtg-rnn to train larger sized RNNs, and I keep running into a problem.
System:
AMD R9 390X 8GB
Ubuntu 15.10 with the proprietary fglrx driver
Obviously this system is using OpenCL. I pretty much followed the directions for installing Torch and other programs on the mtg-rnn repository. I installed clBLAS 2.8.0 from the Git repository as opposed to OpenBLAS. I installed clBLAS mostly because OpenBLAS didn't recognize my CPU, an AMD A10-7870, and stopped compiling. I could have manually forced a CPU with OpenBLAS, but feeling adventurous, I installed clBLAS instead.
Anyway, I have started trying to train three layer RNNs with a size of 1024, and both times so far, the training has aborted because the loss began exploding. Both times the training failed one batch after writing a checkpoint. Smaller RNNs are not a problem, I have been able to train a half a dozen or so that have run to completion.
With the command "th train.lua -gpuid 0 -opencl 1 -rnn_size 1024 -num_layers 3 -seq_length 200 -dropout 0.4 -batch_size 50 -checkpoint_dir cv/gpu0-1024-3-200-0.4-50 -eval_val_every 1000 -seed 7444767"
I got this output:
saving checkpoint to cv/gpu0-1024-3-200-0.4-50/lm_lstm_epoch45.83_0.2370.t7
11000/12000 (epoch 45.833), train_loss = 0.26385448, grad/param norm = 1.4507e-02, time/batch = 1.6307s
11001/12000 (epoch 45.837), train_loss = inf, grad/param norm = 3.6744e-02, time/batch = 1.5413s
loss is exploding, aborting.
The batches leading up to the previous checkpoint file for the RNN were:
9997/12000 (epoch 41.654), train_loss = 0.28550015, grad/param norm = 1.5829e-02, time/batch = 1.7759s
9998/12000 (epoch 41.658), train_loss = 0.27752975, grad/param norm = 1.5546e-02, time/batch = 1.5405s
9999/12000 (epoch 41.663), train_loss = 0.28964216, grad/param norm = 1.4669e-02, time/batch = 1.5531s
The increase in training loss occurs to a smaller degree the batch after saving a checkpoint in the run and during the other run that aborted. I assume this is related to saving the checkpoint.
The increase in training loss occurs to a smaller degree the batch after saving a checkpoint in the run and during the other run that aborted. I assume this is related to saving the checkpoint.
Interesting.
I've noticed a similar problem in a few of my networks, where training will experience a sudden spike in the loss. Sometimes it dies down immediately, other times it ends up crippling the training, though it recovers somewhat after a few epochs. I had not noticed a link between when this happened and when it saved the checkpoints, I'll have to pay more attention when I train a new network.
1024 is a pretty ambitions size. I was able to do it by scaling back the batch size a bit (otherwise it wouldn't fit on my 6GB Titan) but I kept getting memory errors in the Torch framework that would break my training after a few thousand batches for anything above size 768. Your problem seems unrelated, but it's entirely possible that some tiny bug or hardware error could cause a massive failure and throw everything off.
What happens if you try to resume training on the checkpoint right before the explosion, say with a different seed?
I had to do some rewiring of the image code to get it to behave as I wanted, and in the process I ran into a crippling bug in the underlying machine learning library. It was keeping me from making further progress with the image generation stuff for several days. Fortunately, a fix came out for the bug just the other day (just in time!). But things still aren't working the way they should.
The good news is that other people have been putting out implementations of the algorithms using other libraries (the one I was trying to use was written in Theano...
What was the bug and what was the fix? As an avid Theano user, knowing this would be a huge help!
I had to do some rewiring of the image code to get it to behave as I wanted, and in the process I ran into a crippling bug in the underlying machine learning library. It was keeping me from making further progress with the image generation stuff for several days. Fortunately, a fix came out for the bug just the other day (just in time!). But things still aren't working the way they should.
The good news is that other people have been putting out implementations of the algorithms using other libraries (the one I was trying to use was written in Theano...
What was the bug and what was the fix? As an avid Theano user, knowing this would be a huge help!
In retrospect, part of the problem may have been my fault. But the bug I was getting changed after I updated Theano, which makes me think it was related to this issue. I was having a weird problem when trying to take the gradient of a convolution.
By the way, no luck yet with getting the image generator training to work for the small datasets; losses just.. stagnate very early on, and nothing I'm doing seems to change that fact. Like when I try to train the generator on island art, I end up with it just producing variations of a blue crosshatch pattern combined with some sort of mask. It's not noise, it looks to be a purposeful construction, because evidently this is sufficient to fool the discriminator into thinking its looking at an island artwork. I'll have to go back in later and do some more tweaking.
I've noticed a similar problem in a few of my networks, where training will experience a sudden spike in the loss. Sometimes it dies down immediately, other times it ends up crippling the training, though it recovers somewhat after a few epochs. I had not noticed a link between when this happened and when it saved the checkpoints, I'll have to pay more attention when I train a new network.
I've seen that before as well, but this seems a little different and more regular.
I was able to get a 1024 size RNN to complete 50 epochs, I used a dropout of 0.4 so the training loss was higher and the jump in training loss after saving a checkpoint does not exceed three times the previous batch. The run that managed to avoid aborting used a dropout of 0.4. Lower dropout values are where it has failed. The attached graph is the training losses from that run. The graph has a logarithmic scale and the spikes occur with regularity, once every 1000 batches.
1024 is a pretty ambitions size. I was able to do it by scaling back the batch size a bit (otherwise it wouldn't fit on my 6GB Titan) but I kept getting memory errors in the Torch framework that would break my training after a few thousand batches for anything above size 768. Your problem seems unrelated, but it's entirely possible that some tiny bug or hardware error could cause a massive failure and throw everything off.
With my hardware, if I try something too ambitious, mtg-rnn bails at the "cloning criterion" step, and never runs any training batches. I suppose this saves some time.
What happens if you try to resume training on the checkpoint right before the explosion, say with a different seed?
I tried this with the checkpoint immediately preceding the explosion with both the same seed and a different seed and training seems to continue, but the training losses seem to be unusually high, initially. I tried resuming the same RNN with a checkpoint 1000 batches earlier with the same seed. The training losses end up being different than the following batches in the original run.
By the way, I do have a neural style transfer artwork to share. You may not know this, but the posts I make on this thread are actually only a fraction of the communications I have regarding this project. I get a steady stream of e-mails and messages from all sorts of people asking questions. I gave an example of the neural style transfer on pencil sketches and ended up with a result that was very beautiful and felt I had to share. I took a pencil sketch by the artist Zindy Nielsen and remixed it with Air Elemental. I'm especially fond of the second result because it gave her blonde hair unexpectedly.
@Melted_Rabbit: Still not sure about what's causing those spikes. I may have to look into that later. How strange.
I feel like the AI here would be better applied to the building of decks across the formats. This way, you would be able to have humans test the output in a meaningful way and have metrics to compare against. With the daily influx of tournaments, you would have endless results to further refine deck building.
I don't know about anyone else, but that would excite me more.
I haven't done anything with this yet but I may want to. I'll have to read through the instructions again when I'm more awake. I just wanted to comment on something I found very strange looking at the cards here: It's odd that LASture's RNN is so different from Talcos's. LASture's seems to be able to match flavor better than MTG wording. All the cards make sense, but a few are misworded -- but the card name, color, effect, types, etc. always match well.Whereas with Talcos's RNN, the card could be worded perfectly but the flavor totally off. Are you using different scripts somehow or is the AI just trained differently?
Though I thought LASture was editing the wording...? Maybe the remaining errors are human error. But in that case the cards are otherwise perfect.
I feel like the AI here would be better applied to the building of decks across the formats. This way, you would be able to have humans test the output in a meaningful way and have metrics to compare against. With the daily influx of tournaments, you would have endless results to further refine deck building.
I don't know about anyone else, but that would excite me more.
The focus of the thread thus far over the path six-and-a-half months (wow, how time flies!) has been computational creativity, but it's also very interesting to discuss the other applications of the myriad algorithms that we've been tossing around.
There are many ways that you could go about developing a deck-building AI, be it purely procedural, a stochastic black-box model, or some mix of the two. I'd bet my money on a hybrid model. While I'm all for making use of deep learning systems, a symbolic approach has a lot to offer in spite of its drawbacks, and I'm not one to throw out the baby with the bathwater.
For example, you can automate goldfish testing by turning it into a state space search problem, and from that you can get out lower and upper bounds on the performance of a deck in a vacuum. If you reverse engineer decklists, you can come up with a model that tells you that Heritage Druid and Elvish Archdruid go well together, both because they show up frequently in decklists, and (as a generalization) lords go well with other creatures like themselves. However, formal modeling of the rules of the game allows us to make more precise comparisons between decklists, and to take into account curves, the extent of synergies, average number of turns needed to win under ideal conditions, et cetera.
As an aside, for those of you are knowledgeable about what I'm suggesting, yes, something something combinatorial explosion of the state space something something the dark heart of intractability. I know. It's the subject of my dissertation research, lol.
Anyway, all the same, I agree with you that you'd still need data from human players to develop a model of what cards contribute the most to victory against this or that deck; we need to model the metagame. However, the challenge is not just to do that (and it can be done), but also to do it well, and I foresee several interesting obstacles. First and foremost, whatever knowledge the system cultivates, that knowledge has to be highly transferable, insofar as the game is constantly evolving. You'd want to be able to anticipate potential metagames as well as recognize them. With that in mind, I think the greatest problem is not the main deck but the prediction and fine-tuning of the sideboard. It's a Keynesian beauty contest.
There is some method to automatically draw cards? without the image without it perfect. I was searching for a mtg card maker that can accept terminal input for program it but I have not found anything...
The mtgencode repository has code that allows you to take text produced by the network and turn it into a Magic Set Editor file. Is that what you're looking for?
I haven't done anything with this yet but I may want to. I'll have to read through the instructions again when I'm more awake. I just wanted to comment on something I found very strange looking at the cards here: It's odd that LASture's RNN is so different from Talcos's. LASture's seems to be able to match flavor better than MTG wording. All the cards make sense, but a few are misworded -- but the card name, color, effect, types, etc. always match well.Whereas with Talcos's RNN, the card could be worded perfectly but the flavor totally off. Are you using different scripts somehow or is the AI just trained differently?
Though I thought LASture was editing the wording...? Maybe the remaining errors are human error. But in that case the cards are otherwise perfect.
First, if by my cards you mean the ones on Twitter, most of what you're seeing are the hilarious results of earlier iterations of the network.
If you mean the ones that I've posted here over time, I'd say that our results are more alike than they are different. Now, it may be the case that LASture's networks and mine were trained under different parameters, and this can affect the result, but keep in mind that cards you are seeing were deliberately selected. Sometimes the cards have great flavor and poor wording, or great wording and poor flavor. The network can churn out thousands of cards very rapidly (more than we could ever hope to share here), so some filtering and selection is done when we present results. That and the machine tries to imitate Magic cards, and most Magic cards are ho-hum limited fodder; a great deal of the results are technically perfect but not worth mentioning.
Hello again all! Happy holidays! I just wanted to let you know that I figured something out regarding the image generation process after having stepped away from it for a few days.
Recall that we have two parts to this system. The first is the generator, which tries to make fake artwork. The second is the discriminator, which tries to distinguish real artwork from fake artwork. They go back and forth and test each other, each getting better and better. Over time, the generator starts to make artwork that is reasonable.
Here's what I didn't think about: the generator doesn't actually get to see what it's supposed to imitate (otherwise it'd just make clones of the original artworks). Instead, it gets feedback on how convinced the discriminator was that its work was actual Magic art. I didn't realize how long that process would take, so I've been cutting the training short and getting garbage results. So I ran the script for a little longer this time on just the island data set, and after about 80000 iterations I started to get reasonable results. You can start to see everything coming together.
I didn't run it for as long as I should have, had to cut things short. But I'll look into running the process for even longer tonight and see how things go.
Meanwhile, I've also been looking into how neural style transfer can improve upon the works. I've chose an anime girl that I generated to work with. As you can see, the results are a little questionable when the small images are upscaled, but there are ways we can work around that. The first is to clean up the image by using a service like waifu2x, a deep neural network that does smart upsampling.
From there we can use actual artwork that matches the style we want and use neural style transfer to borrow the fine details. In the example I've attached, you can see how the style transfer network adds in texturing to the hair. I ran things with default parameters; the final result will look better with more fine-tuning. I'll have to tweak the process a little bit to get the kind of results we'll need for Magic, but we can figure that out.
----
@Patofet: Sorry that it took me so long to get back to you. But I'm still not quite sure about what you're referring to? Do you want the original art for Magic cards? I have a dataset of low resolution versions of those.
So, I ran some tests last night on the set of all elf artworks in Magic. Interestingly enough, when I run the Torch code put out by the authors, I can't get the networks to converge on anything more than noise. However, when using someone else's implementation written in Chainer, I get the desired results. So weird. I'll have to look into that. Anyway, I sampled the results periodically during training so I could see what was going on. I noted seven distinct phases in the evolution of the artwork (see attached examples):
#0: Initially, the generator outputs purely random noise.
#1: Distinct regions of color appear, though the color choices are random.
#2: The color palette begins to resemble a forest setting.
#3: Patterns of leaf-like striations begin to emerge.
#4: Smooth patches of green for leaves and grass, occasional columns of brown appear. Trees?
#5: Indistinct forms can be seen moving through the mist.
#6: The forms are starting to look humanoid. Sometimes we see Picasso-esque faces peering at us from a distance. We're not alone.
Now, they do look like elves now and then. Most of the time though they look like horrific abominations. Here's an example of two that I got that I retouched by darkening the bodies in photoshop and upscaling it so you could see the details better.
Many of their faces look like misshapen bronze that opens up into some kind of sharp, gaping mouth. Their bodies are less distinct, made of what looks like moss and wood. I swear, half the time they come out looking like something you'd see shambling down a dark hallway in Silent Hill.
Once again, this is to be expected because the generator isn't actually shown what elves are supposed to look like; it's reverse engineering the physiology of elves by studying the responses of the discriminator. Now, in theory, the discriminator should wise up to the deception, making the pseudo-elves subject to natural selection. In the tests I ran last night, however, everything seemed to stagnate around this point. I think it's because there weren't enough feature maps in the network (the networks I ran were very small). With the Torch version I can beef up the network at the push of a button, but everything is hardcoded in the Chainer version - I'll have to do some tinkering later.
In the meantime, I asked RoboRosewater for a black and green elf horror and got...
Glowbrack Muse 1BG
Creature - Elf Horror (Rare)
At the beginning of the end step, you may draw a card for each non-Wall creature put into your graveyard this turn.
2/2
Adding a dash of generated flavor and we get the first card that we have ever generated where the card, art, and flavor are totally original. So that's progress! Hopefully soon I'll be able to generate something other than monstrosities.
EDIT: Oh, and for those of you who are interested to see the style transfers, there's one I did recently on reddit that might interest you. Someone said their dog looked like a Van Gogh painting, so I did a restyle of the dog as Van Gogh's Starry Night and the results came out looking amazing.
I was wondering, could you try fusing different terms into slivers? Like different names or creature types that generally means that the creature in question has X ability or an ability with Y format, such as Eldrazi, or Phyrexian, or different words/names which generally have an effect associated with them when they are used in a card name. I'm asking because I've become rather interested in Slivers as of late due to taking part in a quest that basically just about a sliver hive in warhammer fantasy.
Edit: also that completely generated card is pretty cool, but how useful it is depends on how we interpret it, this turn could mean either the turn the card was played, which means if you play it and then deliberately get lots of creatures killed, then you would be drawing lots of cards each turn, or it could mean on the current turn, so then you would draw cards equal to the deaths on the current turn instead
I was wondering, could you try fusing different terms into slivers? Like different names or creature types that generally means that the creature in question has X ability or an ability with Y format, such as Eldrazi, or Phyrexian, or different words/names which generally have an effect associated with them when they are used in a card name. I'm asking because I've become rather interested in Slivers as of late due to taking part in a quest that basically just about a sliver hive in warhammer fantasy.
Well, the result isn't quite what you're interested in, but it can be done.
Elf slivers are all green, eldrazi slivers tend to be big and colorless, etc. The problem is that the combination of creature types usually results in body text relevant to one or the other type being present but not both. So if I ask for a sliver spellshaper, I get results like
|mysterious sliver||creature||sliver spellshaper|O|&^/&^|{^GG}|sliver creatures you control get +&^/+&^.\whenever you cast a spirit or arcane spell, you may pay {^^^}. if you do, draw a card.|
|mysterious sliver||creature||sliver spellshaper|O|&^/&^|{^WW}|{WW}, T: target creature gains first strike until end of turn.|
|mysterious sliver||creature||sliver spellshaper|A|&^/&^|{^UU}|{UU}, T, discard a card: draw a card.|
Or if I want dragon slivers, I get...
|mysterious sliver||creature||dragon sliver|N|&^/&^|{^RR}|all sliver creatures have "{^^}: this creature gets +&^/+& until end of turn."|
|mysterious sliver||creature||dragon sliver|N|&^^/&^^|{^RR^^}|flying\{RR^}: @ gains flying until end of turn.|
So some have body text like one subtype and some have body text like another. This has to do with the fact that the network is sensitive to the order in which text has to appear in a card and if text from each subtype is supposed to appear in the same place, one usually wins out over the other.
EDIT: I'm getting closer. The elves are looking less ugly and more elf-like. Skin tones are coming in, and I'm even starting to see some hints of clothes and weapons. Of course, I'm on guard for overfitting as we get closer to the target. I'll let things run overnight and see how it turns out. If it works, then we can see about expanding the input to incorporate more diverse art.
EDIT(2): Not quite what I wanted. The discriminator stopped getting smarter, so the generator started overfitting. It changed gears and ended up producing clones of a single elf figure, standing dead center of the frame and looking straight at the viewer, set against different backgrounds. I'll get it figured out though.
I will say this though: in the long run, an adversarial approach may not be the best solution (there are competing alternatives). Or, at the very least, an adversarial approach might just be only a piece of the solution. Training this way is very inefficient; it takes many hundreds of thousands of iterations to start seeing good results because the generator can only learn about what it's supposed to produce indirectly. On the other hand, the results can be very robust. What you get is a mapping from vectors to images, and you can do arithmetic on the vectors to get different images. For example, v(man with glasses) - v(man) + v(woman) == v(woman with glasses). That and the images are unique and not just clones of the training set.
By the way, I did see someone put out something fun. They designed a image generation process using Tensorflow that produces vector graphics rather than raster graphics. That is, it's thinking in terms of lines and curves rather than pixels. That makes it easier to draw complex sketches because it's scale-independent.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I'm currently 33 epochs into a size-512, 2-layer network(running on a VM so that's about as much as it'll handle) and the network is spitting out an alarming amount of cards with names that already exist, and in a few cases carbon copies of existing cards down to rarity. The training loss and validation loss are both between .15 and .2, but hardcast's tutorial(which I followed to get started) says training_loss generally starts to hover around .5. I assume this is related to this over(under?)fitting issue. How can I fix it? Does it have something to do with dropout? I haven't experimented with that parameter yet. What's the difference between, say, 0, .5, and 1.0 dropout?
Sorry if these questions have been answered earlier in the thread, I'm not much for reading dozens of forum pages
Thanks in advance!
You're massively overfitting if the network is spitting out copies of real cards. In general the way to prevent this is to increase the dropout. Essentially, dropout turns off a randomly selected fraction of neurons for each training batch, so the network can't become too reliant on particular connections. Dropout 0.25 turns of a quarter of connections, dropout 0.5 turns off half of them, and so on. So, dropout 1.0 would be a bad idea.
I'm actually pretty curious to see what the best parameters for 2-layer networks are, as I haven't used them much at all. With 3 layers, I've had the best success with size 768, dropout 0.5, and seq_length 200. Training loss is actually worse if you increase the dropout above 0, but other metrics indicate that this is very, very misleading about the quality of the output. This is all using the latest encoding format (explicit labels, name field last) as currently on GitHub. No worries! It's a beast of a thread. I have some Chrome windows I have to remember not to close because they have tabs open to important pages, lol.
I think I'm the first person to do a hyperparameter optimization sweep, so if you want to know what the best training parameters all, you've come at exactly the right time. And if you just want to sample from existing checkpoints, I've trained something like 100 different networks in the past month. I'll try to organize and post as much of my work as possible.
EDIT:
Link to paper and to the poster I made a week ago on google drive.
Golgari Ritual 2UU
Enchantment ~ Aura (rare)
Enchant land
Enchanted land has "T: add B or R to your mana pool.
WW, T: Exile target artifact or enchantment.
2U: Put a % counter on Golgari Ritual.
At the beginning of your upkeep, you may return target creature to its owner's hand.
#This doesn't seem very Golgari, but maybe they've changed their ways. Gotta love that third ability too, really enables Thief of Blood.
Action of Shaid 1BB
Enchantment (rare)
Whenever an opponent discards a card, that player loses the game.
#Brutal. I'll take 4.
Firestorm 1R
Sorcery (common)
Firestorm deals 31 damage to target creature or player.
#Here we have the issue of repeating card names, but this one ups the ante a little bit. At least it's only sorcery speed, otherwise it'd be broken.
I also have a bunch of disorganized scripts, including an Ipython notebook for plotting data, and a huuuuuuuuuuuge amount of data that I couldn't fit into the paper I wrote. I'd like to make that available as well, as some of it is certain to be interesting. For example I just produced a bunch of dumps to compare what happens if we put the name in the first field as opposed to the last field of the encoding. I just have to run my analysis on it and then fiddle with the graphs until they're readable.
So yeah, whether you want to see more cards or know more about the best hyperparameters to use for training the networks, stay tuned over the next few days. I'll try to provide as much as I can, and document / automate my techniques so that others can reproduce my work and expand on it.
Also worth note: Wall of Denium is the first legendary wall! Woo!
The good news is that other people have been putting out implementations of the algorithms using other libraries (the one I was trying to use was written in Theano, but Torch and Chainer implementations just came out) , so if this doesn't work, something else will. I might try messing with the Torch implementation tonight, because in that one the CuDNN library is optional (in the last it was mandatory, and getting around that requirement was annoying).
I really, really, really want to get this working. Know why? Take a look at the image I've attached. Know what they have in common? They're all completely original characters. That's why I'm so interested.
EDIT: YES! Training! No idea if it will work, but it's not breaking apart! If and when I get anything out of all this, I'll be sure to share it with y'all.
EDIT(2): Okay, first attempt was a failure. But on the bright side, I know why. I fed all the card art to it and treated everything as if they belonged to the same category of image. Too much diversity. I need to split them into folders. Perhaps I'll just do creature art, and split on subtypes (elf, goblin, etc.). That way there are consistencies that it can latch onto.
I also experimented with some of the pre-trained models. As you can see, I can churn out novel bedrooms. Note that the network is trained on small versions of images, and if I ask for an image that's very large, reality starts to break down (see attached, upscaled slightly using waifu2x). Although bizarre, the results are very beautiful.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Combined with the proper feed/training of manga scans and episode synopsis you could have entirely RNN written plot/characters for a manga now. That is pretty ridiculous.
Derailing topic for just random thought: Personally, I was kind of intrigued about using this to teach AI for various videogames. Instead of having AI cheat to compensate for its lack of skill, we can give it much more organic characteristics by recording the movements and gameplay of a human behind the controls. For example, let's talk about Racing game AI and "rubber-banding". You're cutting corners and taking shortcuts, the opponents are falling behind... the computer doesn't know how to compensate for your clearly superior skill, so it increases the driving speed/handling to speed up your competition when you're not looking.
Instead, let's say we record a player on a level, and you take checkpoints and record the movement and behavior of the player for 100 races. Then you take the times for various checkpoints.
Slower checkpoints = put into training network for "Easy AI"
faster checkpoints = put into training network for "Hard AI"
Is there really too much of a risk of overtraining? You would just end up with a ghost-race at that point; the end result should be instructions on how to be a good, organic racer instead of a clunky robot trying to Race By Math.
But back to the topic of magic-cards, couldn't we use something like this to make better AI for magic the gathering pc games? You record however many thousands of games and the actions taken, it's got to be able to figure out some sort of connection of "how to play a deck" and how to recognize/identify and respond to a threat. I know I may be giving the RNN too much credit but there is a very weird logic hidden in there behind all of this, and I keep getting surprised by these results you come up with.
Well, not me specifically. But if you're referring to the scientific community as a whole, then yes. Yes we have.
Well, we're not quite there yet, but we're getting there. If you ask for more detail in the images, they start becoming distorted. If you ask for a long plot, it loses track of where it's going. It's all very dream-like: hazy, insubstantial, and unstable. But as I've said before, I think that maintaining lucidity is an engineering problem that we'll overcome in the future.
Well, Deepmind showed it was possible for DNNs to learn to play Atari games, and people are continuing to put out papers on that very subject, both on human-assisted learning as well as independent learning. What you're suggesting is definitely a thing.
Possibly, but I think there's a happy medium. It also depends on the amount of non-determinism involved in the game. It's not really a ghost race if the choices can't be known in advance.
It's possible. My concern wouldn't be with whether or not you could teach a system to play Magic in that way (you definitely can), but there's a need to have it develop knowledge that's highly transferable because the game is constantly evolving. That part, I think, is the more interesting and important challenge.
---
Btw, I've been on the road for quite a long time today and am very tired. I might settle in for the evening. I'll see about restarting the training again tomorrow after I divide up the images into categories. I feel that I'm very close to having novel Magic art, so that's exciting.
EDIT: On the subject of derailment, I have totally unrelated plug. The chances that this will be relevant to any of you are slim, but I promised a very nice Italian colleague of mine that I'd shamelessly advertise a conference at every opportunity that I got: the paper submission system is open for the WETICE'16 conference, which will be held in Paris in July! Specifically, I'm on the programming committee for the verification track, so if you have any ideas about how to prevent smart technologies from breaking, failing, or trying to kill us all, I welcome the contributions.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
The setup is an adversarial one in which we have a generator that makes images and a discriminator that tries to distinguish artificial/genuine images. The idea is that you pit them against each other and it leads to an arms race (a very elaborate and drawn-out game of knifey spoony), and over time the generator gets better and better at making novel, interesting images.
Right now they're reaching a point where they're at a standstill - a situation neither knows how to outsmart the other - and the losses are still too high at that point to have reasonable images. I'll see about fixing that.
Depending on the data you use and the parameters you choose, the results can vary greatly. As an example, I've attached results from a low-end pretrained network that generates faces (it studied celebrity photographs). They're very unfortunate looking people, but they are recognizable as people. Moreover, they aren't just permutations of celebrity faces, they're novel images. However, the network's definition of what constitutes a person may need some improvement.
You can do lots of fun things with the generator. For example, you can morph between different generated images (see attached).
EDIT: I would be failing you if I didn't share this awesomeness. CloudCV put out a demo for their visual question answering system, and it's too much fun. Give it an image and ask any question you'd like. I believe it was trained on photographs, but it works well on illustrations. I've attached some examples. It can be hit or miss sometimes, but the ability to move between understanding natural language and understanding images is an impressive feat. I really want this technology on a phone app.
And yes, the correct answers were "blue", "orange", and "bananas". But hey, the answers it gave weren't bad, right? (I kid)
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
System:
AMD R9 390X 8GB
Ubuntu 15.10 with the proprietary fglrx driver
Obviously this system is using OpenCL. I pretty much followed the directions for installing Torch and other programs on the mtg-rnn repository. I installed clBLAS 2.8.0 from the Git repository as opposed to OpenBLAS. I installed clBLAS mostly because OpenBLAS didn't recognize my CPU, an AMD A10-7870, and stopped compiling. I could have manually forced a CPU with OpenBLAS, but feeling adventurous, I installed clBLAS instead.
Anyway, I have started trying to train three layer RNNs with a size of 1024, and both times so far, the training has aborted because the loss began exploding. Both times the training failed one batch after writing a checkpoint. Smaller RNNs are not a problem, I have been able to train a half a dozen or so that have run to completion.
With the command "th train.lua -gpuid 0 -opencl 1 -rnn_size 1024 -num_layers 3 -seq_length 200 -dropout 0.4 -batch_size 50 -checkpoint_dir cv/gpu0-1024-3-200-0.4-50 -eval_val_every 1000 -seed 7444767"
I got this output:
saving checkpoint to cv/gpu0-1024-3-200-0.4-50/lm_lstm_epoch45.83_0.2370.t7
11000/12000 (epoch 45.833), train_loss = 0.26385448, grad/param norm = 1.4507e-02, time/batch = 1.6307s
11001/12000 (epoch 45.837), train_loss = inf, grad/param norm = 3.6744e-02, time/batch = 1.5413s
loss is exploding, aborting.
The batches leading up to the previous checkpoint file for the RNN were:
9997/12000 (epoch 41.654), train_loss = 0.28550015, grad/param norm = 1.5829e-02, time/batch = 1.7759s
9998/12000 (epoch 41.658), train_loss = 0.27752975, grad/param norm = 1.5546e-02, time/batch = 1.5405s
9999/12000 (epoch 41.663), train_loss = 0.28964216, grad/param norm = 1.4669e-02, time/batch = 1.5531s
saving checkpoint to cv/gpu0-1024-3-200-0.4-50/lm_lstm_epoch41.67_0.2435.t7
10000/12000 (epoch 41.667), train_loss = 0.28061297, grad/param norm = 1.7637e-02, time/batch = 1.6315s
10001/12000 (epoch 41.671), train_loss = 0.67649105, grad/param norm = 3.7121e-02, time/batch = 1.5392s
10002/12000 (epoch 41.675), train_loss = 0.26575626, grad/param norm = 1.4743e-02, time/batch = 1.5426s
The increase in training loss occurs to a smaller degree the batch after saving a checkpoint in the run and during the other run that aborted. I assume this is related to saving the checkpoint.
I've noticed a similar problem in a few of my networks, where training will experience a sudden spike in the loss. Sometimes it dies down immediately, other times it ends up crippling the training, though it recovers somewhat after a few epochs. I had not noticed a link between when this happened and when it saved the checkpoints, I'll have to pay more attention when I train a new network.
1024 is a pretty ambitions size. I was able to do it by scaling back the batch size a bit (otherwise it wouldn't fit on my 6GB Titan) but I kept getting memory errors in the Torch framework that would break my training after a few thousand batches for anything above size 768. Your problem seems unrelated, but it's entirely possible that some tiny bug or hardware error could cause a massive failure and throw everything off.
What happens if you try to resume training on the checkpoint right before the explosion, say with a different seed?
In retrospect, part of the problem may have been my fault. But the bug I was getting changed after I updated Theano, which makes me think it was related to this issue. I was having a weird problem when trying to take the gradient of a convolution.
By the way, no luck yet with getting the image generator training to work for the small datasets; losses just.. stagnate very early on, and nothing I'm doing seems to change that fact. Like when I try to train the generator on island art, I end up with it just producing variations of a blue crosshatch pattern combined with some sort of mask. It's not noise, it looks to be a purposeful construction, because evidently this is sufficient to fool the discriminator into thinking its looking at an island artwork. I'll have to go back in later and do some more tweaking.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I've seen that before as well, but this seems a little different and more regular.
I was able to get a 1024 size RNN to complete 50 epochs, I used a dropout of 0.4 so the training loss was higher and the jump in training loss after saving a checkpoint does not exceed three times the previous batch. The run that managed to avoid aborting used a dropout of 0.4. Lower dropout values are where it has failed. The attached graph is the training losses from that run. The graph has a logarithmic scale and the spikes occur with regularity, once every 1000 batches.
With my hardware, if I try something too ambitious, mtg-rnn bails at the "cloning criterion" step, and never runs any training batches. I suppose this saves some time.
I tried this with the checkpoint immediately preceding the explosion with both the same seed and a different seed and training seems to continue, but the training losses seem to be unusually high, initially. I tried resuming the same RNN with a checkpoint 1000 batches earlier with the same seed. The training losses end up being different than the following batches in the original run.
@Melted_Rabbit: Still not sure about what's causing those spikes. I may have to look into that later. How strange.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I don't know about anyone else, but that would excite me more.
Though I thought LASture was editing the wording...? Maybe the remaining errors are human error. But in that case the cards are otherwise perfect.
The focus of the thread thus far over the path six-and-a-half months (wow, how time flies!) has been computational creativity, but it's also very interesting to discuss the other applications of the myriad algorithms that we've been tossing around.
There are many ways that you could go about developing a deck-building AI, be it purely procedural, a stochastic black-box model, or some mix of the two. I'd bet my money on a hybrid model. While I'm all for making use of deep learning systems, a symbolic approach has a lot to offer in spite of its drawbacks, and I'm not one to throw out the baby with the bathwater.
For example, you can automate goldfish testing by turning it into a state space search problem, and from that you can get out lower and upper bounds on the performance of a deck in a vacuum. If you reverse engineer decklists, you can come up with a model that tells you that Heritage Druid and Elvish Archdruid go well together, both because they show up frequently in decklists, and (as a generalization) lords go well with other creatures like themselves. However, formal modeling of the rules of the game allows us to make more precise comparisons between decklists, and to take into account curves, the extent of synergies, average number of turns needed to win under ideal conditions, et cetera.
As an aside, for those of you are knowledgeable about what I'm suggesting, yes, something something combinatorial explosion of the state space something something the dark heart of intractability. I know. It's the subject of my dissertation research, lol.
Anyway, all the same, I agree with you that you'd still need data from human players to develop a model of what cards contribute the most to victory against this or that deck; we need to model the metagame. However, the challenge is not just to do that (and it can be done), but also to do it well, and I foresee several interesting obstacles. First and foremost, whatever knowledge the system cultivates, that knowledge has to be highly transferable, insofar as the game is constantly evolving. You'd want to be able to anticipate potential metagames as well as recognize them. With that in mind, I think the greatest problem is not the main deck but the prediction and fine-tuning of the sideboard. It's a Keynesian beauty contest.
The mtgencode repository has code that allows you to take text produced by the network and turn it into a Magic Set Editor file. Is that what you're looking for?
First, if by my cards you mean the ones on Twitter, most of what you're seeing are the hilarious results of earlier iterations of the network.
If you mean the ones that I've posted here over time, I'd say that our results are more alike than they are different. Now, it may be the case that LASture's networks and mine were trained under different parameters, and this can affect the result, but keep in mind that cards you are seeing were deliberately selected. Sometimes the cards have great flavor and poor wording, or great wording and poor flavor. The network can churn out thousands of cards very rapidly (more than we could ever hope to share here), so some filtering and selection is done when we present results. That and the machine tries to imitate Magic cards, and most Magic cards are ho-hum limited fodder; a great deal of the results are technically perfect but not worth mentioning.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Recall that we have two parts to this system. The first is the generator, which tries to make fake artwork. The second is the discriminator, which tries to distinguish real artwork from fake artwork. They go back and forth and test each other, each getting better and better. Over time, the generator starts to make artwork that is reasonable.
Here's what I didn't think about: the generator doesn't actually get to see what it's supposed to imitate (otherwise it'd just make clones of the original artworks). Instead, it gets feedback on how convinced the discriminator was that its work was actual Magic art. I didn't realize how long that process would take, so I've been cutting the training short and getting garbage results. So I ran the script for a little longer this time on just the island data set, and after about 80000 iterations I started to get reasonable results. You can start to see everything coming together.
I didn't run it for as long as I should have, had to cut things short. But I'll look into running the process for even longer tonight and see how things go.
Meanwhile, I've also been looking into how neural style transfer can improve upon the works. I've chose an anime girl that I generated to work with. As you can see, the results are a little questionable when the small images are upscaled, but there are ways we can work around that. The first is to clean up the image by using a service like waifu2x, a deep neural network that does smart upsampling.
From there we can use actual artwork that matches the style we want and use neural style transfer to borrow the fine details. In the example I've attached, you can see how the style transfer network adds in texturing to the hair. I ran things with default parameters; the final result will look better with more fine-tuning. I'll have to tweak the process a little bit to get the kind of results we'll need for Magic, but we can figure that out.
----
@Patofet: Sorry that it took me so long to get back to you. But I'm still not quite sure about what you're referring to? Do you want the original art for Magic cards? I have a dataset of low resolution versions of those.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
#0: Initially, the generator outputs purely random noise.
#1: Distinct regions of color appear, though the color choices are random.
#2: The color palette begins to resemble a forest setting.
#3: Patterns of leaf-like striations begin to emerge.
#4: Smooth patches of green for leaves and grass, occasional columns of brown appear. Trees?
#5: Indistinct forms can be seen moving through the mist.
#6: The forms are starting to look humanoid. Sometimes we see Picasso-esque faces peering at us from a distance. We're not alone.
Now, they do look like elves now and then. Most of the time though they look like horrific abominations. Here's an example of two that I got that I retouched by darkening the bodies in photoshop and upscaling it so you could see the details better.
Many of their faces look like misshapen bronze that opens up into some kind of sharp, gaping mouth. Their bodies are less distinct, made of what looks like moss and wood. I swear, half the time they come out looking like something you'd see shambling down a dark hallway in Silent Hill.
Once again, this is to be expected because the generator isn't actually shown what elves are supposed to look like; it's reverse engineering the physiology of elves by studying the responses of the discriminator. Now, in theory, the discriminator should wise up to the deception, making the pseudo-elves subject to natural selection. In the tests I ran last night, however, everything seemed to stagnate around this point. I think it's because there weren't enough feature maps in the network (the networks I ran were very small). With the Torch version I can beef up the network at the push of a button, but everything is hardcoded in the Chainer version - I'll have to do some tinkering later.
In the meantime, I asked RoboRosewater for a black and green elf horror and got...
Glowbrack Muse
1BG
Creature - Elf Horror (Rare)
At the beginning of the end step, you may draw a card for each non-Wall creature put into your graveyard this turn.
2/2
Adding a dash of generated flavor and we get the first card that we have ever generated where the card, art, and flavor are totally original. So that's progress! Hopefully soon I'll be able to generate something other than monstrosities.
EDIT: Oh, and for those of you who are interested to see the style transfers, there's one I did recently on reddit that might interest you. Someone said their dog looked like a Van Gogh painting, so I did a restyle of the dog as Van Gogh's Starry Night and the results came out looking amazing.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Edit: also that completely generated card is pretty cool, but how useful it is depends on how we interpret it, this turn could mean either the turn the card was played, which means if you play it and then deliberately get lots of creatures killed, then you would be drawing lots of cards each turn, or it could mean on the current turn, so then you would draw cards equal to the deaths on the current turn instead
Well, the result isn't quite what you're interested in, but it can be done.
Elf slivers are all green, eldrazi slivers tend to be big and colorless, etc. The problem is that the combination of creature types usually results in body text relevant to one or the other type being present but not both. So if I ask for a sliver spellshaper, I get results like
|mysterious sliver||creature||sliver spellshaper|O|&^/&^|{^GG}|sliver creatures you control get +&^/+&^.\whenever you cast a spirit or arcane spell, you may pay {^^^}. if you do, draw a card.|
|mysterious sliver||creature||sliver spellshaper|O|&^/&^|{^WW}|{WW}, T: target creature gains first strike until end of turn.|
|mysterious sliver||creature||sliver spellshaper|A|&^/&^|{^UU}|{UU}, T, discard a card: draw a card.|
Or if I want dragon slivers, I get...
|mysterious sliver||creature||dragon sliver|N|&^/&^|{^RR}|all sliver creatures have "{^^}: this creature gets +&^/+& until end of turn."|
|mysterious sliver||creature||dragon sliver|N|&^^/&^^|{^RR^^}|flying\{RR^}: @ gains flying until end of turn.|
|mysterious sliver||creature||dragon sliver|N|&^^/&^^|{^^RR}|flash\flying\when @ enters the battlefield, destroy target noncreature permanent.|
So some have body text like one subtype and some have body text like another. This has to do with the fact that the network is sensitive to the order in which text has to appear in a card and if text from each subtype is supposed to appear in the same place, one usually wins out over the other.
EDIT: I'm getting closer. The elves are looking less ugly and more elf-like. Skin tones are coming in, and I'm even starting to see some hints of clothes and weapons. Of course, I'm on guard for overfitting as we get closer to the target. I'll let things run overnight and see how it turns out. If it works, then we can see about expanding the input to incorporate more diverse art.
EDIT(2): Not quite what I wanted. The discriminator stopped getting smarter, so the generator started overfitting. It changed gears and ended up producing clones of a single elf figure, standing dead center of the frame and looking straight at the viewer, set against different backgrounds. I'll get it figured out though.
I will say this though: in the long run, an adversarial approach may not be the best solution (there are competing alternatives). Or, at the very least, an adversarial approach might just be only a piece of the solution. Training this way is very inefficient; it takes many hundreds of thousands of iterations to start seeing good results because the generator can only learn about what it's supposed to produce indirectly. On the other hand, the results can be very robust. What you get is a mapping from vectors to images, and you can do arithmetic on the vectors to get different images. For example, v(man with glasses) - v(man) + v(woman) == v(woman with glasses). That and the images are unique and not just clones of the training set.
By the way, I did see someone put out something fun. They designed a image generation process using Tensorflow that produces vector graphics rather than raster graphics. That is, it's thinking in terms of lines and curves rather than pixels. That makes it easier to draw complex sketches because it's scale-independent.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.