EDIT(2): Rerunning things. But I have a suspicion that something is wrong. I think it may have something to do with the fact that I try doing some weight sharing and those weights get unshared when I do a cast. But we'll see.
Which weights are you sharing?
Well, that's the thing, I realized that was a mistake on my part, I was sharing the weights from the gates and their respective peepholes. I changed that. Things were running okay but then I got a catastrophic, unspecified error. At first I was concerned but then I realized we took the machine offline temporarily and it had thrown a wrench into the training. So I don't have any results to share as of yet and it's possible that I still have a mistake in there somewhere but I haven't had the chance to figure anything out.
I'd start things running again but I suspect I may have made a mistake somewhere else, but I'm having to head into the office tomorrow to do some experiments and some writing; I wouldn't want to have to terminate things prematurely. I'm getting 92% accuracy on a particular problem solving task using run-of-the-mill LSTM networks, but I think a good NTM implementation could do even better. Lots of fun things to experiment with.
Intuitively speaking, it doesn't make sense to go above 90 neurons, because the alphabet I'm using simply doesn't have a high enough information density to justify it. The trick to finding low-error networks appears to be really low feedback (as I've said before) and a few thousand generations. Don't quote me on this, but I suspect the sweet spot for feedback is for the first-order factor to be within an order of magnitude of 0.5 divided by the number of input neurons, and the second-order factor to be any number significantly smaller than the first-order factor.
ETA: "I can't see how to optimize the inner loop any more." I said. "Installing NumPyPy will be quick and painless." I said. "Let's see what kind of performance I get." I said. Ugh. 6x slowdown. It must not be running long enough for pypy to warm up. Either that, or NumPyPy isn't a good fit for this code. But, it was a nice orthogonal experiment.
Vraiment? Por qué? A 6x slowdown sounds very odd to me.
No fair, as internet's back I see lots of things happened here lol
So it turns out we were using rather bad values for learning_rate: it had to be higher initially for faster initial convergence, but it must also go down fast enough because too high lr makes bad learning; at the end of training, using much lower lr values than karpathy's default training lets us gain like 0.01-2 loss. Only problem would be, I have no idea what's the ideal learning strategy; but getting to ADAM would be the fastest anyway? Is that correct Tiir?
It's not that ADAM absolutely guarantees fast convergence, but it has given some sexy results for a variety of standard benchmarks. For example, take a look at the graph I've attached. See the green? That's what we use normally. See the purple? That's ADAM. I'm still studying the paper, looking into the details, but it does look promising, and an implementation already exists in Torch, so that's convenient.
I only have a very dim understanding of the Adam paper and torch implementation of it. Some notes:
- it needs to be passed a fourth parameter, an initially empty table "state" that it uses to store things that it will update at each subsequent call (Talcos, I hope you noticed this since your last message about that)
Yes! The RMSprop algorithm we use does the same thing, and our existing code already takes care of all that for us, no need for any substantial modifications. In train.lua, you take the line
local _, loss = optim.rmsprop(feval, params, optim_state)
and replace it with the line
local _, loss = optim.adam(feval, params, optim_state)
That's all. Now, for RMSprop, the only thing we were putting in the state was the learning rate. The ADAM implementation has extra parameters that constitute the state, and if you don't put any values there at the start, it'll pick default values. We could modify the code to tinker with those values, of course.
- it ponders the current gradient with previous gradients / squares of gradients to let current minibatch training be greater or smaller. It looks like it does not really adress the issue of having to lower the learning rate (alpha parameter in the paper) as learning progresses.
You're correct. The decision to adjust the learning rate is above its pay grade. Ultimately, from everything that Tiir and I have seen, you end up converging to roughly the same point regardless of whether you use ADAM or RMSprop; the difference is that ADAM often gets you there much faster.
And I'll say this again: you're all using too big nets. Smaller nets with 1 hour training total time should be enough to get initial assessments. Which may be inconclusive, i'll grant that
I agree, good point. Smaller nets would give us faster turnaround times for the sake of making initial assessments.
---
Here are a few choice results from the network. I used a version of the network that generated names, though I gave it names of places and the like, and I fed it new ability words and keywords by priming. I rephrased keywords as ability words so I could feed the reminder text to the network to act on.
Kazandu Crier G
Creature - Human Warrior Ally (Common) Rally - Whenever Kazandu Crier or another Ally enters the battlefield under your control, you may gain 1 life.
1/1
Hada Heart-Wing 1BB
Creature - Human Wizard Ally (Rare) Rally - Whenever Hada Heart-Wing or another Ally enters the battlefield under your control, target player loses 2 life and puts a 2/2 black Zombie creature token onto the battlefield.
2/2
#I'm not sure how I feel about this card, but I'll share it with you anyway because it's interesting.
Guardian of Zendikar 3R
Creature - Human Soldier Ally (Rare) Rally - Whenever Guardian of Zendikar or another Ally enters the battlefield under your control, you may have target opponent reveal the top card of your library. Guardian of Zendikar deals damage equal to that card's converted mana cost to that creature or player.
2/1
#What the card does is perfectly clear, but the phrasing is very unusual in that it has your opponent reveal the top card of your library. Also that it makes a small mistake and says "that creature or player" when the only option is player.
Mul Daya Regent 3G
Creature - Human Druid Ally (Rare) Rally - Whenever Mul Daya Regent or another Ally enters the battlefield under your control, you may flip a coin. If you win the flip, assemble two cards. 1: Return Mul Daya Regent to its owner's hand.
0/2
#I kid you not, assemble!
Tormented Mage 2BB
Creature - Eldrazi Drone (Rare)
Devoid (This card has no color.)
Ingest (Whenever this creature deals combat damage to a player, that player exiles the top card of his or her library.)
When Tormented Mage dies, you may choose a card exiled with it. If you do, put that card into your hand.
3/3
#I gave it the "you may choose a card exiled with it." to play with, and this is one of the results I got. It's unusual in terms of what it does, but it makes sense. Other versions of this card let you cast that spell until the end of turn, or to cast it without paying its mana cost.
Aether Tormentor 3UU
Creature - Eldrazi Drone (Uncommon)
Devoid (This card has no color.) 1: Aether Tormentor becomes the color of your choice until end of turn.
4/4
#Colorless? Well, we can't let that happen, now can we?
You know how I said while image generation for our cards wasn't quite where it needed to be yet but that given the exponential pace of current research we'd have a good solution if we just waited? Well, there was an interesting paper that came out three days ago that intrigued me and I just had to share it: http://arxiv.org/pdf/1508.06576v1.pdf
The idea is that you give the network an image I and another image J and you want the network to produce an image K, where K is I done in the style of J. See the attached image for examples.
The network that breaks down the original image into its component parts, like houses, and then it looks at the style guide image and asks itself "Well, how should I draw a house in this style?" It then transforms the image to match the style. It apparently works at arbitrary resolutions, so you get quality images out of the process.
At the very least, this means that you can take any image and make it look like Magic art by borrowing the style of existing cards. For example, you can take any landscape photograph and ask the network to recompose it in the style of John Avon. It also lets us answer questions like "What if Jace, the Mind Sculptor was drawn by Rebecca Guay instead of Jason Chan?"
But that's not what interests me most about this. Think of it this way: the network takes an image, breaks it down into a content representation, and then regenerates it as a new image. So take out the original image part, so that leaves us with a mapping from content representation to an image. Find a mapping from, say, a text description to a content representation (which from what I've read and seen is very do-able), and then you're in business. Effectively you'd be taking text and getting a rough sketch whose details are then filled in with colors and textures from the style image. I'm not saying that I'll have the time and energy to investigate that, but it's well within the realm of possibility and I'm sure more steps will be taken in the coming months that will make that sort of thing possible (all the pieces already exist, they just need to be assembled).
I haven't seen any source code come out for that paper as of yet but the moment that it's available I'm going to have so much fun with it.
EDIT: But what to do first? Maybe Vermeer's "Girl with a Pearl Earring" merged with Goblin Piledriver... Hm.. I have a number of things I want to try. lol.
EDIT(2): I just noticed something about the image I attached. Look at the Starry Night reinterpretation. The network adds lights coming out of the windows even though the original photograph was shot during the day. Why? Because it's a night-time scene. Obviously there would be lights coming out of the windows. How clever.
EDIT(3): Oh, wait, yes, I did find some source code. No trained models though. Not yet anyway. I'll keep looking later.
Also, I promise I'll look into getting that peephole implementation working when I get the chance, so I can see where it leads us with regards to card generation.
Shard Orrent 3BB
Sorcery
As an additional cost to cast Shard Orrent, discard X cards, then draw X cards.
# So now it's doing programming languages tricks: it pretends to do nothing while the actual effect is hidden in the cost of the spell. Clever!
It's an elaborate way of making it uncounterable. You can counter it, but it's already done everything it's going to do by that point!
Shard Orrent 3BB
Sorcery
As an additional cost to cast Shard Orrent, discard X cards, then draw X cards.
# So now it's doing programming languages tricks: it pretends to do nothing while the actual effect is hidden in the cost of the spell. Clever!
It's an elaborate way of making it uncounterable. You can counter it, but it's already done everything it's going to do by that point!
That's pretty cool. If only we knew what X was, though...
I'm now trying to understand how the net is wired. One LSTM is easy to understand from wikipedia for instance, but I can't make heads or tails from the code. Where do the outputs go, where does the input character go, where's the ouput character vector read from? The oxford pdf mentioned in karpathy's page doesn't help much. I'll keep looking but if someone's got pointers I'd be grateful.
I've attached an image of an explanation that I attempted awhile back (along with a simple diagram). To my knowledge, it's accurate. If it isn't, or something is unclear, let me know.
EDIT: The version I describe differs from the version you're seeing below because I mention things like biases which aren't shown in that code. Sorry about that.
I've realized I was stupid to expect verbatim memory from a network that has so few neurons. Adding edges is not the point. Is the following correct?
- each gate of each layer receives as input a weighted sum of all outputs of previous layer this step and of all outputs of the same layer in the previous step. That's why few neurons make lots of parameters
- "layer 0" just makes a one-hot encoding of the input character and feeds it to all neurons of next layer
- output vector is a logsoftmax of all outputs of the last layer
That's almost correct, yes. Take a look at this code:
-- evaluate the input sums at once for efficiency
local i2h = nn.Linear(input_size_L, 4 * rnn_size)(x)
local h2h = nn.Linear(rnn_size, 4 * rnn_size)(prev_h)
local all_input_sums = nn.CAddTable()({i2h, h2h})
local reshaped = nn.Reshape(4, rnn_size)(all_input_sums)
local n1, n2, n3, n4 = nn.SplitTable(2)(reshaped):split(4)
-- decode the gates
local in_gate = nn.Sigmoid()(n1)
local forget_gate = nn.Sigmoid()(n2)
local out_gate = nn.Sigmoid()(n3)
-- decode the write inputs
local in_transform = nn.Tanh()(n4)
I know it's a little bit confusing because, for the sake of efficiency, the inputs to all the gates are computed at once and then split up, but bear with me. So let's say this is a hidden layer cell, and 200 inputs come in from connections from the previous layer. They each get weighted by the linear layer and fed to the in_gate, which applies the sigmoid function element-wise to the 200 weighted inputs. So the difference between what you originally understood and how it actually works is that we're not crushing all the inputs to make a single number. The internal state of the LSTM block, in this case, is a 200-wide vector. Assuming we're working with 64-bit numbers, that would mean that each individual cell/block of the network can hold up to 12800 bits of information.
EDIT: Actually, I made a mistake here in my explanation. I meant that each layer can hold that much information, not each individual block.
---
EDIT: Ooh, people are already reproducing that style transfer paper I talked about. Take a look:
Spinning up my own net again for funsies, to have the latest available from mtg-rnn and MTGJSON. Running only at size 128, and taking a quick pause halfway through training, I'm already getting working Xes and such:
Sudden Canophation (uncommon) 4U
Enchantment
Whenever a player casts a white spell, target creature gets -X/-X until end of turn, where X is the number of cards in your hand, then draw that many cards.
Thanks Talcos. Yes, I was thinking that, for each character, for each gate of every LSTM, all numbers coming from the inputs and corresponding to that very same character were having a linear weighted sum, thus resulting in just 1 number which was then "normalized" with a sigmoid function. But looking at "slices" for each character we see we'd have an independant net for each character, with no character cross-over whatsoever, which wouldn't work. Actually each LSTM doesn't output a vector but just one number, right? That's the unclear part. In your eq. (5) C_t is unused, which confuses me.
To my knowledge, as per our example, it should output a vector of 200 different numbers, providing one for each of the presumably 200 cells occupying the next hidden layer. And.. wow, you're right. I made a typo, my bad. Sorry, I wrote that down in a sketch for my dissertation awhile back and I never went back to check it. There should be an extra V_0 * C_t added onto equation 5. I must have overlooked it.
Let's recap. I'll call V the vocabulary size and W the rnn_size (it's a width actually). If I have a net that does the following things, I can compute the number of parameters.
Layer 0: 1 character -> 1 vector v_0 of size V, all 0s except for one 1 at the index of the character.
Layer 1: W LSTMs will each store and output 1 number, which makes a vector of size W for the whole layer. There's 4W gates, entries are V values of layer 0 and W values of layer 1 at t-1. Each entry is weighted for each gate thus I have 4W*(V+W) parameters.
Layer i>1: Same thing except there were W values on the previous layers, which makes 4W*2W parameters.
Output: the last vector of size W is transformed into a vector of size V (that will be interpreted as holding, for each char index, some kind of intensity for a force that would try to output that character). VW parameters.
Total for 3 layers: 4W*(V+W) + 2*4W*2W + VW = W * (5V + 20W)
I have V = 45, W = 170, this makes 616250, while karpathy's code says 620375 so I'm missing 4125, which, er, does not ring any bell. But if I want each LSTM to store some vector I'll end up with far too many parameters?
Hrmm.. does the final layer add anything to that number? In any case, I can tinker with the script layer to get a precise breakdown for you if you'd like.
Interesting. Maybe its a parameterization issue, not sure. I'll definitely look into it.
EDIT: Finally. I think I got the peephole strategy working correctly. I found a tiny mistake and after fixing it I'm seeing convergence. But we'll have to wait and see how low I can get the error rates. I'm using ADAM right now and no biases.
Quick question: when running the sample script, I get a warning about no seed text even if I supply a "-seed" parameter. Is that normal? (Totally unrelated?)
EDIT: Interesting thing while perusing a dump from the same net as Sudden Canophation above. Does the final clause there work?
Thraben Mist (uncommon) 1
Artifact 2,T: Put a charge counter on Thraben Mist. T, remove a charge counter from Thraben Mist: put a 2/2 green Wolf creature token onto the battlefield tapped and attacking or blocking.
EDIT 2: Bwahaha. I have a flying Goblin Warrior named "Pony Battlemage".
Quick question: when running the sample script, I get a warning about no seed text even if I supply a "-seed" parameter. Is that normal? (Totally unrelated?)
This is normal, it's complaining that you haven't given it a -primetext option, so it doesn't know what sequence of characters to start the dump with. In this case it picks a character at random.
One of the things on my todo list when I start developing again is to get rid of that option and just have it always prime with two newlines and a field separator, so it usually starts on a card boundary rather than in the middle of something, but that's a very small optimization. You could try doing this manually, but it's surprisingly hard to pass real newlines in a command line argument.
@Elseleth
$ NL=$'\n'
$ th sample.lua cp.p2.ep20.t7 -seed 2345 -primetext "${NL}${NL}r:" -temperature 0.7 -length 400
This will work if you change "r:" to whatever is hardcast's initial format after the two newlines. '-seed' changes the random seed, which you have to manually do each time to do a new dump unless you always want the same one.
Similar cards to Thraben Mist have mechanics that let you add the token (or card taken from somewhere) only at a time it makes sense. There's no safeguard here, so whether it'll "work" or not will depend on how you want to use it. If it's "fool a panel of humans into thinking this card was designed by a human", it won't. If it's "when you give this card to a panel of MtG experts, and each of them independantly corrects the text to make it look like it was designed by WotC, all the cards produced behave exactly in the same way whatever's the situation in an otherwise normal MtG game" I'm guessing it works?
Maybe we'd need names for these kind of things: it's not turing but it's expert-clear? While some of mine were interesting designs but expert-ambiguous?
We're still in sub-Turing territory, I think. We can definitely get there when it comes to mechanics, and soon. Flavor of course is another matter entirely since it's informed by all kinds of tropes and cultural understandings. But even that barrier can be overcome. I suppose it also matters whether the machine gets to choose what it presents, to play to its strengths, or whether this is an iterative game where the interrogator gets to describe the kind of card they want in some way.
---
So I halted the experiment I was doing with the peepholes. I did not get the results I wanted (this time).
What happened was that the training loss and validation loss dropped down into the very low 0.30s. I was hopeful that we'd break through that barrier. But by epoch 15 (5 epochs into the decay of the learning rate), it started skyrocketing back up again. I don't have a graph to show you at the moment, but if I did, it would be shaped like a V. Absolutely bizarre. It's also interesting to me that we overshot the goal after we began to decay the learning rate.
Jml34, you noted that you got subpar results with ADAM with default parameters? I can try again with RMSprop to get a better sense for how the peepholes should behave with all else being kept the same. I think with some fine-tuning we can get better results like Tiir described, I just need to figure out what's causing the problem.
For reference, here's sample output from the checkpoint at epoch 13:
|||instant|||O||{^^^^UU}|uncast target spell. its controller loses life equal to the number of creatures you control.|
|||creature||human shaman|N|&^^/&^^^|{^GGGG}|{^}, T: tap target creature.\when @ leaves the battlefield, you may put a &^/&^ green elemental creature token onto the battlefield.|
and here's output from the checkpoint at epoch 15:
||{W{^^^^^^BW^^^}|^^G^^^^^^^^^}|cnes in chote th beganhireoture tn cantoton thi catt toreat card tore tarard. rett creature ent prmr toreaturee entur l wheand cnd onds pet of thant cour socn then thentour tau geaf the baats y ant on iwe tpor tore t||
||c t \wate ca treaan thee themant u oltanare tire. s ses &^ehenn wat pat ant then a d tf nte battlafmou tn them chant.|
Quick question: when running the sample script, I get a warning about no seed text even if I supply a "-seed" parameter. Is that normal? (Totally unrelated?)
EDIT: Interesting thing while perusing a dump from the same net as Sudden Canophation above. Does the final clause there work?
Thraben Mist (uncommon) 1
Artifact 2,T: Put a charge counter on Thraben Mist. T, remove a charge counter from Thraben Mist: put a 2/2 green Wolf creature token onto the battlefield tapped and attacking or blocking.
EDIT 2: Bwahaha. I have a flying Goblin Warrior named "Pony Battlemage".
I believe there's no issue in the rules for a card to be put into play in a state such as attacking. I think it is supposed to limit when you can use such an effect to only the attack phase, though. Ninjitsu works in a similar way, as do a handful of other cards. A good comparison with a token would be Alesha, Who Smiles at Death."
Private Mod Note
():
Rollback Post to RevisionRollBack
Why did I ever think a signature in comic sans was a good idea?
For the record, Pcaoren (who is amazing) just located an implementation of that style transfer paper that I talked. I'll definitely be taking a look at that ASAP to see if it can meet our needs.
If I get any good results, I'll be sure to share them.
EDIT: The code definitely works, but I'm running into some memory issues. I can get a few iterations out it, and the Swiss National Park starts to turn into a work by John Avon, but then I get an out of memory error. This code probably isn't as optimized as it could be, I bet. But that's okay! Even if I can't optimize it for my machine, I just reacquired my desktop machine from back home, an Alienware machine built for gaming with two Nvidia Geforce cards. It has a few years on it, but it's more than equipped to do image processing work...
Ah, but wait, adding in a call to the garbage collector may have fixed it. We may be back on track. I'll let you know.
Thraben Mist (uncommon) 1
Artifact 2,T: Put a charge counter on Thraben Mist. T, remove a charge counter from Thraben Mist: put a 2/2 green Wolf creature token onto the battlefield tapped and attacking or blocking.
I believe there's no issue in the rules for a card to be put into play in a state such as attacking. I think it is supposed to limit when you can use such an effect to only the attack phase, though. Ninjitsu works in a similar way, as do a handful of other cards. A good comparison with a token would be Alesha, Who Smiles at Death."
Well, it should still say that it's only usable during the attack phase. And it should only let you generate a token that's attacking during your combat phase, or blocking during an opponent's combat phase. And it should probably only let you generate a blocker if there's something that the token will be able to block. Or at least it should generate the token, then tell you to assign it to block something if possible (notwithstanding the token being tapped!).
Or maybe it doesn't actually have to be blocking anything? If a blocked creature gets destroyed, are the blockers still considered “blocking”?
EDIT: Actually, there's no issue at all about entering the battlefield blocking. Should've just checked the Comprehensive Rules to begin with.
509.7 sez you do indeed get to choose what creature's getting blocked if nothing chooses for you. And 506.3d sez it's never “blocking” if it doesn't manage to block anything. Oh, and 506.3b sez you can use Thraben Mist anytime and the thing will never actually be “attacking” if not applicable. So actually the card's meaning isn't ambiguous at all. They don't call them Comprehensive for nothing ...
So there might be some limitations with this implementation. The underlying system that's being used is Google's Deepdream network, which won't deliver quite the same results as the one that the authors used, but I understand that Sammim wanted to share this ASAP, so that's fine for now. It also runs into memory issues if I try to up the resolution beyond the hard-coded limits, but that shouldn't be a problem for me on my new machine (which I'll be setting up tonight). I expect implementations to come out that are more memory efficient and that emulate the author's implementation more closely (they will probably release it themselves eventually, given all the attention they're getting).
In the meantime, I can try to get Sammim's implementation to produce higher quality results (I suspect I just need to turn all the parameters up on a better machine, which I now have). I could see us integrating this into our card creation process, at the very end.
In the meantime, I've attached a low-res, low-quality version of Jace Beleren as drawn by Rebecca Guay.
More and better stuff to come in the future.
EDIT: FYI, I used Elvish Piper as a reference. Notice that while it stole some of the background details from the art, it did not change the central focus of the image except to restyle it. The Deep Dream approach, by comparison, would have given us Jace Beleren with a flute sticking out of his head.
EDIT(2): But it's not really borrowing from the style image either. For instance, it just recolored the soil and the background trees are actually arranged differently in the style image.
EDIT(3): Landscape photography seems to do well. And I suspect that, with some tinkering on my part, I can get this thing to turn photographs of cosplayers into Magic art.
EDIT(4): Aha! I see what it did. It turned the clouds in the original Jace image into trees.
EDIT(5): Yes, I definitely can get this thing to run a lot better. Sammim intentionally downgrades everything so he can run his code on his Macbook. I can fix that when I get home. In the mean time, I've attached a Starry Night Jace. This one is problematic because his spell starts to eat his whole arm and mesh with the night sky. I think at a higher resolution we won't see this issue though.
Tweaking the code to remove matrices and casts, um, I didn't time it, but it increased the disparity between Numpy and Numpypy substantially. This is bizarre. I wonder what happens if I turn everything into function calls.
Not really any difference.
Um. Numpypy. Would not recommend. At least, not for any little stuff like this.
I've grabbed the latest JSON dump, and I'm thinking about how to process it to turn it into input. Among other things, using the dump would get me a better-quality data set for my race encoding stuff.
However, my first pass at figuring this stuff out was to pop open the dump in my text editor. It reacted poorly to a file that's just several megs of text on a single line.
Oh my sainted aunt, the Starry Night Jace is amazing. I'd want a 1080p version for a desktop background.
I would really like to try this on my machine but I doubt my paltry 1gb video card, CUDA or no CUDA, will be able to produce anything of decent resolution; would I be correct?
edit: Is there any way to get rid of the weird 'rainbow bump' effect? It's almost like crinkled cellophane. It also looks too 'griddy', if you know what I mean (you can see an underlying grid pattern).
Oh my sainted aunt, the Starry Night Jace is amazing. I'd want a 1080p version for a desktop background.
I would really like to try this on my machine but I doubt my paltry 1gb video card, CUDA or no CUDA, will be able to produce anything of decent resolution; would I be correct?
Sammim said that "On a 2014 MacBook Pro with an NVIDIA GeForce GT 750M, it takes a little over 4 minutes to perform 500 iterations of gradient descent." That's the hardware he's using. Mine's slightly more powerful and I can get results in about 3 minutes. I'm not sure how your machine would compare to his. You can scale it down if needed, but that also means you'll get lower resolution results.
And yeah, I totally plan on making a Starry Night Jace at wallpaper resolution and making that my background, just as soon as I can get my desktop configured properly to run Torch. I'm very excited, haha. I'll be sure to give y'all a copy.
And if anyone can think of any other mashups they'd want me to do, I can queue up a bunch of them with a script. Once I have the better hardware ready to go, that is.
edit: Is there any way to get rid of the weird 'rainbow bump' effect? It's almost like crinkled cellophane. It also looks too 'griddy', if you know what I mean (you can see an underlying grid pattern).
Oh, I know. That's because of the low parameters I'm currently using. At higher resolutions, things should be more detailed and less noisy, and you shouldn't have that grid effect.
Also, as a side note, we're limited at this point by the quality of our image recognition. I think that if the network doesn't understand what it's looking at in the style/source image, then it can misapply textures. For example, take a look at Jace + The Great Wave off Kanagawa that I've attached. The better our feature recognition capabilities, the better we can apply elements from the style image to the content image. As is, I think the network does best if you use photographs for content, as that's what the convnet is trained on. But it does pretty well overall when using works of art for content images, like Jace.
EDIT: BTW, I think I can tweak some parameters to make it so that Jace doesn't get washed away by the waves. Just have to compensate for the mistake.
Seeing a couple cards with Devoid made me go back and check for BOZ spoilers (I hadn't done that in a while).
GODDAMNIT, THEY STOLE MY MECHANIC AGAIN! (Though to be fair, I eventually just un-keyworded Stormtouched and made it a supertype, like Arcane, and will probably use Devoid for cards with colored casting costs, but still!)
Also, me gusta the promise of text-to-image generation in the near future. Godspeed to you, pioneers!
EDIT: THEY STOLE CONVERGE, TOO! FROM THE SAME SET! THOSE BASTARDS!
Baked a net for 15 hours yesterday into today, and I'm really liking what it's given me. At temperature 0.7 or so, most cards need only a couple of grammar tweaks or excess text removed, and they're interesting and playable.
I got a couple of mysterious keywords in the mix, and would like to query the net as to what they mean. However, when I put anything into the "primetext" option of sample.lua, I get this error (command included for context):
~sabrecat@sabrecat-VirtualBox:~/mtg-rnn$ th sample.lua cv/lm_lstm_epoch50.00_0.247.t7 -gpuid -1 -temperature 0.71 -length 100 -primetext "Herlizot~" | tee herlizot.txt
creating an lstm...
seeding with Herlizot~
--------------------------
/home/sabrecat/torch/install/bin/luajit: bad argument #1 to '?' (empty tensor at /home/sabrecat/torch/pkg/torch/generic/Tensor.c:851)
stack traceback:
[C]: at 0x7fab543d4470
[C]: in function '__index'
sample.lua:119: in main chunk
[C]: in function 'dofile'
...ecat/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
If I remove the -primetext option, the sample operation proceeds without issue.
Private Mod Note
():
Rollback Post to RevisionRollBack
To post a comment, please login or register a new account.
I'm probably guilty of something for helping you dump raw code blocks onto a mtg forum, but...
Try putting "diff -c" output into a [ code ] tag?
Well, that's the thing, I realized that was a mistake on my part, I was sharing the weights from the gates and their respective peepholes. I changed that. Things were running okay but then I got a catastrophic, unspecified error. At first I was concerned but then I realized we took the machine offline temporarily and it had thrown a wrench into the training. So I don't have any results to share as of yet and it's possible that I still have a mistake in there somewhere but I haven't had the chance to figure anything out.
I'd start things running again but I suspect I may have made a mistake somewhere else, but I'm having to head into the office tomorrow to do some experiments and some writing; I wouldn't want to have to terminate things prematurely. I'm getting 92% accuracy on a particular problem solving task using run-of-the-mill LSTM networks, but I think a good NTM implementation could do even better. Lots of fun things to experiment with.
Makes sense.
Vraiment? Por qué? A 6x slowdown sounds very odd to me.
It's not that ADAM absolutely guarantees fast convergence, but it has given some sexy results for a variety of standard benchmarks. For example, take a look at the graph I've attached. See the green? That's what we use normally. See the purple? That's ADAM. I'm still studying the paper, looking into the details, but it does look promising, and an implementation already exists in Torch, so that's convenient.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Ah, right. Good point.
Yes! The RMSprop algorithm we use does the same thing, and our existing code already takes care of all that for us, no need for any substantial modifications. In train.lua, you take the line
local _, loss = optim.rmsprop(feval, params, optim_state)
and replace it with the line
local _, loss = optim.adam(feval, params, optim_state)
That's all. Now, for RMSprop, the only thing we were putting in the state was the learning rate. The ADAM implementation has extra parameters that constitute the state, and if you don't put any values there at the start, it'll pick default values. We could modify the code to tinker with those values, of course.
You're correct. The decision to adjust the learning rate is above its pay grade. Ultimately, from everything that Tiir and I have seen, you end up converging to roughly the same point regardless of whether you use ADAM or RMSprop; the difference is that ADAM often gets you there much faster.
I agree, good point. Smaller nets would give us faster turnaround times for the sake of making initial assessments.
---
Here are a few choice results from the network. I used a version of the network that generated names, though I gave it names of places and the like, and I fed it new ability words and keywords by priming. I rephrased keywords as ability words so I could feed the reminder text to the network to act on.
Kazandu Crier
G
Creature - Human Warrior Ally (Common)
Rally - Whenever Kazandu Crier or another Ally enters the battlefield under your control, you may gain 1 life.
1/1
Hada Heart-Wing
1BB
Creature - Human Wizard Ally (Rare)
Rally - Whenever Hada Heart-Wing or another Ally enters the battlefield under your control, target player loses 2 life and puts a 2/2 black Zombie creature token onto the battlefield.
2/2
#I'm not sure how I feel about this card, but I'll share it with you anyway because it's interesting.
Guardian of Zendikar
3R
Creature - Human Soldier Ally (Rare)
Rally - Whenever Guardian of Zendikar or another Ally enters the battlefield under your control, you may have target opponent reveal the top card of your library. Guardian of Zendikar deals damage equal to that card's converted mana cost to that creature or player.
2/1
#What the card does is perfectly clear, but the phrasing is very unusual in that it has your opponent reveal the top card of your library. Also that it makes a small mistake and says "that creature or player" when the only option is player.
Mul Daya Regent
3G
Creature - Human Druid Ally (Rare)
Rally - Whenever Mul Daya Regent or another Ally enters the battlefield under your control, you may flip a coin. If you win the flip, assemble two cards.
1: Return Mul Daya Regent to its owner's hand.
0/2
#I kid you not, assemble!
Tormented Mage
2BB
Creature - Eldrazi Drone (Rare)
Devoid (This card has no color.)
Ingest (Whenever this creature deals combat damage to a player, that player exiles the top card of his or her library.)
When Tormented Mage dies, you may choose a card exiled with it. If you do, put that card into your hand.
3/3
#I gave it the "you may choose a card exiled with it." to play with, and this is one of the results I got. It's unusual in terms of what it does, but it makes sense. Other versions of this card let you cast that spell until the end of turn, or to cast it without paying its mana cost.
Aether Tormentor
3UU
Creature - Eldrazi Drone (Uncommon)
Devoid (This card has no color.)
1: Aether Tormentor becomes the color of your choice until end of turn.
4/4
#Colorless? Well, we can't let that happen, now can we?
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
You know how I said while image generation for our cards wasn't quite where it needed to be yet but that given the exponential pace of current research we'd have a good solution if we just waited? Well, there was an interesting paper that came out three days ago that intrigued me and I just had to share it: http://arxiv.org/pdf/1508.06576v1.pdf
The idea is that you give the network an image I and another image J and you want the network to produce an image K, where K is I done in the style of J. See the attached image for examples.
The network that breaks down the original image into its component parts, like houses, and then it looks at the style guide image and asks itself "Well, how should I draw a house in this style?" It then transforms the image to match the style. It apparently works at arbitrary resolutions, so you get quality images out of the process.
At the very least, this means that you can take any image and make it look like Magic art by borrowing the style of existing cards. For example, you can take any landscape photograph and ask the network to recompose it in the style of John Avon. It also lets us answer questions like "What if Jace, the Mind Sculptor was drawn by Rebecca Guay instead of Jason Chan?"
But that's not what interests me most about this. Think of it this way: the network takes an image, breaks it down into a content representation, and then regenerates it as a new image. So take out the original image part, so that leaves us with a mapping from content representation to an image. Find a mapping from, say, a text description to a content representation (which from what I've read and seen is very do-able), and then you're in business. Effectively you'd be taking text and getting a rough sketch whose details are then filled in with colors and textures from the style image. I'm not saying that I'll have the time and energy to investigate that, but it's well within the realm of possibility and I'm sure more steps will be taken in the coming months that will make that sort of thing possible (all the pieces already exist, they just need to be assembled).
I haven't seen any source code come out for that paper as of yet but the moment that it's available I'm going to have so much fun with it.
EDIT: But what to do first? Maybe Vermeer's "Girl with a Pearl Earring" merged with Goblin Piledriver... Hm.. I have a number of things I want to try. lol.
EDIT(2): I just noticed something about the image I attached. Look at the Starry Night reinterpretation. The network adds lights coming out of the windows even though the original photograph was shot during the day. Why? Because it's a night-time scene. Obviously there would be lights coming out of the windows. How clever.
EDIT(3): Oh, wait, yes, I did find some source code. No trained models though. Not yet anyway. I'll keep looking later.
Also, I promise I'll look into getting that peephole implementation working when I get the chance, so I can see where it leads us with regards to card generation.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
It's an elaborate way of making it uncounterable. You can counter it, but it's already done everything it's going to do by that point!
That's pretty cool. If only we knew what X was, though...
I've attached an image of an explanation that I attempted awhile back (along with a simple diagram). To my knowledge, it's accurate. If it isn't, or something is unclear, let me know.
EDIT: The version I describe differs from the version you're seeing below because I mention things like biases which aren't shown in that code. Sorry about that.
That's almost correct, yes. Take a look at this code:
-- evaluate the input sums at once for efficiency
local i2h = nn.Linear(input_size_L, 4 * rnn_size)(x)
local h2h = nn.Linear(rnn_size, 4 * rnn_size)(prev_h)
local all_input_sums = nn.CAddTable()({i2h, h2h})
local reshaped = nn.Reshape(4, rnn_size)(all_input_sums)
local n1, n2, n3, n4 = nn.SplitTable(2)(reshaped):split(4)
-- decode the gates
local in_gate = nn.Sigmoid()(n1)
local forget_gate = nn.Sigmoid()(n2)
local out_gate = nn.Sigmoid()(n3)
-- decode the write inputs
local in_transform = nn.Tanh()(n4)
I know it's a little bit confusing because, for the sake of efficiency, the inputs to all the gates are computed at once and then split up, but bear with me. So let's say this is a hidden layer cell, and 200 inputs come in from connections from the previous layer. They each get weighted by the linear layer and fed to the in_gate, which applies the sigmoid function element-wise to the 200 weighted inputs. So the difference between what you originally understood and how it actually works is that we're not crushing all the inputs to make a single number. The internal state of the LSTM block, in this case, is a 200-wide vector. Assuming we're working with 64-bit numbers, that would mean that each individual cell/block of the network can hold up to 12800 bits of information.
EDIT: Actually, I made a mistake here in my explanation. I meant that each layer can hold that much information, not each individual block.
---
EDIT: Ooh, people are already reproducing that style transfer paper I talked about. Take a look:
http://imgur.com/a/jeJB6
Picasso Gandalf!
EDIT(2): I restarted training with the peephole connections, just to see whether things are working correctly or not. I'll let you know how that goes.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Sudden Canophation (uncommon)
4U
Enchantment
Whenever a player casts a white spell, target creature gets -X/-X until end of turn, where X is the number of cards in your hand, then draw that many cards.
Ridiculous, but cool!
To my knowledge, as per our example, it should output a vector of 200 different numbers, providing one for each of the presumably 200 cells occupying the next hidden layer. And.. wow, you're right. I made a typo, my bad. Sorry, I wrote that down in a sketch for my dissertation awhile back and I never went back to check it. There should be an extra V_0 * C_t added onto equation 5. I must have overlooked it.
Hrmm.. does the final layer add anything to that number? In any case, I can tinker with the script layer to get a precise breakdown for you if you'd like.
Interesting. Maybe its a parameterization issue, not sure. I'll definitely look into it.
EDIT: Finally. I think I got the peephole strategy working correctly. I found a tiny mistake and after fixing it I'm seeing convergence. But we'll have to wait and see how low I can get the error rates. I'm using ADAM right now and no biases.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
EDIT: Interesting thing while perusing a dump from the same net as Sudden Canophation above. Does the final clause there work?
Thraben Mist (uncommon)
1
Artifact
2,T: Put a charge counter on Thraben Mist.
T, remove a charge counter from Thraben Mist: put a 2/2 green Wolf creature token onto the battlefield tapped and attacking or blocking.
EDIT 2: Bwahaha. I have a flying Goblin Warrior named "Pony Battlemage".
This is normal, it's complaining that you haven't given it a -primetext option, so it doesn't know what sequence of characters to start the dump with. In this case it picks a character at random.
One of the things on my todo list when I start developing again is to get rid of that option and just have it always prime with two newlines and a field separator, so it usually starts on a card boundary rather than in the middle of something, but that's a very small optimization. You could try doing this manually, but it's surprisingly hard to pass real newlines in a command line argument.
Is it just me or is croxis's site still down? I haven't been able to access it in days.
We're still in sub-Turing territory, I think. We can definitely get there when it comes to mechanics, and soon. Flavor of course is another matter entirely since it's informed by all kinds of tropes and cultural understandings. But even that barrier can be overcome. I suppose it also matters whether the machine gets to choose what it presents, to play to its strengths, or whether this is an iterative game where the interrogator gets to describe the kind of card they want in some way.
---
So I halted the experiment I was doing with the peepholes. I did not get the results I wanted (this time).
What happened was that the training loss and validation loss dropped down into the very low 0.30s. I was hopeful that we'd break through that barrier. But by epoch 15 (5 epochs into the decay of the learning rate), it started skyrocketing back up again. I don't have a graph to show you at the moment, but if I did, it would be shaped like a V. Absolutely bizarre. It's also interesting to me that we overshot the goal after we began to decay the learning rate.
Jml34, you noted that you got subpar results with ADAM with default parameters? I can try again with RMSprop to get a better sense for how the peepholes should behave with all else being kept the same. I think with some fine-tuning we can get better results like Tiir described, I just need to figure out what's causing the problem.
For reference, here's sample output from the checkpoint at epoch 13:
|||instant|||O||{^^^^UU}|uncast target spell. its controller loses life equal to the number of creatures you control.|
|||creature||human shaman|N|&^^/&^^^|{^GGGG}|{^}, T: tap target creature.\when @ leaves the battlefield, you may put a &^/&^ green elemental creature token onto the battlefield.|
and here's output from the checkpoint at epoch 15:
||{W{^^^^^^BW^^^}|^^G^^^^^^^^^}|cnes in chote th beganhireoture tn cantoton thi catt toreat card tore tarard. rett creature ent prmr toreaturee entur l wheand cnd onds pet of thant cour socn then thentour tau geaf the baats y ant on iwe tpor tore t||
||c t \wate ca treaan thee themant u oltanare tire. s ses &^ehenn wat pat ant then a d tf nte battlafmou tn them chant.|
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I believe there's no issue in the rules for a card to be put into play in a state such as attacking. I think it is supposed to limit when you can use such an effect to only the attack phase, though. Ninjitsu works in a similar way, as do a handful of other cards. A good comparison with a token would be Alesha, Who Smiles at Death."
If I get any good results, I'll be sure to share them.
EDIT: The code definitely works, but I'm running into some memory issues. I can get a few iterations out it, and the Swiss National Park starts to turn into a work by John Avon, but then I get an out of memory error. This code probably isn't as optimized as it could be, I bet. But that's okay! Even if I can't optimize it for my machine, I just reacquired my desktop machine from back home, an Alienware machine built for gaming with two Nvidia Geforce cards. It has a few years on it, but it's more than equipped to do image processing work...
Ah, but wait, adding in a call to the garbage collector may have fixed it. We may be back on track. I'll let you know.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Well, it should still say that it's only usable during the attack phase. And it should only let you generate a token that's attacking during your combat phase, or blocking during an opponent's combat phase. And it should probably only let you generate a blocker if there's something that the token will be able to block. Or at least it should generate the token, then tell you to assign it to block something if possible (notwithstanding the token being tapped!).
Or maybe it doesn't actually have to be blocking anything? If a blocked creature gets destroyed, are the blockers still considered “blocking”?
EDIT: Actually, there's no issue at all about entering the battlefield blocking. Should've just checked the Comprehensive Rules to begin with.
509.7 sez you do indeed get to choose what creature's getting blocked if nothing chooses for you. And 506.3d sez it's never “blocking” if it doesn't manage to block anything. Oh, and 506.3b sez you can use Thraben Mist anytime and the thing will never actually be “attacking” if not applicable. So actually the card's meaning isn't ambiguous at all. They don't call them Comprehensive for nothing ...
In the meantime, I can try to get Sammim's implementation to produce higher quality results (I suspect I just need to turn all the parameters up on a better machine, which I now have). I could see us integrating this into our card creation process, at the very end.
In the meantime, I've attached a low-res, low-quality version of Jace Beleren as drawn by Rebecca Guay.
More and better stuff to come in the future.
EDIT: FYI, I used Elvish Piper as a reference. Notice that while it stole some of the background details from the art, it did not change the central focus of the image except to restyle it. The Deep Dream approach, by comparison, would have given us Jace Beleren with a flute sticking out of his head.
EDIT(2): But it's not really borrowing from the style image either. For instance, it just recolored the soil and the background trees are actually arranged differently in the style image.
EDIT(3): Landscape photography seems to do well. And I suspect that, with some tinkering on my part, I can get this thing to turn photographs of cosplayers into Magic art.
EDIT(4): Aha! I see what it did. It turned the clouds in the original Jace image into trees.
EDIT(5): Yes, I definitely can get this thing to run a lot better. Sammim intentionally downgrades everything so he can run his code on his Macbook. I can fix that when I get home. In the mean time, I've attached a Starry Night Jace. This one is problematic because his spell starts to eat his whole arm and mesh with the night sky. I think at a higher resolution we won't see this issue though.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Not really any difference.
Um. Numpypy. Would not recommend. At least, not for any little stuff like this.
I've grabbed the latest JSON dump, and I'm thinking about how to process it to turn it into input. Among other things, using the dump would get me a better-quality data set for my race encoding stuff.
However, my first pass at figuring this stuff out was to pop open the dump in my text editor. It reacted poorly to a file that's just several megs of text on a single line.
I would really like to try this on my machine but I doubt my paltry 1gb video card, CUDA or no CUDA, will be able to produce anything of decent resolution; would I be correct?
edit: Is there any way to get rid of the weird 'rainbow bump' effect? It's almost like crinkled cellophane. It also looks too 'griddy', if you know what I mean (you can see an underlying grid pattern).
Sammim said that "On a 2014 MacBook Pro with an NVIDIA GeForce GT 750M, it takes a little over 4 minutes to perform 500 iterations of gradient descent." That's the hardware he's using. Mine's slightly more powerful and I can get results in about 3 minutes. I'm not sure how your machine would compare to his. You can scale it down if needed, but that also means you'll get lower resolution results.
And yeah, I totally plan on making a Starry Night Jace at wallpaper resolution and making that my background, just as soon as I can get my desktop configured properly to run Torch. I'm very excited, haha. I'll be sure to give y'all a copy.
And if anyone can think of any other mashups they'd want me to do, I can queue up a bunch of them with a script. Once I have the better hardware ready to go, that is.
Oh, I know. That's because of the low parameters I'm currently using. At higher resolutions, things should be more detailed and less noisy, and you shouldn't have that grid effect.
Also, as a side note, we're limited at this point by the quality of our image recognition. I think that if the network doesn't understand what it's looking at in the style/source image, then it can misapply textures. For example, take a look at Jace + The Great Wave off Kanagawa that I've attached. The better our feature recognition capabilities, the better we can apply elements from the style image to the content image. As is, I think the network does best if you use photographs for content, as that's what the convnet is trained on. But it does pretty well overall when using works of art for content images, like Jace.
EDIT: BTW, I think I can tweak some parameters to make it so that Jace doesn't get washed away by the waves. Just have to compensate for the mistake.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
GODDAMNIT, THEY STOLE MY MECHANIC AGAIN! (Though to be fair, I eventually just un-keyworded Stormtouched and made it a supertype, like Arcane, and will probably use Devoid for cards with colored casting costs, but still!)
Also, me gusta the promise of text-to-image generation in the near future. Godspeed to you, pioneers!
EDIT: THEY STOLE CONVERGE, TOO! FROM THE SAME SET! THOSE BASTARDS!
I got a couple of mysterious keywords in the mix, and would like to query the net as to what they mean. However, when I put anything into the "primetext" option of sample.lua, I get this error (command included for context):
If I remove the -primetext option, the sample operation proceeds without issue.