From memory, they used Monte Carlo Tree Search to crack M:tG combat between vanilla creatures only. They had some results about different variants of MCTS, one with negative rewards for loss which if I got that right, didn't actually add much to their results.
Oh, yeah, I read that one before. Not deeply, but I remember it.
MCTS seems nice because it's a technique that's domain independent, and evidently in 2011 there was a dissertation that showed you could parallelize tree search over GPUs, which means that you can throw lots of computational resources at it. I see a lot of potential value of it.
In general, state space search is a very useful approach. It's powerful, and we know its strengths and weaknesses very well (well, I do because it's central to my dissertation, but I mean we as a community). That won't be going anywhere any time soon.
But I think that deep representation learning could provide a very good understanding of what the state of the game looks like, and you can easily build state space searches on top of that.
Just some food for thought and earlier results. Cracking M:tG is going to be hard. "Semantics is child's play compared to that" kind of hard.
But it can be done. I know it can.
I think the half-way solution is a hybrid of our tried-and-true approaches combined with the new, powerful forms of sorcery that we have been investigating. I think we can cheat our way to some "free lunches" with special knowledge derived from a more sophisticated understanding of the semantics.
That's within the realm of possibility now. It's not a trivial problem to solve, but it's definitely do-able.
But I'd say we've only truly won when we have a generalizable solution, or at least one that we can mostly carry over to new applications (like transfer learning). That'd be a sign of robustness. You see a little bit of that in that paper I linked you to about the text-based games.
What I'm suggesting is that you will be able to train a system to play Magic. Then you will be able to retrain that same system to play Yu-Gi-Oh, and it'll learn to play that game faster and better compared to a new system trained from scratch just to play Yu-Gi-Oh. The snowball just keeps getting bigger and bigger as it rolls downhill, and the effort it takes to get quality results gets cheaper and cheaper.
As for that, we're not there yet, but we will get there too, haha.
Playing Magic optimally - and knowing it's optimal - is hard enough for humans! The AI has no need to rare draft a foil Goyf though, so that's something.
Perfectly legal, acceptable, and unadventurous. Even the names are boring and trite! At the same time, raising the temperature also means there's more room for mistakes to creep in. Many of the cards are just fine, but some leave you with more questions than answers. I've identified three rough categories of mistakes and oddities that you're likely to see, as exemplified by these cards:
1. Cards that are legal but onerous in their specificity.
Master of Dawn G
Creature - Human (Rare) 2G: Regenerate target creature. Activate this ability only during your upkeep.
1/1
#Hurray for debilitating restrictions! This looks like an attempt to make an ability on an underpriced creature "fair".
Actually, this is not that bad, simply because a Regeneration Shield lasts until the end of turn, it's not like you have to activate Regeneration in response to the destruction, even though that's usually the best way to do that.
Private Mod Note
():
Rollback Post to RevisionRollBack
"If you're not having fun, what's the point of living forever?"
So, on this- I read the paper and I realised there's an important difference with our case, namely that M:tG is not a computer game so there's no "native" representation of the game state that can be learned.
If I can gloss over the details a bit for brevity here, what I mean is that M:tG being a physical game and not a computer game, it doesn't have a text- or graphics- based interface between its internal state and the outside world, or rather, a computer.
So what that means is that for an AI agent to interact with the game, someone needs to provide a representation of the game in a computer-readable format, be it text or graphics or whatever. And we know that such representations are a classic AI PITA.
Does it help that there are fan-made implementations of Magic? Some of them can handle 99% of all cards.
Actually, this is not that bad, simply because a Regeneration Shield lasts until the end of turn, it's not like you have to activate Regeneration in response to the destruction, even though that's usually the best way to do that.
Sorry, I should have been clearer. That was my point.
So, on this- I read the paper and I realised there's an important difference with our case, namely that M:tG is not a computer game so there's no "native" representation of the game state that can be learned.
If I can gloss over the details a bit for brevity here, what I mean is that M:tG being a physical game and not a computer game, it doesn't have a text- or graphics- based interface between its internal state and the outside world, or rather, a computer.
So what that means is that for an AI agent to interact with the game, someone needs to provide a representation of the game in a computer-readable format, be it text or graphics or whatever. And we know that such representations are a classic AI PITA.
True, but we're better off than one might think. For the purposes of the game, a card's text is whatever its oracle text is. And if you count Magic Online, then the game does have a computer readable format, or is at least amenable to having one.
And, as you pointed out, there are fan-made implementations out there as well.
So this is what we need to learn, this is the first step towards learning a control policy for an AI player: we need to learn this natural representation of the game that is also the game itself. Once we have that we have a training environment for the AI player. I think the advantages of this natural representation versus any other representation are clear btw.
I agree with your assessment.I'm not sure what resources I have to spare in terms of moving forward with that idea. However, doing work in that direction could ultimately benefit the generation process.
Assuming I can get everything working with the stackRNN implementation, and I will do so one way or another (I'm currently looking into improvements that can be made), we could probably improve upon the quality of the cards generated by a substantial amount when it comes to coherence, use of abilities like kicker, and so on. I am very confident that we'll overcome a lot of the limitations we've seen in the past.
But there will still need to be a need for cleanup and regularization, and I can see a lot of the ideas that we've discussed being useful in that regard.
For example, a grammar like the one you're developing can already be used to filter out syntactic garbage or, depending on how far you want to take the idea, can be used to suggest corrections that would make cards as syntactically healthy as possible. We could roll that out some version of that idea quite quickly.
For example, this card in the last dump wouldn't be allowed to happen:
Primal Charm 2G
Instant (Uncommon)
Target creature you control becomes instant until end of turn.
Draw a card.
Admittedly, when I turn up the temperature, I make results like this more likely, but I should be able to compensate for that with better filtering/correcting. That way I can encourage creativity without creating more manual labor for myself.
But there's still the semantics barrier that we have to cope with. The RNN can imitate whatever you throw at it, no matter whether its literature, Magic cards, music, etc. And it goes far beyond mere imitation of syntax, because as we've seen, it can derive semantic relationships as well, such as the fact that dragons are often big, red, and have flying. In general, these sorts of predictive models are capable of extracting very nuanced and complex information.
But because the network doesn't know the game itself and how its creations fit into to that game, there will always be certain shortcomings when you use a naive model. I think that a system that has even a rudimentary knowledge of how the game actually works, even if its not very good at playing it, could come up with creations that are more grounded.
For example, take a look at this card:
Overgrowtreach 2G
Enchantment (Uncommon)
Creatures you control get +1/+1 and have basic landwalk.
The network does not understand how obscenely high the price-performance ratio of this card is. If you were to sit down and try it out, you could quantify the value that it adds to a board state (relative to some arbitrary set of cards that defines a power-level "baseline"). Even if you just get a very rough estimate, you get an additional measure that you can use to help filter and tweak cards. You could even work backwards from the measure to train a power-level predictor for cards that it hasn't tested.
I'm not saying that that should be a high priority at this point, but it's a definitely a direction that we can head in in the future. I think that's one way that you can make generative models of game content more robust, some kind of feedback mechanism using simulation.
EDIT: Oh, and on a related note, I just read that Google's Deepmind team have improved upon their earlier work with teaching a deep net to play Atari games (news story here, article here.) Previously they were able to beat human-level performance for 23 Atari games, now they're up to 31 games. I'll have to look into that paper later. I'm sure we'll see even more dramatic improvements in the coming months.
EDIT(2): Oh, by the way, I did something for fun in between runs of my experiments today. I took the art for Deathmist Raptor and restyled it as a medieval black and white woodcut (here's the reference image I pulled from Wikipedia). I then recolored the resulting image using the original. In traditional woodcut art, they try to deemphasize background scenery as it otherwise adds to the noise, but the network doesn't know that, so it's a bit noisy. All in all though I'd say the result looks pretty convincing.
EDIT(3): I should say that the art looks convincing except for the fact that raptor attacks were vanishingly rare in medieval Europe, so they were unlikely to be the subjects of woodcut art. Dragons, on the other hand, are well attested in various bestiaries.
EDIT(4): I'm investigating an issue that I've been having with that data-structure augmented network. I'm getting identical losses on each pass, which shouldn't happen. That might actually be an issue.. and if so, I might have a way of resolving it.
I've concluded that there has to be a bug in my modified version of that data structure augmented network code. I let a test run overnight, just to prove it, and the losses stay the same over the course of dozens of epochs. I'm getting a Test/validation accuracy of zero percent, meaning that I perpetually get the wrong output. That indicates a problem with the way the data is being handled (i.e. garbage in, garbage out). As far as I know, the data is being read in correctly, so there may be some screw-up in the code where I'm not scoring the results properly. As in, I might be telling the network "Oh, the output you gave me was 'c r e a t u r e', and I was expecting '0klm41$!!#(*'. Please review your answers and try again." That's looking very likely at this point.
On the bright side, me diagnosing the issue is a step towards getting everything working.
EDIT: I might have fixed it. At the very least, I'm much closer to having it working if it's not working now. And, incidentally, it runs twice as fast now. So that's also good.
EDIT(2): I'm running a small test on the weaker machine. A single layer LSTM, 256 wide, with a single stack that's 200 deep (equal to the sequence length, because that's the deepest we could possibly push values). I'd run a multi-stack architecture on the big machine, but I've got some experiments running at the moment, and the results are looking too good for me to interrupt anything now. If the small network converges, I'll look into modifying a sampling script to see what kind of output it produces, and then we can go from there.
EDIT(3): As an aside, I'm sure most of you already know about it, but I am very excited about the Mars discovery. The presence of flowing water greatly increases the number of suitable sites for colonization. The mere prospect of off-world colonization could translate into more funding for research into adaptive, autonomous systems. There's a lot of areas where the sort of stuff we've been talking about can translate into useful technologies, like deep learning for motor control policies (e.g. a robot loses function of one or more limbs due to malfunction/falling rocks, and has to invent a new way of moving in order to limp back to base). If mankind is to ever become a multi-planet species, we'll need to rely on sophisticated and independent automation to shape alien environments. When I tell people that the primary research objective of my team is the long-term survival of our species, I'm being completely serious.
Wouldnt i be better to somehow test a single "cell"/"Neuron" on its functionality instead of using a slice? If there is a problem with your coding a single cell should be able to show the problem as well while being faster to debug. Atleast you can retrace the behaviour from the inputs to the expected outputs even if its just a off by one error in the stacks or something
You might need to write some testtool thought and construct a good testinsuit with boundarycases and such,
I've studied the code and the neural network architecture looks sturdy enough. It's an LSTM network just like we've been using this whole time, and everything appears to be wired correctly. There are extra connections that allow the network to send and receive messages to the stack. Now, if the stack were not working properly and producing noise, the network would compensate by weakening that communication channel (so as to dampen the noise). Then we'd just be left with an ordinary LSTM network, and we already know those work.
Rather I think the problem has been with the framework built around the network. You have to get the cards in, reduce everything to a format that the network can handle, do the training regime, etc. It looks like there were some problems with it. I think I fixed those issues. We'll see, of course.
So like I say- it depends on what you want to do. If you want an AI player you don't need all this fuss I describe. I'm just a bit hung up on software correctness and theoretical rigour and so on, that's all
Then again, lots of the time real breakthroughs come from the blind side of theories. Such is life.
And that's a good thing! It's good to demand theoretical rigorousness.
Right now we're just in a state of uncertainty when it comes to things like deep neural networks. They work well, but we don't yet have the theoretical infrastructure needed to reason about them in a sophisticated way. But we will get there.
There's no reason why it wouldn't work like you say. Actually I like that idea, of suggesting corrections. THELEMA is currently deterministic (and for that, quite fast) but I could add an option for it to calculate probabilities of derivations as it goes along, then when it is given a string it rejects as ungrammatical -or rather assigns a very low probability to, it could respond with the most likely strings following the last token before the ungrammaticality in its input, if that makes sense.
What you're describing is exactly what I had in mind, though simple filtering would be useful too.
Agreed- you don't need an expert, even a rudimentary understanding of the game could go a long way towards weeding out stuff that just doesn't make sense. Hopefully that won't stiffle its creativity though, 'cause I think with humans, the more rules we learn the more our imagination shrinks. If I may be permitted a highly speculative aside
Studying code is nice but you get "Arbeitsblindheit" we would say in German, because it "Looks" right in the code it "has" to be but more often then not you get crap. Normally we in Softwaretesting do blackboxtest so without knowing how a piece of code works. What we do know is the behaviour according to the designdocuments. On these Behaviousr we build our tests.
What i suggest is doing Test without looking into the code BUT going by your designnotes/documents predicted behaviour. Say if a neuron has to fire (Just as oversimplyfied example) upon an input of between 0 and 128 i would test these two Boundaries (Does it fire at 0? Does it Fire at 128? Does it not fire at 129?). This way you check if the actual Behaviour conforms with the predicted behaviour without knowing the code. This can give you a overview if you have bugs and depending how you use it where they are.
You're absolutely right. Arbeitsblindheit is definitely something to watch out for. Part of the problem is that the network is adaptive and changes over time. The code and the behavior of the code are largely independent. I can only judge the network based on the outputs it produces.
Also, I like the article! Thanks for sharing.
---
Good news is that I know that I fixed the slowness issue. I can train these networks fairly quickly now. The original code expected inputs like "7 + 5 = 13", where you have a left-hand side that you get as input and a right-hand side that you have to predict. The problem I had was that I fed it the whole card as input and it had to predict the whole card as output, which is not what I wanted. That doubled the number of timesteps, which slowed everything down and put a strain on memory. That's why I thought the implementation was inefficient. Fortunately that's resolved now.
Anyway, for my test run of the stack augmented network, loss went down and then it started going back up... but that it went down at first is a good sign. I'm going to try doing a slightly larger test on the big machine this morning to get a better feel for what's going on. I need to try tweaking some parameters.
As a sidenote, I tried applying that woodcut art to Liliana of the Veil. Results were decent, though I think the effect works best when you have off-camera lighting (especially for the recoloring). The algorithm left very soft parts of the image untouched like the energy from her hands.
One good art result though. Living Tsunami combined with The Great Wave off Kanagawa by 19th century Japanese artist Hokusai is a beautiful combination. It helps that the two artworks have the same subject, haha. The only problem I had was that the Living Tsunami artwork that I found online was rather small, so I decided to upscale the image and try again (letting that run now).
EDIT: I uploaded a larger version. The image was randomly primed, so you do see some noise like the seaspray in the upper right corner (and it missed a spot in the background), but I love seeing all the rich colors.
Holy moly that Living Tsunami combo art is stunning. A perfect combination if there ever was one. Any chance of a 1080p version of that? (And Starry Jace?)
Looking forward to the results from the larger stack test... is there any way of sampling the output yet though?
Double the speed is a good improvement. But talking about predicting cards, I assume it goes in card order, but have you tried/considered doing it the other way around? That is, it starts at the end of a card, and learns everything in reverse order. I imagine this is also a way to make it check if cards are "proper" more, by creating a card either left or righthanded, so to say, and then checking it the other way around to see if it finds it realistic that way too.
As for the artwork, I imagine that the best result will indeed mostly be to similar arts to begin with, from colour palette to what's depicted, unless you train it on a larger amount of art I guess.
Actually yes, we've considered that exact idea. There's a lot of research out there now about bi-directional recurrent networks, where they read the same input in two directions. That way you can reason about the likelihood that dragon implies a firebreathing ability and the likelihood that a firebreathing ability implies that the creature is a dragon.
Holy moly that Living Tsunami combo art is stunning. A perfect combination if there ever was one. Any chance of a 1080p version of that? (And Starry Jace?)
Sure, I can rerun Living Tsunami for you later. But I can't make the big version of the image much larger than I already have without straining the system due to the way the algorithm is implemented (my machine could probably do arbitrarily large images if the algorithm was implemented in a different way. If the size I make it isn't quite to your liking, you can use waifu2x or some other smart upscaler to get a higher res image. And is there anything you want tweaked? More Hokusai, less Matt Cavotta? Anything like that?
And yeah, I can look into getting a better Starry Night Jace for you. I just need to play with the parameters some to get the desired effect.
Looking forward to the results from the larger stack test... is there any way of sampling the output yet though?
Well, not yet, but I'd just have to modify the sampling script to accomplish that. That shouldn't be too hard to do. The I/O handling for the network will be mostly the same.
---
So I ran a single stack, depth 200, with a single layer LSTM with 512 cells. I'm looking at the numbers now. The training loss went from 4.6, to 3.0.. 1.3.. 0.8.. 0.4.. 0.3 (about where we normally stop).. 0.2.. 0.1.. 0.05.. 0.01..
Then losses actually went ever-so-slightly negative, which I think is due to the fact its losses are so low that the measurement I'm using can't accurately reflect it. That's the point at which everything stabilizes. I'm seeing rapid convergence after just one epoch, that is, one pass over all the input.
I don't want to get everyone's hopes up just yet, because who knows, there might be an issue with how I'm reporting the numbers, or we might actually be seeing overfitting (which we can compensate for). But what I can say is that it's definitely learning something from the input, and it learns it incredibly fast. But we'll have to sample the network in order to get a clearer picture of what's really going on.
EDIT: I will need to look into some stuff for the sampling script. I ran into an issue because the data structure augmentation part uses the batch size value directly. If I train a network with a batch size of 50, I have to sample it by passing it 50 inputs (49 of which I don't care about, so that's not very efficient, and that complicates the sampling script). I'll get it figured out. There's a workaround, I'm sure of it.
EDIT(2): Getting closer. I'll have it figured out. It's just that the structure of the network doesn't quite match up with how I do the sampling script, but it can be made to work.
By the way, earlier I had some fun making a mockup of what a synthesis of all the stuff we've been working with will look like. I think it's fairly close to the quality that I'm expecting, since we'd be turning over artistic direction to the machine, and it might not always pick the best flavor text or art combo, lol.
1) Pass 49 zero inputs?
2) Whatever it needs those 50 for, tell it to skip all but the first?
Hack fixes like that will often address the issue in the short term, but make the code messy and prone to break in unexpected ways as soon as something changes to render the hack irrelevant. Better to address why it's requiring that 49x extra work than to paper over the behavior.
So I think I might have found a problem with the code, after doing some deep testing like Mel_vixen suggested (thanks!). There was a missing component, a layer that needed to the architecture that is present in our mtg-rnn code but not in the neural stack code. It was causing all kinds of problems with how we judged the output of the network. It wasn't a problem for the toy problems that came with the code, but it was definitely an issue for doing the kind of text prediction we want to do.
I also changed the network code up a little bit. Originally, they used a compressed encoding of inputs that saves space, but I really wanted it to be a plain one-hot encoding... so I replaced it. I also moved all the encoding stuff to the inside of the network code rather than sitting on the outside (like we do with the mtg-rnn code), so I don't have to worry about all that when I'm writing the sampling script.
I'm rerunning everything and I'm seeing convergence at a healthy and predictable rate. Not sure how low we'll be able to go, but I have high hopes.
I'm waiting to see how this test run goes before revisiting the sampling script issue, which now should be more easy to take care of. I'll let you know what happens.
By the way, in my recent experiments for my dissertation research, I've noticed that for certain classes of problems I get much better performance with a GRU architecture than an LSTM architecture (see example graph). Now, that's without the added bias terms for the LSTM network or anything like that, so it's possible that they'd be much closer in performance with some minor modifications, but I just found that to be interesting. I'm looking into that.
Of course, if I can get the stack-augmented network working, I should be able to get results much better than any of those three you see in the graph.
EDIT: By the way, as a shameless plug (it's related, I swear), some of my fellow researchers recently put out a paper entitled "All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines" (see here), and I'm both happy for their success frightened by the disturbing possibilities.
Just as we can imitate the style of artwork or the text of Magic cards, we can also use these sorts of powers for more nefarious purposes. In this case, imitating the voice of a person to bypass voice biometrics systems and to fool other people. Seriously, test subjects struggled to distinguish a recording of Morgan Freeman and someone else whose voice was morphed to sound like Morgan Freeman.
The good news is that Morgan-Freeman-Bot will narrate our lives for us in the near future. The bad news is that anyone with a grudge could hypothetically steal your voice and your handwriting and ruin your life in any number of ways.
EDIT(2): The good news is that I can sample the network. But the first one I trained ended up with a loss that was a bit too high for my tastes. Example outputs:
Spirit of the Sentinel 6
Creature - Spirit (Common)
Flying
When Spirit of the Sentinel enters the battlefield, you may sacrifice two lands that share a color with the top five cards of your library.
3/3
Whictonotig Stal 1B
Instant (Common)
Gain control of target creature. It's still a colorless heart of U.
But that's okay, I didn't expect this first one to turn out perfectly, given that it's actually smaller than the ones I've been working with, and I have plenty of parameters to tune. I'm going to try making some changes and see what I get. The good news is that I did some diagnostics on the stack and it's definitely using it to store values, so that's something.
Actualy i would love to have bots doing Naration work and i have to admit i would love to have a different telephone voice i get called "Sir" far to often. This also could be awesome for Anime, You could get interresting voices that are instructed(?) by the best Voice-actors. Also we could get the BBC/nature documentation voice to narrate your Homevideos! Heck Carl Sagan or Neil Tyson degrasse voicing over your science projects
Actually i could see it as part op a electronic Voicebox given that you have enough recordings of your own voice or a mixture of different voices (to get a distinct one). This could help a ton of people with injured voiceboxes/windpipes giving them a feeling of normality. With 3D printed Windpipes and rather small electronics nowadays you could even make this "invisible" enough to "look" normal.
Not a bad idea, I agree!
Oh, on a related note (for the future of medicine): researchers at USC are moving forward with human trials for that neural repeater I mentioned a few months ago (not sure if you were around when I mentioned that). It does for the human brain what data structure augmentation does for my networks. It keeps thoughts from escaping you, good for people with illnesses like Alzheimer's disease.
Of course, I could honestly see this as a stepping stone towards horcruxes for the wealthy and powerful, and that's a bioethical conundrum in and of itself. The future is shaping up to be very weird...
---
I started a 2-layer, 2-stack network, just to have another benchmark architecture. I'll let that run for a bit and check back on it later.
Well, I guess this means making audiobooks can be much cheaper if you only need to record a sample of each voice, not the whole book, in the voice of $celebrity.
So the training loss went negative at some point, which might be indicative of some kind of glitch (I'll look into it). Here are some results...
Mind's Eater 1
Artifact - Equipment (Common)
Equip 1
Living Weapon (When this Equipment enters the battlefield, put a 0/0 black Germ creature token onto the battlefield, then attach this to it.)
Equipped creature gets +3/+3 and has "At the beginning of your upkeep, sacrifice this creature unless you pay its mana cost."
#That's actually a really clever design. But the drawback isn't really one, because the Germ doesn't have a mana cost associated with it.
Fool of Calamunsia 3B
Enchantment (Rare)
At the beginning of each player's draw step, that player loses 1 life for each card in his or her graveyard.
Elgaud Relic 4
Artifact (Uncommon) XX, T: Prevent the next X damage that would be dealt to target creature or player this turn.
#Not bad, not bad. I mean, bad card, but good use of X.
Fire Dragon 5RR
Creature - Dragon (Uncommon)
Flying 3R: Fire Dragon deals 1 damage to each creature and each player.
4/4
#Sometimes the cards are perfectly ordinary...
Dictate of the Hunt 4GG
Creature - Hound (Uncommon
While you're searching for your last turn, transform Dictate of the Hunt.
5/5
// //
Stabeso Pulse
Sorcery (Rare)
Stabeso Pulse deals 11 damage to each creature and each player.
#... but other times I get some really bizarre cards.
Tame the Earth 3U
Instant (Common)
Detain target player's hands.
#ROFL!!!! See what I mean? Every so often I get weird stuff that I've never seen before.
Felidate 5B
Sorcery (Rare)
Target player gains control of Felidate unless he or she pays 4 life.
#And again!
...
Okay, something's not quite right. If I zero out the contents of the stacks at every time step, I get the exactly same cards for the same random seed. And yet the contents of the stack are changing. It's as if the network writes to the stack, but it doesn't bother to read from it. So yes, something definitely went wrong with that run.
"You can write?"
"Oh yes, writing is easy! One day we hope to learn to read too!"
So it writes everything down, but the writings are just unintelligible scribbles. That suggests a problem with the way that I trained it, or a problem with the wiring. Well, at least that gets me closer to figuring out how to make it work.
EDIT: I think I know part of the reason. I made the short-term memory strong enough that it didn't need to rely on the permanent storage for most things, so it learned to live without it, even though it would be better off with using the storage. So I need to tweak some parameters more. Ugh, I am so close to having this working, I know it. It's a very careful balancing act.
Mind's Eater 1
Artifact - Equipment (Common)
Equip 1
Living Weapon (When this Equipment enters the battlefield, put a 0/0 black Germ creature token onto the battlefield, then attach this to it.)
Equipped creature gets +3/+3 and has "At the beginning of your upkeep, sacrifice this creature unless you pay its mana cost."
#That's actually a really clever design. But the drawback isn't really one, because the Germ doesn't have a mana cost associated with it.
The drawback comes into play once you equip the artifact onto something other than the Germ it came with. You still get a 3/3 black critter for 1, though, which is a bit much!
Actually, yeah, that's a really good point! Nonexistent costs can't be paid, so the Germ dies without having ever gotten to attack. Still a pretty strong card, I think, but more reasonable than it first appears.
TL;DR: I'm working on the data structure augmentation thing. Hopefully, I'll have better results to share soon.
I didn't get the chance to improve upon the stack-augmented network yet last night (my bf and I were celebrating our 5th anniversary (wow, 5 years already, crazy). But it'll definitely be on my mind today.
In some ways, I should have foreseen the issue, if only because neural nets are the embodiment of lazy inaction, and asking them to come up with a policy to use a data structure is a mentally taxing thing that they will avoid if at all possible.
From the literature, I know it can be done, but let's face it: a lot of these papers only really considered simple cases. For example, let's say you have strings of characters of the form A^nB^n, where n is some integer (i.e. you have a series of A's followed by a series of B's, and the number of A's and the number of B's have to match). The network is asked to decide whether the string is a valid one or an invalid one (a string with 3 A's and 5 B's would be invalid, for example). A small LSTM network can manage this, because it can learn to count (just as the network computes the CMC of our Magic cards). The problem is that as the inputs get longer and longer, it starts to second guess itself (i.e. "Did I see 12 A's or did I see 11 A's? Because I have 11 B's right here.").
If you give the LSTM a stack, it probably avoids using it until it starts getting into situations where it has to use it to get the right answer, but once it picks up on the usefulness of the device, it rapidly comes up with an effective policy to use it. The authors of the stack-RNN paper show that the network learns to push A's onto the stack and then removes them when it sees B's. If the stack is empty and you still have B's or if the stack has A's and you've run out of B's, then you know that the string is invalid. The network can come up with this solution all on its own.
So, how we can get the network to make use of these data structures for more complex problems? It's abundantly clear that it would be helpful. For example, if the network pushed the symbol K onto the stack when it sees the kicker ability, then it can keep it there as a reminder that it needs to add an "if kicked" clause; it wouldn't forget to add it because at every time step it sees the K and thinks "oh, that's right, I have to fit a kicker clause in here somewhere".
The problem is that the network is good enough at generating Magic cards without the aid of permanent storage that it's going to be stubborn about using it. What we saw last night was that the network shut off the connection that allowed it to read from the stack because unless you learn to use it properly, the output is just noise. It kept writing nonsense to the stack because that can't harm you unless you also try to read your own writing. That's why when I traded out the data structure for a blank one, the network didn't even notice the change (it generated the exact same cards without skipping a beat).
So now we just need to figure out how we can train this network properly. On one hand, I need to give it plenty of neurons because those enable the system to do the kind of complex computations and reasoning needed to go from red to creature to dragon to big, flying, and firebreathing. On the other hand, those same neurons can also be used as short-term storage units that compete with the long-term permanent storage. When I showed you that graph of the neuron activations awhile back, you saw that half of the cells glowed with a dim light and hummed continuously; all of those cells are dedicated primarily to storing and filtering data. But that's part of the beauty of these RNN units: they can do both computation and storage; trying to change that would like trying to make water not wet.
I can see several solutions:
* Identify the precise number of neurons needed to make Magic cards, not including storage needs. That forces the network to turn to the stack(s) for help. That's easier said than done though.
* Using dropout on the network could make the reliability of permanent storage more attractive. For those of you that don't know, dropout is where we cause a certain number of neurons to go silent at each time step during training. This is a powerful regularization technique. It might also help us to get the network hooked on using a data structure early on.
* Force the network to read from the stack at every timestep by preventing it from closing off the connection. That's a bit drastic and it might harm the performance of the network. I am certain that some mission-critical neurons are deliberately insulated from the stack so that the noise can't interfere with their computations; some neurons will want to read from the stack and others will want to ignore it.
* That Neural Turing Machine paper that I mentioned awhile back said that, for the problems they tested on, a Turing Machine controlled by a plain, non-recurrent neural network did better than an RNN-controlled machine. That is, a network with no capacity for memory learned to use the data structure more quickly and effectively. Now, at the time I said "¿Porque no los dos?" with regards to short-term and long-term memory. After all, human brains are able to work out a compromise just fine. I'm still of that opinion, but I now understand the problems they were having.
I won't have much time to experiment with that today, however. I have a submission to a research journal that I need to finish peer reviewing. However, it's definitely on the agenda. Getting some answers to these questions would benefit my dissertation research as well as the Magic project, lol.
Also, I'm still working on getting a good, high resolution Starry Night Jace image. When I generated a small image with the DeepDream network, the brush strokes were much larger, warmer, and more intimate. Right now the VGG network is giving me problems because it latches onto the cold of the night, and that produces a very dark and austere image. Not quite the effect I'm going for. But I'll get it figured out.
---
EDIT: For the record, I ran an experiment this morning. 1-layer LSTM, 256 wide, with 4 stacks on the side, each 100 deep. If I reset the stacks during sampling, I get different outputs, so that means the network is in fact reading from the stacks this time. But that network is 1/4th the size that I usually train with. The results aren't as good as a bigger network, even with whatever benefit the stacks offer. But my hypothesis was correct: a less capable network will lean on the data structures to make up for the deficit. Now we just need to figure out how to take a bigger network and get it to actually use the stacks so that it can improve upon its performance even further.
Tame the Earth
3U
Instant (Common)
Detain target player's hands.
#ROFL!!!! See what I mean? Every so often I get weird stuff that I've never seen before.
This actually has value. Any ability on cards that you activate while they are in your hand would be detained. So things like Cycling, arguably Morph, Forecast, and similar abilities can't be used.
Tame the Earth 3U
Instant (Common)
Detain target player's hands.
#ROFL!!!! See what I mean? Every so often I get weird stuff that I've never seen before.
That almost works in SWLCG. If it said capture... but you'd also need to know what objective to capture under.
Hmm, I wonder how well this would work on some of FFG's various LCGs. Probably pretty badly on Netrunner but might do okay on the one's with less abstraction.
This actually has value. Any ability on cards that you activate while they are in your hand would be detained. So things like Cycling, arguably Morph, Forecast, and similar abilities can't be used.
The rules would have to be updated to allow you to detain non-permanents. Cards in hand aren't permanents are they? Even if they are, it's the hand being targeted so would need to either change card to say, "permanents in that player's hand," or to update Detain to detail what it does when Detaining a player's hand.
* Force the network to read from the stack at every timestep by preventing it from closing off the connection. That's a bit drastic and it might harm the performance of the network. I am certain that some mission-critical neurons are deliberately insulated from the stack so that the noise can't interfere with their computations; some neurons will want to read from the stack and others will want to ignore it.
It seems to me that reading the stack every time step is unnecessary if the aim is to get the RNN "addicted" to using the stack. You could have an iterative timing function that forced the network to read very frequently from the stack early on in the run and progressively less so as time goes on. As the network keeps pulling data off the stack to use in card creation, that data is going to end up in the short-term memory anyway, so the short-term/long-term issues should be relieved as the run progresses. If my theory is correct (and it may not be, this isn't my field), then that would provide the benefits of selectively blanking neurons without the associated data loss.
Yes, I thought about that. You had mentioned her before.
Oh, yeah, I read that one before. Not deeply, but I remember it.
MCTS seems nice because it's a technique that's domain independent, and evidently in 2011 there was a dissertation that showed you could parallelize tree search over GPUs, which means that you can throw lots of computational resources at it. I see a lot of potential value of it.
In general, state space search is a very useful approach. It's powerful, and we know its strengths and weaknesses very well (well, I do because it's central to my dissertation, but I mean we as a community). That won't be going anywhere any time soon.
But I think that deep representation learning could provide a very good understanding of what the state of the game looks like, and you can easily build state space searches on top of that.
I think the half-way solution is a hybrid of our tried-and-true approaches combined with the new, powerful forms of sorcery that we have been investigating. I think we can cheat our way to some "free lunches" with special knowledge derived from a more sophisticated understanding of the semantics.
That's within the realm of possibility now. It's not a trivial problem to solve, but it's definitely do-able.
But I'd say we've only truly won when we have a generalizable solution, or at least one that we can mostly carry over to new applications (like transfer learning). That'd be a sign of robustness. You see a little bit of that in that paper I linked you to about the text-based games.
What I'm suggesting is that you will be able to train a system to play Magic. Then you will be able to retrain that same system to play Yu-Gi-Oh, and it'll learn to play that game faster and better compared to a new system trained from scratch just to play Yu-Gi-Oh. The snowball just keeps getting bigger and bigger as it rolls downhill, and the effort it takes to get quality results gets cheaper and cheaper.
As for that, we're not there yet, but we will get there too, haha.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Actually, this is not that bad, simply because a Regeneration Shield lasts until the end of turn, it's not like you have to activate Regeneration in response to the destruction, even though that's usually the best way to do that.
Does it help that there are fan-made implementations of Magic? Some of them can handle 99% of all cards.
Sorry, I should have been clearer. That was my point.
True, but we're better off than one might think. For the purposes of the game, a card's text is whatever its oracle text is. And if you count Magic Online, then the game does have a computer readable format, or is at least amenable to having one.
And, as you pointed out, there are fan-made implementations out there as well.
Well, it makes sense to me, I believe.
I agree with your assessment.I'm not sure what resources I have to spare in terms of moving forward with that idea. However, doing work in that direction could ultimately benefit the generation process.
Assuming I can get everything working with the stackRNN implementation, and I will do so one way or another (I'm currently looking into improvements that can be made), we could probably improve upon the quality of the cards generated by a substantial amount when it comes to coherence, use of abilities like kicker, and so on. I am very confident that we'll overcome a lot of the limitations we've seen in the past.
But there will still need to be a need for cleanup and regularization, and I can see a lot of the ideas that we've discussed being useful in that regard.
For example, a grammar like the one you're developing can already be used to filter out syntactic garbage or, depending on how far you want to take the idea, can be used to suggest corrections that would make cards as syntactically healthy as possible. We could roll that out some version of that idea quite quickly.
For example, this card in the last dump wouldn't be allowed to happen:
Primal Charm
2G
Instant (Uncommon)
Target creature you control becomes instant until end of turn.
Draw a card.
Admittedly, when I turn up the temperature, I make results like this more likely, but I should be able to compensate for that with better filtering/correcting. That way I can encourage creativity without creating more manual labor for myself.
But there's still the semantics barrier that we have to cope with. The RNN can imitate whatever you throw at it, no matter whether its literature, Magic cards, music, etc. And it goes far beyond mere imitation of syntax, because as we've seen, it can derive semantic relationships as well, such as the fact that dragons are often big, red, and have flying. In general, these sorts of predictive models are capable of extracting very nuanced and complex information.
But because the network doesn't know the game itself and how its creations fit into to that game, there will always be certain shortcomings when you use a naive model. I think that a system that has even a rudimentary knowledge of how the game actually works, even if its not very good at playing it, could come up with creations that are more grounded.
For example, take a look at this card:
Overgrowtreach
2G
Enchantment (Uncommon)
Creatures you control get +1/+1 and have basic landwalk.
The network does not understand how obscenely high the price-performance ratio of this card is. If you were to sit down and try it out, you could quantify the value that it adds to a board state (relative to some arbitrary set of cards that defines a power-level "baseline"). Even if you just get a very rough estimate, you get an additional measure that you can use to help filter and tweak cards. You could even work backwards from the measure to train a power-level predictor for cards that it hasn't tested.
I'm not saying that that should be a high priority at this point, but it's a definitely a direction that we can head in in the future. I think that's one way that you can make generative models of game content more robust, some kind of feedback mechanism using simulation.
EDIT: Oh, and on a related note, I just read that Google's Deepmind team have improved upon their earlier work with teaching a deep net to play Atari games (news story here, article here.) Previously they were able to beat human-level performance for 23 Atari games, now they're up to 31 games. I'll have to look into that paper later. I'm sure we'll see even more dramatic improvements in the coming months.
EDIT(2): Oh, by the way, I did something for fun in between runs of my experiments today. I took the art for Deathmist Raptor and restyled it as a medieval black and white woodcut (here's the reference image I pulled from Wikipedia). I then recolored the resulting image using the original. In traditional woodcut art, they try to deemphasize background scenery as it otherwise adds to the noise, but the network doesn't know that, so it's a bit noisy. All in all though I'd say the result looks pretty convincing.
EDIT(3): I should say that the art looks convincing except for the fact that raptor attacks were vanishingly rare in medieval Europe, so they were unlikely to be the subjects of woodcut art. Dragons, on the other hand, are well attested in various bestiaries.
EDIT(4): I'm investigating an issue that I've been having with that data-structure augmented network. I'm getting identical losses on each pass, which shouldn't happen. That might actually be an issue.. and if so, I might have a way of resolving it.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
On the bright side, me diagnosing the issue is a step towards getting everything working.
EDIT: I might have fixed it. At the very least, I'm much closer to having it working if it's not working now. And, incidentally, it runs twice as fast now. So that's also good.
EDIT(2): I'm running a small test on the weaker machine. A single layer LSTM, 256 wide, with a single stack that's 200 deep (equal to the sequence length, because that's the deepest we could possibly push values). I'd run a multi-stack architecture on the big machine, but I've got some experiments running at the moment, and the results are looking too good for me to interrupt anything now. If the small network converges, I'll look into modifying a sampling script to see what kind of output it produces, and then we can go from there.
EDIT(3): As an aside, I'm sure most of you already know about it, but I am very excited about the Mars discovery. The presence of flowing water greatly increases the number of suitable sites for colonization. The mere prospect of off-world colonization could translate into more funding for research into adaptive, autonomous systems. There's a lot of areas where the sort of stuff we've been talking about can translate into useful technologies, like deep learning for motor control policies (e.g. a robot loses function of one or more limbs due to malfunction/falling rocks, and has to invent a new way of moving in order to limp back to base). If mankind is to ever become a multi-planet species, we'll need to rely on sophisticated and independent automation to shape alien environments. When I tell people that the primary research objective of my team is the long-term survival of our species, I'm being completely serious.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I've studied the code and the neural network architecture looks sturdy enough. It's an LSTM network just like we've been using this whole time, and everything appears to be wired correctly. There are extra connections that allow the network to send and receive messages to the stack. Now, if the stack were not working properly and producing noise, the network would compensate by weakening that communication channel (so as to dampen the noise). Then we'd just be left with an ordinary LSTM network, and we already know those work.
Rather I think the problem has been with the framework built around the network. You have to get the cards in, reduce everything to a format that the network can handle, do the training regime, etc. It looks like there were some problems with it. I think I fixed those issues. We'll see, of course.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
And that's a good thing! It's good to demand theoretical rigorousness.
Right now we're just in a state of uncertainty when it comes to things like deep neural networks. They work well, but we don't yet have the theoretical infrastructure needed to reason about them in a sophisticated way. But we will get there.
What you're describing is exactly what I had in mind, though simple filtering would be useful too.
No, that is a legitimate concern.
Haha, yeah. I often use it to indicate that I read/enjoyed a post even though I did not have a chance to reply to it in full when I read it.
You're absolutely right. Arbeitsblindheit is definitely something to watch out for. Part of the problem is that the network is adaptive and changes over time. The code and the behavior of the code are largely independent. I can only judge the network based on the outputs it produces.
Also, I like the article! Thanks for sharing.
---
Good news is that I know that I fixed the slowness issue. I can train these networks fairly quickly now. The original code expected inputs like "7 + 5 = 13", where you have a left-hand side that you get as input and a right-hand side that you have to predict. The problem I had was that I fed it the whole card as input and it had to predict the whole card as output, which is not what I wanted. That doubled the number of timesteps, which slowed everything down and put a strain on memory. That's why I thought the implementation was inefficient. Fortunately that's resolved now.
Anyway, for my test run of the stack augmented network, loss went down and then it started going back up... but that it went down at first is a good sign. I'm going to try doing a slightly larger test on the big machine this morning to get a better feel for what's going on. I need to try tweaking some parameters.
As a sidenote, I tried applying that woodcut art to Liliana of the Veil. Results were decent, though I think the effect works best when you have off-camera lighting (especially for the recoloring). The algorithm left very soft parts of the image untouched like the energy from her hands.
One good art result though. Living Tsunami combined with The Great Wave off Kanagawa by 19th century Japanese artist Hokusai is a beautiful combination. It helps that the two artworks have the same subject, haha. The only problem I had was that the Living Tsunami artwork that I found online was rather small, so I decided to upscale the image and try again (letting that run now).
EDIT: I uploaded a larger version. The image was randomly primed, so you do see some noise like the seaspray in the upper right corner (and it missed a spot in the background), but I love seeing all the rich colors.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Looking forward to the results from the larger stack test... is there any way of sampling the output yet though?
Actually yes, we've considered that exact idea. There's a lot of research out there now about bi-directional recurrent networks, where they read the same input in two directions. That way you can reason about the likelihood that dragon implies a firebreathing ability and the likelihood that a firebreathing ability implies that the creature is a dragon.
Sure, I can rerun Living Tsunami for you later. But I can't make the big version of the image much larger than I already have without straining the system due to the way the algorithm is implemented (my machine could probably do arbitrarily large images if the algorithm was implemented in a different way. If the size I make it isn't quite to your liking, you can use waifu2x or some other smart upscaler to get a higher res image. And is there anything you want tweaked? More Hokusai, less Matt Cavotta? Anything like that?
And yeah, I can look into getting a better Starry Night Jace for you. I just need to play with the parameters some to get the desired effect.
Well, not yet, but I'd just have to modify the sampling script to accomplish that. That shouldn't be too hard to do. The I/O handling for the network will be mostly the same.
---
So I ran a single stack, depth 200, with a single layer LSTM with 512 cells. I'm looking at the numbers now. The training loss went from 4.6, to 3.0.. 1.3.. 0.8.. 0.4.. 0.3 (about where we normally stop).. 0.2.. 0.1.. 0.05.. 0.01..
Then losses actually went ever-so-slightly negative, which I think is due to the fact its losses are so low that the measurement I'm using can't accurately reflect it. That's the point at which everything stabilizes. I'm seeing rapid convergence after just one epoch, that is, one pass over all the input.
I don't want to get everyone's hopes up just yet, because who knows, there might be an issue with how I'm reporting the numbers, or we might actually be seeing overfitting (which we can compensate for). But what I can say is that it's definitely learning something from the input, and it learns it incredibly fast. But we'll have to sample the network in order to get a clearer picture of what's really going on.
EDIT: I will need to look into some stuff for the sampling script. I ran into an issue because the data structure augmentation part uses the batch size value directly. If I train a network with a batch size of 50, I have to sample it by passing it 50 inputs (49 of which I don't care about, so that's not very efficient, and that complicates the sampling script). I'll get it figured out. There's a workaround, I'm sure of it.
EDIT(2): Getting closer. I'll have it figured out. It's just that the structure of the network doesn't quite match up with how I do the sampling script, but it can be made to work.
By the way, earlier I had some fun making a mockup of what a synthesis of all the stuff we've been working with will look like. I think it's fairly close to the quality that I'm expecting, since we'd be turning over artistic direction to the machine, and it might not always pick the best flavor text or art combo, lol.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I also changed the network code up a little bit. Originally, they used a compressed encoding of inputs that saves space, but I really wanted it to be a plain one-hot encoding... so I replaced it. I also moved all the encoding stuff to the inside of the network code rather than sitting on the outside (like we do with the mtg-rnn code), so I don't have to worry about all that when I'm writing the sampling script.
I'm rerunning everything and I'm seeing convergence at a healthy and predictable rate. Not sure how low we'll be able to go, but I have high hopes.
I'm waiting to see how this test run goes before revisiting the sampling script issue, which now should be more easy to take care of. I'll let you know what happens.
By the way, in my recent experiments for my dissertation research, I've noticed that for certain classes of problems I get much better performance with a GRU architecture than an LSTM architecture (see example graph). Now, that's without the added bias terms for the LSTM network or anything like that, so it's possible that they'd be much closer in performance with some minor modifications, but I just found that to be interesting. I'm looking into that.
Of course, if I can get the stack-augmented network working, I should be able to get results much better than any of those three you see in the graph.
EDIT: By the way, as a shameless plug (it's related, I swear), some of my fellow researchers recently put out a paper entitled "All Your Voices Are Belong to Us: Stealing Voices to Fool Humans and Machines" (see here), and I'm both happy for their success frightened by the disturbing possibilities.
Just as we can imitate the style of artwork or the text of Magic cards, we can also use these sorts of powers for more nefarious purposes. In this case, imitating the voice of a person to bypass voice biometrics systems and to fool other people. Seriously, test subjects struggled to distinguish a recording of Morgan Freeman and someone else whose voice was morphed to sound like Morgan Freeman.
The good news is that Morgan-Freeman-Bot will narrate our lives for us in the near future. The bad news is that anyone with a grudge could hypothetically steal your voice and your handwriting and ruin your life in any number of ways.
EDIT(2): The good news is that I can sample the network. But the first one I trained ended up with a loss that was a bit too high for my tastes. Example outputs:
Spirit of the Sentinel
6
Creature - Spirit (Common)
Flying
When Spirit of the Sentinel enters the battlefield, you may sacrifice two lands that share a color with the top five cards of your library.
3/3
Whictonotig Stal
1B
Instant (Common)
Gain control of target creature. It's still a colorless heart of U.
But that's okay, I didn't expect this first one to turn out perfectly, given that it's actually smaller than the ones I've been working with, and I have plenty of parameters to tune. I'm going to try making some changes and see what I get. The good news is that I did some diagnostics on the stack and it's definitely using it to store values, so that's something.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I know. There's a lot of potential there.
One to three hours at most. It would have taken at least 36 hours on the old machine. It's fantastic.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Not a bad idea, I agree!
Oh, on a related note (for the future of medicine): researchers at USC are moving forward with human trials for that neural repeater I mentioned a few months ago (not sure if you were around when I mentioned that). It does for the human brain what data structure augmentation does for my networks. It keeps thoughts from escaping you, good for people with illnesses like Alzheimer's disease.
Of course, I could honestly see this as a stepping stone towards horcruxes for the wealthy and powerful, and that's a bioethical conundrum in and of itself. The future is shaping up to be very weird...
---
I started a 2-layer, 2-stack network, just to have another benchmark architecture. I'll let that run for a bit and check back on it later.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
That is the best rules text I have ever seen.
Mind's Eater
1
Artifact - Equipment (Common)
Equip 1
Living Weapon (When this Equipment enters the battlefield, put a 0/0 black Germ creature token onto the battlefield, then attach this to it.)
Equipped creature gets +3/+3 and has "At the beginning of your upkeep, sacrifice this creature unless you pay its mana cost."
#That's actually a really clever design. But the drawback isn't really one, because the Germ doesn't have a mana cost associated with it.
Fool of Calamunsia
3B
Enchantment (Rare)
At the beginning of each player's draw step, that player loses 1 life for each card in his or her graveyard.
Elgaud Relic
4
Artifact (Uncommon)
XX, T: Prevent the next X damage that would be dealt to target creature or player this turn.
#Not bad, not bad. I mean, bad card, but good use of X.
Fire Dragon
5RR
Creature - Dragon (Uncommon)
Flying
3R: Fire Dragon deals 1 damage to each creature and each player.
4/4
#Sometimes the cards are perfectly ordinary...
Dictate of the Hunt
4GG
Creature - Hound (Uncommon
While you're searching for your last turn, transform Dictate of the Hunt.
5/5
// //
Stabeso Pulse
Sorcery (Rare)
Stabeso Pulse deals 11 damage to each creature and each player.
#... but other times I get some really bizarre cards.
Tame the Earth
3U
Instant (Common)
Detain target player's hands.
#ROFL!!!! See what I mean? Every so often I get weird stuff that I've never seen before.
Felidate
5B
Sorcery (Rare)
Target player gains control of Felidate unless he or she pays 4 life.
#And again!
...
Okay, something's not quite right. If I zero out the contents of the stacks at every time step, I get the exactly same cards for the same random seed. And yet the contents of the stack are changing. It's as if the network writes to the stack, but it doesn't bother to read from it. So yes, something definitely went wrong with that run.
"You can write?"
"Oh yes, writing is easy! One day we hope to learn to read too!"
So it writes everything down, but the writings are just unintelligible scribbles. That suggests a problem with the way that I trained it, or a problem with the wiring. Well, at least that gets me closer to figuring out how to make it work.
EDIT: I think I know part of the reason. I made the short-term memory strong enough that it didn't need to rely on the permanent storage for most things, so it learned to live without it, even though it would be better off with using the storage. So I need to tweak some parameters more. Ugh, I am so close to having this working, I know it. It's a very careful balancing act.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I didn't get the chance to improve upon the stack-augmented network yet last night (my bf and I were celebrating our 5th anniversary (wow, 5 years already, crazy). But it'll definitely be on my mind today.
In some ways, I should have foreseen the issue, if only because neural nets are the embodiment of lazy inaction, and asking them to come up with a policy to use a data structure is a mentally taxing thing that they will avoid if at all possible.
From the literature, I know it can be done, but let's face it: a lot of these papers only really considered simple cases. For example, let's say you have strings of characters of the form A^nB^n, where n is some integer (i.e. you have a series of A's followed by a series of B's, and the number of A's and the number of B's have to match). The network is asked to decide whether the string is a valid one or an invalid one (a string with 3 A's and 5 B's would be invalid, for example). A small LSTM network can manage this, because it can learn to count (just as the network computes the CMC of our Magic cards). The problem is that as the inputs get longer and longer, it starts to second guess itself (i.e. "Did I see 12 A's or did I see 11 A's? Because I have 11 B's right here.").
If you give the LSTM a stack, it probably avoids using it until it starts getting into situations where it has to use it to get the right answer, but once it picks up on the usefulness of the device, it rapidly comes up with an effective policy to use it. The authors of the stack-RNN paper show that the network learns to push A's onto the stack and then removes them when it sees B's. If the stack is empty and you still have B's or if the stack has A's and you've run out of B's, then you know that the string is invalid. The network can come up with this solution all on its own.
So, how we can get the network to make use of these data structures for more complex problems? It's abundantly clear that it would be helpful. For example, if the network pushed the symbol K onto the stack when it sees the kicker ability, then it can keep it there as a reminder that it needs to add an "if kicked" clause; it wouldn't forget to add it because at every time step it sees the K and thinks "oh, that's right, I have to fit a kicker clause in here somewhere".
The problem is that the network is good enough at generating Magic cards without the aid of permanent storage that it's going to be stubborn about using it. What we saw last night was that the network shut off the connection that allowed it to read from the stack because unless you learn to use it properly, the output is just noise. It kept writing nonsense to the stack because that can't harm you unless you also try to read your own writing. That's why when I traded out the data structure for a blank one, the network didn't even notice the change (it generated the exact same cards without skipping a beat).
So now we just need to figure out how we can train this network properly. On one hand, I need to give it plenty of neurons because those enable the system to do the kind of complex computations and reasoning needed to go from red to creature to dragon to big, flying, and firebreathing. On the other hand, those same neurons can also be used as short-term storage units that compete with the long-term permanent storage. When I showed you that graph of the neuron activations awhile back, you saw that half of the cells glowed with a dim light and hummed continuously; all of those cells are dedicated primarily to storing and filtering data. But that's part of the beauty of these RNN units: they can do both computation and storage; trying to change that would like trying to make water not wet.
I can see several solutions:
* Identify the precise number of neurons needed to make Magic cards, not including storage needs. That forces the network to turn to the stack(s) for help. That's easier said than done though.
* Using dropout on the network could make the reliability of permanent storage more attractive. For those of you that don't know, dropout is where we cause a certain number of neurons to go silent at each time step during training. This is a powerful regularization technique. It might also help us to get the network hooked on using a data structure early on.
* Force the network to read from the stack at every timestep by preventing it from closing off the connection. That's a bit drastic and it might harm the performance of the network. I am certain that some mission-critical neurons are deliberately insulated from the stack so that the noise can't interfere with their computations; some neurons will want to read from the stack and others will want to ignore it.
* That Neural Turing Machine paper that I mentioned awhile back said that, for the problems they tested on, a Turing Machine controlled by a plain, non-recurrent neural network did better than an RNN-controlled machine. That is, a network with no capacity for memory learned to use the data structure more quickly and effectively. Now, at the time I said "¿Porque no los dos?" with regards to short-term and long-term memory. After all, human brains are able to work out a compromise just fine. I'm still of that opinion, but I now understand the problems they were having.
I won't have much time to experiment with that today, however. I have a submission to a research journal that I need to finish peer reviewing. However, it's definitely on the agenda. Getting some answers to these questions would benefit my dissertation research as well as the Magic project, lol.
Also, I'm still working on getting a good, high resolution Starry Night Jace image. When I generated a small image with the DeepDream network, the brush strokes were much larger, warmer, and more intimate. Right now the VGG network is giving me problems because it latches onto the cold of the night, and that produces a very dark and austere image. Not quite the effect I'm going for. But I'll get it figured out.
---
EDIT: For the record, I ran an experiment this morning. 1-layer LSTM, 256 wide, with 4 stacks on the side, each 100 deep. If I reset the stacks during sampling, I get different outputs, so that means the network is in fact reading from the stacks this time. But that network is 1/4th the size that I usually train with. The results aren't as good as a bigger network, even with whatever benefit the stacks offer. But my hypothesis was correct: a less capable network will lean on the data structures to make up for the deficit. Now we just need to figure out how to take a bigger network and get it to actually use the stacks so that it can improve upon its performance even further.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
This actually has value. Any ability on cards that you activate while they are in your hand would be detained. So things like Cycling, arguably Morph, Forecast, and similar abilities can't be used.
That almost works in SWLCG. If it said capture... but you'd also need to know what objective to capture under.
Hmm, I wonder how well this would work on some of FFG's various LCGs. Probably pretty badly on Netrunner but might do okay on the one's with less abstraction.
The rules would have to be updated to allow you to detain non-permanents. Cards in hand aren't permanents are they? Even if they are, it's the hand being targeted so would need to either change card to say, "permanents in that player's hand," or to update Detain to detail what it does when Detaining a player's hand.
I think everyone understands what that card is meant to do, even if the rules aren't quite right on it.
It seems to me that reading the stack every time step is unnecessary if the aim is to get the RNN "addicted" to using the stack. You could have an iterative timing function that forced the network to read very frequently from the stack early on in the run and progressively less so as time goes on. As the network keeps pulling data off the stack to use in card creation, that data is going to end up in the short-term memory anyway, so the short-term/long-term issues should be relieved as the run progresses. If my theory is correct (and it may not be, this isn't my field), then that would provide the benefits of selectively blanking neurons without the associated data loss.
Just a thought.
Wouldn't increasing the rate of weight decay make the storage cells less reliable and, in turn, the data structures more desirable in comparison?
BTW, congrats on the 5th anniversary.