Now, teaching a machine to play the game (or at least have a better understanding of it), is possible, but a lot more work needs to be done before we can consider taking on Magic in a raw, unengineered way as we are doing with the text. For example, Google's Deepmind team has been fabulously successful at teaching a neural network to play a variety of 80's arcade games from scratch. All it sees are the buttons to press, how those buttons change the pixels on the screen, and what the current score is. That's all the network needs to know to achieve human-level competence in games like Space Invaders.
Cool, I imagined doing something like that several years ago, but didn't know nearly enough to try. I'm glad to learn it's been done. I knew a bit about genetic algorithms then, but I've since learned those struggle when you add too many variables. In a former job we were doing multi-objective optimisation using nsga-ii, but had to keep the number of decision variables fairly low.
From the article:
As a society, we’re comfortable with the idea of machines telling us where to eat or who to date
We are? I must have missed the memo.
I don't think I need a machine to tell me to keep eating at home
Could it be that X abilities are something learned rather late in training (because they're relatively widely separated), so adding a remember bias makes it more difficult for the network to learn it? In that case, we might want the forget gate's bias to scale up during training, so that it can learn the X abilities better (earlier) and then fixate a little more by the end of training so that we get the long, interesting abilities. This is, of course, assuming that a bias toward forgetting wouldn't totally screw up the process - something that might be good to test.
I think that what the network learned was "If there's an X in the mana cost, there's one in the body text." This is totally consistent both with it being lazy and with the results observed.
Wait, clarification question. In your experiment, did you set the bias to a constant 1? Just like the weights, the bias terms should be learned through gradient descent. The paper is suggesting to just initialize them at 1, not to keep them at 1. If you just made them 1 and didn't have them learn, that could likely explain the weirdness.
Wait, clarification question. In your experiment, did you set the bias to a constant 1? Just like the weights, the bias terms should be learned through gradient descent. The paper is suggesting to just initialize them at 1, not to keep them at 1. If you just made them 1 and didn't have them learn, that could likely explain the weirdness.
Wait, really? I thought that the added bias was a constant deformation of that sigmoid function, since I didn't see where the bias was being initialized in the original code. *headdesk*, haha. But, no, you're absolutely right. I'll go back and look at that. You see, this is what lack of sleep can get you.
EDIT: I'll see about making that adjustment. Then we can rerun things. I'll probably still hold onto a copy of the latest network because the results are so... creative.
That interview mentions you've not been contacted by any game companies; I really do wonder why? Maybe they don't want to encourage machines that could put designers out of business in 5 years
No, rather, I've been contacted by lots of companies that aren't Wizards of the Coast. lol
Cool, I imagined doing something like that several years ago, but didn't know nearly enough to try. I'm glad to learn it's been done. I knew a bit about genetic algorithms then, but I've since learned those struggle when you add too many variables. In a former job we were doing multi-objective optimisation using nsga-ii, but had to keep the number of decision variables fairly low.
Well, several years ago you would have been right, it was very difficult to achieve. Even now I'm still astounded. But you can find the source code and links to the paper and such here. All you have to do is call
./run_gpu mygame.rom
And let it get to work, haha. They tested their algorithm on 49 different Atari games.
I enjoyed the paper. It had lots of very pretty charts and graphs, like the one I've attached. Very useful visualizations.
Wrong colour and very undercosted, but nice idea. I mean, if it was noncombat damage. As it is, it does nothing. Also I love the name and art.
Forgot about putting the card on the battlefield. Also forgot to shuffle, which is uncharacteristic.
I love the concept of devotion to untapped.
BTW temperature is working but the default is 70 and the minimum that works is 1 (it complains about decimal points) so you have to change it to 1 for meaningful output and can't currently go lower.
Edit: have two more amusing cards and an awesome one.
BTW temperature is working but the default is 70 and the minimum that works is 1 (it complains about decimal points) so you have to change it to 1 for meaningful output and can't currently go lower.
Cool, haha. By the way, I think what Croxis did was let you put in the temperature multiplied by 100. So 0.70 is 70, and 0.01 is 1.
A few more examples from the latest network, since they're so fascinating:
Blood Slayer WW
Creature - Human Soldier (Common)
Shadow
Whenever a creature deals damage to a creature, Blood Slayer deals that much damage to each creature's controller.
1/1
Snake Griffin 2W
Creature - Griffin (Uncommon)
Flying
Champion a creature W: Snake Griffin gets +1/+1 until end of turn.
1/3
Spawn of Protection: The Warking 5R
Enchantment (Uncommon)
At the beginning of your upkeep, you may return target creature card from your graveyard to your hand.
#Another long name.
Lorestorm 1W
Enchantment (Rare)
Nonbasic lands are colorless.
Assistant of Chaos 4
Artifact (Rare)
If an opponent cast two or more spells last turn, transform Assistant of Chaos
////
Alless Market
Enchantment
At the beginning of each combat, if you control a swamp, put 2/2 green Chim creature token onto the battlefield.
Malakir Cageblode 1B
Creature - Zombie (Common)
Whenever Malakir Cageblode blocks, each player draws three cards.
3/1
Grinder Pilgrim 3W
Creature - Human Cleric (Uncommon) 1: Grinder Pilgrim becomes a 3/4 white and blue and white and blue and red Elemental creature with first strike until end of turn.
2/3
#America! Or France!
Holovor of the Dead 1W
Creature - Human Cleric
White creatures you control have haste and conspell.
1/3
#This version of the network has a propensity to create new keywords. The earlier networks did the same, but also had some problems with basic grammar that this one experiences slightly less often.
Rootwater Devil 3RRR
Creature - Dragon (Rare)
Flying
When Rootwater Devil dies, each player loses the draft.
7/7
#Yep, looks like this draft is over. Remember to turn your basic lands back in before you leave.
Yeah, I'll try to correct that issue that Tiir719 so astutely pointed out. I think we'll get better results after I make the necessary changes.
EDIT: Actually, upon inspection of the code, I did in fact add an adjustable bias and not a constant bias, but I made a mistake in the way that I initialized it. Since the training loss wasn't exploding like before, I figured it was working as intended, but I was mistaken.
Blood Slayer WW
Creature - Human Soldier (Common)
Shadow
Whenever a creature deals damage to a creature, Blood Slayer deals that much damage to each creature's controller.
1/1
Assistant of Chaos 4
Artifact (Rare)
If an opponent cast two or more spells last turn, transform Assistant of Chaos
////
Alless Market
Enchantment
At the beginning of each combat, if you control a swamp, put 2/2 green Chim creature token onto the battlefield.
Holovor of the Dead 1W
Creature - Human Cleric
White creatures you control have haste and conspell.
1/3
#This version of the network has a propensity to create new keywords. The earlier networks did the same, but also had some problems with basic grammar that this one experiences slightly less often.
Blood Slayer is awesome. It's the ultimate combat disincentive. Dude can't swing in with his 7/7 dragon because if it gets chump blocked he'll lose 7 life and he only has 6. That's just fascinating card design; it also totally makes sense in a white weenie deck where you have tons of 1/1 chump blockers. Maybe it's just chance, but this is a fantastic card. Also undercosted though.
Also, hold the phone; did it actually create a correct double-sided transform card with Assistant of Chaos??
Speaking of new keywords, how would we go and define them? Can the network even get itself to agree what a new keyword would do?
Yeah, I'll try to correct that issue that Tiir719 so astutely pointed out. I think we'll get better results after I make the necessary changes.
EDIT: Actually, upon inspection of the code, I did in fact add an adjustable bias and not a constant bias, but I made a mistake in the way that I initialized it. Since the training loss wasn't exploding like before, I figured it was working as intended, but I was mistaken.
So what exactly was going 'wrong' with this code (how were you initializing it wrong?)? Presumably you'll have to re-run training on the fixed code for another 24 hours?
Also, hold the phone; did it actually create a correct double-sided transform card with Assistant of Chaos??
Oh, yeah. The previous version does that too, but they're infrequent. Oh, and I tried priming it to get fuse cards and got a 16-sided one. For the previous network, I sometimes got quadruple fuse cards due to a problem with the way that I do the priming, but I never thought I'd see something so dense that it couldn't possibly be rendered on paper.
EDIT: And part of the reason why you didn't see any transform cards in the sets we've been producing is that we're only grabbing half of the card at a time, which is a limitation of the current approach that we are using (it could be fixed, but it's not a high priority at this point).
Speaking of new keywords, how would we go and define them? Can the network even get itself to agree what a new keyword would do?
We'd probably have to phrase them as ability words and prod the network, considering we knocked out reminder text from the input corpus. But if there's text in there describing the ability, it's possible for the network to take advantage of that. For example, you have an ability that adds charge counters to a card, and then the network is usually tempted to add another ability that can do something with those charge counters. Of course, given the network's memory span, you have to remind it what a novel ability does if you make it use it on a new card.
So what exactly was going 'wrong' with this code (how were you initializing it wrong?)? Presumably you'll have to re-run training on the fixed code for another 24 hours?
Well, there were several problems. As far as I can tell, the original implementation does not incorporate bias terms, and I was only adding one to the forget gate and not to anything else, which means that the implementation that I was using was incongruent with the one described in the paper. The second problem was that I thought in the constructor there was a field for the starting value but in reality that was a control parameter and the bias term was being randomly initialized (so they weren't all set to 1 like I wanted, but some random negative or positive value).
I'm going to double check all of the math in great detail before I recommence training, lol.
EDIT: I added in the biases to the four places that needed them. The bias on the forget gate is set to one by default. It adds about a tenth of a second per batch of extra computation. That doesn't sound like much, but I predict that will cause the training process to take an additional two hours. To compensate, I upped the batch sizes slightly so I can push more through the GPU per unit time. I started it running and I'll check back later this afternoon to make sure it doesn't crash and burn because of excess pressure on the GPU (if that happens I'll just reduce the batch size again). Projected completion time is about 20 hours from now, give or take a little.
Let's hope I did it correctly this time. Like I mentioned earlier, upping the bias on the forget gate at the start is supposedly very important. In the paper, they say "If the bias of the forget gate is not properly initialized, we may erroneously conclude that the LSTM is incapable of learning to solve problems with long-range dependencies, which is not the case." We'll see what happens.
Hi all, in my google drive in today's folder (named "150813") there should be a file game.zip that contains a first version of the game I mentioned the other week (a simulation of a deck filled with nn-generated cards). Each player is supposed to run this bash script. Basically I'm seizing the opportunity of an internet window and selfishly asking if other people can run it on their GNU/Linux system or if there are portability issues, I'll munch on any comments next week when I'll have caught up with the thread.
I also found the time to tweak the input.txt a bit. I now understand that even when there's some force pushing the net in some direction (e.g. there's something that says that an "if kicked" condition should follow), if it's not to come very quickly (because said condition doesn't always begin with "if @ was kicked", and sometimes the net chooses not to begin with it), then it gets carried away by the momentum of the choices it's making, it's not good at steering the words in a direction compatible with one of its current goal (stating ", if @ was kicked," or ", unless @ was kicked."), and as a result it frequently misses the pending postcondition. After about half the seq_length characters, it's almost certain the net won't be able to meet the goal if it's not something easy to come by. Things that come quickly work a lot better. That's also why my repeater string idea improved some aspects of the generated cards.
But whatever observation I make about things being correlated or not, I'm always thinking "Maybe it's because of splicing"; maybe the net would behave a lot better if it only ever saw complete coherent cards. I'm hoping someone will come up with a solution to this (or has come, since I'm late in reading the thread).
(Sometimes some kind of unusual pattern is coming back several times across a few cards. It missed something in a card's text but it's coming two cards later in the name! These nets are really fun beasts)
Hello once again, and thank you for sharing! I agree with you on your points about the network's "steering" problem. I'm currently experimenting with some modifications to the neural architecture that might help it work better with long-term dependencies, which could help alleviate some of those issues.
And I'll be sure to give you feedback on the first version of your game. I'll probably be busy today and tomorrow but I'm sure I'll find some time this weekend.
I'm currently experimenting with some modifications to the neural architecture that might help it work better with long-term dependencies, which could help alleviate some of those issues.
I presume the modifications you mean are the biases in the 4 places you mentioned? 20 hours from now (or I guess 16, due to forum post time) I hope we'll be getting cards who can recognise X-costs within rule text, at least...
I'm currently experimenting with some modifications to the neural architecture that might help it work better with long-term dependencies, which could help alleviate some of those issues.
I presume the modifications you mean are the biases in the 4 places you mentioned? 20 hours from now (or I guess 16, due to forum post time) I hope we'll be getting cards who can recognise X-costs within rule text, at least...
Yes, the modifications are to add biases to the input transform, input gate, forget gate, and output gate, and these biases can be fine-tuned during the training process. For the sake of reference, I included a diagram that I found online of a single LSTM cell And it's more like like 18-19 hours. The edit was over an hour ago.
And I have no idea what I can promise you and what I can't, haha. My hope is that I at least implemented everything correctly, so that way we can give an accurate estimation of the effects of the modifications.
What I can say is that if the cells are less likely to forget things by default, that means they're exposed to information for longer periods of time and this can help them determine what is and isn't important to remember when it comes to long-term dependencies. Hopefully that'll include things like X costs. Even if doesn't get the X cost right every time, if it at least gets it right more often than it currently does, I'd say that's an improvement.
EDIT: I can't know for certain what's going on at the moment, but I can say that the previous network had a training loss of around 3.342 as far into the training process as we are now, and the current network is showing a training loss of 0.53, so it's definitely converging faster. Now, whether or not we actually break through the barriers we're facing with stuff like X costs remains to be seen (it's possible that we will have accelerating improvements and then we hit a roadblock and stagnate, who knows). But it is nice to know that my modifications probably didn't cripple the network this time.
BTW temperature is working but the default is 70 and the minimum that works is 1 (it complains about decimal points) so you have to change it to 1 for meaningful output and can't currently go lower.
Cool, haha. By the way, I think what Croxis did was let you put in the temperature multiplied by 100. So 0.70 is 70, and 0.01 is 1.
I originally thought this, but changing the temperature from 70 to 20 (back before it changed to be like it is now) had no noticable effect. That's probably how it's meant to be, though. (I mean putting a divide by 100 would make it usable. Though decimal points and a sane default would be good also.)
This network DESPERATELY needs to learn how to balance costs. That 0/0 with 6 +1/+1 counters for 3 made me LOL, even more so given its ability to turn them all into 3 1/1s for free.
So, new results came in! For this latest version of the network, I added in 4 bias units to each LSTM cell to help regulate the learning process and to hopefully make it easier for the network to pick up on long-term dependencies in cards. The last network had just one bias unit on the forget gate and the bias was randomly initialized. The current network has bias units on all the gates and the bias for the forget gate is set to one.
So, let's look at the training results by the numbers. I don't have the numbers for the last stable network, unfortunately, but we can compare the most recent two.
First is training loss over time. In the following graph, you'll see how training loss per batch (a measure of how good each network is at predicting the text of cards) changed over the epochs (passes over the input):
If you want a good laugh, I tried including the training loss of the crazy network, which you can see if you click on "madness" in the legend, but it was almost impossible to render properly, for reasons that will become clear.
As you can see, the forgetbias and allbias networks converge at roughly the same rate, but the allbias network has between 0.05 to 0.10 lower training loss on average (that's why you see so much orange over the green in the graph). I'll have to study the data more closely when I get the chance to give you a more accurate answer.
Now let's revisit our previous experiments. First, the Burning Blaze experiment. To reiterate, we use the sampling script to test the old and new versions of the network to generate a spell titled "Burning Blaze", an instant that costs RRR and whose text starts with "Burning Blaze deals X damage". We use the same sampling parameters in both cases (temperature at 0.6), the only thing we change is the network that we use.
The network without bias completed the card by adding a clause defining X in 31/66 observed instances (46.9% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause defining X in 0/63 observed instances (0.00% of the time).
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X in 1/49 observed instances (2.04% of the time).
So that's not so good. But there's something interesting about it. All of the versions of the card produced by the network without bias look mostly the same and don't have any extra clauses attached to them. But the allbias network adds all sorts of qualifiers about the damage, like
"The damage can't be prevented"
"divided as you choose among any number of target creatures or players"
"if that creature would die this turn, exile it instead"
"that creature can't block this turn"
"that creature doesn't untap during its controller's next untap step"
And all that text correctly refers back to the targets or to the damage. The old network without bias doesn't do anything of the sort. It almost never adds those clever flourishes. So there's something going on there. It's not strictly bad. Also, if you require that there be a comma after the damage dealing clause, the allbias network defines the X 99% of the time. So it's not that it's forgetting about the X, but it's latching onto even longer dependencies that would normally follow the definition of X. That is very, very interesting.
Also, there's something really interesting going on with the version of Burning Blaze with an X in the mana cost:
The network without bias completed the card by adding a clause that uses X in 34/50 observed instances (68% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause that uses X in 28/52 observed instances (53.8% of the time)
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X 29/ 45 obseved instances (64.44% of the time)
The allbias network does just about as well as the network with no bias. But there's an interesting phenomeon going on where the allbias network redefines the X with a "where X is equal" clause on 13.33% percent of the cards. That is so weird. When I want it to put in that definition, it rarely does so without prodding, but when I don't ask for it, it happens 6 times as frequently. That is so weird. It probably has something to do with the limitations of the priming scheme that I'm using. I'd wager that 13.33% is closer to the actual frequency with which an X gets a definition.
As for cards generated by the allbias network, they have a lot of the creative flourishses of the forgetbias network, but they're more often grammatically correct and sensible in terms of what they do. Here are some samples:
Death Charm 1B
Instant (Uncommon)
Put a 1/1 black Shade creature token with flying onto the battlefield.
Cycling 2
#I swear the networks up until this point only ever created black zombie and demon tokens. That this new network is using other creature types is very new.
Ephara, Sea Screaming Vengeance 3R
Legendary Creature - Human Warrior (Rare) 3R, sacrifice Ephara: Ephara deals 2 damage to target creature or player.
Whenever Ephara deals damage to a player, you may return target creature card from your graveyard to your hand.
3/3
#I'm detecting that the allbias network is more comfortable with longer names.
Heavy the Moon xRR
Sorcery (Uncommon)
Heavy the Moon deals X damage to each creature with flying and each player.
Staff of the Dominator 2W
Creature - Human Cleric (Rare) WW, T: Search your library for a card named Festering Mine and put it into your hand.
2/2
#That's interesting
Death Mastery BB
Enchantment (Rare) T, exile a card from your hand: Target creature gets +4/+0 until end of turn. Activate this ability only once each turn.
Foriysian Liege 3RR
Creature - Elemental (Mythic Rare)
When Foriysian Liege enters the battlefield, return a creature you control to its owner's hand.
When Foriysian Liege dies, put a 4/4 red Demon creature token with flying onto the battlefield. 5: Reveal the top four cards of your library. Put one of them into your hand and the rest into your graveyard, then draw a card. Activate this ability only any time you could cast a sorcery.
3/4
This latest network isn't perfect, but honestly I think it's better in many ways compared to our previous versions. I'll do some more in depth analysis later when I get the time, and post some more interesting cards.
EDIT: So yeah, the only reason why the old networks do so well at defining the X without prompting is because of rote memorization. This new network has better predictive power all around and is much more creative. Today we achieved a pretty solid victory.
So, new results came in! For this latest version of the network, I added in 4 bias units to each LSTM cell to help regulate the learning process and to hopefully make it easier for the network to pick up on long-term dependencies in cards. The last network had just one bias unit on the forget gate and the bias was randomly initialized. The current network has bias units on all the gates and the bias for the forget gate is set to one.
So, let's look at the training results by the numbers. I don't have the numbers for the last stable network, unfortunately, but we can compare the most recent two.
First is training loss over time. In the following graph, you'll see how training loss per batch (a measure of how good each network is at predicting the text of cards) changed over the epochs (passes over the input):
If you want a good laugh, I tried including the training loss of the crazy network, which you can see if you click on "madness" in the legend, but it was almost impossible to render properly, for reasons that will become clear.
As you can see, the forgetbias and allbias networks converge at roughly the same rate, but the allbias network has between 0.05 to 0.10 lower training loss on average (that's why you see so much orange over the green in the graph). I'll have to study the data more closely when I get the chance to give you a more accurate answer.
Now let's revisit our previous experiments. First, the Burning Blaze experiment. To reiterate, we use the sampling script to test the old and new versions of the network to generate a spell titled "Burning Blaze", an instant that costs RRR and whose text starts with "Burning Blaze deals X damage". We use the same sampling parameters in both cases (temperature at 0.6), the only thing we change is the network that we use.
The network without bias completed the card by adding a clause defining X in 31/66 observed instances (46.9% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause defining X in 0/63 observed instances (0.00% of the time).
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X in 1/49 observed instances (2.04% of the time).
So that's not so good. But there's something interesting about it. All of the versions of the card produced by the network without bias look mostly the same and don't have any extra clauses attached to them. But the allbias network adds all sorts of qualifiers about the damage, like
"The damage can't be prevented"
"divided as you choose among any number of target creatures or players"
"if that creature would die this turn, exile it instead"
"that creature can't block this turn"
"that creature doesn't untap during its controller's next untap step"
And all that text correctly refers back to the targets or to the damage. The old network without bias doesn't do anything of the sort. It almost never adds those clever flourishes. So there's something going on there. It's not strictly bad. Also, if you require that there be a comma after the damage dealing clause, the allbias network defines the X 99% of the time. So it's not that it's forgetting about the X, but it's latching onto even longer dependencies that would normally follow the definition of X. That is very, very interesting.
Also, there's something really interesting going on with the version of Burning Blaze with an X in the mana cost:
The network without bias completed the card by adding a clause that uses X in 34/50 observed instances (68% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause that uses X in 28/52 observed instances (53.8% of the time)
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X 29/ 45 obseved instances (64.44% of the time)
The allbias network does just about as well as the network with no bias. But there's an interesting phenomeon going on where the allbias network redefines the X with a "where X is equal" clause on 13.33% percent of the cards. That is so weird. When I want it to put in that definition, it rarely does so without prodding, but when I don't ask for it, it happens 6 times as frequently. That is so weird. It probably has something to do with the limitations of the priming scheme that I'm using. I'd wager that 13.33% is closer to the actual frequency with which an X gets a definition.
As for cards generated by the allbias network, they have a lot of the creative flourishses of the forgetbias network, but they're more often grammatically correct and sensible in terms of what they do. Here are some samples:
Death Charm 1B
Instant (Uncommon)
Put a 1/1 black Shade creature token with flying onto the battlefield.
Cycling 2
#I swear the networks up until this point only ever created black zombie and demon tokens. That this new network is using other creature types is very new.
Ephara, Sea Screaming Vengeance 3R
Legendary Creature - Human Warrior (Rare) 3R, sacrifice Ephara: Ephara deals 2 damage to target creature or player.
Whenever Ephara deals damage to a player, you may return target creature card from your graveyard to your hand.
3/3
#I'm detecting that the allbias network is more comfortable with longer names.
Heavy the Moon xRR
Sorcery (Uncommon)
Heavy the Moon deals X damage to each creature with flying and each player.
Staff of the Dominator 2W
Creature - Human Cleric (Rare) WW, T: Search your library for a card named Festering Mine and put it into your hand.
2/2
#That's interesting
Death Mastery BB
Enchantment (Rare) T, exile a card from your hand: Target creature gets +4/+0 until end of turn. Activate this ability only once each turn.
Foriysian Liege 3RR
Creature - Elemental (Mythic Rare)
When Foriysian Liege enters the battlefield, return a creature you control to its owner's hand.
When Foriysian Liege dies, put a 4/4 red Demon creature token with flying onto the battlefield. 5: Reveal the top four cards of your library. Put one of them into your hand and the rest into your graveyard, then draw a card. Activate this ability only any time you could cast a sorcery.
3/4
This latest network isn't perfect, but honestly I think it's better in many ways compared to our previous versions. I'll do some more in depth analysis later when I get the time, and post some more interesting cards.
EDIT: So yeah, the only reason why the old networks do so well at defining the X without prompting is because of rote memorization. This new network has better predictive power all around and is much more creative. Today we achieved a pretty solid victory.
The newest network's output looks awesome. In how many real cards do we have divided x clauses (x in two different sentences) compared to compounded ("," seperated) clauses? If the divided x clauses are considerably different form the compounded clauses the culprit could be the training-data rather then the network.
Deathmastery is awesome (even more if it would put that card in your graveyard) 10/10 would play. Staff of the dominator is interesting because there are so few cards that name specific OTHER cards in the MTG-corpus. edit: just checked Festering Mine is not a real card so you should generate it
Oh god, this is too good not to share. I asked the network for cards named Festering Mine, the results varied because, depending on the card type, it sometimes adds text to make the name fit:
Festering Minecrafter B
Creature - Human Wizard (Uncommon) B, T: Target player discards a card
1/1
#Hahahaha!
Festering Mine
Land (Rare) T: Add 1 to your mana pool. T: Add W or B to your mana pool.
And yes, I'll have to look into the input issue, good point.
EDIT: Oh wow, they just keep coming. Like, it used to be one in five cards were complete garbage, but I'm wanting to copy and paste all of them. Here are a few more (unedited because I don't have time to clean them up and make them more readable.
|harvester boggart||creature||goblin artificer|O|&^/&^|{RR^}|when @ enters the battlefield, put two &^/&^ red goblin creature tokens onto the battlefield.|
|essence of the dissrapper||artifact|||A||{^^^^^}|at the beginning of your upkeep, you may gain &^ life.\{^^^}, T: put a % counter on @.\{^^}, T, remove a % counter from @: put a &^^^^/&^^^^ red dragon creature token with flying onto the battlefield. it has "whenever this creature deals damage to an opponent, draw a card.|
|erasura, the damned hound|legendary|creature||human warrior|A|&^^^/&^^^|{^RR^^}|{RR}: @ gets +&^/+& until end of turn.\sacrifice @: @ deals &^ damage to target creature or player. prevent all damage that would be dealt to you this turn.|
|foundry machine||artifact creature||construct|O|&^^^/&^^^|{^^^^^}|when @ enters the battlefield, destroy target artifact.|
|night stone||artifact|||N||{^^^^}|at the beginning of each player's upkeep, that player sacrifices a permanent.|
|arashin surge||sorcery|||A||{WW^WW^^}|put two &^/&^ green saproling creature tokens onto the battlefield. \flashback {GG^^}|
|thundercloud rig||artifact||equipment|N||{^^}|equip {^^}\equipped creature has "T: this creature deals &^ damage to each attacking creature.|
|dark realist||creature||human wizard|N|&^^/&^^|{^UU^}|whenever a creature enters the battlefield under your control, you may pay {BB}. if you do, draw a card.|
|order of the guildpact||creature||human knight|N|&^^/&^^|{WWWW}|protection from black\{^WW}: @ gets +&^/+& until end of turn.|
|shift||instant|||N||{RR^^}|@ deals &^^^^ damage to target creature or player. the damage can't be prevented.\fuse|
|chance||instant|||O||{UU^}|uncast target spell unless its controller pays {^^}.|
Oh and the network does things that it didn't before. I made it start a card with the word human and I got a card with "humancycling", like slivercycling but for humans, rofl:
"Kids, if you sit at the pc too long playing minecraft you'll start to fester!"
Terrific card name. Terrific results, frankly. Any chance you could upload the trained checkpoints for this network and the other ones? That way I can run some other tests (kicker, choose a colour, place an % counter, etc.) when I get home.
Also; I'm extremely curious to rerun this latest network with a corpus of Modern-only cards. Maybe then, finally, colour identity will be a bit better (e.g. Foriysian Liege, although awesome, has 'return to hand' which is very blue and not very red. Ephara, Sea Screaming Vengeance has return from graveyard, and she's pure red also, which seems wrong).
"Kids, if you sit at the pc too long playing minecraft you'll start to fester!"
Terrific card name. Terrific results, frankly. Any chance you could upload the trained checkpoints for this network and the other ones? That way I can run some other tests (kicker, choose a colour, place an % counter, etc.) when I get home.
EDIT: So yeah, the only reason why the old networks do so well at defining the X without prompting is because of rote memorization. This new network has better predictive power all around and is much more creative. Today we achieved a pretty solid victory.
On a scale of 1 to awesome, 11/10, would play. This is incredible.
How significant are the changes? I'm planning on working on releasing a fork of karpathy's char-rnn repo this weekend, that has the special batcher in it (and hopefully the ability to do cool things like train on exactly one card at a time). I'm thinking it might make sense to pool these improvements together and release them from a common repo. It might make more sense if you owned that repo, I don't know. The batcher modifications are fairly large in terms of diff size because of the need for horrid lua string processing, though, so if it's just a few tweaks to set the biases then I could easily add that in and provide some options to control it. Delusional madness here we come!
On another note, it would be really cool to have set generation automatically check for "card named <CARDNAME>" and then send out a query to generate <CARDNAME>. So some cards that requested it would be allowed to have a +1.
|shift||instant|||N||{RR^^}|@ deals &^^^^ damage to target creature or player. the damage can't be prevented.\fuse|
|chance||instant|||O||{UU^}|uncast target spell unless its controller pays {^^}.|
Did it seriously, seriously just make a legitimate fuse card by itself?? If yes, that must be some sort of breakthrough. I mean, wow.
EDIT: So yeah, the only reason why the old networks do so well at defining the X without prompting is because of rote memorization. This new network has better predictive power all around and is much more creative. Today we achieved a pretty solid victory.
On a scale of 1 to awesome, 11/10, would play. This is incredible.
How significant are the changes? I'm planning on working on releasing a fork of karpathy's char-rnn repo this weekend, that has the special batcher in it (and hopefully the ability to do cool things like train on exactly one card at a time). I'm thinking it might make sense to pool these improvements together and release them from a common repo. It might make more sense if you owned that repo, I don't know. The batcher modifications are fairly large in terms of diff size because of the need for horrid lua string processing, though, so if it's just a few tweaks to set the biases then I could easily add that in and provide some options to control it. Delusional madness here we come!
On another note, it would be really cool to have set generation automatically check for "card named <CARDNAME>" and then send out a query to generate <CARDNAME>. So some cards that requested it would be allowed to have a +1.
Not overly significant, actually. I'll e-mail you a copy of the changes so you can see. As for the repo, I'm not sure that I should be in charge of it, lol, but we'll see.
|shift||instant|||N||{RR^^}|@ deals &^^^^ damage to target creature or player. the damage can't be prevented.\fuse|
|chance||instant|||O||{UU^}|uncast target spell unless its controller pays {^^}.|
Did it seriously, seriously just make a legitimate fuse card by itself?? If yes, that must be some sort of breakthrough. I mean, wow.
The old network does that too, just infrequently. If fuse shows up at the end, it extends the card with another card (at least some of the time) Same with cards that mention the word transform (at least some of the time).
So I guess when the paper (the one that had the bias analysis in it) said, "Importantly, adding a bias of size 1 significantly improved the performance of the LSTM on tasks where it fell behind", it wasn't kidding eh? Awesome that such a small change had a good end effect.
Is it completing fuse/transform cards more frequently now, at least? And would you be able to do a training run on the Modern-only corpus too?
So I guess when the paper (the one that had the bias analysis in it) said, "Importantly, adding a bias of size 1 significantly improved the performance of the LSTM on tasks where it fell behind", it wasn't kidding eh? Awesome that such a small change had a good end effect.
Is it completing fuse/transform cards more frequently now, at least? And would you be able to do a training run on the Modern-only corpus too?
Yes, "significant" did in fact mean significant. And thank you so much for finding that paper. Everyone here is in your debt.
Yes, I think it is completing the fuse/transform more frequently, but we'll need to do an analysis of a large dump of cards to confirm that. And yes, I could try the Modern-only corpus. I'm afraid though that by limiting the size of the input corpus we might run into overfitting issues. But who knows? It might be worthwhile to try.
EDIT: Here's a link to a compressed file containing the network and a modified LSTM.lua file. So if you want to sample from this network, you need to do three things. First, go into your model subdirectory and rename the current LSTM.lua file to LSTM_old.lua. Then copy the new one into that directory. Lastly, open up the sampling script you're going to use and at the end of the "require" directives, add "require model.LSTM", because I had to make a subclass of the Add (bias) module to allow me to specify a starting bias and the sampling script won't recognize the new subclass unless you do that. I'm sure there's a cleaner, less hackish way of going about this, but that's what I did for now. Let me know if you run into any issues.
EDIT(2): Oh, and yes, the results probably aren't perfect yet. I haven't tested it yet but I'm sure there are shortcomings. But as we identify the issues we can come up with more modifications to the architecture and the input corpus to fix those issues.
I had another idea for making X abilities work: On every card which uses X in the mana cost of the card or in the cost of the ability, add "where X is the variable mana cost paid." Then every X is defined so the network should pick up on it.
I'm seeing good things so far, but I'm also seeing a lot of room for improvement.
I like that we get lots of very interesting commons and uncommons, like these:
Blood Reckoning B
Instant
As an additional cost to cast Blood Reckoning, sacrifice a creature.
Blood Reckoning deals 4 damage to target creature or player.
Demon's Torment 1B
Enchantment - Aura (Uncommon)
Enchant creature
Enchanted creature gets +2/+0 and has first strike.
Transmute 1BB
They're varied and interesting.
At the same time, I'm noticing that when this new network messes up, it can do so in very weird, subtle ways:
Energy Form 1BB
Enchantment (Rare)
Whenever a player casts a spell, that player shuffles the top card of his or her library into his or her graveyard.
Jungle Ward G
Enchantment - Aura (Common)
Enchant creature
Enchanted creature has "T: Put a 1/1 red Goblin creature token named Sentinel onto the battlefield."
Wind Spawn 3GG
Creature - Beast (Rare)
Whenever Wind Spawn attacks, you may put a creature card from a graveyard onto the battlefield under your control. That creature is an island.
4/3
It also produces some downright strange flip/transform/fuse cards:
|mardu scarecrow||artifact creature||scarecrow|N|&^^^/&^^^|{^^^^^}|{^^}, sacrifice @: look at target opponent's hand.\fuse|
|talas sanctuary||enchantment||aura|O||{^^WW}|enchant creature\enchanted creature gets +&^^/+& and has "{BB}: regenerate this creature."|
|ith's charm|legendary|creature||eldrazi|Y|&^^^^^^^^^^^^^^^^/&^^^^^^^^^^^^^^^^|{^^^^^^^^^^^^^^}|at the beginning of each end step, if you control no untapped lands, flip @.|
|ithomancer|legendary|creature||eldrazi|Y|&^^^^^/&^^^^^^^^^|{^^^^^^^^^}|when @ enters the battlefield, put two &^/&^ white kithkin soldier creature tokens onto the battlefield.|
Now, there are ways that we can further improve on these results:
First, I think the best course of action is to eliminate names. Names are great for flavor, but remember that the network is being rewarded and penalized equally for all parts of the card. If you look at this graph of an older network's output activations reading the card Kemba, Kha Regent, there's a lot of activity going on in the first seventeen time steps when the network is trying to predict the name. I actually think that there's way too much of the network's reasoning power being exhausted on trying to come up with believable names. Yes, it's nice when the network comes up with intelligent names like "Blightwater Baron" or "Razorstone Dragon", but we really shouldn't be giving equal weight to that activity as we do, say, getting X costs correct. It's also a problem because network units dedicated to name generation don't do anything else useful for the card.
Instead I think we should train a name/flavor-generating network separately, one that takes information about an unnamed card and then comes up with an appropriate name for it.
I'm also looking into some techniques that maplesmall uncovered including batch normalization, which might help improve the consistency of our results. I'll keep y'all posted.
P.S. I tried priming the name with "fuc", and I really only get variations on one word which I won't reproduce here, but they are so very hilarious. Like, that word isn't in the corpus at all, and yet the network gives me such... incredibly colorful results. (X Witch, Xmaster's Wisdom, Xer of Bant, X trap, Xing Archon, etc.) EDIT: And the best, "Xing Life", a black enchantment that harms all players whenever anyone tries to cast a spell.
From the article: We are? I must have missed the memo.
I don't think I need a machine to tell me to keep eating at home
I think that what the network learned was "If there's an X in the mana cost, there's one in the body text." This is totally consistent both with it being lazy and with the results observed.
Wait, really? I thought that the added bias was a constant deformation of that sigmoid function, since I didn't see where the bias was being initialized in the original code. *headdesk*, haha. But, no, you're absolutely right. I'll go back and look at that. You see, this is what lack of sleep can get you.
EDIT: I'll see about making that adjustment. Then we can rerun things. I'll probably still hold onto a copy of the latest network because the results are so... creative.
No, rather, I've been contacted by lots of companies that aren't Wizards of the Coast. lol
Well, several years ago you would have been right, it was very difficult to achieve. Even now I'm still astounded. But you can find the source code and links to the paper and such here. All you have to do is call
And let it get to work, haha. They tested their algorithm on 49 different Atari games.
I enjoyed the paper. It had lots of very pretty charts and graphs, like the one I've attached. Very useful visualizations.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Wrong colour and very undercosted, but nice idea. I mean, if it was noncombat damage. As it is, it does nothing. Also I love the name and art.
Forgot about putting the card on the battlefield. Also forgot to shuffle, which is uncharacteristic.
I love the concept of devotion to untapped.
BTW temperature is working but the default is 70 and the minimum that works is 1 (it complains about decimal points) so you have to change it to 1 for meaningful output and can't currently go lower.
Edit: have two more amusing cards and an awesome one.
Cool, haha. By the way, I think what Croxis did was let you put in the temperature multiplied by 100. So 0.70 is 70, and 0.01 is 1.
A few more examples from the latest network, since they're so fascinating:
Blood Slayer
WW
Creature - Human Soldier (Common)
Shadow
Whenever a creature deals damage to a creature, Blood Slayer deals that much damage to each creature's controller.
1/1
Snake Griffin
2W
Creature - Griffin (Uncommon)
Flying
Champion a creature
W: Snake Griffin gets +1/+1 until end of turn.
1/3
Spawn of Protection: The Warking
5R
Enchantment (Uncommon)
At the beginning of your upkeep, you may return target creature card from your graveyard to your hand.
#Another long name.
Lorestorm
1W
Enchantment (Rare)
Nonbasic lands are colorless.
Assistant of Chaos
4
Artifact (Rare)
If an opponent cast two or more spells last turn, transform Assistant of Chaos
////
Alless Market
Enchantment
At the beginning of each combat, if you control a swamp, put 2/2 green Chim creature token onto the battlefield.
Malakir Cageblode
1B
Creature - Zombie (Common)
Whenever Malakir Cageblode blocks, each player draws three cards.
3/1
Grinder Pilgrim
3W
Creature - Human Cleric (Uncommon)
1: Grinder Pilgrim becomes a 3/4 white and blue and white and blue and red Elemental creature with first strike until end of turn.
2/3
#America! Or France!
Holovor of the Dead
1W
Creature - Human Cleric
White creatures you control have haste and conspell.
1/3
#This version of the network has a propensity to create new keywords. The earlier networks did the same, but also had some problems with basic grammar that this one experiences slightly less often.
Rootwater Devil
3RRR
Creature - Dragon (Rare)
Flying
When Rootwater Devil dies, each player loses the draft.
7/7
#Yep, looks like this draft is over. Remember to turn your basic lands back in before you leave.
Yeah, I'll try to correct that issue that Tiir719 so astutely pointed out. I think we'll get better results after I make the necessary changes.
EDIT: Actually, upon inspection of the code, I did in fact add an adjustable bias and not a constant bias, but I made a mistake in the way that I initialized it. Since the training loss wasn't exploding like before, I figured it was working as intended, but I was mistaken.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Blood Slayer is awesome. It's the ultimate combat disincentive. Dude can't swing in with his 7/7 dragon because if it gets chump blocked he'll lose 7 life and he only has 6. That's just fascinating card design; it also totally makes sense in a white weenie deck where you have tons of 1/1 chump blockers. Maybe it's just chance, but this is a fantastic card. Also undercosted though.
Also, hold the phone; did it actually create a correct double-sided transform card with Assistant of Chaos??
Speaking of new keywords, how would we go and define them? Can the network even get itself to agree what a new keyword would do?
So what exactly was going 'wrong' with this code (how were you initializing it wrong?)? Presumably you'll have to re-run training on the fixed code for another 24 hours?
Oh, yeah. The previous version does that too, but they're infrequent. Oh, and I tried priming it to get fuse cards and got a 16-sided one. For the previous network, I sometimes got quadruple fuse cards due to a problem with the way that I do the priming, but I never thought I'd see something so dense that it couldn't possibly be rendered on paper.
EDIT: And part of the reason why you didn't see any transform cards in the sets we've been producing is that we're only grabbing half of the card at a time, which is a limitation of the current approach that we are using (it could be fixed, but it's not a high priority at this point).
We'd probably have to phrase them as ability words and prod the network, considering we knocked out reminder text from the input corpus. But if there's text in there describing the ability, it's possible for the network to take advantage of that. For example, you have an ability that adds charge counters to a card, and then the network is usually tempted to add another ability that can do something with those charge counters. Of course, given the network's memory span, you have to remind it what a novel ability does if you make it use it on a new card.
Well, there were several problems. As far as I can tell, the original implementation does not incorporate bias terms, and I was only adding one to the forget gate and not to anything else, which means that the implementation that I was using was incongruent with the one described in the paper. The second problem was that I thought in the constructor there was a field for the starting value but in reality that was a control parameter and the bias term was being randomly initialized (so they weren't all set to 1 like I wanted, but some random negative or positive value).
I'm going to double check all of the math in great detail before I recommence training, lol.
EDIT: I added in the biases to the four places that needed them. The bias on the forget gate is set to one by default. It adds about a tenth of a second per batch of extra computation. That doesn't sound like much, but I predict that will cause the training process to take an additional two hours. To compensate, I upped the batch sizes slightly so I can push more through the GPU per unit time. I started it running and I'll check back later this afternoon to make sure it doesn't crash and burn because of excess pressure on the GPU (if that happens I'll just reduce the batch size again). Projected completion time is about 20 hours from now, give or take a little.
Let's hope I did it correctly this time. Like I mentioned earlier, upping the bias on the forget gate at the start is supposedly very important. In the paper, they say "If the bias of the forget gate is not properly initialized, we may erroneously conclude that the LSTM is incapable of learning to solve problems with long-range dependencies, which is not the case." We'll see what happens.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Hello once again, and thank you for sharing! I agree with you on your points about the network's "steering" problem. I'm currently experimenting with some modifications to the neural architecture that might help it work better with long-term dependencies, which could help alleviate some of those issues.
And I'll be sure to give you feedback on the first version of your game. I'll probably be busy today and tomorrow but I'm sure I'll find some time this weekend.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yes, the modifications are to add biases to the input transform, input gate, forget gate, and output gate, and these biases can be fine-tuned during the training process. For the sake of reference, I included a diagram that I found online of a single LSTM cell And it's more like like 18-19 hours. The edit was over an hour ago.
And I have no idea what I can promise you and what I can't, haha. My hope is that I at least implemented everything correctly, so that way we can give an accurate estimation of the effects of the modifications.
What I can say is that if the cells are less likely to forget things by default, that means they're exposed to information for longer periods of time and this can help them determine what is and isn't important to remember when it comes to long-term dependencies. Hopefully that'll include things like X costs. Even if doesn't get the X cost right every time, if it at least gets it right more often than it currently does, I'd say that's an improvement.
EDIT: I can't know for certain what's going on at the moment, but I can say that the previous network had a training loss of around 3.342 as far into the training process as we are now, and the current network is showing a training loss of 0.53, so it's definitely converging faster. Now, whether or not we actually break through the barriers we're facing with stuff like X costs remains to be seen (it's possible that we will have accelerating improvements and then we hit a roadblock and stagnate, who knows). But it is nice to know that my modifications probably didn't cripple the network this time.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yes, a swarm of 18 1/1s for 3 mana sounds legit.
So, let's look at the training results by the numbers. I don't have the numbers for the last stable network, unfortunately, but we can compare the most recent two.
First is training loss over time. In the following graph, you'll see how training loss per batch (a measure of how good each network is at predicting the text of cards) changed over the epochs (passes over the input):
https://plot.ly/~rmmilewi/1089/training-loss-over-time/
If you want a good laugh, I tried including the training loss of the crazy network, which you can see if you click on "madness" in the legend, but it was almost impossible to render properly, for reasons that will become clear.
As you can see, the forgetbias and allbias networks converge at roughly the same rate, but the allbias network has between 0.05 to 0.10 lower training loss on average (that's why you see so much orange over the green in the graph). I'll have to study the data more closely when I get the chance to give you a more accurate answer.
Now let's revisit our previous experiments. First, the Burning Blaze experiment. To reiterate, we use the sampling script to test the old and new versions of the network to generate a spell titled "Burning Blaze", an instant that costs RRR and whose text starts with "Burning Blaze deals X damage". We use the same sampling parameters in both cases (temperature at 0.6), the only thing we change is the network that we use.
The network without bias completed the card by adding a clause defining X in 31/66 observed instances (46.9% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause defining X in 0/63 observed instances (0.00% of the time).
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X in 1/49 observed instances (2.04% of the time).
So that's not so good. But there's something interesting about it. All of the versions of the card produced by the network without bias look mostly the same and don't have any extra clauses attached to them. But the allbias network adds all sorts of qualifiers about the damage, like
"The damage can't be prevented"
"divided as you choose among any number of target creatures or players"
"if that creature would die this turn, exile it instead"
"that creature can't block this turn"
"that creature doesn't untap during its controller's next untap step"
And all that text correctly refers back to the targets or to the damage. The old network without bias doesn't do anything of the sort. It almost never adds those clever flourishes. So there's something going on there. It's not strictly bad. Also, if you require that there be a comma after the damage dealing clause, the allbias network defines the X 99% of the time. So it's not that it's forgetting about the X, but it's latching onto even longer dependencies that would normally follow the definition of X. That is very, very interesting.
Also, there's something really interesting going on with the version of Burning Blaze with an X in the mana cost:
The network without bias completed the card by adding a clause that uses X in 34/50 observed instances (68% of the time)
The network with a randomly initialized bias on only the forget gate completed the card by adding a clause that uses X in 28/52 observed instances (53.8% of the time)
The network with biases on all gates and the input transform, with the forget gate bias initialized to 1, completed the card by adding a clause defining X 29/ 45 obseved instances (64.44% of the time)
The allbias network does just about as well as the network with no bias. But there's an interesting phenomeon going on where the allbias network redefines the X with a "where X is equal" clause on 13.33% percent of the cards. That is so weird. When I want it to put in that definition, it rarely does so without prodding, but when I don't ask for it, it happens 6 times as frequently. That is so weird. It probably has something to do with the limitations of the priming scheme that I'm using. I'd wager that 13.33% is closer to the actual frequency with which an X gets a definition.
As for cards generated by the allbias network, they have a lot of the creative flourishses of the forgetbias network, but they're more often grammatically correct and sensible in terms of what they do. Here are some samples:
Death Charm
1B
Instant (Uncommon)
Put a 1/1 black Shade creature token with flying onto the battlefield.
Cycling 2
#I swear the networks up until this point only ever created black zombie and demon tokens. That this new network is using other creature types is very new.
Ephara, Sea Screaming Vengeance
3R
Legendary Creature - Human Warrior (Rare)
3R, sacrifice Ephara: Ephara deals 2 damage to target creature or player.
Whenever Ephara deals damage to a player, you may return target creature card from your graveyard to your hand.
3/3
#I'm detecting that the allbias network is more comfortable with longer names.
Heavy the Moon
xRR
Sorcery (Uncommon)
Heavy the Moon deals X damage to each creature with flying and each player.
Staff of the Dominator
2W
Creature - Human Cleric (Rare)
WW, T: Search your library for a card named Festering Mine and put it into your hand.
2/2
#That's interesting
Death Mastery
BB
Enchantment (Rare)
T, exile a card from your hand: Target creature gets +4/+0 until end of turn. Activate this ability only once each turn.
Foriysian Liege
3RR
Creature - Elemental (Mythic Rare)
When Foriysian Liege enters the battlefield, return a creature you control to its owner's hand.
When Foriysian Liege dies, put a 4/4 red Demon creature token with flying onto the battlefield.
5: Reveal the top four cards of your library. Put one of them into your hand and the rest into your graveyard, then draw a card. Activate this ability only any time you could cast a sorcery.
3/4
This latest network isn't perfect, but honestly I think it's better in many ways compared to our previous versions. I'll do some more in depth analysis later when I get the time, and post some more interesting cards.
EDIT: So yeah, the only reason why the old networks do so well at defining the X without prompting is because of rote memorization. This new network has better predictive power all around and is much more creative. Today we achieved a pretty solid victory.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Amazing batch of cards.
Oh god, this is too good not to share. I asked the network for cards named Festering Mine, the results varied because, depending on the card type, it sometimes adds text to make the name fit:
Festering Minecrafter
B
Creature - Human Wizard (Uncommon)
B, T: Target player discards a card
1/1
#Hahahaha!
Festering Mine
Land (Rare)
T: Add 1 to your mana pool.
T: Add W or B to your mana pool.
Festering Miner
5B
Creature - Zombie Giant (Common)
5/4
And yes, I'll have to look into the input issue, good point.
EDIT: Oh wow, they just keep coming. Like, it used to be one in five cards were complete garbage, but I'm wanting to copy and paste all of them. Here are a few more (unedited because I don't have time to clean them up and make them more readable.
|harvester boggart||creature||goblin artificer|O|&^/&^|{RR^}|when @ enters the battlefield, put two &^/&^ red goblin creature tokens onto the battlefield.|
|essence of the dissrapper||artifact|||A||{^^^^^}|at the beginning of your upkeep, you may gain &^ life.\{^^^}, T: put a % counter on @.\{^^}, T, remove a % counter from @: put a &^^^^/&^^^^ red dragon creature token with flying onto the battlefield. it has "whenever this creature deals damage to an opponent, draw a card.|
|erasura, the damned hound|legendary|creature||human warrior|A|&^^^/&^^^|{^RR^^}|{RR}: @ gets +&^/+& until end of turn.\sacrifice @: @ deals &^ damage to target creature or player. prevent all damage that would be dealt to you this turn.|
|foundry machine||artifact creature||construct|O|&^^^/&^^^|{^^^^^}|when @ enters the battlefield, destroy target artifact.|
|night stone||artifact|||N||{^^^^}|at the beginning of each player's upkeep, that player sacrifices a permanent.|
|arashin surge||sorcery|||A||{WW^WW^^}|put two &^/&^ green saproling creature tokens onto the battlefield. \flashback {GG^^}|
|thundercloud rig||artifact||equipment|N||{^^}|equip {^^}\equipped creature has "T: this creature deals &^ damage to each attacking creature.|
|dark realist||creature||human wizard|N|&^^/&^^|{^UU^}|whenever a creature enters the battlefield under your control, you may pay {BB}. if you do, draw a card.|
|order of the guildpact||creature||human knight|N|&^^/&^^|{WWWW}|protection from black\{^WW}: @ gets +&^/+& until end of turn.|
|shift||instant|||N||{RR^^}|@ deals &^^^^ damage to target creature or player. the damage can't be prevented.\fuse|
|chance||instant|||O||{UU^}|uncast target spell unless its controller pays {^^}.|
Oh and the network does things that it didn't before. I made it start a card with the word human and I got a card with "humancycling", like slivercycling but for humans, rofl:
|humble harpy||creature||harpy|O|&^^/&^^|{^^BB}|humancycling {^^}, islandcycling {^^}|
That is so beautiful. To be fair, I had to say "humanc" for it to get it, but still, fascinating.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Terrific card name. Terrific results, frankly. Any chance you could upload the trained checkpoints for this network and the other ones? That way I can run some other tests (kicker, choose a colour, place an % counter, etc.) when I get home.
Also; I'm extremely curious to rerun this latest network with a corpus of Modern-only cards. Maybe then, finally, colour identity will be a bit better (e.g. Foriysian Liege, although awesome, has 'return to hand' which is very blue and not very red. Ephara, Sea Screaming Vengeance has return from graveyard, and she's pure red also, which seems wrong).
Sure. I can do that in a few minutes.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
On a scale of 1 to awesome, 11/10, would play. This is incredible.
How significant are the changes? I'm planning on working on releasing a fork of karpathy's char-rnn repo this weekend, that has the special batcher in it (and hopefully the ability to do cool things like train on exactly one card at a time). I'm thinking it might make sense to pool these improvements together and release them from a common repo. It might make more sense if you owned that repo, I don't know. The batcher modifications are fairly large in terms of diff size because of the need for horrid lua string processing, though, so if it's just a few tweaks to set the biases then I could easily add that in and provide some options to control it. Delusional madness here we come!
On another note, it would be really cool to have set generation automatically check for "card named <CARDNAME>" and then send out a query to generate <CARDNAME>. So some cards that requested it would be allowed to have a +1.
Did it seriously, seriously just make a legitimate fuse card by itself?? If yes, that must be some sort of breakthrough. I mean, wow.
Not overly significant, actually. I'll e-mail you a copy of the changes so you can see. As for the repo, I'm not sure that I should be in charge of it, lol, but we'll see.
The old network does that too, just infrequently. If fuse shows up at the end, it extends the card with another card (at least some of the time) Same with cards that mention the word transform (at least some of the time).
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Is it completing fuse/transform cards more frequently now, at least? And would you be able to do a training run on the Modern-only corpus too?
Yes, "significant" did in fact mean significant. And thank you so much for finding that paper. Everyone here is in your debt.
Yes, I think it is completing the fuse/transform more frequently, but we'll need to do an analysis of a large dump of cards to confirm that. And yes, I could try the Modern-only corpus. I'm afraid though that by limiting the size of the input corpus we might run into overfitting issues. But who knows? It might be worthwhile to try.
EDIT: Here's a link to a compressed file containing the network and a modified LSTM.lua file. So if you want to sample from this network, you need to do three things. First, go into your model subdirectory and rename the current LSTM.lua file to LSTM_old.lua. Then copy the new one into that directory. Lastly, open up the sampling script you're going to use and at the end of the "require" directives, add "require model.LSTM", because I had to make a subclass of the Add (bias) module to allow me to specify a starting bias and the sampling script won't recognize the new subclass unless you do that. I'm sure there's a cleaner, less hackish way of going about this, but that's what I did for now. Let me know if you run into any issues.
EDIT(2): Oh, and yes, the results probably aren't perfect yet. I haven't tested it yet but I'm sure there are shortcomings. But as we identify the issues we can come up with more modifications to the architecture and the input corpus to fix those issues.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I like that we get lots of very interesting commons and uncommons, like these:
Blood Reckoning
B
Instant
As an additional cost to cast Blood Reckoning, sacrifice a creature.
Blood Reckoning deals 4 damage to target creature or player.
Demon's Torment
1B
Enchantment - Aura (Uncommon)
Enchant creature
Enchanted creature gets +2/+0 and has first strike.
Transmute 1BB
They're varied and interesting.
At the same time, I'm noticing that when this new network messes up, it can do so in very weird, subtle ways:
Energy Form
1BB
Enchantment (Rare)
Whenever a player casts a spell, that player shuffles the top card of his or her library into his or her graveyard.
Jungle Ward
G
Enchantment - Aura (Common)
Enchant creature
Enchanted creature has "T: Put a 1/1 red Goblin creature token named Sentinel onto the battlefield."
Wind Spawn
3GG
Creature - Beast (Rare)
Whenever Wind Spawn attacks, you may put a creature card from a graveyard onto the battlefield under your control. That creature is an island.
4/3
It also produces some downright strange flip/transform/fuse cards:
|mardu scarecrow||artifact creature||scarecrow|N|&^^^/&^^^|{^^^^^}|{^^}, sacrifice @: look at target opponent's hand.\fuse|
|talas sanctuary||enchantment||aura|O||{^^WW}|enchant creature\enchanted creature gets +&^^/+& and has "{BB}: regenerate this creature."|
|ith's charm|legendary|creature||eldrazi|Y|&^^^^^^^^^^^^^^^^/&^^^^^^^^^^^^^^^^|{^^^^^^^^^^^^^^}|at the beginning of each end step, if you control no untapped lands, flip @.|
|ithomancer|legendary|creature||eldrazi|Y|&^^^^^/&^^^^^^^^^|{^^^^^^^^^}|when @ enters the battlefield, put two &^/&^ white kithkin soldier creature tokens onto the battlefield.|
Now, there are ways that we can further improve on these results:
First, I think the best course of action is to eliminate names. Names are great for flavor, but remember that the network is being rewarded and penalized equally for all parts of the card. If you look at this graph of an older network's output activations reading the card Kemba, Kha Regent, there's a lot of activity going on in the first seventeen time steps when the network is trying to predict the name. I actually think that there's way too much of the network's reasoning power being exhausted on trying to come up with believable names. Yes, it's nice when the network comes up with intelligent names like "Blightwater Baron" or "Razorstone Dragon", but we really shouldn't be giving equal weight to that activity as we do, say, getting X costs correct. It's also a problem because network units dedicated to name generation don't do anything else useful for the card.
Instead I think we should train a name/flavor-generating network separately, one that takes information about an unnamed card and then comes up with an appropriate name for it.
I'm also looking into some techniques that maplesmall uncovered including batch normalization, which might help improve the consistency of our results. I'll keep y'all posted.
P.S. I tried priming the name with "fuc", and I really only get variations on one word which I won't reproduce here, but they are so very hilarious. Like, that word isn't in the corpus at all, and yet the network gives me such... incredibly colorful results. (X Witch, Xmaster's Wisdom, Xer of Bant, X trap, Xing Archon, etc.) EDIT: And the best, "Xing Life", a black enchantment that harms all players whenever anyone tries to cast a spell.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.