Well, here's a big, unedited dump. Unfortunately, the shakespeare makes it annoying (not too difficult, but annoying) to extract cards from it automatically.
It seems to just be pretty much the same as before, but a bit stupider, because the network has devoted some neurons to Shakespeare, though the names do seem a bit more creative maybe.
It was certainly a fun experiment! I did find this powerful card among them!
I'm at a satisfactory spot with it, pending any new developments that would let me train something bigger and better on the architecture I have access to. I can whip up new cards of a quality similar to the ones from my last couple posts on demand, which is good enough to keep my cube supplied for as infrequently as I get to play it!
Planeswalkers, well. Working yes, resembling anything in print, no. I get about one planeswalker or leveler per 300 cards I generate, and they always have only one loyalty ability or level up line. Four examples:
Snip
I guess this is all one can expect from common rarity planeswalkers, though...
Talcos was having a similar issues with the planeswalkers their neural network was generating. A couple work arounds for one loyalty ability walkers was to either re-feed them into the network and prompt it to continue the card, or "Randomizing the fields as hardcast_sixdrop has done has caused the network to relax its restrictions on card length, which means we can get planeswalkers with 3 abilities (with no special training)"
I found some nice art for this card from the batch you posted:
Probably a bit too powerful to keep in the cube, but if I toss it I will definitely recycle the art.
Currently the one of the only two planeswalkers in the cube.
I wanted to use some non-bipedal art for this one, but I couldn't find any ones that really fit with what this guy (Gal?) does.
Too much of a classic to exclude.
Probably a little under powered, but I like it!
This is the thing you see in the nightmares where you are sinking into a black ocean void.
Talcos was having a similar issues with the planeswalkers their neural network was generating. A couple work arounds for one loyalty ability walkers was to either re-feed them into the network and prompt it to continue the card, or "Randomizing the fields as hardcast_sixdrop has done has caused the network to relax its restrictions on card length, which means we can get planeswalkers with 3 abilities (with no special training)"
Ah! I've been following this thread since nearly the beginning, but with so many stops and starts I've lost track of some things. I think the field randomization option is in the parameters you can set for mtg-rnn... I may give that a try when I've got a few more $$ to throw at AWS. The planeswalkers you included in this post are amazing, so if I can manage something similar to those, I'll be super happy!
I found some nice art for this card from the batch you posted:
I also saw a couple that were decent cards, except they were incredibly off-color.
That Cup actually has decent flavor.
Private Mod Note
():
Rollback Post to RevisionRollBack
Vorthos Cartography - Check out my completed maps of Zendikar and Innistrad!
"You say 'learn from history,' but that does not mean 'learn the same bull***** the people in history learned alongside phrenology and alchemy.'" - The Blinking Spirit
I may give that a try when I've got a few more $$ to throw at AWS.
Could you describe your experiences with training on AWS? I've been feeling impatient with what I can manage on CPU, but I don't really have the budget for buying a powerful GPU. I assumed that running through AWS or Amazon Lambda would be cheaper than building my own because of economies of scale, but I haven't gotten a chance to look into it yet.
Could you describe your experiences with training on AWS? I've been feeling impatient with what I can manage on CPU, but I don't really have the budget for buying a powerful GPU. I assumed that running through AWS or Amazon Lambda would be cheaper than building my own because of economies of scale, but I haven't gotten a chance to look into it yet.
I highly recommend it! I fumbled about for a little while because their default Linux images don't install CUDA easily, but once I found a public AMI with CUDA preinstalled (I couldn't find the exact one referenced earlier in this thread, but "torch-ubuntu-14.04-cuda-7.0-28 (ami-c79b7eac)" worked), everything went very smoothly. Ultimately I can train a fresh network and sample from it for about $5. Then I can turn off the instance and incur no further charges until I bring it back online to sample some more.
If you're unfamiliar with EC2, I'd recommend grabbing a free tier instance to play with at first so you don't throw away money just getting the hang of how they work. I'd been using AWS for a few months for work purposes before I made my own AWS account, and even then I wasted some cash by not grabbing that AMI first thing.
Based on my current research on generating programs with neural networks, I've had a new idea about evaluating networks that generate cards. Instead of looking at the loss (which is kind of meaningless) or bringing in outside metrics (which are hard to write), we could use a sequence to sequence learning task where we prime the network with part of a card, then see how accurate it is at generating the rest of it.
For example, if we had a format where the mana cost is last (which is easy enough to arrange), we could give the neural network everything about some real card up to the mana cost, then see what mana cost it spits out. You still have to figure out how to define an accuracy metric, but it's a whole lot easier and more meaningful than trying to use the validation loss or something like that.
It doesn't have to be cost: this sort of evaluation would work for any part of the card that is determined by other parts. For example, it makes less sense from an evaluation standpoint to prime with the cost R and then compare accuracy against Lightning Bolt, because there are a lot of cards that cost R. But if we were to give it everything except the type, there are many cards for which only a few types make sense - if the card has a trigger on entering the battlefield, then clearly it can't be an instant. I think these sorts of metrics will be particularly useful because they can identify networks that know things and generate plausible cards independently of their ability to memorize lots of cards exactly. And we can sort out the overfitters separately by looking at word2vec distances.
Anyone have any ideas about tasks like this that would make sense? I think predicting costs is going to be the big one. There are more abilities that resemble costs other than just the mana cost though: I can think of kicker, suspend, echo, evoke, any others? How about prowl and ninjutsu? The thing is some of them, like kicker, are strongly determined by the text: if the rest of the text mentions being kicked, then there better be a kicker ability. Ideally I'd like to separate all of these out so I can put them at the end as special 'cost-like' abilities, to make the dependencies as clear as possible, and then evaluate how good the networks are at learning those dependencies.
How fuzzy can the evaluation logic be? If you're looking only for exact hits, then mana cost will work great, but it's going to be pretty limited beyond that. If on the other hand there's a score on how close the NN got, you could do things like:
Prime a planeswalker up to the subtype line; do you get three loyalty abilities?
Similar for levelers, do you get more than one level tier?
If you prime a subtype with clear themes, do you get appropriate keywords? E.g. a Bird should fly, a Wurm should not, an Ally scores better if it has a Rally or Cohort effect...
If the NN can score well for making appropriate decisions in examples like that, even if they're not the exact ones in the test data, that'd be cool.
My only concern with this method is that if we're encouraging it to generate cards that resemble real cards, we could end up stifling its 'natural creativity' to a certain extent. Or would the temperature parameter take care of that for us?
How fuzzy can the evaluation logic be? If you're looking only for exact hits, then mana cost will work great, but it's going to be pretty limited beyond that. If on the other hand there's a score on how close the NN got, you could do things like:
Prime a planeswalker up to the subtype line; do you get three loyalty abilities?
Similar for levelers, do you get more than one level tier?
If you prime a subtype with clear themes, do you get appropriate keywords? E.g. a Bird should fly, a Wurm should not, an Ally scores better if it has a Rally or Cohort effect...
If the NN can score well for making appropriate decisions in examples like that, even if they're not the exact ones in the test data, that'd be cool.
That's the beauty of this method of evaluation. The evaluation logic can be as fuzzy as you want, all you have to do with the actual neural network is give in the input and let it the output whatever it thinks should come next, and then you can evaluate that output however you want. Even for mana costs it would probably make sense to give it some kind of accuracy based on how many symbols from the real cost it got right, with some penalty if the cost was not correctly formatted or not a cost.
My only concern with this method is that if we're encouraging it to generate cards that resemble real cards, we could end up stifling its 'natural creativity' to a certain extent. Or would the temperature parameter take care of that for us?
What we do for evaluation would have absolutely no impact on the neural network's training, it would just tell us extra things about how successful that training had been. We could still sample from the network as usual and get the full natural creativity if we wanted to, and we probably would for tasks like generating a set. We could also do target sampling exactly like Talcos's sample_hs script, but with more flexibility.
I was wondering if I could make a couple requests for those of you with working networks to prime:
What comes out when you prime it for 4 color legendary creatures? I have this guy in the cube I'm collecting and was wondering what the neural networks would come up with for other color combinations.
What does the network make when you prime it with one of the following two phrases? There are 6 cards in magic that contain "Draft @ face up" and 6 cards contain "Reveal @ as you draft it"
That's a terrific card. The blue obviously gives scry, and green gives trample, and +1/+1 counters are generic enough that white or black both go with that. The black probably influences the creature types more than anything.
Those links you pasted link back to this thread, unfortunately. Were you looking for this and this?
That's a terrific card. The blue obviously gives scry, and green gives trample, and +1/+1 counters are generic enough that white or black both go with that. The black probably influences the creature types more than anything.
Those links you pasted link back to this thread, unfortunately. Were you looking for this and this?
Pretty much, they're supposed to link to a magiccards.info search but for some reason the form linking is messing up. Your searches are missing Æther Searcher for some reason.
Ol' Debreic was generated when Telcos was asked to generated white zombies.
Ah, whoops, I have a discrepancy between rules text and name text having the Æ and Ae characters respectively (which causes my rules-text anonymizer to not behave properly). I should be able to fix that tonight. I suspect the links bugger up whenever you have [] in the link, which is why I used tinyurl for converting my links (which also have []s) to ones that this forum will understand.
I haven't tried priming yet. My understanding of it is that it's a pretty invasive interruption of the NN's thought process, so I tend to prefer sampling large numbers of cards and filtering down if I'm looking for a specific thing.
That said, there's always a first time! I can grab a few primed samples when next I bring up my instance, before my project of retraining with random field order.
Very small update, I updated mtgencode with the latest cards and fixed the new C symbol.
Currently working on training and evaluating with my shiny new torch-rnn framework. I'll update the tutorial once everything is working.
torch-rnn tends to use a lot less memory than char-rnn, so it will probably be easier for people with middle-end graphics cards to train big networks. I don't have the precise numbers, but I wouldn't be surprised if the traditional 3-layer, 512-cell networks that have produced most of the cards on here would fit comfortably on even a 1-GB gpu.
Would that scale up as well? So if I could do a 3-layer 512-cell network on my 6gb GPU previously, would I be able to do 3-layer 1024 or 2048-cell now?
What comes out when you prime it for 4 color legendary creatures?
Alas, priming using specific fields doesn't seem to work for me. The inserted text goes in the wrong places. I'm guessing that sample_hs_v3.lua (which has all the priming options) isn't up to date for the format the network was trained on.
Would that scale up as well? So if I could do a 3-layer 512-cell network on my 6gb GPU previously, would I be able to do 3-layer 1024 or 2048-cell now?
Yes, absolutely. Last time I tried to scale up, I ran into a lot of issues with CUDA crashing and halting training. Part of my current toolchain is a script that basically watches the training process, and can both restart it when it fails and change the training parameters periodically based on outside input, like some custom accuracy measure. It's really ugly, but the version I'm using for my research has managed to administrate the training curriculum from this paper for over a day now, and actually produces better accuracy than what the authors published.
What comes out when you prime it for 4 color legendary creatures?
Alas, priming using specific fields doesn't seem to work for me. The inserted text goes in the wrong places. I'm guessing that sample_hs_v3.lua (which has all the priming options) isn't up to date for the format the network was trained on.
That's correct, sample_hs_v3.lua expects a format that isn't used any more. You can force the encoder to produce it (I think, but I'm not sure which one it is specifically), but you'd have to retrain or find the right legacy checkpoint to sample. Going forward I think all formats will have standardized field labels just to remove that headache. Part of the new toolchain will also include a massively reworked version of the targeted sampling script, with a full python api and probably a good command line tool as well that will at least reproduce the original functionality.
Hacking on this starts now, and will continue through the weekend. I hope to have new checkpoints / card dumps soon, and a new tutorial soon after.
That's correct, sample_hs_v3.lua expects a format that isn't used any more. You can force the encoder to produce it (I think, but I'm not sure which one it is specifically), but you'd have to retrain or find the right legacy checkpoint to sample.
There's an --encoding option called "old"; I have a suspicion that'd be it! But why regress to the past when we can look to the future, eh?
Anyway, I just paid the month's AWS bill a couple days ago, so I'm trying a 512 x 3 network using field and mana randomization. I don't remember if random mana did any notable good, but it sounded like random fields at least were a worthy way to go.
That's correct, sample_hs_v3.lua expects a format that isn't used any more. You can force the encoder to produce it (I think, but I'm not sure which one it is specifically), but you'd have to retrain or find the right legacy checkpoint to sample.
There's an --encoding option called "old"; I have a suspicion that'd be it! But why regress to the past when we can look to the future, eh?
Anyway, I just paid the month's AWS bill a couple days ago, so I'm trying a 512 x 3 network using field and mana randomization. I don't remember if random mana did any notable good, but it sounded like random fields at least were a worthy way to go.
Randomizing mana is usually a good idea, Talcos did some work that showed that if you don't, the network tends to be lazy and refuse to look at much of the mana cost field, which means it has a harder time with color.
Randomizing fields is fun, I think there's a higher limit to how much we can teach the network there, as there are more possible randomized cards we could give it (almost infinite, really), but it also makes the learning problem harder, so with the finite amount of training that we can do it tends to give less consistent results.
What I want do do with the new training process is implement some form of curriculum learning, where we give it a mixture of ordered cards and randomized ones. That way, it has to learn the meanings of the field labels independently of order, but when we generate cards we can tell it to give us nicely ordered ones and it will be more consistent. This is all part of a grander hypothesis I have about adding ambiguity to your encoding scheme to increase the effective amount of training you can do - basically, you want to make the data just ambiguous enough that when you present it in all of the different ways you can, you have a big enough dataset to train the largest kind of model you can handle with your hardware.
What comes out when you prime it for 4 color legendary creatures?
Alas, priming using specific fields doesn't seem to work for me. The inserted text goes in the wrong places. I'm guessing that sample_hs_v3.lua (which has all the priming options) isn't up to date for the format the network was trained on.
That's correct, sample_hs_v3.lua expects a format that isn't used any more. You can force the encoder to produce it (I think, but I'm not sure which one it is specifically), but you'd have to retrain or find the right legacy checkpoint to sample. Going forward I think all formats will have standardized field labels just to remove that headache. Part of the new toolchain will also include a massively reworked version of the targeted sampling script, with a full python api and probably a good command line tool as well that will at least reproduce the original functionality.
Hacking on this starts now, and will continue through the weekend. I hope to have new checkpoints / card dumps soon, and a new tutorial soon after.
Darn, that's a little sad to hear I'll have to wait. Oh well, still looking forwards to seeing what the networks produce next.
Very small update, I updated mtgencode with the latest cards and fixed the new C symbol.
Currently working on training and evaluating with my shiny new torch-rnn framework. I'll update the tutorial once everything is working.
torch-rnn tends to use a lot less memory than char-rnn, so it will probably be easier for people with middle-end graphics cards to train big networks. I don't have the precise numbers, but I wouldn't be surprised if the traditional 3-layer, 512-cell networks that have produced most of the cards on here would fit comfortably on even a 1-GB gpu.
Are there any speed improvements to the new framework?
Randomizing fields is fun, I think there's a higher limit to how much we can teach the network there, as there are more possible randomized cards we could give it (almost infinite, really), but it also makes the learning problem harder, so with the finite amount of training that we can do it tends to give less consistent results.
Indeed, yesterday's training was rather a bust. The cards were not of appreciably better quality; planeswalkers and levelers didn't get any longer, and there were more malformed cards and word salad than I'd had with the previous network. Methinks a 512 x 3 can't handle the extra puzzle of randomized order very well.
I did get some amusing stuff, of course. Like a creature that lets you sacrifice planeswalkers to get +2/+2. Planeswalkatog?
I still have enough fun budget to do one more round of training, so I'll take advantage of the improved colorless mana handling and do a round with randomized mana only.
Private Mod Note
():
Rollback Post to RevisionRollBack
To post a comment, please login or register a new account.
It was certainly a fun experiment! I did find this powerful card among them!
Talcos was having a similar issues with the planeswalkers their neural network was generating. A couple work arounds for one loyalty ability walkers was to either re-feed them into the network and prompt it to continue the card, or "Randomizing the fields as hardcast_sixdrop has done has caused the network to relax its restrictions on card length, which means we can get planeswalkers with 3 abilities (with no special training)"
I found some nice art for this card from the batch you posted:
Probably a bit too powerful to keep in the cube, but if I toss it I will definitely recycle the art.
Currently the one of the only two planeswalkers in the cube.
I wanted to use some non-bipedal art for this one, but I couldn't find any ones that really fit with what this guy (Gal?) does.
Too much of a classic to exclude.
Probably a little under powered, but I like it!
This is the thing you see in the nightmares where you are sinking into a black ocean void.
Too cool! Thanks!
That Cup actually has decent flavor.
"You say 'learn from history,' but that does not mean 'learn the same bull***** the people in history learned alongside phrenology and alchemy.'" - The Blinking Spirit
Could you describe your experiences with training on AWS? I've been feeling impatient with what I can manage on CPU, but I don't really have the budget for buying a powerful GPU. I assumed that running through AWS or Amazon Lambda would be cheaper than building my own because of economies of scale, but I haven't gotten a chance to look into it yet.
If you're unfamiliar with EC2, I'd recommend grabbing a free tier instance to play with at first so you don't throw away money just getting the hang of how they work. I'd been using AWS for a few months for work purposes before I made my own AWS account, and even then I wasted some cash by not grabbing that AMI first thing.
For example, if we had a format where the mana cost is last (which is easy enough to arrange), we could give the neural network everything about some real card up to the mana cost, then see what mana cost it spits out. You still have to figure out how to define an accuracy metric, but it's a whole lot easier and more meaningful than trying to use the validation loss or something like that.
It doesn't have to be cost: this sort of evaluation would work for any part of the card that is determined by other parts. For example, it makes less sense from an evaluation standpoint to prime with the cost R and then compare accuracy against Lightning Bolt, because there are a lot of cards that cost R. But if we were to give it everything except the type, there are many cards for which only a few types make sense - if the card has a trigger on entering the battlefield, then clearly it can't be an instant. I think these sorts of metrics will be particularly useful because they can identify networks that know things and generate plausible cards independently of their ability to memorize lots of cards exactly. And we can sort out the overfitters separately by looking at word2vec distances.
Anyone have any ideas about tasks like this that would make sense? I think predicting costs is going to be the big one. There are more abilities that resemble costs other than just the mana cost though: I can think of kicker, suspend, echo, evoke, any others? How about prowl and ninjutsu? The thing is some of them, like kicker, are strongly determined by the text: if the rest of the text mentions being kicked, then there better be a kicker ability. Ideally I'd like to separate all of these out so I can put them at the end as special 'cost-like' abilities, to make the dependencies as clear as possible, and then evaluate how good the networks are at learning those dependencies.
How fuzzy can the evaluation logic be? If you're looking only for exact hits, then mana cost will work great, but it's going to be pretty limited beyond that. If on the other hand there's a score on how close the NN got, you could do things like:
Prime a planeswalker up to the subtype line; do you get three loyalty abilities?
Similar for levelers, do you get more than one level tier?
If you prime a subtype with clear themes, do you get appropriate keywords? E.g. a Bird should fly, a Wurm should not, an Ally scores better if it has a Rally or Cohort effect...
If the NN can score well for making appropriate decisions in examples like that, even if they're not the exact ones in the test data, that'd be cool.
What we do for evaluation would have absolutely no impact on the neural network's training, it would just tell us extra things about how successful that training had been. We could still sample from the network as usual and get the full natural creativity if we wanted to, and we probably would for tasks like generating a set. We could also do target sampling exactly like Talcos's sample_hs script, but with more flexibility.
What comes out when you prime it for 4 color legendary creatures? I have this guy in the cube I'm collecting and was wondering what the neural networks would come up with for other color combinations.
What does the network make when you prime it with one of the following two phrases? There are 6 cards in magic that contain "Draft @ face up" and 6 cards contain "Reveal @ as you draft it"
Those links you pasted link back to this thread, unfortunately. Were you looking for this and this?
Pretty much, they're supposed to link to a magiccards.info search but for some reason the form linking is messing up. Your searches are missing Æther Searcher for some reason.
Ol' Debreic was generated when Telcos was asked to generated white zombies.
That said, there's always a first time! I can grab a few primed samples when next I bring up my instance, before my project of retraining with random field order.
Currently working on training and evaluating with my shiny new torch-rnn framework. I'll update the tutorial once everything is working.
torch-rnn tends to use a lot less memory than char-rnn, so it will probably be easier for people with middle-end graphics cards to train big networks. I don't have the precise numbers, but I wouldn't be surprised if the traditional 3-layer, 512-cell networks that have produced most of the cards on here would fit comfortably on even a 1-GB gpu.
Would that scale up as well? So if I could do a 3-layer 512-cell network on my 6gb GPU previously, would I be able to do 3-layer 1024 or 2048-cell now?
Hacking on this starts now, and will continue through the weekend. I hope to have new checkpoints / card dumps soon, and a new tutorial soon after.
Anyway, I just paid the month's AWS bill a couple days ago, so I'm trying a 512 x 3 network using field and mana randomization. I don't remember if random mana did any notable good, but it sounded like random fields at least were a worthy way to go.
Randomizing fields is fun, I think there's a higher limit to how much we can teach the network there, as there are more possible randomized cards we could give it (almost infinite, really), but it also makes the learning problem harder, so with the finite amount of training that we can do it tends to give less consistent results.
What I want do do with the new training process is implement some form of curriculum learning, where we give it a mixture of ordered cards and randomized ones. That way, it has to learn the meanings of the field labels independently of order, but when we generate cards we can tell it to give us nicely ordered ones and it will be more consistent. This is all part of a grander hypothesis I have about adding ambiguity to your encoding scheme to increase the effective amount of training you can do - basically, you want to make the data just ambiguous enough that when you present it in all of the different ways you can, you have a big enough dataset to train the largest kind of model you can handle with your hardware.
Darn, that's a little sad to hear I'll have to wait. Oh well, still looking forwards to seeing what the networks produce next.
Are there any speed improvements to the new framework?
I did get some amusing stuff, of course. Like a creature that lets you sacrifice planeswalkers to get +2/+2. Planeswalkatog?
I still have enough fun budget to do one more round of training, so I'll take advantage of the improved colorless mana handling and do a round with randomized mana only.