Weren't we talking a while back about generating cards word-by-word rather than letter-by-letter? I found someone who's already implemented this in a modified version of Karpathy's code and used it to generate clickbait titles (21 MTG Cards Generated By Machines That May Shock You!). Any chance we could use this for our purposes?
Weren't we talking a while back about generating cards word-by-word rather than letter-by-letter? I found someone who's already implemented this in a modified version of Karpathy's code and used it to generate clickbait titles (21 MTG Cards Generated By Machines That May Shock You!). Any chance we could use this for our purposes?
We could, actually. In some ways it's advantageous because we only ever use the same small set of words over and over, and there's no sense of making 8 separate passes to generate the word "creature". Unless of course the network is using that time to allow further predictions to percolate, but I haven't seen any evidence of that either way.
On the other hand, we do lose the benefit of character-level priming. For example, I can prime the network with the word "Flood" and I get back text like...
* "Flood ~ B: Regenerate @. Activate this ability only if you control an elf."
* "Flood 2"
* "Flood a white creature you control"
It's a nice feature to have. But I'd be willing to sacrifice that if we get a leaner, more efficient model. That option remains on the table.
Thing is, if I just download and use this modified RNN with our MTG corpus which has been optimized for character-level encoding (especially when it comes to numbers and mana costs), I'm not sure we'll get good results. I'm still going to try it tonight, but my prediction is gibberish.
Couldn't we still prime the word-level network with words?
Thing is, if I just download and use this modified RNN with our MTG corpus which has been optimized for character-level encoding (especially when it comes to numbers and mana costs), I'm not sure we'll get good results. I'm still going to try it tonight, but my prediction is gibberish.
I think we may need to add in some spaces, to sanitize the input, like how the neural-storyteller is trained on text that looks like this: "john 's book does n't address the underlying causes of the civil war . I would n't recommend it ! ". Otherwise "flying\flanking" is treated as a single word rather than "flying (line break) flanking".
Also, the word-rnn takes vectorized versions of words as input, in the form of GloVe vectors. GloVe is conceptually similar to word2vec, same idea. You have to first train a vector model for the text, then what happens is you pass in the text for training, it gets broken up into words, the words get mapped to vectors, and the network takes a vector and tries to predict which vector will come next after it. Then we use the vector dictionary to convert the output back to English so we can read it. That's my understanding, anyway.
EDIT: For a vector representation that is just a one-hot encoding (a simple encoding), it'd be like replacing every word in the text with a unique character, like "creatures you control" -> "A Q C". The GloVe vector is different because the numbers don't just encode the position in the dictionary but also semantic information, so the vector for black is similar to the vector for white. Doing this kind of vectorization first is like a mother bird pre-digesting food before feeding it to its offspring.
Couldn't we still prime the word-level network with words?
Yes. You could. So long as the words were words included in the training/test data. It's possible that the author of the code has a default, blank vector for unknown words, but I haven't looked into it.
Weren't we talking a while back about generating cards word-by-word rather than letter-by-letter? I found someone who's already implemented this in a modified version of Karpathy's code and used it to generate clickbait titles (21 MTG Cards Generated By Machines That May Shock You!). Any chance we could use this for our purposes?
There are pros and cons to doing it word-by-word compared to character-by-character models. Opening up to word-by-word would allow for some natural language processing tricks which may be useful. However, like Talcos said, it would mean losing the ability for the net to create and use novel words (no more tromple). I definitely think it's worth a try to see how it differs.
Couldn't we still prime the word-level network with words?
We could still prime with words, but only words that exist in the corpus
EDIT: I see Talcos beat me to the punch on answering your questions!
EDIT 2: In order to use that specific library, we would have to check and make sure that every word in the mtg corpus has a GloVe vector, though it shouldn't be a difficult thing to work around regardless. A cool thing about this library is that the word vectors are also being updated during training so that they are fine-tuned for our task
EDIT 2: In order to use that specific library, we would have to check and make sure that every word in the mtg corpus has a GloVe vector, though it shouldn't be a difficult thing to work around regardless. A cool thing about this library is that the word vectors are also being updated during training so that they are fine-tuned for our task
What do you mean? I figured we'd train a fresh GloVe model based on the mtg corpus rather than using a pre-trained model, just like we did with word2vec.
@Talcos, I would like to (if I may), address the larger lesson contained within this thread; and ask you (honestly its an open question that I would love to hear from everybody on) hows to let it guide my life choices.
That lesson is The Future. More Specifically that it is coming ever more rapidly than before. What I don't quite hear you say with each new batch of papers you have read is "PLEASE, Throw out your ideas and impossible notions: every month some of those become reality and every six months some of those look like old, tired, well-trod paths." This field isn't the only one that has is going through major revolutions and surpassing thermopylaen constraints.
I am out of undergrad, but not through it, I want to go back but what stopped me the first time was my lack of a goal, a purpose to strive for. I had known that a degree just isn't what it once was, so I didn't have the drive to finish. I think I know what I want to do, sort of, I think I know what degree I want to get and I feel like the job I want just doesn't exist yet but this thread reignited my passion for leaning and technology and the future. I am fairly good at math, it has been a while and I know I will be able to get back into it, I enjoy coding, and I love Space and thinking about the future. I think I'll go into Aerospace design. I've read some articles that really resonated with me about jobs of the future (its amazing how a google search for "jobs that don't exist yet" can hit so spot on). I would like to hear your suggestions on how to best spend my time waiting for my job to exist? Do I figure out kind of what I want it to be and keep pushing toward it till it exists? Or do I get into a related field that I can take into that new area? I feel like Both is the best approach: I should live the future, waiting and watching for the sparkling start of what may be a career path 15 years off and blaze a trail to it. But does that lend itself more toward the academic side where Reading the current paper is part of the job, or something else?
I guess some part of this is for any younger people who are having trouble "picking what they want to do for the rest of their lives". Like that is really how that works; I drop out of college and am on track to stay making more than those who finished for at least the next 10 years. But it's not doing what I want to be doing, I see myself living in a time where if I don't live long into my second century I will see people born that will. I want to live in a world where inter planetary travel happens, where if Starfleet isn't started before I die, we are on the way to getting there.
Where do you see availability for people who want to work on tomorrow's problems today?Maybe it is here with machine learning, maybe with sub-planetary craft design, or phillosophy of AI design, or something else.
-------------------------------------
Sorry for taking up space here but I see so many glimmering shreds of the future shining through here I wanted to hear from those who have laid hands on it.
EDIT 2: In order to use that specific library, we would have to check and make sure that every word in the mtg corpus has a GloVe vector, though it shouldn't be a difficult thing to work around regardless. A cool thing about this library is that the word vectors are also being updated during training so that they are fine-tuned for our task
What do you mean? I figured we'd train a fresh GloVe model based on the mtg corpus rather than using a pre-trained model, just like we did with word2vec.
A common practice in NLP is to initialize with GloVe or word2vec vectors since they're already trained on a huge corpus. From there we could continue training our vectors solely on the mtg corpus to fine-tune if we would like. Since the vectors can continue to train during the training of our net, it may be superfluous to have the middle fine-tuning step. However, due to the nature of the mtg corpus and how it differs from regular English, initializing to pre-trained GloVe/word2vec might not benefit us as much
I am fairly good at math, it has been a while and I know I will be able to get back into it, I enjoy coding, and I love Space and thinking about the future. I think I'll go into Aerospace design.
You might want to talk to maplesmall. His background is in astrophysics, so he would have more to say about potential career paths in that domain that I do.
I've read some articles that really resonated with me about jobs of the future (its amazing how a google search for "jobs that don't exist yet" can hit so spot on). I would like to hear your suggestions on how to best spend my time waiting for my job to exist? Do I figure out kind of what I want it to be and keep pushing toward it till it exists? Or do I get into a related field that I can take into that new area?
I would go with whatever you are passionate about. Do some research, and talk to people who are in the fields you are interested in. If you want more from my perspective in particular, feel free to send me a private message.
I will say this: getting a degree in any field can open doors for you in so far as it demonstrates your aptitude and willingness for learning and adapting.
But also keep this in mind: in an era of intense technological disruption across many different industries, it's very difficult to forecast the forecast the future of the job market with any certainty.
But does that lend itself more toward the academic side where Reading the current paper is part of the job, or something else?
That depends; it can go either way. But I'll warn you that academia is not for everyone. The wages are not great (e.g. I technically qualify for food stamps at this point. If I work really hard, I'll get a job where I technically don't qualify for food stamps.). The sheer number of dead ends and setbacks you face on a daily basis are daunting. You constantly have to fight and struggle to get funding.
In short, I would only go into academia if you feel that you absolutely have to, if every fiber of your being calls you to do such a thing. Just like being a musician or a minister.
Where do you see availability for people who want to work on tomorrow's problems today?Maybe it is here with machine learning, maybe with sub-planetary craft design, or phillosophy of AI design, or something else.
There are plenty of options. Most of them are available to those who pursue graduate degrees, get into networks of like-minded people, etc. Now, like I said, that can lead to research positions in academia or in industry, and your choices can vary depending on what kind of discipline you're wanting to go into.
A common practice in NLP is to initialize with GloVe or word2vec vectors since they're already trained on a huge corpus. From there we could continue training our vectors solely on the mtg corpus to fine-tune if we would like. Since the vectors can continue to train during the training of our net, it may be superfluous to have the middle fine-tuning step. However, due to the nature of the mtg corpus and how it differs from regular English, initializing to pre-trained GloVe/word2vec might not benefit us as much
Ah, gotcha. Yeah. I was thinking along the same lines.
You might want to talk to maplesmall. His background is in astrophysics, so he would have more to say about potential career paths in that domain that I do.
Well, I can tell you that without a PhD, the pickings are slim. I was lucky to get a software dev job with a Masters (in astrophysics, not software), and I was coding in my spare time along with the coding I learned in my degree. I'd not recommend stopping at a Masters; if you're passionate about something, go for the PhD. Much better future career prospects (it's what I should've done).
As usual, there's another crop of papers that have come out over the last month that might be worth investigating. Among them I saw a few that were relevant to us:
One that you didn't mention that I feel should be is this paper (and its corresponding github repo) for its potential as a means of generating completely novel card art. Their results are shockingly good-- better than anything else I've seen in the field to date.
As usual, there's another crop of papers that have come out over the last month that might be worth investigating. Among them I saw a few that were relevant to us:
One that you didn't mention that I feel should be is this paper (and its corresponding github repo) for its potential as a means of generating completely novel card art. Their results are shockingly good-- better than anything else I've seen in the field to date.
Ah, yes. I had mentioned that a few days ago, but yes, it's definitely worth following. There might be some scalability problems with the approach as is (not sure, looking into that), but I'm definitely checking that out.
On a related note, I've been playing with the demo of illustration2vec, an algorithm that maps illustrations to semantic vector representations using convolutional neural networks. That's specifically useful for our art generation, as we're working with illustrations in many different styles rather than, say, photographs.
They have an online demo and it's entertaining. You give it an image, it computes the vector representation and then we work backwards from the vector representation to figure out what tags are most appropriate for the image.
I've attached an image below showcasing it (on the off chance that their servers get taken offline by the hug of death). I gave it the art for Basandra, Battle Seraph and it tells me that there is one woman standing alone. She is wearing armor, has long red hair and wings. She is wearing a cape and wielding a sword (whip, actually, but close enough), and has an exposed navel and huge.. tracts of land. Oh, and the system thinks she has blue eyes; they're red, but whatever.
----
EDIT: Also, hardcast and I were talking earlier about a paper entitled "The Mechanism of Additive Composition", which gives a formal treatment of how word vectors can be added together to make a vector that is the sum of their semantic meanings (e.g. v("capital") + v("france") == v("paris")). It's the sorcery that allows us to use word2vec to measure the similarity of novel cards to existing ones. Moreover, in that paper, the authors talk about how to extend the approach to preserve information about word order. That's really important because right now this card...
Deadly Griffon Rider 1BW
Creature - Human Knight
Flying T: Target creature gains deathtouch until end of turn.
2/2
has the same vector representation as this card...
Deadly Griffon Trainer 1BW
Creature - Human Knight
Deathtouch T: Target creature gains flying until end of turn.
2/2
even though functionally they are somewhat different. This latest work shows us how to fix that problem, so that'll be helpful. It's a 40 page paper. I've printed out a copy and am in the process of making sense of it.
As usual, there's another crop of papers that have come out over the last month that might be worth investigating. Among them I saw a few that were relevant to us:
One that you didn't mention that I feel should be is this paper (and its corresponding github repo) for its potential as a means of generating completely novel card art. Their results are shockingly good-- better than anything else I've seen in the field to date.
Ah, yes. I had mentioned that a few days ago, but yes, it's definitely worth following. There might be some scalability problems with the approach as is (not sure, looking into that), but I'm definitely checking that out.
Oh, so you did. I should have remembered that.
Actually, I'm pretty curious if you could use that sort of generative model to create text-- change it to a 1d convolution of raw characters and use a generative-adversarial technique with their pyramidal deconvolution structure to create text. Unfortunately, you couldn't prime that at all.
Another odd idea comes from the neural storyteller and the idea of skip-thought vectors (or just thought vectors, 'cause I don't totally understand the difference). Any idea if it would be possible to have a thought vector encoder which passes its output to several different decoders? For example, you give it a piece of art, it creates a semantic representation (as in the neural storyteller), and from that it goes to a neural network that converts thought vectors to names, thought vectors to card abilities, and thought vectors to flavor text. That way, they all have the same semantic meaning, so you have a sort of coherency between the card, and hopefully (as seen in the neural storyteller) get somewhat coherent flavor text.
Both are departures from char-rnn, but might be able to be made to work on a character level.
Actually, I'm pretty curious if you could use that sort of generative model to create text-- change it to a 1d convolution of raw characters and use a generative-adversarial technique with their pyramidal deconvolution structure to create text. Unfortunately, you couldn't prime that at all.
I think there's been some work along those lines, but I'd have to go back and check. And yeah, it would limit our ability to do priming in the direct way that we can now, but I see nothing wrong with trying with different approaches, each with their own attendant advantages and disadvantages.
Another odd idea comes from the neural storyteller and the idea of skip-thought vectors (or just thought vectors, 'cause I don't totally understand the difference). Any idea if it would be possible to have a thought vector encoder which passes its output to several different decoders? For example, you give it a piece of art, it creates a semantic representation (as in the neural storyteller), and from that it goes to a neural network that converts thought vectors to names, thought vectors to card abilities, and thought vectors to flavor text. That way, they all have the same semantic meaning, so you have a sort of coherency between the card, and hopefully (as seen in the neural storyteller) get somewhat coherent flavor text.
It's something that has crossed my mind (and I may have mentioned something to that effect some 20 or 30 pages ago), but I'm not sure if we're there yet. It would be nice if we had some kind of learned representation that acted as a bridge between all these different modalities of a card.
Not that I have any idea about how to achieve that (yet), but it's fun to think about.
I was up early this morning, and I did some coding, and I may or may not almost have that whole stabilization-norm thing working with the training script. I'll need to finish up the code and do some tests later. Torch provides a way for us to judge our networks according to multiple criteria, which is helpful. The first criteria is that the network gets the answer right. The second criteria is that the network does so while acting in a controlled, calm manner (they're very hyperactive by nature).
The question is whether we can teach it self-control and restraint while keeping its creative spirit intact. The literature seems to suggest that it is possible, but there may be some fine-tuning of parameters involved. The process more closely resembles psychiatry than mathematics. We'll see.
I'm poking at Rust code still. I've got more of a handle on how various things fit together, and the main thing I'm noticing is that, design-wise, Rust traits have very different priorities from Haskell typeclasses. The top-level ASTs my code works with (I've given up on tensors for now) form a commutative ring, which can be represented fairly simply by Num in Haskell (16 patterns to match). The corresponding set of definitions in Rust looks like it'll require something like at least 22 impl blocks, some of which aren't even useful, and which are mostly boilerplate that I don't feel confident writing macros to abstract away.
I just finished a massive parameter sweep across 60 different parameter settings for training models with default char-rnn and my modified mtg-rnn. The full data of the sweep is available on my google drive.
Here's some fun metrics I generated with some of my data analysis scripts:
Validation and distance metrics for baseline_XXX_1 component of mtg-rnn sweep.
Each column corresponds to a single checkpoint. 's' is the size, 'd' is the
dropout, and 'v' is the validation error. All of these checkpoints are from
epoch 50 (the end of training). So, 's128, d0, v0.3941' is a checkpoint
from a size 128 network with dropout 0, that had a validation loss of 0.3941.
If you look at the full sweep, it corresponds to mtg-rnn-sweep1/baseline_128_1.
All of the validation metrics like 'types' and 'pt' are simple string processing
tests to determine if a card has a property, and roughly check if it is used
in the correct way. For the specific definitions, you'll have to see the source
code in scripts/validate.py.
The 'names' and 'cards' distances are the average of the name text edit distance
and the word2vec semantic distance from each card in the dump to the nearest
real card. Calling them distances is a little misleading, they're really
similarity measures with 1.0 being identical.
This data isn't very scientific, but it is interesting to look at. There seems
to be a fair amount of variance between different 1MB dumps.
real cards s128, d0, v0.3941 s256, d0, v0.2736 s384, d0, v0.2117 s512, d0, v0.1952 s640, d0, v0.1798
-- overall -- -- overall -- -- overall -- -- overall -- -- overall -- -- overall --
total: 15065 total: 5666 total: 5820 total: 5960 total: 5777 total: 5800
good : 15061 (99.97%) good : 4457 (78.66%) good : 4979 (85.54%) good : 5277 (88.54%) good : 5153 (89.19%) good : 5130 (88.44%)
bad : 4 (0.026%) bad : 1209 (21.33%) bad : 841 (14.45%) bad : 683 (11.45%) bad : 624 (10.80%) bad : 670 (11.55%)
---- ---- ---- ---- ---- ----
types: types: types: types: types: types:
total: 15065 (100.0%) total: 5666 (100.0%) total: 5820 (100.0%) total: 5960 (100.0%) total: 5777 (100.0%) total: 5800 (100.0%)
good : 15065 (100.0%) good : 5648 (99.68%) good : 5818 (99.96%) good : 5956 (99.93%) good : 5774 (99.94%) good : 5796 (99.93%)
bad : 0 (0.0%) bad : 18 (0.317%) bad : 2 (0.034%) bad : 4 (0.067%) bad : 3 (0.051%) bad : 4 (0.068%)
pt: pt: pt: pt: pt: pt:
total: 8007 (53.14%) total: 2688 (47.44%) total: 3094 (53.16%) total: 2956 (49.59%) total: 3527 (61.05%) total: 2641 (45.53%)
good : 8007 (53.14%) good : 2648 (46.73%) good : 3078 (52.88%) good : 2943 (49.37%) good : 3519 (60.91%) good : 2618 (45.13%)
bad : 0 (0.0%) bad : 40 (0.705%) bad : 16 (0.274%) bad : 13 (0.218%) bad : 8 (0.138%) bad : 23 (0.396%)
lands: lands: lands: lands: lands: lands:
total: 533 (3.538%) total: 231 (4.076%) total: 225 (3.865%) total: 228 (3.825%) total: 110 (1.904%) total: 177 (3.051%)
good : 533 (3.538%) good : 184 (3.247%) good : 86 (1.477%) good : 147 (2.466%) good : 68 (1.177%) good : 126 (2.172%)
bad : 0 (0.0%) bad : 47 (0.829%) bad : 139 (2.388%) bad : 81 (1.359%) bad : 42 (0.727%) bad : 51 (0.879%)
X: X: X: X: X: X:
total: 757 (5.024%) total: 568 (10.02%) total: 407 (6.993%) total: 461 (7.734%) total: 484 (8.378%) total: 564 (9.724%)
good : 756 (5.018%) good : 74 (1.306%) good : 90 (1.546%) good : 144 (2.416%) good : 168 (2.908%) good : 201 (3.465%)
bad : 1 (0.006%) bad : 494 (8.718%) bad : 317 (5.446%) bad : 317 (5.318%) bad : 316 (5.469%) bad : 363 (6.258%)
kicker: kicker: kicker: kicker: kicker: kicker:
total: 114 (0.756%) total: 92 (1.623%) total: 51 (0.876%) total: 46 (0.771%) total: 93 (1.609%) total: 70 (1.206%)
good : 112 (0.743%) good : 5 (0.088%) good : 14 (0.240%) good : 15 (0.251%) good : 36 (0.623%) good : 19 (0.327%)
bad : 2 (0.013%) bad : 87 (1.535%) bad : 37 (0.635%) bad : 31 (0.520%) bad : 57 (0.986%) bad : 51 (0.879%)
counters: counters: counters: counters: counters: counters:
total: 401 (2.661%) total: 338 (5.965%) total: 475 (8.161%) total: 236 (3.959%) total: 237 (4.102%) total: 192 (3.310%)
good : 401 (2.661%) good : 38 (0.670%) good : 156 (2.680%) good : 82 (1.375%) good : 91 (1.575%) good : 68 (1.172%)
bad : 0 (0.0%) bad : 300 (5.294%) bad : 319 (5.481%) bad : 154 (2.583%) bad : 146 (2.527%) bad : 124 (2.137%)
choices: choices: choices: choices: choices: choices:
total: 175 (1.161%) total: 161 (2.841%) total: 144 (2.474%) total: 174 (2.919%) total: 114 (1.973%) total: 92 (1.586%)
good : 174 (1.154%) good : 1 (0.017%) good : 38 (0.652%) good : 78 (1.308%) good : 45 (0.778%) good : 30 (0.517%)
bad : 1 (0.006%) bad : 160 (2.823%) bad : 106 (1.821%) bad : 96 (1.610%) bad : 69 (1.194%) bad : 62 (1.068%)
auras: auras: auras: auras: auras: auras:
total: 2318 (15.38%) total: 1036 (18.28%) total: 1092 (18.76%) total: 1061 (17.80%) total: 852 (14.74%) total: 1074 (18.51%)
good : 2318 (15.38%) good : 1036 (18.28%) good : 1092 (18.76%) good : 1061 (17.80%) good : 852 (14.74%) good : 1074 (18.51%)
bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%)
equipment: equipment: equipment: equipment: equipment: equipment:
total: 200 (1.327%) total: 43 (0.758%) total: 82 (1.408%) total: 112 (1.879%) total: 44 (0.761%) total: 114 (1.965%)
good : 200 (1.327%) good : 43 (0.758%) good : 81 (1.391%) good : 112 (1.879%) good : 43 (0.744%) good : 114 (1.965%)
bad : 0 (0.0%) bad : 0 (0.0%) bad : 1 (0.017%) bad : 0 (0.0%) bad : 1 (0.017%) bad : 0 (0.0%)
planeswalkers: planeswalkers: planeswalkers: planeswalkers: planeswalkers: planeswalkers:
total: 61 (0.404%) total: 30 (0.529%) total: 20 (0.343%) total: 25 (0.419%) total: 15 (0.259%) total: 37 (0.637%)
good : 61 (0.404%) good : 0 (0.0%) good : 2 (0.034%) good : 4 (0.067%) good : 2 (0.034%) good : 6 (0.103%)
bad : 0 (0.0%) bad : 30 (0.529%) bad : 18 (0.309%) bad : 21 (0.352%) bad : 13 (0.225%) bad : 31 (0.534%)
levelup: levelup: levelup: levelup: levelup: levelup:
total: 27 (0.179%) total: 11 (0.194%) total: 25 (0.429%) total: 6 (0.100%) total: 17 (0.294%) total: 5 (0.086%)
good : 27 (0.179%) good : 4 (0.070%) good : 13 (0.223%) good : 2 (0.033%) good : 6 (0.103%) good : 4 (0.068%)
bad : 0 (0.0%) bad : 7 (0.123%) bad : 12 (0.206%) bad : 4 (0.067%) bad : 11 (0.190%) bad : 1 (0.017%)
activated: activated: activated: activated: activated: activated:
total: 4307 (28.58%) total: 1692 (29.86%) total: 1688 (29.00%) total: 1587 (26.62%) total: 1618 (28.00%) total: 1639 (28.25%)
good : 4307 (28.58%) good : 1555 (27.44%) good : 1634 (28.07%) good : 1556 (26.10%) good : 1591 (27.54%) good : 1621 (27.94%)
bad : 0 (0.0%) bad : 137 (2.417%) bad : 54 (0.927%) bad : 31 (0.520%) bad : 27 (0.467%) bad : 18 (0.310%)
triggered: triggered: triggered: triggered: triggered: triggered:
total: 4340 (28.80%) total: 1589 (28.04%) total: 1526 (26.21%) total: 1661 (27.86%) total: 1848 (31.98%) total: 1622 (27.96%)
good : 4340 (28.80%) good : 1496 (26.40%) good : 1509 (25.92%) good : 1635 (27.43%) good : 1818 (31.46%) good : 1601 (27.60%)
bad : 0 (0.0%) bad : 93 (1.641%) bad : 17 (0.292%) bad : 26 (0.436%) bad : 30 (0.519%) bad : 21 (0.362%)
names: names: names: names: names:
dist : 0.691 dist : 0.732 dist : 0.776 dist : 0.784 dist : 0.802
dupes: 17 dupes: 253 dupes: 905 dupes: 961 dupes: 1266
cards (word2vec): cards (word2vec): cards (word2vec): cards (word2vec): cards (word2vec):
dist : 0.887 dist : 0.905 dist : 0.918 dist : 0.917 dist : 0.921
dupes: 14 dupes: 51 dupes: 142 dupes: 209 dupes: 218
Oh, and as a little note, there are a few invalid cards in the real cards because it's really hard to write tests that account for all of the corner cases.
I'm working on rendering all of this data (and the data for the rest of the sweep, that's just 5 checkpoints of the 60 that I generated) in a more easily consumable form. I'll try to post things as I come up with them. Expect lots of pretty graphs.
Preliminary analysis suggests that large networks trained with mtg-rnn's training set card order randomization and *no dropout at all* are really good at producing things that look like real cards. Of course, that comes with the tradeoff of producing more cards that are too similar to existing cards to be interesting. One of my main goals is to examine that tradeoff in more detail.
That is fantastic! Interesting how all the checkpoints you've tested have no dropout; any test results from networks with the default dropout value? It seems based on this that a network size of 512 is optimal; 640 seems to have slightly worse results.
I'm a huge visualization fan so I look forward to your graphs!
That is fantastic! Interesting how all the checkpoints you've tested have no dropout; any test results from networks with the default dropout value? It seems based on this that a network size of 512 is optimal; 640 seems to have slightly worse results.
I believe 0 is the default dropout value? Anyway, yes, I do have the same data (all in duplicate) from models trained with dropouts of 0.25 and 0.50, I just didn't produce the hideous text tables to show off the statistics. I'm working on scripts to do that more automatically.
If you can run mtgencode, you can compute the same statistics from any of the dumps on my google drive - they're labeled with the dropout.
./scripts/validate.py name_of_dump.txt
EDIT: ok, here's some numbers with dropout.
real cards s512, d0, v0.1952 s512, d0.25, v0.2663 s512, d0.50, v0.3269
-- overall -- -- overall -- -- overall -- -- overall --
total: 15065 total: 5777 total: 6032 total: 5913
good : 15061 (99.97%) good : 5153 (89.19%) good : 5367 (88.97%) good : 5304 (89.70%)
bad : 4 (0.026%) bad : 624 (10.80%) bad : 665 (11.02%) bad : 609 (10.29%)
---- ---- ---- ----
types: types: types: types:
total: 15065 (100.0%) total: 5777 (100.0%) total: 6032 (100.0%) total: 5913 (100.0%)
good : 15065 (100.0%) good : 5774 (99.94%) good : 6028 (99.93%) good : 5910 (99.94%)
bad : 0 (0.0%) bad : 3 (0.051%) bad : 4 (0.066%) bad : 3 (0.050%)
pt: pt: pt: pt:
total: 8007 (53.14%) total: 3527 (61.05%) total: 3419 (56.68%) total: 3330 (56.31%)
good : 8007 (53.14%) good : 3519 (60.91%) good : 3413 (56.58%) good : 3329 (56.29%)
bad : 0 (0.0%) bad : 8 (0.138%) bad : 6 (0.099%) bad : 1 (0.016%)
lands: lands: lands: lands:
total: 533 (3.538%) total: 110 (1.904%) total: 138 (2.287%) total: 127 (2.147%)
good : 533 (3.538%) good : 68 (1.177%) good : 133 (2.204%) good : 114 (1.927%)
bad : 0 (0.0%) bad : 42 (0.727%) bad : 5 (0.082%) bad : 13 (0.219%)
X: X: X: X:
total: 757 (5.024%) total: 484 (8.378%) total: 501 (8.305%) total: 500 (8.455%)
good : 756 (5.018%) good : 168 (2.908%) good : 174 (2.884%) good : 128 (2.164%)
bad : 1 (0.006%) bad : 316 (5.469%) bad : 327 (5.421%) bad : 372 (6.291%)
kicker: kicker: kicker: kicker:
total: 114 (0.756%) total: 93 (1.609%) total: 98 (1.624%) total: 112 (1.894%)
good : 112 (0.743%) good : 36 (0.623%) good : 41 (0.679%) good : 28 (0.473%)
bad : 2 (0.013%) bad : 57 (0.986%) bad : 57 (0.944%) bad : 84 (1.420%)
counters: counters: counters: counters:
total: 401 (2.661%) total: 237 (4.102%) total: 308 (5.106%) total: 167 (2.824%)
good : 401 (2.661%) good : 91 (1.575%) good : 100 (1.657%) good : 89 (1.505%)
bad : 0 (0.0%) bad : 146 (2.527%) bad : 208 (3.448%) bad : 78 (1.319%)
choices: choices: choices: choices:
total: 175 (1.161%) total: 114 (1.973%) total: 99 (1.641%) total: 103 (1.741%)
good : 174 (1.154%) good : 45 (0.778%) good : 34 (0.563%) good : 45 (0.761%)
bad : 1 (0.006%) bad : 69 (1.194%) bad : 65 (1.077%) bad : 58 (0.980%)
auras: auras: auras: auras:
total: 2318 (15.38%) total: 852 (14.74%) total: 928 (15.38%) total: 969 (16.38%)
good : 2318 (15.38%) good : 852 (14.74%) good : 928 (15.38%) good : 969 (16.38%)
bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%) bad : 0 (0.0%)
equipment: equipment: equipment: equipment:
total: 200 (1.327%) total: 44 (0.761%) total: 59 (0.978%) total: 67 (1.133%)
good : 200 (1.327%) good : 43 (0.744%) good : 59 (0.978%) good : 67 (1.133%)
bad : 0 (0.0%) bad : 1 (0.017%) bad : 0 (0.0%) bad : 0 (0.0%)
planeswalkers: planeswalkers: planeswalkers: planeswalkers:
total: 61 (0.404%) total: 15 (0.259%) total: 10 (0.165%) total: 19 (0.321%)
good : 61 (0.404%) good : 2 (0.034%) good : 0 (0.0%) good : 4 (0.067%)
bad : 0 (0.0%) bad : 13 (0.225%) bad : 10 (0.165%) bad : 15 (0.253%)
levelup: levelup: levelup: levelup:
total: 27 (0.179%) total: 17 (0.294%) total: 8 (0.132%) total: 14 (0.236%)
good : 27 (0.179%) good : 6 (0.103%) good : 4 (0.066%) good : 11 (0.186%)
bad : 0 (0.0%) bad : 11 (0.190%) bad : 4 (0.066%) bad : 3 (0.050%)
activated: activated: activated: activated:
total: 4307 (28.58%) total: 1618 (28.00%) total: 1741 (28.86%) total: 1719 (29.07%)
good : 4307 (28.58%) good : 1591 (27.54%) good : 1709 (28.33%) good : 1677 (28.36%)
bad : 0 (0.0%) bad : 27 (0.467%) bad : 32 (0.530%) bad : 42 (0.710%)
triggered: triggered: triggered: triggered:
total: 4340 (28.80%) total: 1848 (31.98%) total: 1944 (32.22%) total: 1985 (33.57%)
good : 4340 (28.80%) good : 1818 (31.46%) good : 1914 (31.73%) good : 1934 (32.70%)
bad : 0 (0.0%) bad : 30 (0.519%) bad : 30 (0.497%) bad : 51 (0.862%)
names: names: names:
dist : 0.784 dist : 0.749 dist : 0.715
dupes: 961 dupes: 177 dupes: 52
cards (word2vec): cards (word2vec): cards (word2vec):
dist : 0.917 dist : 0.914 dist : 0.906
dupes: 209 dupes: 48 dupes: 24
Ugh. While I'm stepping away from Rust for a bit, I tried to get DCGAN working. Their code appears to need a very particular version of Theano. Either that, or I messed something else up, and Theano is just the messenger. Has anyone else messed with DCGAN? How do you deal with
NameError: global name 'HostFromGpu' is not defined
EDIT: For the record, the problem isn't importing HostFromGpu, it's the import statement on the previous line, which is trying to import a function that I'm not sure has ever, you know, existed. I would love to be proven wrong on this, because then I could see my way to fixing it.
EDIT: Looking over the unitary paper now. If I can figure out how to translate this stuff into my AST stuff, I'll try implementing it against my old Python base. Key quote so far: "A major advantage of composing unitary matrices of the form listed above, is that the number of parameters, memory and computational cost increase almost linearly in the size of the hidden layer." I plug those numbers into my calculator and it makes a happy face.
I believe I more or less implemented the stability-normalization correctly... I think. I'll have to go back and check. The one thing I'm not entirely sure about is how they integrated their cost function (link to the paper)? It might just be a linear combination, like
loss = A * correctness + B * stability
because that's what I implemented it as, where A and B are (positive) constants of our choice. If you pick A=1 and B=0, then you get the same results as we always have. Picking B > 0 makes stability a priority. I tested (A=1,B=0),(A=1,B=1), and (A=1,B=10) just to get a feel for what's going on.
The network converges in all cases, but training time definitely takes longer, due to the issue of having competing objectives; this matches what the authors claim about their approach. The network gets partial credit if it says the wrong thing in an elegant way, or if it says the right thing but in a sloppy way. It takes us longer to find the happy medium. For example, after 25 epochs, I got a training loss of around 0.39-0.40 with no stabilization, and 0.48-0.50 with stabilization. I may need to run things for longer to see whether stabilization can give us an improvement, and what value of B works best.
It's also not entirely clear whether we need to be stabilizing the outputs of cells, the contents of each cell's memory, or both. The authors try both separately in their experiments, but it's not exactly clear what would be best for us. Right now I'm just doing it on the activations of the hidden layers at each time step.
Oh, and while typing this I noted something that the authors did that I didn't do:
For LSTMs, we either apply the norm-stabilizer penalty only to the memory cells, or only to the hidden
state (in which case we remove the output tanh, as in (Gers & Schmidhuber, 2000)).
Whoops. I missed that part. I'll have to go back and try that later.
Here's one entertaining/unusual result that I saw from sampling one of the networks:
Hell-hast 2R
Sorcery (Rare)
Hell-hast deals 2 damage to target creature or player. If you cast Hell-hast from your library, that player loses the game.
@hardcast_sixdrop: The data you are gathering is very intriguing. Mapping out the results we get with different parameters would definitely be helpful.
@mwchase: I'm still not sure about your 'HostFromGpu' error. And yes, the unitary paper is very interesting.
Just saying, but Hell-hast is a perfect name for flavour text seeding. What the RNN really needs to beat is 'Hell-hast no fury like a woman scorned'.
Haha, good idea. RoboRosewater suggests...
Hell hath better than any victor.
Hell hath no shame to recold a child.
Hell hath none that she salvatery attains, and sinners are like a fireful.
Hell hath seen the natural cycle of fire and rubble and his lifeage.
Hell hath her honor.
Hell hath devils for every path.
Hell hath better.
Hell hath strange battlefields. Hell hath no fury like me!
Hell hath no fury like many horses.
Hell has far more deader things.
Hell has to be a soldier. It's a stubborn thing.
Hell has failed to crack the Tongming memory or Gish's waters.
Hell has lost, burning and broken, roaming into the infinite night.
Hell has a way for an unpleasant prophecy: I found them follow the cursed bones.
EDIT: Reran the training script, this time for 60 epochs. Validation loss was still going down, so I have a feeling I could continue running it for longer, but I needed the machine for other purposes.
Some of them are more.. verbose than I'm used to seeing. I didn't use any dropout this time, so there may be some overfitting, but most of the cards I'm seeing are novel or are distinct enough from their sources of inspiration that you can't really consider them clones. Examples:
Partial Friend 1GG
Instant (Rare)
Exile the top five cards of your library. Until end of turn, you may play cards exiled with Partial Friend.
Put a card exiled with Partial Friend onto the battlefield, then return a creature you control to its owners hand, then put the rest of those cards on top of your library in any order.
#See what I mean?
Inkmagumal Visions 7
Enchantment - Aura (Uncommon)
Enchant creature
Flash
Enchanted creature has "T: Put a charge counter on Inkmagumal Visions" and "T, remove a charge counter Inkmagumal Visions: this creature deals 1 damage to target creature or player."
#Again.. very elaborate.
Hunt-Tribe Master 1G
Creature - Elf Artificer (Rare) T: Target land becomes an X/X Elemental creature until end of turn, where X is the number of red creatures on the battlefield.
1/2
Blood Cursemage 2BB
Creature - Human Wizard Mutant (Rare) 8: Tap all other creatures you control. Each Wolf tapped this way deals 3 damage to each creature and each player. Activate this ability only if you control five or more Vampires.
3/3
Joven's Trove 3
Artifact (Common) 2, T, sacrifice Joven's Trove: Search your library for an Island card and a Swamp card and put them onto the battlefield. Then shuffle your library.
Gaerid Messenger 5WW
Creature - Human Soldier (Mythic Rare)
Gaerid Messenger's power and toughness are each equal to the number of white mana symbols in the mana costs of lands you control.
*/*
#Yes! Wait.. no. But not a bad idea.
Archaeol Witch 5W
Creature - Wurm
Trample
Morbid - When Archaeol Witch enters the battlefield, if a creature died this turn, return that card to the battlefield under your control.
5/5
#An interesting application of the morbid ability word.
Glade Infantry G
Creature - Elf(Common)
At the beginning of your upkeep, you may remove a charge counter from target creature. If you do, that creature deals damage equal to its power to Glade Infantry.
2/1
Strandwalker's Mists 3U
Enchantment (Uncommon)
Whenever a blue creature you control becomes the target of an instant or sorcery spell, put a 2/2 black Zombie creature onto the battlefield tapped.
Malaqua 0
Artifact (Rare) T: Look at the top card of target player's library. If it' a nonland card, you may pay 2 life. If you do, put it into that player's graveyard.
#An example of a near-clone due to overfitting. This one is based on Wand of Denial.
#Malaqua -> mala aqua -> bad water. Clever name.
EDIT(2): Okay, a few more:
Kokuk, Tyrant's Familiar 3BBB
Legendary Creature - Beast (Rare)
Morph 3R
At the beginning of your upkeep, return the top creature card of your graveyard to the battlefield.
5/4
Korozda Guildmage BG
Creature - Elf Druid
When Korozda Guildmage enters the battlefield, pay any amount of life. T: Add G to your mana pool. G,T: Put an X/X black Treefolk artifact creature token named Keeper onto the battlefield, where X is equal to that card's converted mana cost.
1/1
#You can see where it got the name and the rough idea of the card from, but it's nothing like Korozda Guildmage. This is an example of what I call a pseudo-clone.
Seriously, though, it's because there's a few missing steps in the guides I was looking at. So far, I've found out that I need to make sure nvcc is on my $PATH, and then I found out that the installer has been downloading the wrong version, so I'm trying to install the not-wrong versions of stuff now.
EDIT: The installer chokes on itself. Tempted to crack open the app and see if it's something obvious.
EDIT: I was using the wrong network installer. These things have no version branding on the inside... ("Well geez, you dope, why were you using an old installer?" I downloaded this junk last week.)
EDIT: 'Unable to get the number of gpus available: CUDA driver version is insufficient for CUDA runtime version'. Well, I'm done for tonight.
EDIT: ... Sigh ... I can't do GPU training on this computer. Perhaps on some other computer, but not this one. This isn't a "Oh, it's really hard" thing, this is "I skipped checking that it actually satisfied the system requirements". CUDA can't run on this. Maybe OpenCL, but I don't feel like climbing out of one rabbit hole only to immediately plunge down another. (Okay, fine, it's definitely compatible with OpenCL 1.2. But I go no further for now.)
We could, actually. In some ways it's advantageous because we only ever use the same small set of words over and over, and there's no sense of making 8 separate passes to generate the word "creature". Unless of course the network is using that time to allow further predictions to percolate, but I haven't seen any evidence of that either way.
On the other hand, we do lose the benefit of character-level priming. For example, I can prime the network with the word "Flood" and I get back text like...
* "Flood ~ B: Regenerate @. Activate this ability only if you control an elf."
* "Flood 2"
* "Flood a white creature you control"
It's a nice feature to have. But I'd be willing to sacrifice that if we get a leaner, more efficient model. That option remains on the table.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Couldn't we still prime the word-level network with words?
I think we may need to add in some spaces, to sanitize the input, like how the neural-storyteller is trained on text that looks like this: "john 's book does n't address the underlying causes of the civil war . I would n't recommend it ! ". Otherwise "flying\flanking" is treated as a single word rather than "flying (line break) flanking".
Also, the word-rnn takes vectorized versions of words as input, in the form of GloVe vectors. GloVe is conceptually similar to word2vec, same idea. You have to first train a vector model for the text, then what happens is you pass in the text for training, it gets broken up into words, the words get mapped to vectors, and the network takes a vector and tries to predict which vector will come next after it. Then we use the vector dictionary to convert the output back to English so we can read it. That's my understanding, anyway.
EDIT: For a vector representation that is just a one-hot encoding (a simple encoding), it'd be like replacing every word in the text with a unique character, like "creatures you control" -> "A Q C". The GloVe vector is different because the numbers don't just encode the position in the dictionary but also semantic information, so the vector for black is similar to the vector for white. Doing this kind of vectorization first is like a mother bird pre-digesting food before feeding it to its offspring.
Yes. You could. So long as the words were words included in the training/test data. It's possible that the author of the code has a default, blank vector for unknown words, but I haven't looked into it.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
We could still prime with words, but only words that exist in the corpus
EDIT: I see Talcos beat me to the punch on answering your questions!
EDIT 2: In order to use that specific library, we would have to check and make sure that every word in the mtg corpus has a GloVe vector, though it shouldn't be a difficult thing to work around regardless. A cool thing about this library is that the word vectors are also being updated during training so that they are fine-tuned for our task
What do you mean? I figured we'd train a fresh GloVe model based on the mtg corpus rather than using a pre-trained model, just like we did with word2vec.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
That lesson is The Future. More Specifically that it is coming ever more rapidly than before. What I don't quite hear you say with each new batch of papers you have read is "PLEASE, Throw out your ideas and impossible notions: every month some of those become reality and every six months some of those look like old, tired, well-trod paths." This field isn't the only one that has is going through major revolutions and surpassing thermopylaen constraints.
I am out of undergrad, but not through it, I want to go back but what stopped me the first time was my lack of a goal, a purpose to strive for. I had known that a degree just isn't what it once was, so I didn't have the drive to finish. I think I know what I want to do, sort of, I think I know what degree I want to get and I feel like the job I want just doesn't exist yet but this thread reignited my passion for leaning and technology and the future. I am fairly good at math, it has been a while and I know I will be able to get back into it, I enjoy coding, and I love Space and thinking about the future. I think I'll go into Aerospace design. I've read some articles that really resonated with me about jobs of the future (its amazing how a google search for "jobs that don't exist yet" can hit so spot on). I would like to hear your suggestions on how to best spend my time waiting for my job to exist? Do I figure out kind of what I want it to be and keep pushing toward it till it exists? Or do I get into a related field that I can take into that new area? I feel like Both is the best approach: I should live the future, waiting and watching for the sparkling start of what may be a career path 15 years off and blaze a trail to it. But does that lend itself more toward the academic side where Reading the current paper is part of the job, or something else?
I guess some part of this is for any younger people who are having trouble "picking what they want to do for the rest of their lives". Like that is really how that works; I drop out of college and am on track to stay making more than those who finished for at least the next 10 years. But it's not doing what I want to be doing, I see myself living in a time where if I don't live long into my second century I will see people born that will. I want to live in a world where inter planetary travel happens, where if Starfleet isn't started before I die, we are on the way to getting there.
Where do you see availability for people who want to work on tomorrow's problems today?Maybe it is here with machine learning, maybe with sub-planetary craft design, or phillosophy of AI design, or something else.
-------------------------------------
Sorry for taking up space here but I see so many glimmering shreds of the future shining through here I wanted to hear from those who have laid hands on it.
You might want to talk to maplesmall. His background is in astrophysics, so he would have more to say about potential career paths in that domain that I do.
I would go with whatever you are passionate about. Do some research, and talk to people who are in the fields you are interested in. If you want more from my perspective in particular, feel free to send me a private message.
I will say this: getting a degree in any field can open doors for you in so far as it demonstrates your aptitude and willingness for learning and adapting.
But also keep this in mind: in an era of intense technological disruption across many different industries, it's very difficult to forecast the forecast the future of the job market with any certainty.
That depends; it can go either way. But I'll warn you that academia is not for everyone. The wages are not great (e.g. I technically qualify for food stamps at this point. If I work really hard, I'll get a job where I technically don't qualify for food stamps.). The sheer number of dead ends and setbacks you face on a daily basis are daunting. You constantly have to fight and struggle to get funding.
In short, I would only go into academia if you feel that you absolutely have to, if every fiber of your being calls you to do such a thing. Just like being a musician or a minister.
There are plenty of options. Most of them are available to those who pursue graduate degrees, get into networks of like-minded people, etc. Now, like I said, that can lead to research positions in academia or in industry, and your choices can vary depending on what kind of discipline you're wanting to go into.
Ah, gotcha. Yeah. I was thinking along the same lines.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
One that you didn't mention that I feel should be is this paper (and its corresponding github repo) for its potential as a means of generating completely novel card art. Their results are shockingly good-- better than anything else I've seen in the field to date.
Ah, yes. I had mentioned that a few days ago, but yes, it's definitely worth following. There might be some scalability problems with the approach as is (not sure, looking into that), but I'm definitely checking that out.
On a related note, I've been playing with the demo of illustration2vec, an algorithm that maps illustrations to semantic vector representations using convolutional neural networks. That's specifically useful for our art generation, as we're working with illustrations in many different styles rather than, say, photographs.
They have an online demo and it's entertaining. You give it an image, it computes the vector representation and then we work backwards from the vector representation to figure out what tags are most appropriate for the image.
I've attached an image below showcasing it (on the off chance that their servers get taken offline by the hug of death). I gave it the art for Basandra, Battle Seraph and it tells me that there is one woman standing alone. She is wearing armor, has long red hair and wings. She is wearing a cape and wielding a sword (whip, actually, but close enough), and has an exposed navel and huge.. tracts of land. Oh, and the system thinks she has blue eyes; they're red, but whatever.
----
EDIT: Also, hardcast and I were talking earlier about a paper entitled "The Mechanism of Additive Composition", which gives a formal treatment of how word vectors can be added together to make a vector that is the sum of their semantic meanings (e.g. v("capital") + v("france") == v("paris")). It's the sorcery that allows us to use word2vec to measure the similarity of novel cards to existing ones. Moreover, in that paper, the authors talk about how to extend the approach to preserve information about word order. That's really important because right now this card...
Deadly Griffon Rider
1BW
Creature - Human Knight
Flying
T: Target creature gains deathtouch until end of turn.
2/2
has the same vector representation as this card...
Deadly Griffon Trainer
1BW
Creature - Human Knight
Deathtouch
T: Target creature gains flying until end of turn.
2/2
even though functionally they are somewhat different. This latest work shows us how to fix that problem, so that'll be helpful. It's a 40 page paper. I've printed out a copy and am in the process of making sense of it.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Oh, so you did. I should have remembered that.
Actually, I'm pretty curious if you could use that sort of generative model to create text-- change it to a 1d convolution of raw characters and use a generative-adversarial technique with their pyramidal deconvolution structure to create text. Unfortunately, you couldn't prime that at all.
Another odd idea comes from the neural storyteller and the idea of skip-thought vectors (or just thought vectors, 'cause I don't totally understand the difference). Any idea if it would be possible to have a thought vector encoder which passes its output to several different decoders? For example, you give it a piece of art, it creates a semantic representation (as in the neural storyteller), and from that it goes to a neural network that converts thought vectors to names, thought vectors to card abilities, and thought vectors to flavor text. That way, they all have the same semantic meaning, so you have a sort of coherency between the card, and hopefully (as seen in the neural storyteller) get somewhat coherent flavor text.
Both are departures from char-rnn, but might be able to be made to work on a character level.
I think there's been some work along those lines, but I'd have to go back and check. And yeah, it would limit our ability to do priming in the direct way that we can now, but I see nothing wrong with trying with different approaches, each with their own attendant advantages and disadvantages.
It's something that has crossed my mind (and I may have mentioned something to that effect some 20 or 30 pages ago), but I'm not sure if we're there yet. It would be nice if we had some kind of learned representation that acted as a bridge between all these different modalities of a card.
Not that I have any idea about how to achieve that (yet), but it's fun to think about.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I was up early this morning, and I did some coding, and I may or may not almost have that whole stabilization-norm thing working with the training script. I'll need to finish up the code and do some tests later. Torch provides a way for us to judge our networks according to multiple criteria, which is helpful. The first criteria is that the network gets the answer right. The second criteria is that the network does so while acting in a controlled, calm manner (they're very hyperactive by nature).
The question is whether we can teach it self-control and restraint while keeping its creative spirit intact. The literature seems to suggest that it is possible, but there may be some fine-tuning of parameters involved. The process more closely resembles psychiatry than mathematics. We'll see.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Here's some fun metrics I generated with some of my data analysis scripts:
Preliminary analysis suggests that large networks trained with mtg-rnn's training set card order randomization and *no dropout at all* are really good at producing things that look like real cards. Of course, that comes with the tradeoff of producing more cards that are too similar to existing cards to be interesting. One of my main goals is to examine that tradeoff in more detail.
I'm a huge visualization fan so I look forward to your graphs!
I believe 0 is the default dropout value? Anyway, yes, I do have the same data (all in duplicate) from models trained with dropouts of 0.25 and 0.50, I just didn't produce the hideous text tables to show off the statistics. I'm working on scripts to do that more automatically.
If you can run mtgencode, you can compute the same statistics from any of the dumps on my google drive - they're labeled with the dropout.
EDIT: ok, here's some numbers with dropout.
EDIT: For the record, the problem isn't importing HostFromGpu, it's the import statement on the previous line, which is trying to import a function that I'm not sure has ever, you know, existed. I would love to be proven wrong on this, because then I could see my way to fixing it.
EDIT: Looking over the unitary paper now. If I can figure out how to translate this stuff into my AST stuff, I'll try implementing it against my old Python base. Key quote so far: "A major advantage of composing unitary matrices of the form listed above, is that the number of parameters, memory and computational cost increase almost linearly in the size of the hidden layer." I plug those numbers into my calculator and it makes a happy face.
loss = A * correctness + B * stability
because that's what I implemented it as, where A and B are (positive) constants of our choice. If you pick A=1 and B=0, then you get the same results as we always have. Picking B > 0 makes stability a priority. I tested (A=1,B=0),(A=1,B=1), and (A=1,B=10) just to get a feel for what's going on.
The network converges in all cases, but training time definitely takes longer, due to the issue of having competing objectives; this matches what the authors claim about their approach. The network gets partial credit if it says the wrong thing in an elegant way, or if it says the right thing but in a sloppy way. It takes us longer to find the happy medium. For example, after 25 epochs, I got a training loss of around 0.39-0.40 with no stabilization, and 0.48-0.50 with stabilization. I may need to run things for longer to see whether stabilization can give us an improvement, and what value of B works best.
It's also not entirely clear whether we need to be stabilizing the outputs of cells, the contents of each cell's memory, or both. The authors try both separately in their experiments, but it's not exactly clear what would be best for us. Right now I'm just doing it on the activations of the hidden layers at each time step.
Oh, and while typing this I noted something that the authors did that I didn't do:
Whoops. I missed that part. I'll have to go back and try that later.
Here's one entertaining/unusual result that I saw from sampling one of the networks:
Hell-hast
2R
Sorcery (Rare)
Hell-hast deals 2 damage to target creature or player. If you cast Hell-hast from your library, that player loses the game.
@hardcast_sixdrop: The data you are gathering is very intriguing. Mapping out the results we get with different parameters would definitely be helpful.
@mwchase: I'm still not sure about your 'HostFromGpu' error. And yes, the unitary paper is very interesting.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Haha, good idea. RoboRosewater suggests...
Hell hath better than any victor.
Hell hath no shame to recold a child.
Hell hath none that she salvatery attains, and sinners are like a fireful.
Hell hath seen the natural cycle of fire and rubble and his lifeage.
Hell hath her honor.
Hell hath devils for every path.
Hell hath better.
Hell hath strange battlefields.
Hell hath no fury like me!
Hell hath no fury like many horses.
Hell has far more deader things.
Hell has to be a soldier. It's a stubborn thing.
Hell has failed to crack the Tongming memory or Gish's waters.
Hell has lost, burning and broken, roaming into the infinite night.
Hell has a way for an unpleasant prophecy: I found them follow the cursed bones.
EDIT: Reran the training script, this time for 60 epochs. Validation loss was still going down, so I have a feeling I could continue running it for longer, but I needed the machine for other purposes.
Some of them are more.. verbose than I'm used to seeing. I didn't use any dropout this time, so there may be some overfitting, but most of the cards I'm seeing are novel or are distinct enough from their sources of inspiration that you can't really consider them clones. Examples:
Partial Friend
1GG
Instant (Rare)
Exile the top five cards of your library. Until end of turn, you may play cards exiled with Partial Friend.
Put a card exiled with Partial Friend onto the battlefield, then return a creature you control to its owners hand, then put the rest of those cards on top of your library in any order.
#See what I mean?
Inkmagumal Visions
7
Enchantment - Aura (Uncommon)
Enchant creature
Flash
Enchanted creature has "T: Put a charge counter on Inkmagumal Visions" and "T, remove a charge counter Inkmagumal Visions: this creature deals 1 damage to target creature or player."
#Again.. very elaborate.
Hunt-Tribe Master
1G
Creature - Elf Artificer (Rare)
T: Target land becomes an X/X Elemental creature until end of turn, where X is the number of red creatures on the battlefield.
1/2
Blood Cursemage
2BB
Creature - Human Wizard Mutant (Rare)
8: Tap all other creatures you control. Each Wolf tapped this way deals 3 damage to each creature and each player. Activate this ability only if you control five or more Vampires.
3/3
Joven's Trove
3
Artifact (Common)
2, T, sacrifice Joven's Trove: Search your library for an Island card and a Swamp card and put them onto the battlefield. Then shuffle your library.
Gaerid Messenger
5WW
Creature - Human Soldier (Mythic Rare)
Gaerid Messenger's power and toughness are each equal to the number of white mana symbols in the mana costs of lands you control.
*/*
#Yes! Wait.. no. But not a bad idea.
Archaeol Witch
5W
Creature - Wurm
Trample
Morbid - When Archaeol Witch enters the battlefield, if a creature died this turn, return that card to the battlefield under your control.
5/5
#An interesting application of the morbid ability word.
Glade Infantry
G
Creature - Elf(Common)
At the beginning of your upkeep, you may remove a charge counter from target creature. If you do, that creature deals damage equal to its power to Glade Infantry.
2/1
Strandwalker's Mists
3U
Enchantment (Uncommon)
Whenever a blue creature you control becomes the target of an instant or sorcery spell, put a 2/2 black Zombie creature onto the battlefield tapped.
Malaqua
0
Artifact (Rare)
T: Look at the top card of target player's library. If it' a nonland card, you may pay 2 life. If you do, put it into that player's graveyard.
#An example of a near-clone due to overfitting. This one is based on Wand of Denial.
#Malaqua -> mala aqua -> bad water. Clever name.
EDIT(2): Okay, a few more:
Kokuk, Tyrant's Familiar
3BBB
Legendary Creature - Beast (Rare)
Morph 3R
At the beginning of your upkeep, return the top creature card of your graveyard to the battlefield.
5/4
Korozda Guildmage
BG
Creature - Elf Druid
When Korozda Guildmage enters the battlefield, pay any amount of life.
T: Add G to your mana pool.
G,T: Put an X/X black Treefolk artifact creature token named Keeper onto the battlefield, where X is equal to that card's converted mana cost.
1/1
#You can see where it got the name and the rough idea of the card from, but it's nothing like Korozda Guildmage. This is an example of what I call a pseudo-clone.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
...
Seriously, though, it's because there's a few missing steps in the guides I was looking at. So far, I've found out that I need to make sure nvcc is on my $PATH, and then I found out that the installer has been downloading the wrong version, so I'm trying to install the not-wrong versions of stuff now.
EDIT: The installer chokes on itself. Tempted to crack open the app and see if it's something obvious.
EDIT: I was using the wrong network installer. These things have no version branding on the inside... ("Well geez, you dope, why were you using an old installer?" I downloaded this junk last week.)
EDIT: 'Unable to get the number of gpus available: CUDA driver version is insufficient for CUDA runtime version'. Well, I'm done for tonight.
EDIT: ... Sigh ... I can't do GPU training on this computer. Perhaps on some other computer, but not this one. This isn't a "Oh, it's really hard" thing, this is "I skipped checking that it actually satisfied the system requirements". CUDA can't run on this. Maybe OpenCL, but I don't feel like climbing out of one rabbit hole only to immediately plunge down another. (Okay, fine, it's definitely compatible with OpenCL 1.2. But I go no further for now.)