I don't have a twitter account, would someone mind requesting a mono-green angel for me? Thanks.
You can't specify colors as such, it turns out; you can only dictate complete mana costs. I discovered this when the first two cards I primed with a cost of "G" came out with costs of, exactly, G. The third I primed with the Serra Angel cost, colorswapped to green. It was... pretty uninspiring! But not unexpected. The 1 CMC mistake cards were more interesting.
Does it still count as "mono green" for you if it makes white and red tokens and has a black mana activated ability?
Um, kinda? The first ability is actually pretty on spot, but it's the wrong color. I do like the name though.
Private Mod Note
():
Rollback Post to RevisionRollBack
Vorthos Cartography - Check out my completed maps of Zendikar and Innistrad!
"You say 'learn from history,' but that does not mean 'learn the same bull***** the people in history learned alongside phrenology and alchemy.'" - The Blinking Spirit
Oh, the color filter worked fine. The mana cost was mono-green, no problem. It just proceeded to show off how the NN tends not to associate colors in the mana cost with the colors of activated abilities and created tokens in the card text!
Just getting aboard on with this, though haven't had time to trudge through all 116 pages of this thread yet. Just started training my neural network with exclusively common and uncommon cards. I thought I would do a silly and make a twitter account associated with it. I just posted a card that stuck out to me in my 2nd'ish run. The text on it isn't perfect but there is something about it that sticks out.
Look forward to playing around with this more. One of my other projects related to this involves loading the mtgjson set in to a couchdb, so I may attempt to integrate this with that to enable easier filtering (not that it is particularly hard now).
Hopefully Talcos or hardcast come by and are able to help you (or you could message them); I'm not really experienced enough in Ubuntu systems to try to figure out the problem (and I don't have a gtx1080).
This is what happens when you get your greater than and less than signs mixed up: you select all pre modern (rather than modern) commons/uncommons. Some of the more legible cards that stood out.
fog basteds {U}
creature ~ human monk (common)
when @ attacks or blocks, sacrifice it at end of combat.
(2/2)
# This name! Damn basteds!
mind tip {3}{B}
creature ~ sliver (common)
all slivers have "{2}, sacrifice this permanent: target player loses 3 life.
(2/2)
# Probably undercosted, but I dig this for a sliver.
stern of kan {2}{W}
creature ~ wall (uncommon)
defender
{T}: target untapped creature gets +1/+1 until end of turn.
(0/3)
# Kind of a cool wall. Would stick in a peasant cube anytime.
Hey guys; Just here to make sure this thread doesn't stagnant; Anyway I've got a question about the viability of useing Luinx Bash to run a neural network; Additionally if it's possible to use the Luinx Bash shell on windows 10 you know that newish one, (http://www.howtogeek.com/265900/everything-you-can-do-with-windows-10s-new-bash-shell/); It would be great if it's possible it would mean I could you know generate cards and images a stuff; Like I know I could just get luinx it's self but I've already got windows 10 and I don't want to go through the pain of reconfiguring my computer.
Luinx? I think you mean Linux. I looked into this myself, because dual-booting into Ubuntu is annoying. Unfortunately, the Bash shell currently doesn't support CUDA. So while you could train/sample networks using it, it'd only have access to the CPU and therefore would be no better than using a virtual Ubuntu install. The devs working on this have said 'maybe' to adding CUDA support; if they do that, then yeah, we can properly train neural nets on Windows.
Yeah Linux; I'm tired and on my phone when the thought came; Well that's annoying that it doesn't already have CUDA support and hopeful I guess if it does pan out and they alow CUDA support.
I still exist, though unfortunately I've been busy lately working on other things.
CUDA 8 just came out, along with a compatible version of cudnn. This is particularly exciting to me because I upgraded one of my machines a little while ago - Here's what nvidia-smi has to say about it
Sun Oct 9 13:29:44 2016
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A |
| 30% 52C P0 56W / 250W | 1095MiB / 12180MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I've been hoping to have some fun with it, but just haven't had a chance yet. I did start the process of updating mtgencode and my fork of torch-rnn to work with the new platform, but it's going to take some time, especially to rewrite the tutorial. I'll try to post developments as they happen.
Sweet. I spent some time putting together an RNN a couple days ago, I really should build an actual Ubuntu machine instead of a VM to train with - CPU training is just slow.
Hey folks! It's been a while, but I (with a huge amount of help from Talcos) have been working on some pretty interesting stuff involving cards and neural nets. No results yet, but one thing I can say is that image recognition networks have a hell of a hard time distinguishing between Angels and Birds in card art.
I remember a while back there was a lot of discussion about making the char-rnn stuff run on Windows. For ages, the 'best' option was simply to take a VM and run Ubuntu in that, but that prevents GPU usage. I myself set up a dual-boot to be able to use GPUs in Ubuntu natively. Now, however, Google has blessed us with TensorFlow for Windows, complete with GPU support! My installation experience was pretty trivial. Just download the official Python 3.5 installer, run it, and do the steps in the TF docs. It's one step, really, and make sure you install the GPU version.
As a test, beyond the basic validation test they have you run, I was lucky enough to find a char-rnn implementation of Karpathy's algorithm in TF. I just cloned the repository, went in, and ran the train.py script on the default Shakespeare training corpus. It runs 23100 batches in 17 minutes on my GTX 980ti, which isn't bad (I haven't tried the equivalent on my Ubuntu install, but I think 0.04s per batch was about the same as I was getting there). After the training finished, I ran the sample.py script and it generated Shakespearean text, so it all seems to work great. It'll probably need a few tweaks for making Magic cards, but the best part is it's all in Python, so the output should be handled by hardcast_sixdrop's existing scripts so you can still generate html/MSE2 sets out of this.
One thing about Windows; it obviously has a lot more stuff running in the background. This isn't really an issue for the CPU, since it's a GPU-intensive task, but one thing that happens is that a lot more GPU memory is used in Windows than Ubuntu. I suspect that turning down some graphics settings to free up some of that memory would help (for context, TF detected I had 4.9 of 6GB free to be used, while on Ubuntu, Chainer (another ML framework library) has access to 5.6GB).
Anyway, hopefully this is useful to those who wanna run this stuff on Windows. I'll try to get actual Magic card generation working to see what modifications might have to be made, hopefully they won't be too extensive. But I'm just really glad that proper, GPU-accelerated machine learning is possible on Windows now.
edit: Doing a test run with the standard text input now, I suspect sampling it will return something just about acceptable. It's running at 51300 instead of 23100, so it'll take about 40 mins to train. hardcast_sixdrop did quite a few modifications to the original char-rnn code, mostly involving loading the data and randomizing it better and such, which are going to be challenging to replicate in TensorFlow, since, well, it's a separate library to Torch, which is what we've used so far. But if I can't do it, I'm sure someone else can.
There's actually already something kinda like that. The only part of the corpus that needs more encoding is the rules text (since you can't compress colour, cost, type, p/t, any more than what it is now) and I'm working on figuring out a 'programmatic' representation of it which would be useful for training, but also for other search-related things.
I finished training the default corpus in the TF char-rnn, and I got some results!
|5enchantment|4|6|7|8|9whenever a creature, and put it on top of his or her library into his or her graveyard.|3{^^}|0N|1wand hood|
|5sorcery|4|6|7|8|9prevent all colorless {UU} costs {^RR} gain life and you no island into his or her library this creature. its controller reveals his or her hand, attacking creatures.\-&^^^^^^: you get an equipment by name is red.|3{^^^UUUU}|0A|1boros gatekeeper|
OK, so they're not great results, but it's vaguely coherent... Boros Gatekeeper is a really, really good name, considering Boros is on Ravnica, which has gates, and... yeah, I'm surprised at that. And it proves the theory, at least. Now, the upgrading can begin.
Hey folks! It's been a while, but I (with a huge amount of help from Talcos) have been working on some pretty interesting stuff involving cards and neural nets. No results yet, but one thing I can say is that image recognition networks have a hell of a hard time distinguishing between Angels and Birds in card art.
I remember a while back there was a lot of discussion about making the char-rnn stuff run on Windows. For ages, the 'best' option was simply to take a VM and run Ubuntu in that, but that prevents GPU usage. I myself set up a dual-boot to be able to use GPUs in Ubuntu natively. Now, however, Google has blessed us with TensorFlow for Windows, complete with GPU support! My installation experience was pretty trivial. Just download the official Python 3.5 installer, run it, and do the steps in the TF docs. It's one step, really, and make sure you install the GPU version.
I'm glad there's a viable solution for windows now! I'm also really glad to see that people are still working on this. It would be great to have some kind of plausible art generation, so I can finally write a bash script that prints up a booster box of never before seen cards to draft.
I haven't worked on my code in a while, but if it would be helpful to people I can do a quick revision to mtgencode to put the latest sets in the default corpus on the github repo. It looks like I should update the tutorial to mention the possibility of using Windows TensorFlow as a backend.
I also have my own usable but only partially complete and totally undocumented library for doing character-level language model things, which is faster and more flexible than mtg-rnn. I might have a chance to finish that as well over the holidays. It should make it really easy to do things like curriculum learning, but unfortunately it is not Windows friendly. I wonder how hard it would be to port the API over to TensorFlow...
Yep, work's still ongoing. Talcos just showed me this paper; imagine its results applied to MTG art.
Does your code support new stuff like {C} and {E}? I know my local copy of mtg-encode does, but I wasn't able to do a thing with git to apply the changes to your repo.
I also have my own usable but only partially complete and totally undocumented library for doing character-level language model things, which is faster and more flexible than mtg-rnn. I might have a chance to finish that as well over the holidays. It should make it really easy to do things like curriculum learning, but unfortunately it is not Windows friendly. I wonder how hard it would be to port the API over to TensorFlow...
What is this new library you speak of? Is it more effective at generation than mtg-rnn? Porting it to TF would be great in the sense that the audience who could use it would increase, since people with Ubuntu systems can operate as normal, but now Windows people could do it too.
Speaking of porting, what changes exactly did you make to mtg-rnn as compared to char-rnn? I'm thinking about making those same changes to the TF char-rnn, but I'm not exactly sure what to do with it yet.
Does your code support new stuff like {C} and {E}? I know my local copy of mtg-encode does, but I wasn't able to do a thing with git to apply the changes to your repo.
No, it doesn't yet, I haven't looked at it in a while. I was afraid they would add something new like that. The github way to add stuff to my repo is to submit a pull request. If it's just a minor change, it might not be worth the effort. I'll have time to look at it later this week.
What is this new library you speak of? Is it more effective at generation than mtg-rnn?
The new library is here, cleverly hidden behind the "dev" branch on the github repo. It's based on torch-rnn, which is essentially the successor to char-rnn, providing the same capabilities but faster and more efficiently. I used it for a couple of different projects, so I built a more general-purpose interface that gives you more control during training and lets you do more advanced language model things other than just generating text.
The problem with stock char-rnn is that it doesn't give you enough control of the training process. It slurps up all the text in the training corpus and divides it inefficiently, ignoring information about card boundaries and reusing the same divisions over and over again. Since our data is composed of many short independent segments, we can do much better, which mtg-rnn does manually with a custom data loader.
The new library is much more powerful. Instead of reading a text file, it can invoke a linux program that streams training text over a variable number of channels. This is nice because you can write whatever custom, dynamic training regimen you want, and you don't have to do it in Torch7 lua, which is not my favorite language for string processing.
The drawback is that writing the program to provide the corpus is hard, because getting all of the interprocess communication stuff right is kind of tricky. To make things easier, I had intended to write a single streamer script that would just take a corpus specified as a directory tree of text files and configuration options, and figure out how to output the right training streams based on the configuration. This would make it simple to specify things like randomization of independent segments and also do curriculum learning.
I have some very basic examples in my torch-rnn repo, but nothing particularly usable yet. I hope I'll have some time over the holidays to finish it up. Let me know if you're feeling adventurous and want me to try and explain anything sooner.
As for Windows support, in principle it should be possible to build a similar input API in TensorFlow (I mean, it's a simple interface, right? Just spawn a program and read a whole bunch of output streams from it...). Then the training curriculum script could even be reused, assuming it was writing in something portable like Python, which is my current language of choice for it. The trickiest part I think will be finding an interprocess communication method that works. Currently I'm using pretty low-level hacks with LuaPosix in Torch7, which probably won't transfer to Windows any better than Torch itself will. TensorFlow is Python though, so it might be a lot easier to build a portable interface. I'll have to look into it.
The new library is much more powerful. Instead of reading a text file, it can invoke a linux program that streams training text over a variable number of channels. This is nice because you can write whatever custom, dynamic training regimen you want
What does this do exactly? Opening a file is pretty simple conceptually; you open a book and read the pages. This streaming over channels is more like having the individual pages shot at you all at once. What advantages, concretely, does it convey over just opening the .txt file of the corpus and splitting it up based on our categories? When you say 'custom, dynamic training regimen', I don't see why that can't be applied to a properly parsed .txt corpus file.
Also, if it's a Linux program, it might not work for Windows TF, which would suck. My goal (not sure if it's achievable, but one can hope) is to have all mtg-rnn stuff in Python so it can work with Windows or Linux, alongside TF. I'm feeling adventurous in that I want to write probably 3 different neural nets in TF (all doing different things with card data; images, syntax and this generation one), so learning tricks in this one couldn't hurt. It feels like an insane goal, given I've never written a neural net from scratch and I have barely enough comp-sci/maths background, but heck. Learning!
I was afraid they would add something new like that. The github way to add stuff to my repo is to submit a pull request. If it's just a minor change, it might not be worth the effort. I'll have time to look at it later this week.
It's trivial; basically wherever you define other symbols, you just add definitions for C and E. Not really worth making a pull request over.
What does this do exactly? Opening a file is pretty simple conceptually; you open a book and read the pages. This streaming over channels is more like having the individual pages shot at you all at once. What advantages, concretely, does it convey over just opening the .txt file of the corpus and splitting it up based on our categories? When you say 'custom, dynamic training regimen', I don't see why that can't be applied to a properly parsed .txt corpus file.
Training with minibatches is not like opening a single book - it's more like opening 50 or 100 books at once, depending on the value of batch_size, and reading all of them simultaneously. Ideally, each book should be infinitely long, and the data should be continuous; there shouldn't be any places where the sequence stops and restarts like you get in char-rnn when you reach the end of a segment and have to go back and start over from the beginning.
You're absolutely right, a relatively simple parser can take a text file, split it up into pieces, and spit out 50 or 100 continuous infinite streams of the data it contains. But this parser has to do more than just open the file, it's not a trivial program. You can't encode 50 infinite, continuous streams directly as a text file. If you make the mechanism general, then you can use a more complicated parser if you want to: maybe you train the first 10,000 batches on cards legal in Modern only, and then throw in the rest of the cards and train for another 10,0000 batches. That's a very simple instance of curriculum learning. Maybe you use a distribution of 75% cards from Modern and 25% cards not from Modern, even though those aren't the actual ratios in the training data. Maybe you even look at the output of the network periodically during training to figure out what distribution to use.
The streamer script I mentioned would be implemented in Python, so it would be portable to Windows. It fills the role of the parser. Separating the programs and having the parser be its own Python script would mean that the logic to determine the training curriculum could live in mtgencode, not with a specific rnn implementation, and then any rnn library that supported the interface could use it. Supporting the interface should be much, much easier than porting the whole streamer script / parser.
I was afraid they would add something new like that. The github way to add stuff to my repo is to submit a pull request. If it's just a minor change, it might not be worth the effort. I'll have time to look at it later this week.
It's trivial; basically wherever you define other symbols, you just add definitions for C and E. Not really worth making a pull request over.
Cool, that's what I was hoping. Shouldn't be a big deal for me to implement the change.
https://twitter.com/DroidRosewater/status/754749960655175680
https://twitter.com/DroidRosewater/status/754753311497216000
https://twitter.com/DroidRosewater/status/754755584180838400
Does it still count as "mono green" for you if it makes white and red tokens and has a black mana activated ability?
Um, kinda? The first ability is actually pretty on spot, but it's the wrong color. I do like the name though.
"You say 'learn from history,' but that does not mean 'learn the same bull***** the people in history learned alongside phrenology and alchemy.'" - The Blinking Spirit
Just getting aboard on with this, though haven't had time to trudge through all 116 pages of this thread yet. Just started training my neural network with exclusively common and uncommon cards. I thought I would do a silly and make a twitter account associated with it. I just posted a card that stuck out to me in my 2nd'ish run. The text on it isn't perfect but there is something about it that sticks out.
https://twitter.com/HoboRosewater/status/756793720914055169
Look forward to playing around with this more. One of my other projects related to this involves loading the mtgjson set in to a couchdb, so I may attempt to integrate this with that to enable easier filtering (not that it is particularly hard now).
Speaking of graphics cards... http://arstechnica.com/gadgets/2016/07/gtx-titan-x-pascal-specs-price-release-date/. The new holy grail for neural networks. It has 480GB/s memory bandwidth, which is usually the limiting factor for our purposes. It's 50% more than the old Titan X. If only it didn't cost an arm and a leg...
fog basteds {U}
creature ~ human monk (common)
when @ attacks or blocks, sacrifice it at end of combat.
(2/2)
# This name! Damn basteds!
mind tip {3}{B}
creature ~ sliver (common)
all slivers have "{2}, sacrifice this permanent: target player loses 3 life.
(2/2)
# Probably undercosted, but I dig this for a sliver.
stern of kan {2}{W}
creature ~ wall (uncommon)
defender
{T}: target untapped creature gets +1/+1 until end of turn.
(0/3)
# Kind of a cool wall. Would stick in a peasant cube anytime.
viashino sands {2}{R}
enchantment (common)
sacrifice a land: destroy target land.
# Wow! Too powerful for a common, but flavor potential!
Is it just me or the bot is a bit overtrained? It produces a lot of cards that are copies of existing Magic ones.
CUDA 8 just came out, along with a compatible version of cudnn. This is particularly exciting to me because I upgraded one of my machines a little while ago - Here's what nvidia-smi has to say about it
I remember a while back there was a lot of discussion about making the char-rnn stuff run on Windows. For ages, the 'best' option was simply to take a VM and run Ubuntu in that, but that prevents GPU usage. I myself set up a dual-boot to be able to use GPUs in Ubuntu natively. Now, however, Google has blessed us with TensorFlow for Windows, complete with GPU support! My installation experience was pretty trivial. Just download the official Python 3.5 installer, run it, and do the steps in the TF docs. It's one step, really, and make sure you install the GPU version.
As a test, beyond the basic validation test they have you run, I was lucky enough to find a char-rnn implementation of Karpathy's algorithm in TF. I just cloned the repository, went in, and ran the train.py script on the default Shakespeare training corpus. It runs 23100 batches in 17 minutes on my GTX 980ti, which isn't bad (I haven't tried the equivalent on my Ubuntu install, but I think 0.04s per batch was about the same as I was getting there). After the training finished, I ran the sample.py script and it generated Shakespearean text, so it all seems to work great. It'll probably need a few tweaks for making Magic cards, but the best part is it's all in Python, so the output should be handled by hardcast_sixdrop's existing scripts so you can still generate html/MSE2 sets out of this.
One thing about Windows; it obviously has a lot more stuff running in the background. This isn't really an issue for the CPU, since it's a GPU-intensive task, but one thing that happens is that a lot more GPU memory is used in Windows than Ubuntu. I suspect that turning down some graphics settings to free up some of that memory would help (for context, TF detected I had 4.9 of 6GB free to be used, while on Ubuntu, Chainer (another ML framework library) has access to 5.6GB).
Anyway, hopefully this is useful to those who wanna run this stuff on Windows. I'll try to get actual Magic card generation working to see what modifications might have to be made, hopefully they won't be too extensive. But I'm just really glad that proper, GPU-accelerated machine learning is possible on Windows now.
edit: Doing a test run with the standard text input now, I suspect sampling it will return something just about acceptable. It's running at 51300 instead of 23100, so it'll take about 40 mins to train. hardcast_sixdrop did quite a few modifications to the original char-rnn code, mostly involving loading the data and randomizing it better and such, which are going to be challenging to replicate in TensorFlow, since, well, it's a separate library to Torch, which is what we've used so far. But if I can't do it, I'm sure someone else can.
I still dream of reducing Magicese to a small vocabulary of instructions, translating the corpus, and training a net on that.
I may pick that up as a project, now that I know I can do training myself.
Welcome back!
I finished training the default corpus in the TF char-rnn, and I got some results!
|5enchantment|4|6|7|8|9whenever a creature, and put it on top of his or her library into his or her graveyard.|3{^^}|0N|1wand hood|
|5sorcery|4|6|7|8|9prevent all colorless {UU} costs {^RR} gain life and you no island into his or her library this creature. its controller reveals his or her hand, attacking creatures.\-&^^^^^^: you get an equipment by name is red.|3{^^^UUUU}|0A|1boros gatekeeper|
OK, so they're not great results, but it's vaguely coherent... Boros Gatekeeper is a really, really good name, considering Boros is on Ravnica, which has gates, and... yeah, I'm surprised at that. And it proves the theory, at least. Now, the upgrading can begin.
I haven't worked on my code in a while, but if it would be helpful to people I can do a quick revision to mtgencode to put the latest sets in the default corpus on the github repo. It looks like I should update the tutorial to mention the possibility of using Windows TensorFlow as a backend.
I also have my own usable but only partially complete and totally undocumented library for doing character-level language model things, which is faster and more flexible than mtg-rnn. I might have a chance to finish that as well over the holidays. It should make it really easy to do things like curriculum learning, but unfortunately it is not Windows friendly. I wonder how hard it would be to port the API over to TensorFlow...
Does your code support new stuff like {C} and {E}? I know my local copy of mtg-encode does, but I wasn't able to do a thing with git to apply the changes to your repo.
What is this new library you speak of? Is it more effective at generation than mtg-rnn? Porting it to TF would be great in the sense that the audience who could use it would increase, since people with Ubuntu systems can operate as normal, but now Windows people could do it too.
Speaking of porting, what changes exactly did you make to mtg-rnn as compared to char-rnn? I'm thinking about making those same changes to the TF char-rnn, but I'm not exactly sure what to do with it yet.
The problem with stock char-rnn is that it doesn't give you enough control of the training process. It slurps up all the text in the training corpus and divides it inefficiently, ignoring information about card boundaries and reusing the same divisions over and over again. Since our data is composed of many short independent segments, we can do much better, which mtg-rnn does manually with a custom data loader.
The new library is much more powerful. Instead of reading a text file, it can invoke a linux program that streams training text over a variable number of channels. This is nice because you can write whatever custom, dynamic training regimen you want, and you don't have to do it in Torch7 lua, which is not my favorite language for string processing.
The drawback is that writing the program to provide the corpus is hard, because getting all of the interprocess communication stuff right is kind of tricky. To make things easier, I had intended to write a single streamer script that would just take a corpus specified as a directory tree of text files and configuration options, and figure out how to output the right training streams based on the configuration. This would make it simple to specify things like randomization of independent segments and also do curriculum learning.
I have some very basic examples in my torch-rnn repo, but nothing particularly usable yet. I hope I'll have some time over the holidays to finish it up. Let me know if you're feeling adventurous and want me to try and explain anything sooner.
As for Windows support, in principle it should be possible to build a similar input API in TensorFlow (I mean, it's a simple interface, right? Just spawn a program and read a whole bunch of output streams from it...). Then the training curriculum script could even be reused, assuming it was writing in something portable like Python, which is my current language of choice for it. The trickiest part I think will be finding an interprocess communication method that works. Currently I'm using pretty low-level hacks with LuaPosix in Torch7, which probably won't transfer to Windows any better than Torch itself will. TensorFlow is Python though, so it might be a lot easier to build a portable interface. I'll have to look into it.
Also, if it's a Linux program, it might not work for Windows TF, which would suck. My goal (not sure if it's achievable, but one can hope) is to have all mtg-rnn stuff in Python so it can work with Windows or Linux, alongside TF. I'm feeling adventurous in that I want to write probably 3 different neural nets in TF (all doing different things with card data; images, syntax and this generation one), so learning tricks in this one couldn't hurt. It feels like an insane goal, given I've never written a neural net from scratch and I have barely enough comp-sci/maths background, but heck. Learning!
It's trivial; basically wherever you define other symbols, you just add definitions for C and E. Not really worth making a pull request over.
You're absolutely right, a relatively simple parser can take a text file, split it up into pieces, and spit out 50 or 100 continuous infinite streams of the data it contains. But this parser has to do more than just open the file, it's not a trivial program. You can't encode 50 infinite, continuous streams directly as a text file. If you make the mechanism general, then you can use a more complicated parser if you want to: maybe you train the first 10,000 batches on cards legal in Modern only, and then throw in the rest of the cards and train for another 10,0000 batches. That's a very simple instance of curriculum learning. Maybe you use a distribution of 75% cards from Modern and 25% cards not from Modern, even though those aren't the actual ratios in the training data. Maybe you even look at the output of the network periodically during training to figure out what distribution to use.
The streamer script I mentioned would be implemented in Python, so it would be portable to Windows. It fills the role of the parser. Separating the programs and having the parser be its own Python script would mean that the logic to determine the training curriculum could live in mtgencode, not with a specific rnn implementation, and then any rnn library that supported the interface could use it. Supporting the interface should be much, much easier than porting the whole streamer script / parser.
Cool, that's what I was hoping. Shouldn't be a big deal for me to implement the change.