That... seems like a hell of a script. I wouldn't even know where to start with something like that, but I've never worked with advanced python IO, so there's that. What kind of interface would the rnn library need to support this?
In the meantime, I ran the TensorFlow char-rnn code with 3 layers and rnn_size of 512, and I got some interesting results, such as...
shahki, skirch flock 3R
legendary creature ~ elder dragon (rare)
flying
when @ enters the battlefield, if you control three or more artifacts, return target creature card from your graveyard to the battlefield.
at the beginning of your upkeep, if @ is in your graveyard, you may pay {BBR.}. if you do, return @ from your graveyard to the battlefield.
(5/5)
A perfectly legible and sensible legend! Apart from the . in the mana cost. Unfortunately there seems to be some overfitting because one of the cards had the name 'krark~clan shaman' and the other completely copied the text of Demonic Torment. Not sure how to turn up/down the temperature of the new TF sampling script yet. Luckily the mtg-encode script works fine with the output though, which is very convenient. Here's a few more:
long~term intellect 5UU
creature ~ avatar (rare)
trample
when @ enters the battlefield, draw three cards.
(11/11)
// An 11/11 trampling for 7, with upside? Yes, please. In blue no less. At least it's rare!
shambling shieldmage 3B
instant (common)
cast @ only during the declare attackers step and only if you've been attacked this step.
prevent all damage that would be dealt to you this turn. if it's blocking, remove a % counter from it. if that card is returned this way, you may pay 1. if you do, you gain 2 life.
// Impossible instructions are impossible. How has one been attacked if this can be cast only during the declare attackers step?
grixisk strike R
creature ~ elemental (common)
echo 3R
@ attacks each turn if able. R: @ gains indestructible until end of turn.
sacrifice @: destroy target enchantment.
(2/2)
// I'm afraid this gives red a really good way to deal with enchantments, so that wouldn't fly in today's colour pie world.
lifelink 2RR
instant (rare)
strive ~ @ costs 1U more to cast for each target beyond the first.
return any number of target creatures you control. prevent all combat damage and exile them. then that player shuffles his or her library.
// An actual correct implementation of Strive! That's pretty impressive. Pity the second line is a bit garbled.
destructive retribution 3G
enchantment (rare)
when @ enters the battlefield, search your library for a creature card with power 2 or less, reveal that card, and players.
// REVEAL THE PLAYERS!
veteran war cry 3
creature ~ snake elf and (rare)
hexproof
@ gets +1/+1 as long as you control a permanent other than @.
whenever @ deals combat damage to a player, if @ is in your graveyard, creatures you control get +1/+1 until end of turn.
(3/3)
// Ah, the rare 'and' subtype. I wonder if it's possible to deal combat damage from the graveyard, technically? If yes, yay anthem!
press to dust 2
artifact (rare)
sunburst T, remove X % counters from @: target creature gets -X/-X until end of turn.
countertype % echois
// Technically... this works? Sunburst implies the number of 'echois' counters (if we assume they're charge counters of a sort), and it's a keyword ability so it technically doesn't need reminder text! And this is a rare example of X being both defined and used. Winner!
I'd forgotten how fun it is to generate random cards. There'll probably be a few more of these posts.
That... seems like a hell of a script. I wouldn't even know where to start with something like that, but I've never worked with advanced python IO, so there's that. What kind of interface would the rnn library need to support this?
Basically, the rnn library needs to be able to launch a Python script, and then read input from multiple IO streams. My current version uses low-level posix file operations, which works fine on linux, but there's probably a more portable way to do it with python. The fun part is coordinating so that the streamer knows when it should exit, either when training is done or if the rnn library crashes for some reason.
It's not really that complicated, but it can be tricky to get it to work right if you aren't using some kind of specialized library to coordinate IO. Hopefully it's not to hard to build the APIs, and then users can just edit config files and run the script and not have to care about the gory details.
There's actually already something kinda like that. The only part of the corpus that needs more encoding is the rules text (since you can't compress colour, cost, type, p/t, any more than what it is now) and I'm working on figuring out a 'programmatic' representation of it which would be useful for training, but also for other search-related things.
The rules text conversion is the part I find most interesting. I'm imagining something that would define "destroy target creature" as "permanent(creature) move(battlefield, graveyard)", and "target player draws a card" to be "card move(library[top], hand)" In this way, the network could see that certain game actions have meaning. For example moving a card to the battlefield (elvish piper) is associated with higher mana costs than moving a card from library to graveyard (mill), therefore when crafting an effect, it will be more likely to cost it correctly.
Obviously I haven't gone deeply into this yet, but I can see it working. Cards would be data arrays with fields like P/T, Loyalty, rules text. They could have local instances that handle values like owner, controller, tapped status. For the system to understand complex mana issues, more work is needed. Mana would need a structure for costs, to handle hybrid, phyrexian and the like. A separate structure for generation would handle things like coming from a snow source, only only being able to be used to pay for artifacts, and so on.
Since each point of mana has all sorts of knobs and dials, it's pretty much required that each be treated like a separate object, more akin to a token than an anonymous component of the value "Four red".
Hey! I'm working on trying to understand this program and how I can use it; I installed linux for the first time just for it (something I've been meaning to do for a while). How exactly do you make it run on the GPU instead of the CPU? And if information like this is already easy to find online... could someone point me to it?
My first batch generated some great stuff even though it took FOREVER to run, like
dream thief {1}{U}
creature ~ human cleric (common)
indestructible
(1/1)
and
shimmering to the untouth {2}
artifact (uncommon)
whenever a player casts a blue spell, you may gain 1 life.
What you're looking for is hardcast's tutorial (which also includes GPU instructions). Alternatively, if you want to stick to Windows, you can use Python 3.5 and TensorFlow as described in my earlier post. That should get you going; if you have any questions though we should be able to fix (most of) them
It looks like the folder you chose for the -data_dir, data/custom_encoding, doesn't contain a file called input.txt. You may have an input file there, but unless it's called input.txt it won't work (it's picky like that).
Has anyone had any luck training with 2 or more GPUs? I know CuTorch theoretically supports it, but my Linux/Python/Torch skills are not quite there.
*EDIT*
Also, does anyone have any suggested training parameters for really high power systems? I work for a company where I have access to some pretty powerful hardware I find that with the recommended settings I finish training with the standard training regimen in a pretty short amount of time. If computing power was not an issue, what training parameters would you use?
EDH: UG Momir Vig, Simic Visionary BW Vish Kal, Blood Arbiter BUG The Mimeoplasm UWR Zedruu the Greathearted RUG Riku of Two Reflections UR Niv-Mizzet, Dracogenius WR Aurelia, the Warleader
Aw, I was all excited about the Windows TensorFlow release, but my GPU isn't nVidia so CUDA isn't an option. Sadness.
It sounded like there was still some catching up to do vis getting MTG-specific training improvements into TF. Is that on someone's to-do list already, or is it waiting for a brave soul to step up? (I'm only a barely-competent JavaScript dude, so I'd be starting from scratch re Python...)
I think hardcast_sixdrop has the best idea of what to do for the mtg-specific optimizations, since he made them for the Torch7 network. I briefly considered attempting to implement them in Tensorflow, but I have about 3 separate-but-related projects going on right now, so I don't have time to squeeze this one in too. If you wanna take it on, it won't be easy, but I assume hardcast would be helping you out in that.
It's been a while since anybody's posted any cards! I'm revisiting the stuff I generated last go 'round (with a bug in my head about somehow running an RNN constructed match) and finding all sorts of oddments...
Kraith, Oliro's Courier0
Enchantment (w}) 2: the next 2 damage that would be dealt to you whil this turn is dealt to its controller instead.
If you don't control an enchantment, Kraith, Oliro's Courier becomes a 3/3 artifact creature with flying.
At the beginning of your upkeep, destroy a land.
All that value for the low cost of zero mana! And yes, "w}" is its rarity.
Elder Loomer2W
Rander (common)
Elder Loomer enters the battlefield tapped. T, sacrifice Elder Loomer: search your library for a nonland card, reveal it, and put it into your hand. Then shuffle your library.
A Rander is presumably a permanent. Beyond that, it's anyone's guess!
Haunted Ooger3B
Enchantment ~ Aura (common)
Enchant creature
Flash
Enchanted creature gets -1/-1 for each +1/+1 counter on it.
Whenever enchanted creature attacks, you may draw a card.
That's kind of a cool way of shutting down counter-based pump. Overcosted of course given you'd typically rather just kill the thing.
Most2U
Creature ~ Horror (common)
Most has flying as long as you control a creature with flying.
Threshold ~ as long as seven or more cards are in your graveyard, Most gets +3/+0 and has "T: target creature gets +1/+0 until end of turn."
(1/1)
Would it stay flying even if the rest of your fliers went away...?
EDIT: Oh! Cool correct uses of X!
Reparity4G
Sorcery (rare)
Put X X/X green wall creature tokens onto the battlefield, where X is the number of colors of mana spent to cast Reparity.
And they don't even have Defender!
Armorer's Edge1W
Creature ~ Human Druid (rare) UU, T: put an X/X blue Rigger Beast creature token onto the battlefield, where X is the total number of lands on the battlefield tapped and attacking.
(1/1)
Wacky but fun tech for an Awaken cycle?
EDIT 2: Amazing--
Jerkargd of Yesi3RG
Sorcery (uncommon)
Miracle R
Destroy all nonblack creatures. If it's your turn, instead Jerkargd of Yesi deals 1 damage to target creature or player instead.
So much value for one mana, but you've gotta pull out the Miracle at the right time!
The mtgencode repository at https://github.com/billzorn/mtgencode hasn't been updated since summer of last year. I went ahead and bashed in some quick-and-dirty fixes (all I'm really capable of, this being literally the first time I've done anything in Python) to enable Energy counters and vehicles, and a bit of error output in verbose mode when a card from the input corpus comes up invalid. If you want to use the current AllSets.json and get every card into your training data, you can pull from my fork: https://github.com/SabreCat/mtgencode
EDIT: Oh shoot, I didn't realize or had forgotten hardcast_sixdrop is the maintainer of /billzorn/mtgencode. So the above should just be considered a workaround until if/when he gets the chance to review my PRs against his repo!
Added a feature to my fork to strip ability words from rules text, given that they're basically flavor text and thus extra verbiage for the RNN to learn with no mechanical benefit. Any reason this would be a bad idea? I don't expect it to make a huge difference in output quality, but every little bit helps, right? A minor step forward while we're waiting for the next big thing.
I've pulled most of Elseleth's updates into the main mtgencode repo and pushed an updated output.txt generated from the most recent mtgjson version 3.8.
I'm going to hold off for now on the change to strip ability words, pending a full overhaul of my neural network libraries that would allow me to test it. An option to do something similar will probably be implemented at some point, along with some options to control stripping or forcing reminder text.
It is not clear to me if "extra verbiage" has a mechanical benefit or not. Yes, it's more for the language model to learn, but it's also more that the language model can learn from. We're at the point where the capabilities of hardware far exceed the size of our dataset, so throwing in extra words that might have some relationship to functionality could actually be beneficial. Heck, it could even be beneficial to throw in flavor text.
On a side note, it turns out mtgjson is also open source. They get their data directly from gatherer, which can cause issues when gatherer has inaccuracies, lol. Also, is anyone still working on generating artwork? I was not previously aware of mtg.wtf, but it looks like a very convenient source of card pictures (and other mtgjson-derived data). I may have to modify the html spoilers to take advantage of it.
I saw your comment on the repo that the Energy symbol conflicts with the E rarity symbol. Is that a problem in mtgjson, then? I'm having trouble even finding what rarity "E" is, so I imagine there aren't too many of those...
Looking forward to the options and enhancements you mention! For now I'll maintain my fork with the ability word change, for anyone who might want to give it a spin and see if it has any noticeable effect.
Yeah, gatherer's inaccuracies have been a constant source of pain for the mtgjson folks; I hang out in their Gitter channel and a ton of the Kamigawa flip cards had their names mangled recently. And just for fun, Gatherer also deleted the Vintage, Legacy and Commander as formats. They just keep making mistake after mistake, somehow...
For image extraction, have you tried Gatherer Extractor? It's what I used to get all my MTG images for phash recognition. Its only drawback is not getting multiple images of cards like Icatian Javelineers and the basic lands.
I saw your comment on the repo that the Energy symbol conflicts with the E rarity symbol. Is that a problem in mtgjson, then? I'm having trouble even finding what rarity "E" is, so I imagine there aren't too many of those...
This is just a (very minor) internal thing. E is the marker that I use to indicate "special" rarity. I originally chose it because it wasn't being used for anything else, but that's no longer the case, so it might make sense to change it to avoid ambiguity, though I'd be very surprised if the effect would even be noticeable. More of a consistency thing.
My fork of mtgencode now fixes an error I introduced when decoding in verbose mode (whoops), encodes special rarities as I, and provides capitalization when outputting default text or forum formatting, in addition to stripping ability words on encode.
I output a couple thousand cards using the no-ability-words setup, and I'm pretty pleased with the results! I've gotten a bit of overfitting on card names, but the text seems really good. I even got some complete Planeswalkers!
Mirrorpool, the Realm Seer4G
Planeswalker ~ Chandra (mythic rare)
+1: Put a % counter on an artifact.
-2: Exile target creature card from a graveyard.
-6: You get an emblem with "whenever a player casts a spell, that player puts that card into his or her hand. If you do, shuffle your library.
((4))
Given how often the RNN forgets to add its little "countertype" line, adding a % counter could actually be pretty handy in this format!
Watch of Solitude2WW
Planeswalker ~ Darati (rare)
+1: Untap Watch of Solitude and untap it.
-3: You gain 1 life for each enchantment you control.
-7: You get an emblem with "creatures you control get +1/+1 until end of turn."
((3))
So close. So very, very close.
Goblin War Cry2RRR
Planeswalker ~ Nissa (mythic rare)
When Goblin War Cry is put into a graveyard from anywhere, shuffle it into its owner's library.
+1: Untap up to four target creatures.
-2: Would be put into a graveyard from your hand for each 1 damage prevented this way.
-5: Goblin War Cry deals 7 damage to each creature.
((3))
See what I mean about card names? And no idea what's going on with that second ability.
Kavu Thrill~Goder3RR
Planeswalker ~ Ring (rare)
+1: Look at the top two cards of your library. Put one of them into your hand and the rest on the bottom of your library in any order.
-2: Gain control of all creatures with a +1/+1 counter on it.
-10: Each opponent sacrifices a spell.
countertype % quest.
((4))
That is a sick -2 ability. And what does it mean that it uses quest counters?
Hematic Seer3R
Planeswalker ~ Ajani (mythic rare)
+1: Create two 1/1 colorless Myr artifact creature tokens.
-2: Destroy target nonbasic land.
-4: Create five 1/1 white Spirit creature tokens with flying.
Whenever a creature is put into an opponent's graveyard from anywhere, shuffle it into its owner's library.
You may cast a creature card at random from your hand.
When Hematic Seer enters the battlefield, if tribute wasn't paid, it gets +2/+2 and choose a color. Creatures you control get +1/+1 and have first strike.
((3))
I've been familiarizing myself with the lovely fan work available at PlaneSculptors, and I'm wondering if that might be a worthy way to bulk up our training data set. It'd be a few thousand cards at least, depending on how discriminating we are in picking sets to add to the corpus (and I'd venture that even in-progress fan cards aren't likely to be much worse than the ancient pre-Modern cards in the official data). What do people think? mtgencode doesn't currently have the ability to parse the XML files PlaneSculptor supplies for download, but I'm thinking that might be within my coding reach as I get more comfortable with Python.
Hey I've been thinking; How difficult on a scale of one arbitrary number to another, would it be to use these or a similar neural network and associated algorithm to generate, Yugioh cards?
I'm going to have stab in the dark and say reasonably difficult.
In the meantime, I ran the TensorFlow char-rnn code with 3 layers and rnn_size of 512, and I got some interesting results, such as...
shahki, skirch flock 3R
legendary creature ~ elder dragon (rare)
flying
when @ enters the battlefield, if you control three or more artifacts, return target creature card from your graveyard to the battlefield.
at the beginning of your upkeep, if @ is in your graveyard, you may pay {BBR.}. if you do, return @ from your graveyard to the battlefield.
(5/5)
A perfectly legible and sensible legend! Apart from the . in the mana cost. Unfortunately there seems to be some overfitting because one of the cards had the name 'krark~clan shaman' and the other completely copied the text of Demonic Torment. Not sure how to turn up/down the temperature of the new TF sampling script yet. Luckily the mtg-encode script works fine with the output though, which is very convenient. Here's a few more:
long~term intellect 5UU
creature ~ avatar (rare)
trample
when @ enters the battlefield, draw three cards.
(11/11)
// An 11/11 trampling for 7, with upside? Yes, please. In blue no less. At least it's rare!
shambling shieldmage 3B
instant (common)
cast @ only during the declare attackers step and only if you've been attacked this step.
prevent all damage that would be dealt to you this turn. if it's blocking, remove a % counter from it. if that card is returned this way, you may pay 1. if you do, you gain 2 life.
// Impossible instructions are impossible. How has one been attacked if this can be cast only during the declare attackers step?
grixisk strike R
creature ~ elemental (common)
echo 3R
@ attacks each turn if able.
R: @ gains indestructible until end of turn.
sacrifice @: destroy target enchantment.
(2/2)
// I'm afraid this gives red a really good way to deal with enchantments, so that wouldn't fly in today's colour pie world.
lifelink 2RR
instant (rare)
strive ~ @ costs 1U more to cast for each target beyond the first.
return any number of target creatures you control. prevent all combat damage and exile them. then that player shuffles his or her library.
// An actual correct implementation of Strive! That's pretty impressive. Pity the second line is a bit garbled.
destructive retribution 3G
enchantment (rare)
when @ enters the battlefield, search your library for a creature card with power 2 or less, reveal that card, and players.
// REVEAL THE PLAYERS!
veteran war cry 3
creature ~ snake elf and (rare)
hexproof
@ gets +1/+1 as long as you control a permanent other than @.
whenever @ deals combat damage to a player, if @ is in your graveyard, creatures you control get +1/+1 until end of turn.
(3/3)
// Ah, the rare 'and' subtype. I wonder if it's possible to deal combat damage from the graveyard, technically? If yes, yay anthem!
press to dust 2
artifact (rare)
sunburst
T, remove X % counters from @: target creature gets -X/-X until end of turn.
countertype % echois
// Technically... this works? Sunburst implies the number of 'echois' counters (if we assume they're charge counters of a sort), and it's a keyword ability so it technically doesn't need reminder text! And this is a rare example of X being both defined and used. Winner!
I'd forgotten how fun it is to generate random cards. There'll probably be a few more of these posts.
It's not really that complicated, but it can be tricky to get it to work right if you aren't using some kind of specialized library to coordinate IO. Hopefully it's not to hard to build the APIs, and then users can just edit config files and run the script and not have to care about the gory details.
The rules text conversion is the part I find most interesting. I'm imagining something that would define "destroy target creature" as "permanent(creature) move(battlefield, graveyard)", and "target player draws a card" to be "card move(library[top], hand)" In this way, the network could see that certain game actions have meaning. For example moving a card to the battlefield (elvish piper) is associated with higher mana costs than moving a card from library to graveyard (mill), therefore when crafting an effect, it will be more likely to cost it correctly.
Obviously I haven't gone deeply into this yet, but I can see it working. Cards would be data arrays with fields like P/T, Loyalty, rules text. They could have local instances that handle values like owner, controller, tapped status. For the system to understand complex mana issues, more work is needed. Mana would need a structure for costs, to handle hybrid, phyrexian and the like. A separate structure for generation would handle things like coming from a snow source, only only being able to be used to pay for artifacts, and so on.
Since each point of mana has all sorts of knobs and dials, it's pretty much required that each be treated like a separate object, more akin to a token than an anonymous component of the value "Four red".
My first batch generated some great stuff even though it took FOREVER to run, like
dream thief {1}{U}
creature ~ human cleric (common)
indestructible
(1/1)
and
shimmering to the untouth {2}
artifact (uncommon)
whenever a player casts a blue spell, you may gain 1 life.
Ive set up tensorflow and cuda, i hope correctly.
I had torch working on linux but had tons of issues with cuda and couldnt get it working with GPU so it was way too slow.
Thanks mate. ~
For additional info, check the original RNN repo by Karpathy and his subsequent talk video on them. Since he wrote the char-rnn network, he's rather good at explaining them.
*EDIT*
Also, does anyone have any suggested training parameters for really high power systems? I work for a company where I have access to some pretty powerful hardware I find that with the recommended settings I finish training with the standard training regimen in a pretty short amount of time. If computing power was not an issue, what training parameters would you use?
EDH:
UG Momir Vig, Simic Visionary
BW Vish Kal, Blood Arbiter
BUG The Mimeoplasm
UWR Zedruu the Greathearted
RUG Riku of Two Reflections
UR Niv-Mizzet, Dracogenius
WR Aurelia, the Warleader
It sounded like there was still some catching up to do vis getting MTG-specific training improvements into TF. Is that on someone's to-do list already, or is it waiting for a brave soul to step up? (I'm only a barely-competent JavaScript dude, so I'd be starting from scratch re Python...)
Kraith, Oliro's Courier 0
Enchantment (w})
2: the next 2 damage that would be dealt to you whil this turn is dealt to its controller instead.
If you don't control an enchantment, Kraith, Oliro's Courier becomes a 3/3 artifact creature with flying.
At the beginning of your upkeep, destroy a land.
All that value for the low cost of zero mana! And yes, "w}" is its rarity.
Elder Loomer 2W
Rander (common)
Elder Loomer enters the battlefield tapped.
T, sacrifice Elder Loomer: search your library for a nonland card, reveal it, and put it into your hand. Then shuffle your library.
A Rander is presumably a permanent. Beyond that, it's anyone's guess!
Murana Ball 2
Artifact (rare)
Players can't cast players.
Aww
Haunted Ooger 3B
Enchantment ~ Aura (common)
Enchant creature
Flash
Enchanted creature gets -1/-1 for each +1/+1 counter on it.
Whenever enchanted creature attacks, you may draw a card.
That's kind of a cool way of shutting down counter-based pump. Overcosted of course given you'd typically rather just kill the thing.
Most 2U
Creature ~ Horror (common)
Most has flying as long as you control a creature with flying.
Threshold ~ as long as seven or more cards are in your graveyard, Most gets +3/+0 and has "T: target creature gets +1/+0 until end of turn."
(1/1)
Would it stay flying even if the rest of your fliers went away...?
EDIT: Oh! Cool correct uses of X!
Reparity 4G
Sorcery (rare)
Put X X/X green wall creature tokens onto the battlefield, where X is the number of colors of mana spent to cast Reparity.
And they don't even have Defender!
Armorer's Edge 1W
Creature ~ Human Druid (rare)
UU, T: put an X/X blue Rigger Beast creature token onto the battlefield, where X is the total number of lands on the battlefield tapped and attacking.
(1/1)
Wacky but fun tech for an Awaken cycle?
EDIT 2: Amazing--
Jerkargd of Yesi 3RG
Sorcery (uncommon)
Miracle R
Destroy all nonblack creatures. If it's your turn, instead Jerkargd of Yesi deals 1 damage to target creature or player instead.
So much value for one mana, but you've gotta pull out the Miracle at the right time!
The mtgencode repository at https://github.com/billzorn/mtgencode hasn't been updated since summer of last year. I went ahead and bashed in some quick-and-dirty fixes (all I'm really capable of, this being literally the first time I've done anything in Python) to enable Energy counters and vehicles, and a bit of error output in verbose mode when a card from the input corpus comes up invalid. If you want to use the current AllSets.json and get every card into your training data, you can pull from my fork: https://github.com/SabreCat/mtgencode
EDIT: Oh shoot, I didn't realize or had forgotten hardcast_sixdrop is the maintainer of /billzorn/mtgencode. So the above should just be considered a workaround until if/when he gets the chance to review my PRs against his repo!
I'm going to hold off for now on the change to strip ability words, pending a full overhaul of my neural network libraries that would allow me to test it. An option to do something similar will probably be implemented at some point, along with some options to control stripping or forcing reminder text.
It is not clear to me if "extra verbiage" has a mechanical benefit or not. Yes, it's more for the language model to learn, but it's also more that the language model can learn from. We're at the point where the capabilities of hardware far exceed the size of our dataset, so throwing in extra words that might have some relationship to functionality could actually be beneficial. Heck, it could even be beneficial to throw in flavor text.
On a side note, it turns out mtgjson is also open source. They get their data directly from gatherer, which can cause issues when gatherer has inaccuracies, lol. Also, is anyone still working on generating artwork? I was not previously aware of mtg.wtf, but it looks like a very convenient source of card pictures (and other mtgjson-derived data). I may have to modify the html spoilers to take advantage of it.
I saw your comment on the repo that the Energy symbol conflicts with the E rarity symbol. Is that a problem in mtgjson, then? I'm having trouble even finding what rarity "E" is, so I imagine there aren't too many of those...
Looking forward to the options and enhancements you mention! For now I'll maintain my fork with the ability word change, for anyone who might want to give it a spin and see if it has any noticeable effect.
For image extraction, have you tried Gatherer Extractor? It's what I used to get all my MTG images for phash recognition. Its only drawback is not getting multiple images of cards like Icatian Javelineers and the basic lands.
EDIT: Yup, looks like it. pull requested!
I output a couple thousand cards using the no-ability-words setup, and I'm pretty pleased with the results! I've gotten a bit of overfitting on card names, but the text seems really good. I even got some complete Planeswalkers!
Mirrorpool, the Realm Seer 4G
Planeswalker ~ Chandra (mythic rare)
+1: Put a % counter on an artifact.
-2: Exile target creature card from a graveyard.
-6: You get an emblem with "whenever a player casts a spell, that player puts that card into his or her hand. If you do, shuffle your library.
((4))
Given how often the RNN forgets to add its little "countertype" line, adding a % counter could actually be pretty handy in this format!
Watch of Solitude 2WW
Planeswalker ~ Darati (rare)
+1: Untap Watch of Solitude and untap it.
-3: You gain 1 life for each enchantment you control.
-7: You get an emblem with "creatures you control get +1/+1 until end of turn."
((3))
So close. So very, very close.
Goblin War Cry 2RRR
Planeswalker ~ Nissa (mythic rare)
When Goblin War Cry is put into a graveyard from anywhere, shuffle it into its owner's library.
+1: Untap up to four target creatures.
-2: Would be put into a graveyard from your hand for each 1 damage prevented this way.
-5: Goblin War Cry deals 7 damage to each creature.
((3))
See what I mean about card names? And no idea what's going on with that second ability.
Kavu Thrill~Goder 3RR
Planeswalker ~ Ring (rare)
+1: Look at the top two cards of your library. Put one of them into your hand and the rest on the bottom of your library in any order.
-2: Gain control of all creatures with a +1/+1 counter on it.
-10: Each opponent sacrifices a spell.
countertype % quest.
((4))
That is a sick -2 ability. And what does it mean that it uses quest counters?
Hematic Seer 3R
Planeswalker ~ Ajani (mythic rare)
+1: Create two 1/1 colorless Myr artifact creature tokens.
-2: Destroy target nonbasic land.
-4: Create five 1/1 white Spirit creature tokens with flying.
Whenever a creature is put into an opponent's graveyard from anywhere, shuffle it into its owner's library.
You may cast a creature card at random from your hand.
When Hematic Seer enters the battlefield, if tribute wasn't paid, it gets +2/+2 and choose a color. Creatures you control get +1/+1 and have first strike.
((3))
It just sort of rambled on at the end there.
That... is a lot of shuffling.
I'm going to have stab in the dark and say reasonably difficult.