By the way, I did see someone put out something fun. They designed a image generation process using Tensorflow that produces vector graphics rather than raster graphics. That is, it's thinking in terms of lines and curves rather than pixels. That makes it easier to draw complex sketches because it's scale-independent.
Heh, I came here exactly to tell you about this, but it seems you got there first By the way, dunno if you read their previous developments, the network they're using is based on a Tensorflow char-rnn implementation you can find here.
This research got me thinking about turning pixel sprites/textures into vector drawings. There are a bunch of vectorizing technologies around, but none of them are any good if the raster input is too small - few samples to reliably create smooth lines. From a corpus of vector drawings you could generate raster images at several different scales (and rotations, deformations, etc) to train the network.
I decided to run one more test last night before I committed to gutting the network code and making it parameterizable for further testing. I did a test on the set of all Magic cards. The results are not bad, actually, with one small problem: every creature is composed of textureless gray blobs. I've enlarged one of the images to give you a clear idea of what I'm talking about.
I think it's supposed to be a Rhox-like creature. You can see the eyes, and what look to be horns set atop a big head, but the body is a featureless gray mass. There are several reasons why this might be happening. For one, I lumped all the images into a single category of object, so this may be a case of the network generalizing over many diverse creatures and settling on an average texture. That and the network that I trained with wasn't very big, so it might not have the capacity to learn different textures for different subjects.
I'm also on the lookout for instances of overfitting. For example, does anyone recognize the other art I've attached? Looks like a man taking a discard/mill spell to the face. While the generator has never actually seen real Magic art, it may have stumbled upon a blob that looked like a silhouette and reshaped it according to the responses of the discriminator.
---
While we're on the subject of convolutional neural networks, I took the opportunity to try out BlindTool Beta (after I saw this video that was posted to the /r/machinelearning subreddit). It's a free Android app that uses convnets to identify objects that are in the view of the phone's camera, and tells you what it thinks it sees. It's not the brightest network out there, but it can distinguish between a thousand different commonly encountered classes of objects. Before I left for work this morning, I went around my apartment testing it out, and it performed very well.
Of course, it's easy to fool it when it has to reason about things it has never seen before. I took the card Mesa Falcon and set it on the table. It told me it was a book cover. But when I picked up the card and tapped on the body text with my thumb, it told me it was an Ipod or a personal hand-held computer. Don't get me wrong, it's clever that it can take advantage of contextual clues like that; it's just fun to push it to its limits.
I assume the square artifacts are the result of how the image gets divided into different subregions, right?
Would it be possible to have the subregion grid jitter randomly by a few pixels during each feedback loop, so it doesn't end up reinforcing the horizontal and vertical lines?
I assume the square artifacts are the result of how the image gets divided into different subregions, right?
Would it be possible to have the subregion grid jitter randomly by a few pixels during each feedback loop, so it doesn't end up reinforcing the horizontal and vertical lines?
I see what you're getting at. I'll have to look into the matter further.
By the way, I noticed that not all of the images from this last run come out as blobs. As I generated new ones, I found that they occasionally coalesce into something more reasonable. Sometimes (almost) card-worthy. For example, if Messeel Disciple was printed out and viewed at a great distance, you probably wouldn't even question it. Obviously, upon closer inspection you see that she's missing a face and the resolution is off, but the scene composition of "woman in a red dress and hood gazing upon a magic, floating crystal" is on point.
I did some tweaking and let the training run for a day and a half. I managed to get past the "everything is a blob" stage and the results are more often coherent than not. Of course, for as long as I ran it, there's some possibility of overfitting, so I've attached some examples for y'all to eyeball. Are any of these works obviously derivative? It's hard to do direct comparisons to existing art because if it did copy anything, it may have done so in a subtle way or mixed it in with novel features. That being said, I'm fairly confident that the majority of artworks produced by the system are original, it's just that some of them (like the ones I'm showing you) seem too good to be true.
That being said, many of the artworks are at least somewhat noisy. For example, I've attached one where it's clear we have a monster of some kind charging out of a portal, but it's very hard to make out exactly what's going on. Other times, the artwork is ambiguous or just strange. I've attached an example of what appears to be a man wearing a very big hat of some kind.
Right now I'm training on low resolution images because I can do the training faster and because it's easier for an adversarial approach to converge on good solutions when it doesn't have to contend with lots of fine details. I'm sure we'll eventually be able to produce full-fledged, high resolution artworks that look and feel like Magic art; these are just the first baby steps towards novel art generation.
EDIT: I'm also going to look into messing with vector arithmetic with the art generation vectors. With any luck, I should be able to rotate the "nun" to see her face. That is, if she even has a face. More on that later.
EDIT(2): The more interesting thing I'm noticing is that the lighting of scenes is believable. That suggests it has, from learning how to make believable images, learned a primitive model of lighting and shadow. That would make sense, in the same way that studying faces leads to a 3D geometry of faces that can be used to do rotations. I have some fun experiments to try soon...
My first thought for "Noisy Beast" was that it looked like Chaos Warp. They're not *that* similar, but it might be converging on it.
Not a bad suggestion, it just might be.
------
So, as I mentioned last night, experiments! I just couldn't wait to try out some of the stuff the researchers at Facebook did in the paper.
First, linear interpolation between two images. At the start, we have a sleeping child, and we want to end up with an image of a knight in profile. So I calculate
v(t) = v(child)*t + v(knight)*(1-t) where t=1.0 to 0.0
I've attached a copy of the results. At the halfway point, the child's face melts into a dark frame of some kind. Atop this frame a body emerges, and the frame beneath suggests a quadruped. It appears to have some kind of some kind of stinger. As the the transformation continues, the body shifts into a more human form. The extra appendages fall away and become background details like rocks. Interestingly, the "stinger" turns into a golden lyre. If you look closely at the final result, you can see horizontal striations suggesting strings.
Many of the results produced by the network are downright bizarre, like the four-legged scorpion man, but it turns out that these strange creatures are not far from aethestically pleasing results. Moreover, all of the tiny details in the image are deliberate; every element of every image is purposefully chosen to maximize the chances of the output to pass for real Magic art. The knight is more likely to have a lyre than a scorpion's stinger, and more likely to wear armor than a carapace. The mess of pixels that we get out of this process belie a very complex, learned system of associations.
Next, I did a bit of reverse engineering of the vectors. We have a vector of size 200 that we put in, and we get a knight out. The vector encodes all the relevant information about the image, both the content (the person, the armor, the golden lyre), and the presentation of that content (the angle, the lighting, the scale, etc.). However, like card vectors and word vectors, we have no idea what elements control what image features.
I selected elements of the input vector and interpolated them to see what effect that would have on the output image. I managed to...
* Melted him into a blob.
* Set him on fire.
* Changed the sky in the background.
* Turned the lyre into a distant island.
* Turned his head into a rectangular prism.
But that took a long time and it's difficult to figure out which combinations of transformations are content-preserving and which are content-altering. So instead I randomly interpolated elements in an attempt to narrow in on the ones that control the position of the viewer. I've attached some results that I got. Note that because I selected several vector elements at random when I make my changes, I ended up destroying/changing some of the content, but you get the general impression of what I'm aiming for. In the top left of the image you'll see the original image. The top right image is a fuzzy zoom-in of the knight's head, and the others are a fuzzy zoom-out accompanied by a fuzzy rotation. With some additional sorcery, we should be able to make the operations less destructive and preserve the original content of the image.
Effectively, the 2D output image we get is actually a projection from a higher-dimensional space. That is, it's actually calculating lighting, occlusion, perspective, etc. with respect to a fuzzy-three-dimensional representation. Now that is cool, especially because all of this was learned without the generator ever having seen the actual artwork. It's all inferred through trial and error. That's why it takes days to do the damn training, but the pay-off is worth it.
I'm very excited to see where these sorts of algorithms lead us in the years to come. They hold a great deal of promise.
Anyway, the next step is to find a way to get from card vectors to image vectors. Possibly by way of illustration-content vectors. I'll have to figure that one out. If we can do that, then we can automatically generate appropriate art for the cards we make.
EDIT: I just had to let it come up with a card with the name "Hat" that would go along with the man wearing the hat.
New to the thread but this ties together 2 of my favorite things. Very exciting stuff. I will be checking this daily going forward. Hat man looks like Obsianus Golem
I just want to mention that those results are absolutely amazing and this thread is consistently one of the most interesting parts of my day.
On an unrelated note, I've been training the card generation network on my own computer, but I've been having some problems. Training the default size network on GPU works fine, but whenever I try to increase the size, I pretty quickly get the error:
cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-6753/cutorch/lib/THC/generic/THCStorage.c:147
Or error 74 (at the same line). Any ideas what might cause this? I'm getting the error with vanilla char-rnn as well, which is weird, because that worked fine before. Especially frustrating because I've also been trying to train it on a database of every SCP object that I compiled recently, but I have to train on CPU if I want any size larger than 128x2. Got some fun results from training a 300x2 network on CPU though!
Somewhat more unrelated to the topic, but I'm considering trying to run the generative-adversarial network on a corpus of midi files (translated into a piano roll format, for readability) or audio files (again, translated into a spectrogram), but probably not knowledgeable enough to make the necessary modifications.
On an unrelated note, I've been training the card generation network on my own computer, but I've been having some problems. Training the default size network on GPU works fine, but whenever I try to increase the size, I pretty quickly get the error:
cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-6753/cutorch/lib/THC/generic/THCStorage.c:147
Or error 74 (at the same line). Any ideas what might cause this? I'm getting the error with vanilla char-rnn as well, which is weird, because that worked fine before.
Hrm, interesting. My first thought was to check the github page, but I see that you already posted there and haven't gotten a reply in several days.
You might have already done this, but have you tried using luarocks to update the nn, cutorch, and cunn packages? I just want to make sure it's not some kind of version mismatch. They frequently update and tweak the libraries, and sometimes issues like this can creep in.
If that doesn't help, I'd be happy to sit down and help you dig through everything, to narrow in on the exact problem.
Somewhat more unrelated to the topic, but I'm considering trying to run the generative-adversarial network on a corpus of midi files (translated into a piano roll format, for readability) or audio files (again, translated into a spectrogram), but probably not knowledgeable enough to make the necessary modifications.
Interesting idea! My first suggestion would have been to use a recurrent network, because the music files could be of variable length. I've seen several people approach the problem that way. But the results I've seen thus far have been lackluster, at least with regards to raw audio. Midi files are more tolerable, of course; if you're sticking with that, you should have better luck. I know that one person used the char-rnn package to write out text that was convertible to midi files. Mind you, I've heard rumors that several research teams are actively working on the problem of audio generation, and it's likely that we'll see some fresh papers on the subject come out this year.
On the subject of convnets applied to music, there's been some progress in using them to do music analysis. Back in July, Kereliuk et al. put out a paper entitled "Deep Learning and Music Adversaries", used an adversarial approach to develop a robust music classifier. They used convnets on fixed duration "music frames", represented as sonograms (see attached image). I'm sure there are some other papers out there at this intersection of ideas. After all, the use of AI to study music has a storied history stretching back to the late 1980s with the work of people like David Cope who, I've learned, hosts a yearly workshop on algorithmic computer music.
EDIT: Here's a fun link. You might find useful information there.
EDIT(2): Oh, and here's an LSTM network with a webpage where it just endlessly crafts traditional Irish folk tunes.
EDIT(3): Sorry for the quick edits in rapid succession, but I just realized something interesting. In the final image of the rotation series that I posted yesterday (where we are looking at the knight from behind), there was an unusual streak of gold that I assumed was noise. It is in fact the frame of the golden lyre. We're just seeing it at an odd angle.
By the way, I think Smog Fiend looks a little derivative of a horizontally flipped Air Elemental. I mean, that's pretty obvious, but I figured I'd note it.
You might have already done this, but have you tried using luarocks to update the nn, cutorch, and cunn packages? I just want to make sure it's not some kind of version mismatch. They frequently update and tweak the libraries, and sometimes issues like this can creep in.
If that doesn't help, I'd be happy to sit down and help you dig through everything, to narrow in on the exact problem.
Yeah, I tried updating those, (you can update them with luarocks install, right?) to no avail. I do get a lot of "Removing Non-existent dependency file" warnings when installing cunn and cutorch, though it does say it installed successfully. Would that have anything to do with it? It seems suspect, since one of the "non-existent dependency files" is THCStorage.c.
[edit] did a clean install of CUDA and all the dependencies. Didn't help. Any help at all would be awesome.
If you've got a bit of a handle on trying to rotate the knight, would it be possible to create more intermediate frames and animate them together? Would be interesting to see how well it flows.
You might have already done this, but have you tried using luarocks to update the nn, cutorch, and cunn packages? I just want to make sure it's not some kind of version mismatch. They frequently update and tweak the libraries, and sometimes issues like this can creep in.
If that doesn't help, I'd be happy to sit down and help you dig through everything, to narrow in on the exact problem.
Yeah, I tried updating those, (you can update them with luarocks install, right?) to no avail. I do get a lot of "Removing Non-existent dependency file" warnings when installing cunn and cutorch, though it does say it installed successfully. Would that have anything to do with it? It seems suspect, since one of the "non-existent dependency files" is THCStorage.c.
[edit] did a clean install of CUDA and all the dependencies. Didn't help. Any help at all would be awesome.
Hmm.. I'll need to do some research. Perhaps you could PM me your specs and the environment that you're running everything in? That would be a good start for me. I can try and find some time to dig around for answers.
If you've got a bit of a handle on trying to rotate the knight, would it be possible to create more intermediate frames and animate them together? Would be interesting to see how well it flows.
Yes, it should be possible. I almost have it. The images that I showed you came from a series of random interpolations on pieces of the input vector. I need to go back and see if I can distill a rotation vector that applies to images in general. The authors of the DCGAN paper were able to do it with the network trained on faces (see attached image).
I should be able to produce a turn operation by randomly interpolating images, collecting rotated versions, and averaging together the transformations. Something like that. It'll take some time though.
The question is whether rotations actually apply to all "objects" or not. It's possible that foreground objects like people and monsters have strongly defined geometries and that background objects have weakly defined geometries. That is, an island in the background might just be a static, painted backdrop. Lots of fun to be had there.
And an even more interesting question is whether we can recover the original 3D representation. You know how you can take multiple 2D images and do a 3D reconstruction from that? Imagine doing that with the art this thing generates. That should be possible if the learned model is well-behaved enough. I'm also looking into that possibility.
EDIT: Oh, I almost forgot. Here's a few mock-ups I made combining generated art with generated cards taken from card dumps. Hopefully I'll have a system for matching cards to artwork before too long. The groundwork has already been laid for that, but there are a few additional steps needed. :-D
Those images you've attached to the cards, they're a bit grainy. Is that a resolution issue or something else? And if it's a resolution issue, can that be resolved?
Looking at the card thumbnails though, they look so colour-appropriate. They look like thumbnails of real cards, honestly. That artwork generator is nuts. Actually, would you be able to provide the artwork generator network checkpoints for us to try using as well?
Those images you've attached to the cards, they're a bit grainy. Is that a resolution issue or something else? And if it's a resolution issue, can that be resolved?
It's due to the resolution of the output. I took high(er) resolution Magic card art and scaled them down to 64x64 images, and used that for the training data. Likewise, the images that the generator learns to create come out the same resolution. When you take that into consideration, the resulting images often look convincingly like the kind of images we trained the system on.
I can train on higher resolution images, but that makes the training process more demanding for the generator. Note that it never gets to see what real art looks like, it only knows how good of a fake it made when it gets the results back from the discriminator***. Even for very small images, lacking in detail, it takes a very long time (on the order of days) for the generator to hone in on what Magic art ought to look like. It'll take even longer to get a feel for fine details, textures, etc.
Also, I will note that not all of the art comes out looking good upon close inspection. In the first example I've attached, I'm sure there's something exciting and interesting going on in the image. There's definitely a person in the figure. Unfortunately, the network chose such bad lighting and a bad angle that it's almost impossible to make sense of it. In the second image, you have what looks like an abundance of competing elements in the image all thrown together. Mind you, with some transformations of the vectors, I think you could solve the lighting/angle issues and force the compositions to coalesce into something more reasonable. However, the space of all possible images that the generator can produce is very vast, and large regions of that space are uninteresting to us.
My hope is that we can engineer a mapping from cards to artwork that keeps us within the desirable subregions of the image space. I have a plan that I think might work.
Looking at the card thumbnails though, they look so colour-appropriate. They look like thumbnails of real cards, honestly. That artwork generator is nuts. Actually, would you be able to provide the artwork generator network checkpoints for us to try using as well?
Yeah! I can do that later.
Also, I attached a few more example image/card combos that I made the other day.
***EDIT: If you're wondering why we never show the artwork to the generator, remember that neural networks are laziness incarnate. It'd just memorize the artwork in whole or in part and give us knock-offs of pre-existing works, because that would be the least-effort way to fool the discriminator.
EDIT(1): I think the garbage results that show up are signs of underfitting, but I have a solution. We can filter the results using the discriminator to just keep the stable, coherent outputs. I'll have to try that.
Those images you've attached to the cards, they're a bit grainy. Is that a resolution issue or something else? And if it's a resolution issue, can that be resolved?
It's due to the resolution of the output. I took high(er) resolution Magic card art and scaled them down to 64x64 images, and used that for the training data. Likewise, the images that the generator learns to create come out the same resolution. When you take that into consideration, the resulting images often look convincingly like the kind of images we trained the system on.
Yeah, but if you're upscaling you should really do that by running it through waifu2x, to keep the neural net theme going.
The fact that it does a great job is more of a bonus than anything else.
I ran a couple of these through waifu2x - there seems to be very little difference in quality after 1 upscaling. Unsurprising, but it does mean that it might be best to do the art as images that are 128x128 or something so that we can get a little more detail. If the pictures are already looking like real magic card art at 64x64, then we're not going to get (much) better by just improving the AI.
I ran a couple of these through waifu2x - there seems to be very little difference in quality after 1 upscaling. Unsurprising, but it does mean that it might be best to do the art as images that are 128x128 or something so that we can get a little more detail. If the pictures are already looking like real magic card art at 64x64, then we're not going to get (much) better by just improving the AI.
Well, higher resolution training data would give us more to work with in terms of backgrounds, textures, et cetera.
Of course, we might also run into problem because we don't have a ton of training data. The LSUN training set that was used to train the system to make images of bedrooms consists 3,033,042 images. Our training set has some 14000-ish images. That's what had me confused at first, because the authors reported fantastic results "after just one epoch". They didn't say just how long that epoch was, haha. In any case, I'm actually rather surprised that the results are as good as they are, given our limitations.
In the next year or two, I'm sure we'll come up with a scalable system that can churn out incredibly detailed, high resolution images. But there's still the data problem when it comes to small sets of artwork. The solution for artwork generation will probably be a hybrid of image generation and style transfer. For example, the system might how to render photo-realistic city scenes, and could re-purpose that knowledge to draw scenes of Ravnica's bustling marketplaces.
Right-click on an image, and click "Search Google for image."
In the search box, add "mtg" after the thumbnail of the image, and search.
Click "Visually Similar Images"
Not a bad idea! Mind you, the network will try and throw you off by changing up the color palette, rotating the scene, and altering the lighting. Not sure how sensitive Google Images is to those kinds of variations. But it's definitely worth a shot. I might also write a script that compares novel art to the low-res versions of existing Magic art.
Also, I've come to realize that that isn't the only kind of overfitting that we need to watch out for. While it's hard (but not impossible) for the network to stumble upon an arrangement of shapes that looks almost like a real Magic card art and then refine it over the course of training, a greater threat comes from the generator stumbling upon images that trigger an extremely strong reaction from the discriminator and then repeating those ad nauseam.
For instance, when I trained on just the elf artworks, the generator started to overfit and produce themes and variations on a single image (see attached), one that conveyed the essence of "elf-ness", and that was extremely effective at fooling the discriminator. Now, the discriminator responded by learning to reject that specific image, but the generator just had to find another image that had a similar effect on her counterpart. In this way, the two ended oscillating between a small number of "perfect" elves.
Now, that's made more difficult by having a very large and diverse data set, but I've noticed that some 2%-3% of the artworks chosen at random are themes and variations on a wall of glowing faces (see attached). I think it rests at the intersection of "spell-like" and "creature-like" art. So that is something to watch out for, but with this latest run it hasn't been much of an issue.
------
I've almost finished integrating the discriminator into the art-making code. Still a few steps more. If we have that in place, I think I can set it up so you pass a confidence value and a number of artworks, and then it runs until it collects as many artworks as you want that are as convincing as needed.
It's not a lot to do, but most of my waking hours are spent working on my dissertation, including writing thousands of lines of code. I'm mostly tinkering with the Magic stuff in the evening when I'm tired, but sheer motivation keeps me going, haha.
So yeah, once that's done I'll package that up and try and make it available to y'all.
@nyrt: As promised, I'm looking into your bug problem. I'll let you know what I come up with.
EDIT: Working on that generation code. By the way, I was doing some tests with the discriminator, and I saw a read-out that indicated that the discriminator believed a fake artwork was real with confidence 99.98% (very rare to get that high of a score), so I investigated and encountered what I call the Noseman.
In the image. a purple figure appears from the shadows. A long, snout-like nose protrudes from his face. He wears a tassled hat with a silver crest pinned to the front. His arm is held close to his chest, and he points an outstretched finger towards the heavens in a style that is strongly reminiscent of Da Vinci's John the Baptist. I've attached an image of the Noseman (center), surrounded by perturbations that alter the lighting, angle, and texture. In one case his hat fell off, revealing a full head of hair. In that version, he's also albino and bathed in an evil, red glow. Personally, the middle top one is my favorite. The added lighting gives us a hint of the brown robes that he's wearing.
If anyone has ever seen a Magic art like this one, let me know. As far as I can tell, it's totally original, which impresses me.
I'm also noticing that bad images can become good images with just a little prodding. In the other image I've attached, the leftmost figure is an image that got a bad score by the discriminator. The middle one is with alterations that raised the score, and the rightmost has alterations that raised the score even higher. I re-rolled just 20% of the original vector's entries in each case.
Maybe it's me, but in the Noseman picture I personally see an Elesh Norn, Grand Cenobite like figure depicted in the white. Other card arts include Due Respect, and Rout.
Maybe it's me, but in the Noseman picture I personally see an Elesh Norn, Grand Cenobite like figure depicted in the white. Other card arts include Due Respect, and Rout.
But that's probably just me.
Ah, good eye. Perhaps. Hard to say though.
---
I've gotten closer. Looks like I may need some kind of evolutionary algorithm if I want to speed things up though. I say that because most images don't turn out quite right but are in the neighborhood of good images, and we should be able to converge on the right images after just a few iterations. I've noticed that while having a high quality image is ideal, you can get a good response from the discriminator so long as you have a clear subject and focus. Both of the images I've attached get very good marks, with the latter being slightly higher. I'd prefer to get more images like the latter rather than the former, but there's no sense in throwing out the not-quite-as-good version if it can be reforged into a higher quality image.
EDIT: Implemented a prototype solution using evolutionary algorithm while I ate breakfast. Preliminary results showed promise, but I'll need to tinker with it more this evening.
[
I've gotten closer. Looks like I may need some kind of evolutionary algorithm if I want to speed things up though. I say that because most images don't turn out quite right but are in the neighborhood of good images, and we should be able to converge on the right images after just a few iterations. I've noticed that while having a high quality image is ideal, you can get a good response from the discriminator so long as you have a clear subject and focus. Both of the images I've attached get very good marks, with the latter being slightly higher. I'd prefer to get more images like the latter rather than the former, but there's no sense in throwing out the not-quite-as-good version if it can be reforged into a higher quality image.
Just a hunch, but I'd be interested in seeing what you get if you tried averaging five or six small perturbations output images together. If I had photoshop I'd try it out overlaying each noseguy with 1/9th opacity, but all I have is mspaint on this laptop.
This isn't really helpful, but the Norseman reminds me of something out of Jim Hensen's creature shop. In particular, Farscape's Pilot. That's pretty exciting to me, because it mirrors human creativity, while still being quite novel for Magic art!
On another note, just how many values are in the vectors that describe these images?
EDIT: I'm also curious what happens if you look at how the evolutionary algorithm modifies the vectors, and instead go in the opposite direction. Does it become a mess, or create a viable alternative?
Just a hunch, but I'd be interested in seeing what you get if you tried averaging five or six small perturbations output images together. If I had photoshop I'd try it out overlaying each noseguy with 1/9th opacity, but all I have is mspaint on this laptop.
I can try that later for you. It'd be interesting to compare the average image against the image you would get if you averaged the vectors, as those are not guaranteed to be the same.
This isn't really helpful, but the Norseman reminds me of something out of Jim Hensen's creature shop. In particular, Farscape's Pilot. That's pretty exciting to me, because it mirrors human creativity, while still being quite novel for Magic art!
Haha, you're right, I see what you mean. And yes, I'm absolutely amazed that any of the images turn out as good as that one. The majority of them are so-so, but every now and then it just goes all out and produces a masterpiece. We see the same thing happen with the card generator as well, so I'm not surprised.
I'd imagine we'd get more high quality images on average if we had a larger, more robust dataset. The generator never sees the artwork, but the discriminator does, and in the end, the generator is only as smart as the discriminator. If the discriminator can barely tell the difference between the two versions of the fire elemental that I posted, that means that the generator has little motivation to try harder.
That being said, I'm considering re-doing the training soon and for a longer period of time because we might still be seeing underfitting. At the same time, I'm worried about the discriminator hitting a wall and the generator stumbling upon "perfect" images, causing the whole process to collapse. I'd hate to leave it running for three days and have nothing to show for it. So we'll see.
On another note, just how many values are in the vectors that describe these images?
With the latest network I trained, 200. You can have as few or as many as you'd like, of course. I just picked what I thought was a decently-sized number.
EDIT: I'm also curious what happens if you look at how the evolutionary algorithm modifies the vectors, and instead go in the opposite direction. Does it become a mess, or create a viable alternative?
The first time I tested it this morning, I forgot to flip a sign everything ended up evolving in the wrong direction. Yes, the images became progressively crappier and less coherent.
Heh, I came here exactly to tell you about this, but it seems you got there first By the way, dunno if you read their previous developments, the network they're using is based on a Tensorflow char-rnn implementation you can find here.
This research got me thinking about turning pixel sprites/textures into vector drawings. There are a bunch of vectorizing technologies around, but none of them are any good if the raster input is too small - few samples to reliably create smooth lines. From a corpus of vector drawings you could generate raster images at several different scales (and rotations, deformations, etc) to train the network.
I think it's supposed to be a Rhox-like creature. You can see the eyes, and what look to be horns set atop a big head, but the body is a featureless gray mass. There are several reasons why this might be happening. For one, I lumped all the images into a single category of object, so this may be a case of the network generalizing over many diverse creatures and settling on an average texture. That and the network that I trained with wasn't very big, so it might not have the capacity to learn different textures for different subjects.
I'm also on the lookout for instances of overfitting. For example, does anyone recognize the other art I've attached? Looks like a man taking a discard/mill spell to the face. While the generator has never actually seen real Magic art, it may have stumbled upon a blob that looked like a silhouette and reshaped it according to the responses of the discriminator.
---
While we're on the subject of convolutional neural networks, I took the opportunity to try out BlindTool Beta (after I saw this video that was posted to the /r/machinelearning subreddit). It's a free Android app that uses convnets to identify objects that are in the view of the phone's camera, and tells you what it thinks it sees. It's not the brightest network out there, but it can distinguish between a thousand different commonly encountered classes of objects. Before I left for work this morning, I went around my apartment testing it out, and it performed very well.
Of course, it's easy to fool it when it has to reason about things it has never seen before. I took the card Mesa Falcon and set it on the table. It told me it was a book cover. But when I picked up the card and tapped on the body text with my thumb, it told me it was an Ipod or a personal hand-held computer. Don't get me wrong, it's clever that it can take advantage of contextual clues like that; it's just fun to push it to its limits.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Would it be possible to have the subregion grid jitter randomly by a few pixels during each feedback loop, so it doesn't end up reinforcing the horizontal and vertical lines?
I see what you're getting at. I'll have to look into the matter further.
By the way, I noticed that not all of the images from this last run come out as blobs. As I generated new ones, I found that they occasionally coalesce into something more reasonable. Sometimes (almost) card-worthy. For example, if Messeel Disciple was printed out and viewed at a great distance, you probably wouldn't even question it. Obviously, upon closer inspection you see that she's missing a face and the resolution is off, but the scene composition of "woman in a red dress and hood gazing upon a magic, floating crystal" is on point.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I did some tweaking and let the training run for a day and a half. I managed to get past the "everything is a blob" stage and the results are more often coherent than not. Of course, for as long as I ran it, there's some possibility of overfitting, so I've attached some examples for y'all to eyeball. Are any of these works obviously derivative? It's hard to do direct comparisons to existing art because if it did copy anything, it may have done so in a subtle way or mixed it in with novel features. That being said, I'm fairly confident that the majority of artworks produced by the system are original, it's just that some of them (like the ones I'm showing you) seem too good to be true.
That being said, many of the artworks are at least somewhat noisy. For example, I've attached one where it's clear we have a monster of some kind charging out of a portal, but it's very hard to make out exactly what's going on. Other times, the artwork is ambiguous or just strange. I've attached an example of what appears to be a man wearing a very big hat of some kind.
Right now I'm training on low resolution images because I can do the training faster and because it's easier for an adversarial approach to converge on good solutions when it doesn't have to contend with lots of fine details. I'm sure we'll eventually be able to produce full-fledged, high resolution artworks that look and feel like Magic art; these are just the first baby steps towards novel art generation.
EDIT: I'm also going to look into messing with vector arithmetic with the art generation vectors. With any luck, I should be able to rotate the "nun" to see her face. That is, if she even has a face. More on that later.
EDIT(2): The more interesting thing I'm noticing is that the lighting of scenes is believable. That suggests it has, from learning how to make believable images, learned a primitive model of lighting and shadow. That would make sense, in the same way that studying faces leads to a 3D geometry of faces that can be used to do rotations. I have some fun experiments to try soon...
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Not a bad suggestion, it just might be.
------
So, as I mentioned last night, experiments! I just couldn't wait to try out some of the stuff the researchers at Facebook did in the paper.
First, linear interpolation between two images. At the start, we have a sleeping child, and we want to end up with an image of a knight in profile. So I calculate
v(t) = v(child)*t + v(knight)*(1-t) where t=1.0 to 0.0
I've attached a copy of the results. At the halfway point, the child's face melts into a dark frame of some kind. Atop this frame a body emerges, and the frame beneath suggests a quadruped. It appears to have some kind of some kind of stinger. As the the transformation continues, the body shifts into a more human form. The extra appendages fall away and become background details like rocks. Interestingly, the "stinger" turns into a golden lyre. If you look closely at the final result, you can see horizontal striations suggesting strings.
Many of the results produced by the network are downright bizarre, like the four-legged scorpion man, but it turns out that these strange creatures are not far from aethestically pleasing results. Moreover, all of the tiny details in the image are deliberate; every element of every image is purposefully chosen to maximize the chances of the output to pass for real Magic art. The knight is more likely to have a lyre than a scorpion's stinger, and more likely to wear armor than a carapace. The mess of pixels that we get out of this process belie a very complex, learned system of associations.
Next, I did a bit of reverse engineering of the vectors. We have a vector of size 200 that we put in, and we get a knight out. The vector encodes all the relevant information about the image, both the content (the person, the armor, the golden lyre), and the presentation of that content (the angle, the lighting, the scale, etc.). However, like card vectors and word vectors, we have no idea what elements control what image features.
I selected elements of the input vector and interpolated them to see what effect that would have on the output image. I managed to...
* Melted him into a blob.
* Set him on fire.
* Changed the sky in the background.
* Turned the lyre into a distant island.
* Turned his head into a rectangular prism.
But that took a long time and it's difficult to figure out which combinations of transformations are content-preserving and which are content-altering. So instead I randomly interpolated elements in an attempt to narrow in on the ones that control the position of the viewer. I've attached some results that I got. Note that because I selected several vector elements at random when I make my changes, I ended up destroying/changing some of the content, but you get the general impression of what I'm aiming for. In the top left of the image you'll see the original image. The top right image is a fuzzy zoom-in of the knight's head, and the others are a fuzzy zoom-out accompanied by a fuzzy rotation. With some additional sorcery, we should be able to make the operations less destructive and preserve the original content of the image.
Effectively, the 2D output image we get is actually a projection from a higher-dimensional space. That is, it's actually calculating lighting, occlusion, perspective, etc. with respect to a fuzzy-three-dimensional representation. Now that is cool, especially because all of this was learned without the generator ever having seen the actual artwork. It's all inferred through trial and error. That's why it takes days to do the damn training, but the pay-off is worth it.
I'm very excited to see where these sorts of algorithms lead us in the years to come. They hold a great deal of promise.
Anyway, the next step is to find a way to get from card vectors to image vectors. Possibly by way of illustration-content vectors. I'll have to figure that one out. If we can do that, then we can automatically generate appropriate art for the cards we make.
EDIT: I just had to let it come up with a card with the name "Hat" that would go along with the man wearing the hat.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
On an unrelated note, I've been training the card generation network on my own computer, but I've been having some problems. Training the default size network on GPU works fine, but whenever I try to increase the size, I pretty quickly get the error:
cuda runtime error (77) : an illegal memory access was encountered at /tmp/luarocks_cutorch-scm-1-6753/cutorch/lib/THC/generic/THCStorage.c:147
Or error 74 (at the same line). Any ideas what might cause this? I'm getting the error with vanilla char-rnn as well, which is weird, because that worked fine before. Especially frustrating because I've also been trying to train it on a database of every SCP object that I compiled recently, but I have to train on CPU if I want any size larger than 128x2. Got some fun results from training a 300x2 network on CPU though!
Somewhat more unrelated to the topic, but I'm considering trying to run the generative-adversarial network on a corpus of midi files (translated into a piano roll format, for readability) or audio files (again, translated into a spectrogram), but probably not knowledgeable enough to make the necessary modifications.
Haha, I'm happy you find all this so interesting.
Hrm, interesting. My first thought was to check the github page, but I see that you already posted there and haven't gotten a reply in several days.
You might have already done this, but have you tried using luarocks to update the nn, cutorch, and cunn packages? I just want to make sure it's not some kind of version mismatch. They frequently update and tweak the libraries, and sometimes issues like this can creep in.
If that doesn't help, I'd be happy to sit down and help you dig through everything, to narrow in on the exact problem.
Interesting idea! My first suggestion would have been to use a recurrent network, because the music files could be of variable length. I've seen several people approach the problem that way. But the results I've seen thus far have been lackluster, at least with regards to raw audio. Midi files are more tolerable, of course; if you're sticking with that, you should have better luck. I know that one person used the char-rnn package to write out text that was convertible to midi files. Mind you, I've heard rumors that several research teams are actively working on the problem of audio generation, and it's likely that we'll see some fresh papers on the subject come out this year.
On the subject of convnets applied to music, there's been some progress in using them to do music analysis. Back in July, Kereliuk et al. put out a paper entitled "Deep Learning and Music Adversaries", used an adversarial approach to develop a robust music classifier. They used convnets on fixed duration "music frames", represented as sonograms (see attached image). I'm sure there are some other papers out there at this intersection of ideas. After all, the use of AI to study music has a storied history stretching back to the late 1980s with the work of people like David Cope who, I've learned, hosts a yearly workshop on algorithmic computer music.
EDIT: Here's a fun link. You might find useful information there.
EDIT(2): Oh, and here's an LSTM network with a webpage where it just endlessly crafts traditional Irish folk tunes.
EDIT(3): Sorry for the quick edits in rapid succession, but I just realized something interesting. In the final image of the rotation series that I posted yesterday (where we are looking at the knight from behind), there was an unusual streak of gold that I assumed was noise. It is in fact the frame of the golden lyre. We're just seeing it at an odd angle.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yeah, I tried updating those, (you can update them with luarocks install, right?) to no avail. I do get a lot of "Removing Non-existent dependency file" warnings when installing cunn and cutorch, though it does say it installed successfully. Would that have anything to do with it? It seems suspect, since one of the "non-existent dependency files" is THCStorage.c.
[edit] did a clean install of CUDA and all the dependencies. Didn't help. Any help at all would be awesome.
Hmm.. I'll need to do some research. Perhaps you could PM me your specs and the environment that you're running everything in? That would be a good start for me. I can try and find some time to dig around for answers.
Yes, it should be possible. I almost have it. The images that I showed you came from a series of random interpolations on pieces of the input vector. I need to go back and see if I can distill a rotation vector that applies to images in general. The authors of the DCGAN paper were able to do it with the network trained on faces (see attached image).
I should be able to produce a turn operation by randomly interpolating images, collecting rotated versions, and averaging together the transformations. Something like that. It'll take some time though.
The question is whether rotations actually apply to all "objects" or not. It's possible that foreground objects like people and monsters have strongly defined geometries and that background objects have weakly defined geometries. That is, an island in the background might just be a static, painted backdrop. Lots of fun to be had there.
And an even more interesting question is whether we can recover the original 3D representation. You know how you can take multiple 2D images and do a 3D reconstruction from that? Imagine doing that with the art this thing generates. That should be possible if the learned model is well-behaved enough. I'm also looking into that possibility.
EDIT: Oh, I almost forgot. Here's a few mock-ups I made combining generated art with generated cards taken from card dumps. Hopefully I'll have a system for matching cards to artwork before too long. The groundwork has already been laid for that, but there are a few additional steps needed. :-D
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Looking at the card thumbnails though, they look so colour-appropriate. They look like thumbnails of real cards, honestly. That artwork generator is nuts. Actually, would you be able to provide the artwork generator network checkpoints for us to try using as well?
It's due to the resolution of the output. I took high(er) resolution Magic card art and scaled them down to 64x64 images, and used that for the training data. Likewise, the images that the generator learns to create come out the same resolution. When you take that into consideration, the resulting images often look convincingly like the kind of images we trained the system on.
I can train on higher resolution images, but that makes the training process more demanding for the generator. Note that it never gets to see what real art looks like, it only knows how good of a fake it made when it gets the results back from the discriminator***. Even for very small images, lacking in detail, it takes a very long time (on the order of days) for the generator to hone in on what Magic art ought to look like. It'll take even longer to get a feel for fine details, textures, etc.
Also, I will note that not all of the art comes out looking good upon close inspection. In the first example I've attached, I'm sure there's something exciting and interesting going on in the image. There's definitely a person in the figure. Unfortunately, the network chose such bad lighting and a bad angle that it's almost impossible to make sense of it. In the second image, you have what looks like an abundance of competing elements in the image all thrown together. Mind you, with some transformations of the vectors, I think you could solve the lighting/angle issues and force the compositions to coalesce into something more reasonable. However, the space of all possible images that the generator can produce is very vast, and large regions of that space are uninteresting to us.
My hope is that we can engineer a mapping from cards to artwork that keeps us within the desirable subregions of the image space. I have a plan that I think might work.
Yeah! I can do that later.
Also, I attached a few more example image/card combos that I made the other day.
***EDIT: If you're wondering why we never show the artwork to the generator, remember that neural networks are laziness incarnate. It'd just memorize the artwork in whole or in part and give us knock-offs of pre-existing works, because that would be the least-effort way to fool the discriminator.
EDIT(1): I think the garbage results that show up are signs of underfitting, but I have a solution. We can filter the results using the discriminator to just keep the stable, coherent outputs. I'll have to try that.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yeah, but if you're upscaling you should really do that by running it through waifu2x, to keep the neural net theme going.
The fact that it does a great job is more of a bonus than anything else.
EDIT: I just discovered that Google can help with this! Here's a bunch of card images that Google thinks look like "badart_2."
Here's how, in Chrome:
Well, higher resolution training data would give us more to work with in terms of backgrounds, textures, et cetera.
Of course, we might also run into problem because we don't have a ton of training data. The LSUN training set that was used to train the system to make images of bedrooms consists 3,033,042 images. Our training set has some 14000-ish images. That's what had me confused at first, because the authors reported fantastic results "after just one epoch". They didn't say just how long that epoch was, haha. In any case, I'm actually rather surprised that the results are as good as they are, given our limitations.
In the next year or two, I'm sure we'll come up with a scalable system that can churn out incredibly detailed, high resolution images. But there's still the data problem when it comes to small sets of artwork. The solution for artwork generation will probably be a hybrid of image generation and style transfer. For example, the system might how to render photo-realistic city scenes, and could re-purpose that knowledge to draw scenes of Ravnica's bustling marketplaces.
Not a bad idea! Mind you, the network will try and throw you off by changing up the color palette, rotating the scene, and altering the lighting. Not sure how sensitive Google Images is to those kinds of variations. But it's definitely worth a shot. I might also write a script that compares novel art to the low-res versions of existing Magic art.
Also, I've come to realize that that isn't the only kind of overfitting that we need to watch out for. While it's hard (but not impossible) for the network to stumble upon an arrangement of shapes that looks almost like a real Magic card art and then refine it over the course of training, a greater threat comes from the generator stumbling upon images that trigger an extremely strong reaction from the discriminator and then repeating those ad nauseam.
For instance, when I trained on just the elf artworks, the generator started to overfit and produce themes and variations on a single image (see attached), one that conveyed the essence of "elf-ness", and that was extremely effective at fooling the discriminator. Now, the discriminator responded by learning to reject that specific image, but the generator just had to find another image that had a similar effect on her counterpart. In this way, the two ended oscillating between a small number of "perfect" elves.
Now, that's made more difficult by having a very large and diverse data set, but I've noticed that some 2%-3% of the artworks chosen at random are themes and variations on a wall of glowing faces (see attached). I think it rests at the intersection of "spell-like" and "creature-like" art. So that is something to watch out for, but with this latest run it hasn't been much of an issue.
------
I've almost finished integrating the discriminator into the art-making code. Still a few steps more. If we have that in place, I think I can set it up so you pass a confidence value and a number of artworks, and then it runs until it collects as many artworks as you want that are as convincing as needed.
It's not a lot to do, but most of my waking hours are spent working on my dissertation, including writing thousands of lines of code. I'm mostly tinkering with the Magic stuff in the evening when I'm tired, but sheer motivation keeps me going, haha.
So yeah, once that's done I'll package that up and try and make it available to y'all.
@nyrt: As promised, I'm looking into your bug problem. I'll let you know what I come up with.
EDIT: Working on that generation code. By the way, I was doing some tests with the discriminator, and I saw a read-out that indicated that the discriminator believed a fake artwork was real with confidence 99.98% (very rare to get that high of a score), so I investigated and encountered what I call the Noseman.
In the image. a purple figure appears from the shadows. A long, snout-like nose protrudes from his face. He wears a tassled hat with a silver crest pinned to the front. His arm is held close to his chest, and he points an outstretched finger towards the heavens in a style that is strongly reminiscent of Da Vinci's John the Baptist. I've attached an image of the Noseman (center), surrounded by perturbations that alter the lighting, angle, and texture. In one case his hat fell off, revealing a full head of hair. In that version, he's also albino and bathed in an evil, red glow. Personally, the middle top one is my favorite. The added lighting gives us a hint of the brown robes that he's wearing.
If anyone has ever seen a Magic art like this one, let me know. As far as I can tell, it's totally original, which impresses me.
I'm also noticing that bad images can become good images with just a little prodding. In the other image I've attached, the leftmost figure is an image that got a bad score by the discriminator. The middle one is with alterations that raised the score, and the rightmost has alterations that raised the score even higher. I re-rolled just 20% of the original vector's entries in each case.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
But that's probably just me.
Ah, good eye. Perhaps. Hard to say though.
---
I've gotten closer. Looks like I may need some kind of evolutionary algorithm if I want to speed things up though. I say that because most images don't turn out quite right but are in the neighborhood of good images, and we should be able to converge on the right images after just a few iterations. I've noticed that while having a high quality image is ideal, you can get a good response from the discriminator so long as you have a clear subject and focus. Both of the images I've attached get very good marks, with the latter being slightly higher. I'd prefer to get more images like the latter rather than the former, but there's no sense in throwing out the not-quite-as-good version if it can be reforged into a higher quality image.
EDIT: Implemented a prototype solution using evolutionary algorithm while I ate breakfast. Preliminary results showed promise, but I'll need to tinker with it more this evening.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Just a hunch, but I'd be interested in seeing what you get if you tried averaging five or six small perturbations output images together. If I had photoshop I'd try it out overlaying each noseguy with 1/9th opacity, but all I have is mspaint on this laptop.
On another note, just how many values are in the vectors that describe these images?
EDIT: I'm also curious what happens if you look at how the evolutionary algorithm modifies the vectors, and instead go in the opposite direction. Does it become a mess, or create a viable alternative?
I can try that later for you. It'd be interesting to compare the average image against the image you would get if you averaged the vectors, as those are not guaranteed to be the same.
Haha, you're right, I see what you mean. And yes, I'm absolutely amazed that any of the images turn out as good as that one. The majority of them are so-so, but every now and then it just goes all out and produces a masterpiece. We see the same thing happen with the card generator as well, so I'm not surprised.
I'd imagine we'd get more high quality images on average if we had a larger, more robust dataset. The generator never sees the artwork, but the discriminator does, and in the end, the generator is only as smart as the discriminator. If the discriminator can barely tell the difference between the two versions of the fire elemental that I posted, that means that the generator has little motivation to try harder.
That being said, I'm considering re-doing the training soon and for a longer period of time because we might still be seeing underfitting. At the same time, I'm worried about the discriminator hitting a wall and the generator stumbling upon "perfect" images, causing the whole process to collapse. I'd hate to leave it running for three days and have nothing to show for it. So we'll see.
With the latest network I trained, 200. You can have as few or as many as you'd like, of course. I just picked what I thought was a decently-sized number.
The first time I tested it this morning, I forgot to flip a sign everything ended up evolving in the wrong direction. Yes, the images became progressively crappier and less coherent.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.