Ugh. I cannot convince myself that the code I swiped this from is implementing gradient descent correctly. Orthogonally, I'm not totally sure that gradient descent is really the right approach for minimizing the squared error here. The thing is, if the input to a sigmoid function is very, very wrong, then even though the output contributes heavily to the error, it takes a large change in the input to reduce that contribution by a tiny amount. So, if the initial weights put an output neuron in the wrong state for one input, then gradient descent doesn't really nudge the inputs to that neuron much, unless you kick the weights way up... in which case it always explodes anyway.
It's possible that, by actually doing the work on my "average the input and the output" idea (strictly speaking, it actually adds the delta, times a scaling factor like 1/2, to the activations), I could figure out what other tweaks that needs to make things stable.
I might also look into using a method besides gradient descent. Got to use my degree somehow. Or... half my degree. Whatever.
ETA: Okay, I don't feel like using paper, and this is too complicated to rubber-duck verbally. So that means you all get a math lesson! Inside a spoiler.
The end goal of neural network training is to get the output "close to" a target. Here, we express this in terms of the squared error between target and output: E = 1/2 (y - t)^2. For things more complicated than a single output neuron, we use the dot, or inner, product of the difference between output (y) and target(t).
Hypothetically, we could use some other scoring system for optimization, but I'm not going to consider that here.
I'm going to focus on my autoencoder setup, which means that each output neuron gets the activation of every hidden neuron, and every hidden neuron gets the activation of every input neuron, plus a bias neuron with constant activation.
I'm further going to keep up bpnn.py's use of tanh, rather than a logistic function. The comments are largely unhelpful, but I believe one motivation is that tanh(0) is zero, rather than one half.
The output of a particular neuron o_j is sigmoid(the weighted sum of its inputs). The weighted sum is just the inner product of the inputs with the corresponding subsection of the matrix. (So sue me, I never got the row-column conventions consistent...)
The derivative of tanh(x) = y is 1 - y^2.
Consider the partial derivative of the error with respect to a particular weight. I am willing to agree with Wikipedia that it's the product of the following partial derivatives: The error with respect to that output neuron, that output neuron with respect to the weighted sum of the inputs, and the weighted sum of the inputs with respect to the weight.
The last partial is equal to the activation of the neuron that feeds that weight. Okay, strike one against my code, perhaps. I don't see the dependency.
The middle partial is simple: 1 - y^2.
For the first partial, we have to get the derivative of the square magnitude of a vector. But every axis of a vector contributes independently, so it's just the derivative of 1/2 (y - t)^2, or y - t.
So, piecewise, we multiply (y - t), (1 - y^2), and the output for a neuron the next layer up. This gives us the derivative with respect to every weight as a matrix, the same shape as the weight matrix we used.
Oh, there's the dependency. It comes up when we actually update the weights.
Anyway, it's supposed to be possible to propagate to the next layer using only the part of the partial that's dependent on the output. I'm not going to go into detail, because I'm starting to get it.
Anyway, now that we have a bunch of partials, we can do gradient descent. By going against the gradient, we can find a local minimum.
The question, then, is whether we want to use the method of gradient descent, given that we can calculate all the partials.
One possibility is using Newton's method. This requires calculating the Hessian matrix of the error function, which gets confusing because it's like a double matrix, which I know is defensible, but I haven't really worked with them directly.
TL;DR The code is a correct implementation of the algorithm, but I don't like the convergence characteristics.
Hey guys. I'm very busy with the school year starting up. I can not figure out why it keeps timing out after a few hours. I am totally happy if someone forks it and continues developing it, or if they figure out a fix I would love a pull request. I just don't have the time or mental energy to do much programming until winter break. I am more than happy to host it!
Private Mod Note
():
Rollback Post to RevisionRollBack
Proud to be saving the world since 1984 -- I also have an open source website to make AI generated magic cards. Source code
I tried installing the neural-style code for cpu, but lua is giving me an error: in the file models/VGG_ILSVRC_19_layers_deploy.prototxt.cpu.lua:7 - which is "table.insert(model, {'pool1', nn.SpatialMaxPooling(2, 2, 2, 2, 0, 0):ceil()})" - the method 'ceil' is a nil value. Any idea why this might be?
I tried installing the neural-style code for cpu, but lua is giving me an error: in the file models/VGG_ILSVRC_19_layers_deploy.prototxt.cpu.lua:7 - which is "table.insert(model, {'pool1', nn.SpatialMaxPooling(2, 2, 2, 2, 0, 0):ceil()})" - the method 'ceil' is a nil value. Any idea why this might be?
I've been having problems with that code as well.
As for your specific problem, that surprises me, because it doesn't sound like a numeric issue like I've been having. Rather, it's saying that the SpatialMaxPooling class does not have a ceiling method. That is, of course, false, as from the source code in nn/SpatialMaxPooling.lua, we see...
function SpatialMaxPooling:ceil()
self.ceil_mode = true
return self
end
There. Clearly defined. Definitely not nil.
I'll look into your issue and see why it might be happening.
EDIT: One possibility! The ceiling mode support wasn't added until July 8th. If you installed the nn package before that and never updated, you might not have that functionality. Can you try doing "luarocks install nn" or something like that, and then re-run the code?
EDIT(2): As a teaser, I've attached a candidate artwork for Elseleth's set, in this case for a red-green delve creature (in the style of Leonid Afremov). Things are coming along nicely so far.
EDIT(3): In case you didn't know, the original artwork was for Boreal Druid. I restyled him, for now he cultivates a garden of fiery wrath!
That's a very fiery druid. Will you be generating all artwork for the new set with this method? If so, how do you choose the source picture and 'modifier artist' for each new card?
That's a very fiery druid. Will you be generating all artwork for the new set with this method? If so, how do you choose the source picture and 'modifier artist' for each new card?
The selection is manual at this point, done according to my needs. In this case I needed a red and green druid figure, so I took the art for Boreal Druid and combined it with "Candle Fire" by Leonid Afremov (see attached). The dancer in that painting is infused with fiery passion, and I wanted that emotion in the Magic art. So I presented the two to the network, and over the course of about 500 iterations (3 minutes), it repainted Boreal Druid to match Afremov's style.
As you can see, some manual labor was involved in this task, because I had to identify what I wanted to convey and find an artist whose style I could repurpose, so things move more slowly compared to a hypothetical, fully automated approach (which I think is totally possible, by mapping descriptive text to content vectors). Because of that, and because Elseleth is wanting to do a draft in the near future, I don't think we'll be doing art for the whole set. The rares and mythics, sure, and some commons and uncommons. I'm churning out the artwork in the evenings while he generates the flavor text.
By the way, I'll note that works by impressionists and the like tend to deliver mixed results. Their brushstrokes are so soft, so gentle, that the style transfer doesn't work well at low resolutions: everything gets washed out. I'm sure that once I get my new machine working, I'll be able to do some higher res versions, and we can see about how well the network can transfer "softness" when it has more pixels to work with. That and I really want a high resolution Starry Night Jace for my background, haha.
EDIT: I really think we can conquer the art generation process with techniques like this. That's important for Magic, because a huge part of the game is storytelling, and a lot of that is done through art. Eventually we'll be able to generate wholly new images (with borrowed style but novel content).
And I can only imagine what people will accomplish with well-funded development teams and limitless computational resources. My advisor and I were discussing the possibility of applying this sort of technique on different kinds of media. For example, take a short story, rewrite it in the style of War and Peace, and take Microsoft Sam's voice reading it and re-do that in the style of Morgan Freeman. And so on, and so on. The sky is the limit here.
Today in: maybe I should just switch to PyBrain or whatever: my diagnostics were outright wrong, and described no particular useful information. Now that they're fixed, the problems with convergence actually show up there, rather than just from carefully inspecting the output.
Annoyingly, convergence is possible. The only thing stopping me from reliably getting a good net out of this stuff is having initial values that don't work well with the training algorithm.
ETA: Ported the toy dataset to PyBrain. I need to figure out what knobs to turn.
@Talcos yeah I checked that ceil was defined in the nn source on GitHub, but my local copy isn't particularly recent. Not much older than my account here, which was opened in June. I'll give updating a go. Should work...
[Edit]Thanks, that fixed that problem.
New problem, regarding 'padding'.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
/home/ed/torch/install/bin/luajit: /home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: bad argument #1 (field padding does not exist)
stack traceback:
[C]: in function 'updateOutput'
/home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
neural_style.lua:103: in function 'main'
neural_style.lua:350: in main chunk
[C]: in function 'dofile'
...e/mf/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Sequential.lua never mentions padding, so I'm lost.
@Talcos yeah I checked that ceil was defined in the nn source on GitHub, but my local copy isn't particularly recent. Not much older than my account here, which was opened in June. I'll give updating a go. Should work...
[Edit]Thanks, that fixed that problem.
New problem, regarding 'padding'.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:505] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
/home/ed/torch/install/bin/luajit: /home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: bad argument #1 (field padding does not exist)
stack traceback:
[C]: in function 'updateOutput'
/home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
neural_style.lua:103: in function 'main'
neural_style.lua:350: in main chunk
[C]: in function 'dofile'
...e/mf/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Sequential.lua never mentions padding, so I'm lost.
Well, at least we resolved that problem. I'm looking into the second issue. I'll let you know if I find anything.
EDIT: Idea. Try "luarocks install cunn" and then run it again. I read some people having that sort of issue when both nn and cunn were out of date and they updated nn but didn't think to update cunn.
EDIT(2): Good news! Yesterday, I got my machine almost set up and ready to go. Still need to install Linux and the necessary libraries so I can move the image generation code over and see what I can get, but I'm excited to try. I really want to see a high(er) resolution version of Starry Night Jace.
EDIT(3): In the meantime, here are a few more artworks for the set:
* Barrenton Land, by Vincent Van Gogh (originally an Island artwork for Innistrad)
* Predatory Sanctuary, by an anonymous photographer of a building on fire (originally Godless Shrine).
* Reckless the Unspeakable, by Piet Mondrian (originally Spirit of the Labyrinth)
Once everything is cleaned up and ready to go, I'm sure we'll make the set available.
The fiery Godless Shrine is beautiful. I feel the Spirit one got a little bit lost in translation; you don't really see much of a body anymore. The landscape would be terrific with some more detail/less artifacts (so presumably higher resolution rendering). Your new machine; how's the installing going on that? And what's its GPU?
The fiery Godless Shrine is beautiful. I feel the Spirit one got a little bit lost in translation; you don't really see much of a body anymore. The landscape would be terrific with some more detail/less artifacts (so presumably higher resolution rendering). Your new machine; how's the installing going on that? And what's its GPU?
Yeah, I noticed that about the Piet Mondrian style works. As far as I can tell, the network understands minimalism just fine and tries to scrub out small details. And yeah, I definitely want a higer res version of the fire shrine. As for the machine, I hope to have things set up this evening. I've got two Nvidia Geforce cards installed (see attached), and I'll see about parallelizing the workload across both of them. :-D
As far as image manipulation via deep learning goes, you might want to take a look at waifu2x http://waifu2x.udp.jp/
It's specialized in resizing images losslessly, and it works very well - you can upscale pixel art, even, and it won't look terrible. Original
Yeah, I recall hearing about that before. Now, working at higher resolution means the network has more opportunities to incorporate details, but we'll want to use some good noise reduction techniques all the same. And yes, I'm liking what I see. Thanks for sharing that with us.
EDIT: Installing so many different dependencies at this point. But making progress!
EDIT(2): Honestly, I'm exhausted. I'm probably going to set the machine up to grab all the packages I need and then I'll go to bed. But don't worry! Everything *should* be up and running before too long.
So, that answers the 'how to make art' question. Find the closest two vectors to the produced card and select one for style and the other for content, and on average we should end up with a new card art that's appropriate...
Either that, or to give the machine-generated set its own aesthetic, seed the 'style' category with cool computery pictures, and make everything into machine-versions of existing art.
I attempted this, but am getting failures in the python library code it depends on to use CUDA. Apparently I didn't tell it to install cudnn correctly. Sigh.
I did, however, get caffe / deepdream working locally, so that's super cool. What I really want to do is train a new network on magic art so we can dream about goblins and dragons. First step there is massive webscraping.
EDIT: Guided Deepdream can do some really cool stuff. The problem with using existing models trained on standard datasets is that they really want to see things like people and cars. I think if we train our own network on just MTG art, it'll work a lot better, even if the network isn't very good as an actual classifier.
Here's an example of repeated dreaming. Base image from Barren Moor, guided by Honden of Night's Reach. And the output after 1, 6, and many iterations of the standard deepdream algorithm from github. You can see that it starts off trying to respond to the lines and shape from the Honden, but then it just gets lost and spits out a psychedelic menagerie. Some refinement is needed.
Tried again with Watchwolf as the guide image, seems to have worked out a lot better. The network has a pretty good idea of what bridges and dogs are, so there's more it can do to respond I think.
This stuff is suuuuuuuuuuuuuuuuuuuuuuuper trippy. Try the gif of it!
EDIT: Guided Deepdream can do some really cool stuff. The problem with using existing models trained on standard datasets is that they really want to see things like people and cars. I think if we train our own network on just MTG art, it'll work a lot better, even if the network isn't very good as an actual classifier.
True, that's that's one possibility, and one we should probably investigate. Now, since Magic art is highly detailed, conventional detectors can do a somewhat decent job, like seeing that the art for Watchwolf has a wolf and a bridge in it. Of course, it falls short when trying to make sense of something like a dragon, which it cannot recognize from real life. Now, there have been several papers on sketch recognition ( http://arxiv.org/pdf/1502.00254.pdf , http://arxiv.org/pdf/1501.07873v3.pdf) . It's possible to have a network take an image that it knows is not real but that represents something else that is real. In theory, we should be able to have an approach that can handle different kinds of media (photographs, drawings, etc.).
By the way, last time I was at a conference, I was speaking with a colleague of mine from the Netherlands, and he put forward some very interesting ideas. His projects involve preserving art and culture, and one of the problems he ran into was that he'd have a painting of a bird, and he'd like to get information about the painting so it can be properly understood and catalogued. But what kind of bird is it? That's not clear. His solution has been to mine the wisdom of the crowds; he's found ingenious ways to scrape websites like reddit to extract expert content and build useable knowledge. You might be able to do the same for dragons. Yes, there are no photographs of dragons to work with, and there are no tagged databases of dragons, but there are tons of drawings of dragons out there, and people talking about those dragons. There are ways for us to assemble such data efficiently and put it to use.
But yeah, that's not something we have the time or energy for at this point. So training on just art might provide a workable solution.
So, that answers the 'how to make art' question. Find the closest two vectors to the produced card and select one for style and the other for content, and on average we should end up with a new card art that's appropriate...
Either that, or to give the machine-generated set its own aesthetic, seed the 'style' category with cool computery pictures, and make everything into machine-versions of existing art.
Ultimately, I think the end goal (one we won't reach here, but one that will be achieved eventually), is that you'll be able to hand the machine a labeled style guide (say of Ravnica), much like we do with human Magic artists, and a series of instructions for what to draw.
We ask, in plain English, for an image of people buying things at a market. The network has seen lots of images of people, lots of images of markets, and lots of images of people walking around markets, and it's able to map that understanding to English text and vice versa (see attached). It takes the content description and conjures up a hazy content representation featuring a market, goods, people buying those goods, etc.
Then it has to take that content representation and flesh it out according to the style guide. It knows it wants to put a buildings in the background, but what do buildings look like in this world? It consults the style guide and sees that buildings are tall and angular, so that's what it draws (and so on). Now, the style guide isn't going to be comprehensive. For instance, the machine decided it wanted a goat in the background but you never told it what Ravnican goats look like. Fortunately, the machine both knows what goats look like, and as we have already seen, it can generate goats matching the artistic style that you desire. In this way, the computer can fill in the little details that we would not have thought to mention.
From there, you have a few more layers of filtering, smoothing, lighting calculations, etc. In the end, we have a polished image that is fit for print. Now, this image may not have the same "inspired" quality of real human art, but this does make average-quality commercial art much cheaper. That's important for people like small-time game designers who can't afford legions of human artists. They'll be able to rapidly move from ideas to realities.
Now, for your well-established game companies, this sort of stuff opens up all kinds of new possibilities. Imagine being able to conjure content on the fly. Not just scripted and templated content and interactions, but real, organic experiences. I'm talking stats, gameplay, even artistic style and composition.
EDIT: Sorry on the delay with regards to getting the hardware set up, haha. I'll see about making that happen tonight.
EDIT(2): Speaking of the future to come, I saw a paper about generating interactive narratives by studying stories. Here's a link to an article with a video. Cute.
So, that answers the 'how to make art' question. Find the closest two vectors to the produced card and select one for style and the other for content, and on average we should end up with a new card art that's appropriate...
Either that, or to give the machine-generated set its own aesthetic, seed the 'style' category with cool computery pictures, and make everything into machine-versions of existing art.
Ultimately, I think the end goal (one we won't reach here, but one that will be achieved eventually), is that you'll be able to hand the machine a labeled style guide (say of Ravnica), much like we do with human Magic artists, and a series of instructions for what to draw.
We ask, in plain English, for an image of people buying things at a market. The network has seen lots of images of people, lots of images of markets, and lots of images of people walking around markets, and it's able to map that understanding to English text and vice versa (see attached). It takes the content description and conjures up a hazy content representation featuring a market, goods, people buying those goods, etc.
.
I think we could in principle get a fair approximation of that functionality for some cards if we took a lot of shortcuts.
Do some sort of procedural creature generation based on the card stats, subtypes, et cetera (may require a set of tags to train on). Use the information there (three legs, tail) to build a model. Decent-looking procedural model building was what made Toady One drop the idea of doing 3d entirely, so it's not exactly trivial, but we'll just take another shortcut and just build a creature skeleton for spore, load that, export an image in front of a blank background, have a NN deepdream a color-appropriate background, then stylize it make it look like the two pieces belong together.
It's not exactly a "practical" or "good" solution but it'd be the most fun I got out of Spore, so that's something.
So I got neural artistic style working. You have to install the dependencies that use CUDA in the right way, which is tricky, and even trickier if you want to use the NVIDIA cuDNN library with it, but I did eventually figure it out with some help from the author of the repository.
For a test run, I wanted to try something a little different. I want to develop my own custom lightweight web server at some point, and I'm calling the project BearMetal. I decided to make a logo for it. You can judge the results for yourself.
So I got neural artistic style working. You have to install the dependencies that use CUDA in the right way, which is tricky, and even trickier if you want to use the NVIDIA cuDNN library with it, but I did eventually figure it out with some help from the author of the repository.
For a test run, I wanted to try something a little different. I want to develop my own custom lightweight web server at some point, and I'm calling the project BearMetal. I decided to make a logo for it. You can judge the results for yourself.
Very nice, love it! BearMetal gets an emphatic yes from me, haha.
By the way, I'm still having issues with getting some packages installed for my home machine. On the bright side, the Tesla K40m for the Intel 20-core Phi machine is now installed! I am so excited because that $3500+ graphics card has been sitting on the shelf for ages while we waited to get the right interconnects in stock.
* 12 GB of memory, bandwidth of 288 GB/sec.
* 15 multiprocessors, each supporting up to 2048 threads.
* 1.43 Tflops peak double precision floating point performance at base clock speed, but evidently it can go as high as 1.66 Tflops.
That'll speed things up considerably.
EDIT: And by "things", I primarily mean my research work. But honestly, I could churn out tens of millions of high res image renders a day with that thing, haha.
EDIT(2): By the way, hardcast_sixdrop, what was tricky about using the cuDNN library?
So I got neural artistic style working. You have to install the dependencies that use CUDA in the right way, which is tricky, and even trickier if you want to use the NVIDIA cuDNN library with it, but I did eventually figure it out with some help from the author of the repository.
Any chance you can tell me how you did it? I got to the point where I had CUDA installed, had cuDNN downloaded and in the proper folder, and just couldn't make anything acknowledge CUDA existed.
I'm now trying this https://github.com/jcjohnson/neural-style in a ubuntu virtualbox and it's not going that well either. CUDA just doesn't seem to install, and if I try running it without CUDA (using the -gpu -1 flags) I end up just getting a "C++ Exception" and it fails.
So I got neural artistic style working. You have to install the dependencies that use CUDA in the right way, which is tricky, and even trickier if you want to use the NVIDIA cuDNN library with it, but I did eventually figure it out with some help from the author of the repository.
Any chance you can tell me how you did it? I got to the point where I had CUDA installed, had cuDNN downloaded and in the proper folder, and just couldn't make anything acknowledge CUDA existed.
I'm now trying this https://github.com/jcjohnson/neural-style in a ubuntu virtualbox and it's not going that well either. CUDA just doesn't seem to install, and if I try running it without CUDA (using the -gpu -1 flags) I end up just getting a "C++ Exception" and it fails.
My problem was that it wasn't recognizing the environment variable "CUDNN_ENABLED" in all the right places. You have to set that with "export CUDNN_ENABLE=1" before you build the dependencies (this is relevant for cudarray), and then you have to be careful when using sudo for anything that you use "sudo -E" instead so it gets the right environment variables. Once I figured that out, it was a matter of doing "make clean" and "sudo python setup.py clean" in cudarray to clear out the old build and then just rebuilding as if from scratch.
That said, I don't think CUDA will work on a virtual machine, you need to have real NVIDIA hardware (and recent for cuDNN, I think they only suport Kepler and later, so 700 series / Titan as the minimum). I was getting a different error at first when I tried a setup that didn't use CUDA, and I never managed to get that one resolved. It was some other, separate failure from cudarray.
EDIT: messing around with the style transfer net, I've gotten some really interesting effects by using different pictures of metal ingots as my style guide. Creating art has never been so fun!
You can use other things for styles besides just metal ingots. Mostly I've had good success with uniformish style photos that are dominated by some sort of pattern you could use to retexture another image. Trying it with two random photographs usually results in divergence (infinite loss and a black output image, or something like that).
Some style images work a lot better than others. I think fire and sunset turned out the best here.
It's possible that, by actually doing the work on my "average the input and the output" idea (strictly speaking, it actually adds the delta, times a scaling factor like 1/2, to the activations), I could figure out what other tweaks that needs to make things stable.
I might also look into using a method besides gradient descent. Got to use my degree somehow. Or... half my degree. Whatever.
ETA: Okay, I don't feel like using paper, and this is too complicated to rubber-duck verbally. So that means you all get a math lesson! Inside a spoiler.
Hypothetically, we could use some other scoring system for optimization, but I'm not going to consider that here.
I'm going to focus on my autoencoder setup, which means that each output neuron gets the activation of every hidden neuron, and every hidden neuron gets the activation of every input neuron, plus a bias neuron with constant activation.
I'm further going to keep up bpnn.py's use of tanh, rather than a logistic function. The comments are largely unhelpful, but I believe one motivation is that tanh(0) is zero, rather than one half.
The output of a particular neuron o_j is sigmoid(the weighted sum of its inputs). The weighted sum is just the inner product of the inputs with the corresponding subsection of the matrix. (So sue me, I never got the row-column conventions consistent...)
The derivative of tanh(x) = y is 1 - y^2.
Consider the partial derivative of the error with respect to a particular weight. I am willing to agree with Wikipedia that it's the product of the following partial derivatives: The error with respect to that output neuron, that output neuron with respect to the weighted sum of the inputs, and the weighted sum of the inputs with respect to the weight.
The last partial is equal to the activation of the neuron that feeds that weight. Okay, strike one against my code, perhaps. I don't see the dependency.
The middle partial is simple: 1 - y^2.
For the first partial, we have to get the derivative of the square magnitude of a vector. But every axis of a vector contributes independently, so it's just the derivative of 1/2 (y - t)^2, or y - t.
So, piecewise, we multiply (y - t), (1 - y^2), and the output for a neuron the next layer up. This gives us the derivative with respect to every weight as a matrix, the same shape as the weight matrix we used.
Oh, there's the dependency. It comes up when we actually update the weights.
Anyway, it's supposed to be possible to propagate to the next layer using only the part of the partial that's dependent on the output. I'm not going to go into detail, because I'm starting to get it.
Anyway, now that we have a bunch of partials, we can do gradient descent. By going against the gradient, we can find a local minimum.
The question, then, is whether we want to use the method of gradient descent, given that we can calculate all the partials.
One possibility is using Newton's method. This requires calculating the Hessian matrix of the error function, which gets confusing because it's like a double matrix, which I know is defensible, but I haven't really worked with them directly.
TL;DR The code is a correct implementation of the algorithm, but I don't like the convergence characteristics.
I've been having problems with that code as well.
As for your specific problem, that surprises me, because it doesn't sound like a numeric issue like I've been having. Rather, it's saying that the SpatialMaxPooling class does not have a ceiling method. That is, of course, false, as from the source code in nn/SpatialMaxPooling.lua, we see...
There. Clearly defined. Definitely not nil.
I'll look into your issue and see why it might be happening.
EDIT: One possibility! The ceiling mode support wasn't added until July 8th. If you installed the nn package before that and never updated, you might not have that functionality. Can you try doing "luarocks install nn" or something like that, and then re-run the code?
EDIT(2): As a teaser, I've attached a candidate artwork for Elseleth's set, in this case for a red-green delve creature (in the style of Leonid Afremov). Things are coming along nicely so far.
EDIT(3): In case you didn't know, the original artwork was for Boreal Druid. I restyled him, for now he cultivates a garden of fiery wrath!
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
The selection is manual at this point, done according to my needs. In this case I needed a red and green druid figure, so I took the art for Boreal Druid and combined it with "Candle Fire" by Leonid Afremov (see attached). The dancer in that painting is infused with fiery passion, and I wanted that emotion in the Magic art. So I presented the two to the network, and over the course of about 500 iterations (3 minutes), it repainted Boreal Druid to match Afremov's style.
As you can see, some manual labor was involved in this task, because I had to identify what I wanted to convey and find an artist whose style I could repurpose, so things move more slowly compared to a hypothetical, fully automated approach (which I think is totally possible, by mapping descriptive text to content vectors). Because of that, and because Elseleth is wanting to do a draft in the near future, I don't think we'll be doing art for the whole set. The rares and mythics, sure, and some commons and uncommons. I'm churning out the artwork in the evenings while he generates the flavor text.
By the way, I'll note that works by impressionists and the like tend to deliver mixed results. Their brushstrokes are so soft, so gentle, that the style transfer doesn't work well at low resolutions: everything gets washed out. I'm sure that once I get my new machine working, I'll be able to do some higher res versions, and we can see about how well the network can transfer "softness" when it has more pixels to work with. That and I really want a high resolution Starry Night Jace for my background, haha.
EDIT: I really think we can conquer the art generation process with techniques like this. That's important for Magic, because a huge part of the game is storytelling, and a lot of that is done through art. Eventually we'll be able to generate wholly new images (with borrowed style but novel content).
And I can only imagine what people will accomplish with well-funded development teams and limitless computational resources. My advisor and I were discussing the possibility of applying this sort of technique on different kinds of media. For example, take a short story, rewrite it in the style of War and Peace, and take Microsoft Sam's voice reading it and re-do that in the style of Morgan Freeman. And so on, and so on. The sky is the limit here.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Annoyingly, convergence is possible. The only thing stopping me from reliably getting a good net out of this stuff is having initial values that don't work well with the training algorithm.
ETA: Ported the toy dataset to PyBrain. I need to figure out what knobs to turn.
[Edit]Thanks, that fixed that problem.
New problem, regarding 'padding'.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 574671192
Successfully loaded models/VGG_ILSVRC_19_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv3_4: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv4_4: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
conv5_4: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
/home/ed/torch/install/bin/luajit: /home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: bad argument #1 (field padding does not exist)
stack traceback:
[C]: in function 'updateOutput'
/home/mf/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
neural_style.lua:103: in function 'main'
neural_style.lua:350: in main chunk
[C]: in function 'dofile'
...e/mf/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670
Well, at least we resolved that problem. I'm looking into the second issue. I'll let you know if I find anything.
EDIT: Idea. Try "luarocks install cunn" and then run it again. I read some people having that sort of issue when both nn and cunn were out of date and they updated nn but didn't think to update cunn.
EDIT(2): Good news! Yesterday, I got my machine almost set up and ready to go. Still need to install Linux and the necessary libraries so I can move the image generation code over and see what I can get, but I'm excited to try. I really want to see a high(er) resolution version of Starry Night Jace.
EDIT(3): In the meantime, here are a few more artworks for the set:
* Barrenton Land, by Vincent Van Gogh (originally an Island artwork for Innistrad)
* Predatory Sanctuary, by an anonymous photographer of a building on fire (originally Godless Shrine).
* Reckless the Unspeakable, by Piet Mondrian (originally Spirit of the Labyrinth)
Once everything is cleaned up and ready to go, I'm sure we'll make the set available.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Yeah, I noticed that about the Piet Mondrian style works. As far as I can tell, the network understands minimalism just fine and tries to scrub out small details. And yeah, I definitely want a higer res version of the fire shrine. As for the machine, I hope to have things set up this evening. I've got two Nvidia Geforce cards installed (see attached), and I'll see about parallelizing the workload across both of them. :-D
EDIT:
Don't despair, Mustard Fountain. We'll get everything working for you, I promise. I'll continue looking into the problem for you.
EDIT(2): I uploaded a before-and-after picture of Air Elemental that I posted to reddit.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
It's specialized in resizing images losslessly, and it works very well - you can upscale pixel art, even, and it won't look terrible.
Original
Upscaled 2x:
Yeah, I recall hearing about that before. Now, working at higher resolution means the network has more opportunities to incorporate details, but we'll want to use some good noise reduction techniques all the same. And yes, I'm liking what I see. Thanks for sharing that with us.
EDIT: Installing so many different dependencies at this point. But making progress!
EDIT(2): Honestly, I'm exhausted. I'm probably going to set the machine up to grab all the packages I need and then I'll go to bed. But don't worry! Everything *should* be up and running before too long.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Either that, or to give the machine-generated set its own aesthetic, seed the 'style' category with cool computery pictures, and make everything into machine-versions of existing art.
Anyway, currently setting up the python version (https://github.com/andersbll/neural_artistic_style) of the style network. Let's see how well this goes!
EDIT: Dangit, CUDA won't talk to anyone or anything else.
I attempted this, but am getting failures in the python library code it depends on to use CUDA. Apparently I didn't tell it to install cudnn correctly. Sigh.
I did, however, get caffe / deepdream working locally, so that's super cool. What I really want to do is train a new network on magic art so we can dream about goblins and dragons. First step there is massive webscraping.
EDIT: Guided Deepdream can do some really cool stuff. The problem with using existing models trained on standard datasets is that they really want to see things like people and cars. I think if we train our own network on just MTG art, it'll work a lot better, even if the network isn't very good as an actual classifier.
Here's an example of repeated dreaming. Base image from Barren Moor, guided by Honden of Night's Reach. And the output after 1, 6, and many iterations of the standard deepdream algorithm from github. You can see that it starts off trying to respond to the lines and shape from the Honden, but then it just gets lost and spits out a psychedelic menagerie. Some refinement is needed.
Tried again with Watchwolf as the guide image, seems to have worked out a lot better. The network has a pretty good idea of what bridges and dogs are, so there's more it can do to respond I think.
This stuff is suuuuuuuuuuuuuuuuuuuuuuuper trippy. Try the gif of it!
True, that's that's one possibility, and one we should probably investigate. Now, since Magic art is highly detailed, conventional detectors can do a somewhat decent job, like seeing that the art for Watchwolf has a wolf and a bridge in it. Of course, it falls short when trying to make sense of something like a dragon, which it cannot recognize from real life. Now, there have been several papers on sketch recognition ( http://arxiv.org/pdf/1502.00254.pdf , http://arxiv.org/pdf/1501.07873v3.pdf) . It's possible to have a network take an image that it knows is not real but that represents something else that is real. In theory, we should be able to have an approach that can handle different kinds of media (photographs, drawings, etc.).
By the way, last time I was at a conference, I was speaking with a colleague of mine from the Netherlands, and he put forward some very interesting ideas. His projects involve preserving art and culture, and one of the problems he ran into was that he'd have a painting of a bird, and he'd like to get information about the painting so it can be properly understood and catalogued. But what kind of bird is it? That's not clear. His solution has been to mine the wisdom of the crowds; he's found ingenious ways to scrape websites like reddit to extract expert content and build useable knowledge. You might be able to do the same for dragons. Yes, there are no photographs of dragons to work with, and there are no tagged databases of dragons, but there are tons of drawings of dragons out there, and people talking about those dragons. There are ways for us to assemble such data efficiently and put it to use.
But yeah, that's not something we have the time or energy for at this point. So training on just art might provide a workable solution.
Ultimately, I think the end goal (one we won't reach here, but one that will be achieved eventually), is that you'll be able to hand the machine a labeled style guide (say of Ravnica), much like we do with human Magic artists, and a series of instructions for what to draw.
We ask, in plain English, for an image of people buying things at a market. The network has seen lots of images of people, lots of images of markets, and lots of images of people walking around markets, and it's able to map that understanding to English text and vice versa (see attached). It takes the content description and conjures up a hazy content representation featuring a market, goods, people buying those goods, etc.
Then it has to take that content representation and flesh it out according to the style guide. It knows it wants to put a buildings in the background, but what do buildings look like in this world? It consults the style guide and sees that buildings are tall and angular, so that's what it draws (and so on). Now, the style guide isn't going to be comprehensive. For instance, the machine decided it wanted a goat in the background but you never told it what Ravnican goats look like. Fortunately, the machine both knows what goats look like, and as we have already seen, it can generate goats matching the artistic style that you desire. In this way, the computer can fill in the little details that we would not have thought to mention.
From there, you have a few more layers of filtering, smoothing, lighting calculations, etc. In the end, we have a polished image that is fit for print. Now, this image may not have the same "inspired" quality of real human art, but this does make average-quality commercial art much cheaper. That's important for people like small-time game designers who can't afford legions of human artists. They'll be able to rapidly move from ideas to realities.
Now, for your well-established game companies, this sort of stuff opens up all kinds of new possibilities. Imagine being able to conjure content on the fly. Not just scripted and templated content and interactions, but real, organic experiences. I'm talking stats, gameplay, even artistic style and composition.
EDIT: Sorry on the delay with regards to getting the hardware set up, haha. I'll see about making that happen tonight.
EDIT(2): Speaking of the future to come, I saw a paper about generating interactive narratives by studying stories. Here's a link to an article with a video. Cute.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
I think we could in principle get a fair approximation of that functionality for some cards if we took a lot of shortcuts.
Do some sort of procedural creature generation based on the card stats, subtypes, et cetera (may require a set of tags to train on). Use the information there (three legs, tail) to build a model. Decent-looking procedural model building was what made Toady One drop the idea of doing 3d entirely, so it's not exactly trivial, but we'll just take another shortcut and just build a creature skeleton for spore, load that, export an image in front of a blank background, have a NN deepdream a color-appropriate background, then stylize it make it look like the two pieces belong together.
It's not exactly a "practical" or "good" solution but it'd be the most fun I got out of Spore, so that's something.
For a test run, I wanted to try something a little different. I want to develop my own custom lightweight web server at some point, and I'm calling the project BearMetal. I decided to make a logo for it. You can judge the results for yourself.
Very nice, love it! BearMetal gets an emphatic yes from me, haha.
By the way, I'm still having issues with getting some packages installed for my home machine. On the bright side, the Tesla K40m for the Intel 20-core Phi machine is now installed! I am so excited because that $3500+ graphics card has been sitting on the shelf for ages while we waited to get the right interconnects in stock.
* 12 GB of memory, bandwidth of 288 GB/sec.
* 15 multiprocessors, each supporting up to 2048 threads.
* 1.43 Tflops peak double precision floating point performance at base clock speed, but evidently it can go as high as 1.66 Tflops.
That'll speed things up considerably.
EDIT: And by "things", I primarily mean my research work. But honestly, I could churn out tens of millions of high res image renders a day with that thing, haha.
EDIT(2): By the way, hardcast_sixdrop, what was tricky about using the cuDNN library?
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Any chance you can tell me how you did it? I got to the point where I had CUDA installed, had cuDNN downloaded and in the proper folder, and just couldn't make anything acknowledge CUDA existed.
I'm now trying this https://github.com/jcjohnson/neural-style in a ubuntu virtualbox and it's not going that well either. CUDA just doesn't seem to install, and if I try running it without CUDA (using the -gpu -1 flags) I end up just getting a "C++ Exception" and it fails.
My problem was that it wasn't recognizing the environment variable "CUDNN_ENABLED" in all the right places. You have to set that with "export CUDNN_ENABLE=1" before you build the dependencies (this is relevant for cudarray), and then you have to be careful when using sudo for anything that you use "sudo -E" instead so it gets the right environment variables. Once I figured that out, it was a matter of doing "make clean" and "sudo python setup.py clean" in cudarray to clear out the old build and then just rebuilding as if from scratch.
That said, I don't think CUDA will work on a virtual machine, you need to have real NVIDIA hardware (and recent for cuDNN, I think they only suport Kepler and later, so 700 series / Titan as the minimum). I was getting a different error at first when I tried a setup that didn't use CUDA, and I never managed to get that one resolved. It was some other, separate failure from cudarray.
EDIT: messing around with the style transfer net, I've gotten some really interesting effects by using different pictures of metal ingots as my style guide. Creating art has never been so fun!
Some style images work a lot better than others. I think fire and sunset turned out the best here.
EDIT: Yeah that's what I do.