Oops the RNN made yet another card with an unused X in the mana cost.
No, wait, that's a newly spoiled BFZ card!
Well... that actually works, right? Minimum 2/2 but can be a 5/5?
I think he means that it's the kind of card where the use of X is not explicit, which makes it a lot like some of the X cost cards we've been seeing. Of course, unlike the network's mistakes, this card actually works, haha.
---
I heard back from the keepers of the machine that I'm wanting to do experiments on. They'll try setting Torch up for me over the weekend. That should speed up my work by several orders of magnitude, so that's nice. When I get the chance, I'll also try producing a higher res version of Starry Night Jace, just to see how that goes. I'll be sure to keep y'all updated.
Is the implementation of Stack-RNN complete? If yes, I would be interested to try running it on my GPU at home (if I could even make the required 60 stacks fit on 1gb of GPU ram).
Ok, my attempt at dualbooting was an abject 'killed three OSes' failure. I've got windows running again, and I'm not treading out into those waters again.
Anyone have any advice on how to get this to work, subject to a 5 GB 'this is on a memory stick' constraint?
Is the implementation of Stack-RNN complete? If yes, I would be interested to try running it on my GPU at home (if I could even make the required 60 stacks fit on 1gb of GPU ram).
Not quite. I'm getting a NaN error with the recurrent matrix enabled. Looks like a step involved in the calculation of that matrix is messed up. I'm investigating that. Once everything is working, I'll be sure to make it available.
EDIT: With a sequence length of 200, I get about 140 timesteps into the first batch, and then the loss for every timestep after that is NaN, which makes the resulting training loss for the whole batch NaN. If I cut the sequence length in half, I can complete the first batch, but I start getting NaN results halfway through the second batch.
So up until a certain point, everything looks normal. But something must be spinning out of control, and it involves the computation of the recurrent matrix. I'll have to look into that later.
EDIT(2): Oh, fun note. I found a paper that covers neural stacks, queues, and dequeues, and for that paper I found people who have implemented everything, like this one. Interesting...
EDIT(3): Wow. This is extremely promising. The training script is way more sophisticated than what we've been using. You can do variable sequence length. When it asks for a batch, what you do is pass back the string and a value indicating how long the sequence is. And you can do stacks, and queues, and dequeues, and you can even mix them up. Like you can have 2 stacks and 1 queue on the side. The controller is an LSTM network like we're used to using. Definitely going to look into this tonight.
Ok, my attempt at dualbooting was an abject 'killed three OSes' failure. I've got windows running again, and I'm not treading out into those waters again.
Anyone have any advice on how to get this to work, subject to a 5 GB 'this is on a memory stick' constraint?
Hmm... I'm not as experienced in such matters, but I think that there are several people here that may be able to help you.
I may or may not have found the solution to all of our problems. Assuming this new code works, of course, but it came prepackaged with some test examples and those seem to work just fine, so I'm very hopeful. The code in question came out in between when I started coding the stack-rnn and today (evidently I wasn't the only one who had the idea, haha). I just need to reconfigure it a little bit to serve our needs.
EDIT: I'm currently looking into how I can integrate our current batch loader into the training program. The code is structured the same as before, for the most part, so it shouldn't be too hard.
EDIT(2): Honestly though I'd rather just feed in the plaintext because they already do their own encoding/embedding and I'd hate to have to rip out their code and cram in ours. Hmm...
EDIT(3): By that I meant the mapping to integers, not the encoding of text like conversion of numbers to unary. I think I can make this work just fine.
EDIT(4): Getting closer to getting this working. Just dealing with I/O issues at this point. I'm excited.
Why am I excited? Take a look at the charts. For instance, for the gender conjugation problem, the network has to put attach the right gender to the right noun, and the data-structure-augmented architectures get those kinds of problems correct 100% of the time. Why? Well, they have the power to say "Oh, what was the gender supposed to be here again? I forget. Fortunately, I had the foresight to write that down earlier, let me consult my notes."
As you can see in the diagram (ignore all the minor details), what we have is a neural network that's wired to a permanent data store that it can send signals to in order to read/write data. So instead of trying to hold on to every piece of information across timesteps, which y'all know is extremely difficult for the network, instead it just makes a memo and stores it. So instead of having to remember the information, it just has to remember that it stored that information (which is far less taxing).
So substitute, say, gender, for color or type, and then you can see why such a feature would be ever-so-helpful.
EDIT(5): Things are looking good. Almost have everything working. The implementation probably isn't the most memory efficient, but that's okay. We can work with that.
EDIT(6): And I think I'll call it a night. I'm still experiencing a slight bug where I want a CUDA-compatible tensor yet somewhere I'm getting a non-CUDA-compatible tensor, but that should be easily fixed. Just need to find out where that data is coming from.
I'm getting good results on the toy data. Next up is the new network topology that I've been wanting to make for like a week (which has a good chance of being dog slow, but...). The code I have has gotten pretty far afield from where I started out. I'm kind of wondering how much the current design resembles Torch.
The "network" is a DAG of expression nodes. It uses a mixture of algebra and caching to make sure that equivalent expressions get the same node. The nodes can return various manipulations of themselves, including taking a partial derivative.
The task of interpreting the nodes falls to evaluators, which dispatch on the class of the nodes, and convert input data and system state into the nodes' output.
I've favored a largely immutable approach to all of this, using mutable data only for caching purposes.
One thing that is different from Torch is that I'm using no contiguous storage or explicitly multi-dimensional math. It may be possible to recover such things from the node structures, but I haven't investigated that area any.
So, in short, I've got one set of data structures for representing annotated network topology, a dumb container type for representing system state, a simple class hierarchy for evaluating the network in terms of the state, and factory functions that convert a network specification into a network topology and initial state.
Say, what are the IP ramifications of training a neural network on a corpus of copyright material? Is the output an original work? Is this a dangerous question?
Sooo.... Maybe the cause of all our woes with X costs is... Engineered Explosives, a card with X in the cost and no X in the text?
Engineered Explosives is misleading, yes, but it's just one card out of 240 X cost cards, the vast majority of which explicitly use the X in the text. The network tries to maximize its chances of making correct predictions, so it'll disregard outliers.
Rather, part of the problem with X cost cards is that, ignoring the cost, the text of the card would still make sense if you substituted a fixed value wherever X is used, and the use of X can be carried quite a long way away from the text (Blaze is relatively easy, Strength of the Tajuru is trickier). Since with our current approach the memory of the X in the cost is easily subject to weakening and alteration, by the time that the network gets to the point where it would have used the X, it has to contend with many competing alternatives. There's also the problem that the same network that has to predict X cost cards also has to predict the vast majority of cards that do not have Xs in them, so that can play a role as well.
Now, the use of an external data store won't be a cure-all, because there will still be comprehension problems, in part because we have a relatively small data set to work with. I don't expect the network to "get" Engineered Explosives, for instance. What the data does suggest is that the network will have an easier time producing cleaner, more coherent cards.
Say, what are the IP ramifications of training a neural network on a corpus of copyright material? Is the output an original work? Is this a dangerous question?
Obviously, I am no laywer, but I think we stand on firm ground here. The text of the generated cards are independent derivative works. Now, if we wanted to sell a printed t-shirt that had a render of one of our cards that used the Magic fonts, symbols, etc., that might be a different story. But the generated game content itself? I don't think so.
Besides, Magic is a community experience, and it attracts swarms of creatives who churn out fan fiction, cards, artwork. That we're all here talking about Magic reflects our love and engagement with the product.
---
I'm tinkering with the scripts some more before I leave for work. Things are progressing nicely. And hopefully by early next week I'll have Torch set up on the high-end machine so I can test everything quickly with different parameters. There's a lot of unanswered questions. For example, what kind of data structure will work best? A stack? A queue? A deque? And what about combinations of these? For instance, we could have 2 stacks and 2 queues. Or 20 shallow stacks. No clue whatsoever. That's part of the fun though.
EDIT: I think I may have fixed the bug that I was experiencing with the conversion issue. I looked online and heard that it's caused by an out-of-date package issue. This does not surprise me, given that Torch is undergoing a lot of active development. Currently resolving other related issues.
We have training! Now, here's to hoping that the algorithm was implemented correctly (from the looks of it, it is). I hardcoded in some stuff such that it'll only work if you use CUDA support, and right now we're using the old batching scheme (it looks like we can do sequences of variable length, so we can pass in unbroken cards pretty efficiently if we want to). Also, the batch time seems very slow, but I think it's because they bundle a lot of stuff into one epoch (a lot more training goes on). I may need to make sure garbage collection is done regularly to prevent any GPU memory issues, that'll be a non-issue with the super-fast CPU/GPU setup that'll be coming online soon.
EDIT(3): Oh, and I'll need to update our sampling script for these checkpoints, but that shouldn't be overly difficult.
EDIT(4): I'mma wait to do the training until I can do it on the high-end machine. But I barely contain my excitement. If it works like we hope that it does, this'll put us light years ahead of where we were. Obviously there'll be a lot of fine-tuning and so forth and that'll take time, but the very idea of having permanent memory storage on the side opens up a ton of new possibilities.
EDIT(5): I lied. I went ahead and started a very small version running just to see how it behaves. I'll check back in on it later.
EDIT(6): And I'm glad I did that, because I identified a small typo that throws off the training process. Fixed.
Suppose we trained a network to convincingly mimic a contemporary artist's style? Then we could turn a rough drawing into what looks like artwork by that artist, without paying the artist a commission. I imagine the artist would be unhappy at being replaced by a program trained to mimic their work. But I suppose there's nothing actually illegal about it. And the artist could do the same thing to save labour.
Suppose we trained a network to convincingly mimic a contemporary artist's style? Then we could turn a rough drawing into what looks like artwork by that artist, without paying the artist a commission. I imagine the artist would be unhappy at being replaced by a program trained to mimic their work. But I suppose there's nothing actually illegal about it. And the artist could do the same thing to save labour.
I think the idea of the software having a master/student relationship with an artist seems is a very interesting idea. The root volition starts with the human, and the machine helps with the execution of the concept.
A fun thing to do is to combine all that we've done this far. I found a machine-generated card and supplied it with machine-generated art and machine-generated flavor text. Credit goes to Sendai45 for the set symbol, lol. Art credit goes to Van Gogh's Starry Night mixed with Scarwood Treefolk.
I saw the name "Cosmic Treefolk" and could not resist, lol.
EDIT: Runner-up flavor texts included:
* "It's easy for the innocent to speak of fragment. They are limited with fallen silence."
* "We three, though of separate ancestry, join in brotherhood. . . . We dare not hope it has no forge, and we stop fighting when a random body part falls off."
* This spell, like many attributed to Drakna, was dead.
I'm hitting some kind of wall in optimizing this, so I'm moving to another language to see what happens. I'm thinking Haskell; it's a pretty good fit for how a lot of the code is already structured.
ETA: OH YEAH, the gnarly algorithms that have their correctness verified by testing. This'll be... fun. Also, I really need to have Ord implemented, like, all over the place in this code, now that I'm not assuming that RPython might somehow get involved.
I'm still waiting on a report about the Torch installation on the high-end machine. Haven't heard anything yet, but that's probably a good thing - they'd have contacted me sooner if anything went wrong.
One bit of good (non-Magic) related news: I just got word that a paper for which I am the lead author was accepted for publication in an international research journal! The review committee praised both the technical contributions of the work and my evocative writing style (I'm a firm believer that scientific writing doesn't always have to be dull and obtuse).
At the end of the day, I feel that my research team and I have ever so slightly improved the odds for the long-term survival of our species, so that's nice.
Anyway, I'll let y'all know once everything is up and running. I'm currently setting up batches of experiments to be executed for when the time comes, and I'll be sure to make some time for higher resolution renders of art and training a new card generator.
Well, I've ported but not tested the core DAG stuff. (I got it to sign off on some easy-to-write but wrong code that I caught by inspection, so I know this code needs tests, the miracles of static analysis be damned.) It kind of feels like I injected pure math directly into my eyeballs.
To write the network code, I need some functions that "consume" "resources" (in this case, "unused system variables") and keep track of how much they've used up. Not sure if there's any library support for this. It's not Writer as I'm familiar with it, since the functions need to know where they're starting from. Any advice?
Talcos, what's the status of getting stack-augmented networks to play nice with our format in Torch?
My internship just ended, so I'll be spending this week moving home. Once I'm back in the lab, I'm hoping to upgrade my Intel machine with a new GPU, and it would be nice to have something to run on it.
Well, I've ported but not tested the core DAG stuff. (I got it to sign off on some easy-to-write but wrong code that I caught by inspection, so I know this code needs tests, the miracles of static analysis be damned.) It kind of feels like I injected pure math directly into my eyeballs.
It needs tests? Or is this just an admission that your faith in SA is not strong enough? lol, jk.
To write the network code, I need some functions that "consume" "resources" (in this case, "unused system variables") and keep track of how much they've used up. Not sure if there's any library support for this. It's not Writer as I'm familiar with it, since the functions need to know where they're starting from. Any advice?
Could you elaborate on what you mean by this? Do you mean in the sense that all the inputs get used, as in, the graph contains a path from each input to an output? Or do you mean something else?
It supports any number/combination of stacks, queues, and deques, so that's fun. The authors of the paper showed that, for a number of different tasks, a single layer LSTM network with external data storage could outperform an LSTM network with 2-8 layers without external data storage. When we have a pure LSTM network, it turns out that a sizeable number of cells end up as storage units rather than computation units. When we provide external storage, we lessen the burden of having to maintain that fragile short-term memory, so we can make do with fewer cells. Of course, the authors only tested on relatively small and simple problems, so I imagine our networks will be larger than the ones they report, but there's still substantial savings to be had with the use of external memory.
That being said, the implementation I'm working with right now is kinda slow. Like 8 times slower. Part of that is due to the added training costs, but I suspect that there are some inefficiencies. That and I don't know whether the algorithm was properly implemented. It looks okay and it can replicate the results shown in the paper, but I'll have to run everything with Magic cards to see how things go. I may have to tweak some stuff in the code.
But the possibilities are intriguing. The approach seems to be very powerful.
My internship just ended, so I'll be spending this week moving home. Once I'm back in the lab, I'm hoping to upgrade my Intel machine with a new GPU, and it would be nice to have something to run on it.
Congratulations on the successful completion of your internship!
But yeah, I'll keep you posted as to how things go.
To write the network code, I need some functions that "consume" "resources" (in this case, "unused system variables") and keep track of how much they've used up. Not sure if there's any library support for this. It's not Writer as I'm familiar with it, since the functions need to know where they're starting from. Any advice?
Could you elaborate on what you mean by this? Do you mean in the sense that all the inputs get used, as in, the graph contains a path from each input to an output? Or do you mean something else?
Okay, so, have some code excerpts:
module Network where
import Control.Arrow
import Nodes -- Exports Expr, which is an instance of Num. It is the tree type. It also exports a function that converts indices to Expr, "variable"
type System a = (a, Int)
-- Elided some code that I haven't used yet.
synapse :: System Expr -> Expr
synapse (node, offset) = variable offset * node
biasSynapse :: System a -> Expr
biasSynapse (_, offset) = variable offset
neuron :: System [Expr] -> System Expr
neuron ([], offset) = (biasSynapse ([], offset), offset + 1)
neuron (node:nodes, offset) = ((+) (synapse (node, offset)) *** (+) 1) (neuron (nodes, offset + 1)) -- Rewritten until the build system stopped issuing warnings, presumably because it was too confused.
neuron :: System [Expr] -> System Expr
neuron ([], offset) = (biasSynapse ([], offset), offset + 1)
neuron (node:nodes, offset) = ((+) (synapse (node, offset)) *** (+) 1) (neuron (nodes, offset + 1)) -- Rewritten until the build system stopped issuing warnings, presumably because it was too confused.[/code]
Gotcha. Okay, so question: are you modeling your network on a neuron-by-neuron basis, or using some kind of layer abstraction? Because when I model networks, I talk about things at the level of layers rather than individual neurons because I usually have it set up where one layer merely feeds forward to the next layer and there's a one-to-many connection from the neurons of the previous layer and all the ones of the next. One advantage of doing it that way is that it's easier to specify and track where inputs are going and how they map to outputs. It does limit me a little bit in terms of flexibility, but analyzing the graph becomes much easier.
Well, I'm building it up from neurons, but the goal is to have the interfaces go high-level. Like, the next function to define is one that takes an entire layer (note that there has to be a map sigmoid in here somewhere, but that's not very interesting), a size for the next layer, and produces "the next" layer from those specifications.
Hey, I finally got stylenet working :). For future reference, if you install cuda and on the next boot ubuntu purplescreens, instead of booting with the default options, go into grub, select 'advanced boot options, then on that menu pick option 3 (the non-default, non recovery ubuntu version), and boot into it.
Once everything boots in there, you should be able to get to the desktop.
Don't try to use cuda, it won't work. Instead, reboot the system again, and now you can boot ubuntu normally with everything working... once. Turn it off, and you have to go through the whole process again.
Well, I'm building it up from neurons, but the goal is to have the interfaces go high-level. Like, the next function to define is one that takes an entire layer (note that there has to be a map sigmoid in here somewhere, but that's not very interesting), a size for the next layer, and produces "the next" layer from those specifications.
I see. So, in your design, if I have a layer, and I pass a vector of N values to a layer, one entry for each neuron, do you split up the incoming data N ways?
Regardless, in the end, whatever you do will be logically equivalent to what I do. I was just curious from a GPU parallelization perspective.
Hey, I finally got stylenet working :). For future reference, if you install cuda and on the next boot ubuntu purplescreens, instead of booting with the default options, go into grub, select 'advanced boot options, then on that menu pick option 3 (the non-default, non recovery ubuntu version), and boot into it.
Once everything boots in there, you should be able to get to the desktop.
Don't try to use cuda, it won't work. Instead, reboot the system again, and now you can boot ubuntu normally with everything working... once. Turn it off, and you have to go through the whole process again.
I never expected 'dual-booting' to be so literal.
Congrats! Of course, I'm sorry to hear that you ended up with such a complicated process just to boot up with CUDA support.
You guys mentioned someone was working on a set of these?
Ah, yes, Elseleth has been. The original hope was to have a draft around labor day, but he ended up being very busy and to my knowledge the draft has yet to be held. I've been meaning to contact him about that, but then again, I've been very busy myself, haha. The set is more or less done, though I'm sure it could use some more art and flavor work, that sort of thing.
Right now the budget of mental energy I have to spare is being put towards developing better card generating systems. Like I said before, there's a lot of untapped potential.
---
I'm waiting to hear back from the keepers of the high-end machine so I can run my next batch of experiments (and see about training a new, data-structure-augmented network). Should be very soon though.
Well... that actually works, right? Minimum 2/2 but can be a 5/5?
I think he means that it's the kind of card where the use of X is not explicit, which makes it a lot like some of the X cost cards we've been seeing. Of course, unlike the network's mistakes, this card actually works, haha.
---
I heard back from the keepers of the machine that I'm wanting to do experiments on. They'll try setting Torch up for me over the weekend. That should speed up my work by several orders of magnitude, so that's nice. When I get the chance, I'll also try producing a higher res version of Starry Night Jace, just to see how that goes. I'll be sure to keep y'all updated.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Anyone have any advice on how to get this to work, subject to a 5 GB 'this is on a memory stick' constraint?
Not quite. I'm getting a NaN error with the recurrent matrix enabled. Looks like a step involved in the calculation of that matrix is messed up. I'm investigating that. Once everything is working, I'll be sure to make it available.
EDIT: With a sequence length of 200, I get about 140 timesteps into the first batch, and then the loss for every timestep after that is NaN, which makes the resulting training loss for the whole batch NaN. If I cut the sequence length in half, I can complete the first batch, but I start getting NaN results halfway through the second batch.
So up until a certain point, everything looks normal. But something must be spinning out of control, and it involves the computation of the recurrent matrix. I'll have to look into that later.
EDIT(2): Oh, fun note. I found a paper that covers neural stacks, queues, and dequeues, and for that paper I found people who have implemented everything, like this one. Interesting...
EDIT(3): Wow. This is extremely promising. The training script is way more sophisticated than what we've been using. You can do variable sequence length. When it asks for a batch, what you do is pass back the string and a value indicating how long the sequence is. And you can do stacks, and queues, and dequeues, and you can even mix them up. Like you can have 2 stacks and 1 queue on the side. The controller is an LSTM network like we're used to using. Definitely going to look into this tonight.
Hmm... I'm not as experienced in such matters, but I think that there are several people here that may be able to help you.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Ookay, time to start work on some of those optimizations I had in mind.
It's also time to say "This use of the visitor pattern in Python will be different, because I won't be complaining about it. ... As much."
EDIT: I'm currently looking into how I can integrate our current batch loader into the training program. The code is structured the same as before, for the most part, so it shouldn't be too hard.
EDIT(2): Honestly though I'd rather just feed in the plaintext because they already do their own encoding/embedding and I'd hate to have to rip out their code and cram in ours. Hmm...
EDIT(3): By that I meant the mapping to integers, not the encoding of text like conversion of numbers to unary. I think I can make this work just fine.
EDIT(4): Getting closer to getting this working. Just dealing with I/O issues at this point. I'm excited.
Why am I excited? Take a look at the charts. For instance, for the gender conjugation problem, the network has to put attach the right gender to the right noun, and the data-structure-augmented architectures get those kinds of problems correct 100% of the time. Why? Well, they have the power to say "Oh, what was the gender supposed to be here again? I forget. Fortunately, I had the foresight to write that down earlier, let me consult my notes."
As you can see in the diagram (ignore all the minor details), what we have is a neural network that's wired to a permanent data store that it can send signals to in order to read/write data. So instead of trying to hold on to every piece of information across timesteps, which y'all know is extremely difficult for the network, instead it just makes a memo and stores it. So instead of having to remember the information, it just has to remember that it stored that information (which is far less taxing).
So substitute, say, gender, for color or type, and then you can see why such a feature would be ever-so-helpful.
EDIT(5): Things are looking good. Almost have everything working. The implementation probably isn't the most memory efficient, but that's okay. We can work with that.
EDIT(6): And I think I'll call it a night. I'm still experiencing a slight bug where I want a CUDA-compatible tensor yet somewhere I'm getting a non-CUDA-compatible tensor, but that should be easily fixed. Just need to find out where that data is coming from.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
The "network" is a DAG of expression nodes. It uses a mixture of algebra and caching to make sure that equivalent expressions get the same node. The nodes can return various manipulations of themselves, including taking a partial derivative.
The task of interpreting the nodes falls to evaluators, which dispatch on the class of the nodes, and convert input data and system state into the nodes' output.
I've favored a largely immutable approach to all of this, using mutable data only for caching purposes.
One thing that is different from Torch is that I'm using no contiguous storage or explicitly multi-dimensional math. It may be possible to recover such things from the node structures, but I haven't investigated that area any.
So, in short, I've got one set of data structures for representing annotated network topology, a dumb container type for representing system state, a simple class hierarchy for evaluating the network in terms of the state, and factory functions that convert a network specification into a network topology and initial state.
Engineered Explosives is misleading, yes, but it's just one card out of 240 X cost cards, the vast majority of which explicitly use the X in the text. The network tries to maximize its chances of making correct predictions, so it'll disregard outliers.
Rather, part of the problem with X cost cards is that, ignoring the cost, the text of the card would still make sense if you substituted a fixed value wherever X is used, and the use of X can be carried quite a long way away from the text (Blaze is relatively easy, Strength of the Tajuru is trickier). Since with our current approach the memory of the X in the cost is easily subject to weakening and alteration, by the time that the network gets to the point where it would have used the X, it has to contend with many competing alternatives. There's also the problem that the same network that has to predict X cost cards also has to predict the vast majority of cards that do not have Xs in them, so that can play a role as well.
Now, the use of an external data store won't be a cure-all, because there will still be comprehension problems, in part because we have a relatively small data set to work with. I don't expect the network to "get" Engineered Explosives, for instance. What the data does suggest is that the network will have an easier time producing cleaner, more coherent cards.
Obviously, I am no laywer, but I think we stand on firm ground here. The text of the generated cards are independent derivative works. Now, if we wanted to sell a printed t-shirt that had a render of one of our cards that used the Magic fonts, symbols, etc., that might be a different story. But the generated game content itself? I don't think so.
Besides, Magic is a community experience, and it attracts swarms of creatives who churn out fan fiction, cards, artwork. That we're all here talking about Magic reflects our love and engagement with the product.
---
I'm tinkering with the scripts some more before I leave for work. Things are progressing nicely. And hopefully by early next week I'll have Torch set up on the high-end machine so I can test everything quickly with different parameters. There's a lot of unanswered questions. For example, what kind of data structure will work best? A stack? A queue? A deque? And what about combinations of these? For instance, we could have 2 stacks and 2 queues. Or 20 shallow stacks. No clue whatsoever. That's part of the fun though.
EDIT: I think I may have fixed the bug that I was experiencing with the conversion issue. I looked online and heard that it's caused by an out-of-date package issue. This does not surprise me, given that Torch is undergoing a lot of active development. Currently resolving other related issues.
EDIT(2): Aaaannd...
We have training! Now, here's to hoping that the algorithm was implemented correctly (from the looks of it, it is). I hardcoded in some stuff such that it'll only work if you use CUDA support, and right now we're using the old batching scheme (it looks like we can do sequences of variable length, so we can pass in unbroken cards pretty efficiently if we want to). Also, the batch time seems very slow, but I think it's because they bundle a lot of stuff into one epoch (a lot more training goes on). I may need to make sure garbage collection is done regularly to prevent any GPU memory issues, that'll be a non-issue with the super-fast CPU/GPU setup that'll be coming online soon.
EDIT(3): Oh, and I'll need to update our sampling script for these checkpoints, but that shouldn't be overly difficult.
EDIT(4): I'mma wait to do the training until I can do it on the high-end machine. But I barely contain my excitement. If it works like we hope that it does, this'll put us light years ahead of where we were. Obviously there'll be a lot of fine-tuning and so forth and that'll take time, but the very idea of having permanent memory storage on the side opens up a ton of new possibilities.
EDIT(5): I lied. I went ahead and started a very small version running just to see how it behaves. I'll check back in on it later.
EDIT(6): And I'm glad I did that, because I identified a small typo that throws off the training process. Fixed.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Edit: Reddit still remembers Tromple http://www.reddit.com/r/magicTCG/comments/3kicxx/the_shirts_uabrandys_wife_didnt_make/cuy3cdk?context=2
I think the idea of the software having a master/student relationship with an artist seems is a very interesting idea. The root volition starts with the human, and the machine helps with the execution of the concept.
But yeah, who knows what the future holds.
Haha!
---
A fun thing to do is to combine all that we've done this far. I found a machine-generated card and supplied it with machine-generated art and machine-generated flavor text. Credit goes to Sendai45 for the set symbol, lol. Art credit goes to Van Gogh's Starry Night mixed with Scarwood Treefolk.
I saw the name "Cosmic Treefolk" and could not resist, lol.
EDIT: Runner-up flavor texts included:
* "It's easy for the innocent to speak of fragment. They are limited with fallen silence."
* "We three, though of separate ancestry, join in brotherhood. . . . We dare not hope it has no forge, and we stop fighting when a random body part falls off."
* This spell, like many attributed to Drakna, was dead.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
ETA: OH YEAH, the gnarly algorithms that have their correctness verified by testing. This'll be... fun. Also, I really need to have Ord implemented, like, all over the place in this code, now that I'm not assuming that RPython might somehow get involved.
One bit of good (non-Magic) related news: I just got word that a paper for which I am the lead author was accepted for publication in an international research journal! The review committee praised both the technical contributions of the work and my evocative writing style (I'm a firm believer that scientific writing doesn't always have to be dull and obtuse).
At the end of the day, I feel that my research team and I have ever so slightly improved the odds for the long-term survival of our species, so that's nice.
Anyway, I'll let y'all know once everything is up and running. I'm currently setting up batches of experiments to be executed for when the time comes, and I'll be sure to make some time for higher resolution renders of art and training a new card generator.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
To write the network code, I need some functions that "consume" "resources" (in this case, "unused system variables") and keep track of how much they've used up. Not sure if there's any library support for this. It's not Writer as I'm familiar with it, since the functions need to know where they're starting from. Any advice?
My internship just ended, so I'll be spending this week moving home. Once I'm back in the lab, I'm hoping to upgrade my Intel machine with a new GPU, and it would be nice to have something to run on it.
It needs tests? Or is this just an admission that your faith in SA is not strong enough? lol, jk.
Could you elaborate on what you mean by this? Do you mean in the sense that all the inputs get used, as in, the graph contains a path from each input to an output? Or do you mean something else?
Good (I think!). I extended this code to incorporate your batch loader: https://github.com/PrajitR/NeuralStacksQueues
It supports any number/combination of stacks, queues, and deques, so that's fun. The authors of the paper showed that, for a number of different tasks, a single layer LSTM network with external data storage could outperform an LSTM network with 2-8 layers without external data storage. When we have a pure LSTM network, it turns out that a sizeable number of cells end up as storage units rather than computation units. When we provide external storage, we lessen the burden of having to maintain that fragile short-term memory, so we can make do with fewer cells. Of course, the authors only tested on relatively small and simple problems, so I imagine our networks will be larger than the ones they report, but there's still substantial savings to be had with the use of external memory.
That being said, the implementation I'm working with right now is kinda slow. Like 8 times slower. Part of that is due to the added training costs, but I suspect that there are some inefficiencies. That and I don't know whether the algorithm was properly implemented. It looks okay and it can replicate the results shown in the paper, but I'll have to run everything with Magic cards to see how things go. I may have to tweak some stuff in the code.
But the possibilities are intriguing. The approach seems to be very powerful.
Congratulations on the successful completion of your internship!
But yeah, I'll keep you posted as to how things go.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Okay, so, have some code excerpts:
Gotcha. Okay, so question: are you modeling your network on a neuron-by-neuron basis, or using some kind of layer abstraction? Because when I model networks, I talk about things at the level of layers rather than individual neurons because I usually have it set up where one layer merely feeds forward to the next layer and there's a one-to-many connection from the neurons of the previous layer and all the ones of the next. One advantage of doing it that way is that it's easier to specify and track where inputs are going and how they map to outputs. It does limit me a little bit in terms of flexibility, but analyzing the graph becomes much easier.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.
Once everything boots in there, you should be able to get to the desktop.
Don't try to use cuda, it won't work. Instead, reboot the system again, and now you can boot ubuntu normally with everything working... once. Turn it off, and you have to go through the whole process again.
I never expected 'dual-booting' to be so literal.
I see. So, in your design, if I have a layer, and I pass a vector of N values to a layer, one entry for each neuron, do you split up the incoming data N ways?
Regardless, in the end, whatever you do will be logically equivalent to what I do. I was just curious from a GPU parallelization perspective.
Congrats! Of course, I'm sorry to hear that you ended up with such a complicated process just to boot up with CUDA support.
Ah, yes, Elseleth has been. The original hope was to have a draft around labor day, but he ended up being very busy and to my knowledge the draft has yet to be held. I've been meaning to contact him about that, but then again, I've been very busy myself, haha. The set is more or less done, though I'm sure it could use some more art and flavor work, that sort of thing.
Right now the budget of mental energy I have to spare is being put towards developing better card generating systems. Like I said before, there's a lot of untapped potential.
---
I'm waiting to hear back from the keepers of the high-end machine so I can run my next batch of experiments (and see about training a new, data-structure-augmented network). Should be very soon though.
My LinkedIn profile... thing (I have one of those now!).
My research team's webpage.
The mtg-rnn repo and the mtg-encode repo.