verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
witness_ has quit [Quit: Connection closed for inactivity]
vivekp has joined #mlpack
< ShikharJ> zoq: Should we continue the testing? It's been three days today, and the network still continues to train, on the full test.
witness_ has joined #mlpack
< ShikharJ> zoq: It turns out that a single iteration of the optimizer takes about a second, so for 70,000 images iterated over 2000 epochs, this takes way too much time (more than what is required to train probably). I'll see if I'm able to get good results within a day of training or not.
< ShikharJ> zoq: I have tmux'd another session which should take a day at most, I'll also spawn some other session on different hyperparameters.
< jenkins-mlpack> Project docker mlpack nightly build build #341: SUCCESS in 2 hr 49 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/341/
vivekp has quit [Ping timeout: 265 seconds]
vivekp has joined #mlpack
wenhao has joined #mlpack
< rcurtin> ShikharJ: 2000 epochs seems like a lot to me, do GANs really take that long to train? all of the work I have ever done with neural networks and MNIST reaches maximum accuracy usually within 100 epochs
< rcurtin> note, I am not an expert, so maybe 2k epochs is totally reasonable, I am just curious
< ShikharJ> rcurtin: The O'Reilly test example ran for that many epochs (100,000 epochs on a 50 batch), so we just went by 2,000. Hopefullty the per epoch evaluation is a lot better with mlpack. Let's see.
< rcurtin> wow, 100k epochs... and was that really necessary to get the performance they got?
< rcurtin> I'm not familiar with the example by the way, so I'd be interested in glancing at the paper or reference if you have it handy
< ShikharJ> Yeah, but seemingly, we got better results with a smaller dataset and a lot smaller epoch and pre-training. So hopefully, the things even out.
< ShikharJ> rcurtin: Take a look here for the slightly modified example (https://www.oreilly.com/learning/generative-adversarial-networks-for-beginners).
< ShikharJ> rcurtin: Here is the original paper, though I don't think they have specified the code anywhere (https://arxiv.org/abs/1406.2661).
< rcurtin> oh, I see, actually that is 100k batches, not 100k epochs (if 'epoch' is defined as one full pass over the data)
< rcurtin> since the MNIST training data is 55k points, that actually comes out to roughly 91 full passes over the dataset
< rcurtin> if I am understanding it right
< ShikharJ> Training data is 60,000 points if I'm not wrong and 10,000 test data points for a total of 70,000 images.
< rcurtin> that's how I've typically seen it, but if they are using the same mnist package in Python that I've used before, it's 55k training, 5k validation, 10k test
< rcurtin> if it is 60k points, that's ~83 full passes, which to me seems a lot less crazy than 100k passes :)
< ShikharJ> But still, it's a lot. A single pass over the entire dataset of 70,000 images would approximately take over 19 hours.
< rcurtin> right, that seems really long compared to what I would expect
< rcurtin> if the batch size support is not yet ready for your GAN implementation, that can make a huge difference
< ShikharJ> I've currently spawned a new job with 10,000 images and 10 epochs to see if we get somewhere. SHould be done by tomorrow.
< rcurtin> cool, hopefully it performs well :)
ImQ009 has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> ShikharJ/mlpack#172 (GAN - 9014d65 : Shikhar Jaiswal): The build has errored.
travis-ci has left #mlpack []
wenhao has quit [Ping timeout: 260 seconds]
< zoq> ShikharJ: Let's see if we can get good results on a smaller subset, we can always run more experiments on the side.
< ShikharJ> zoq: Posted some results which I got, I'll post more tomorrow.
< zoq> ShikharJ: Looks good, do you mind to post the test script you used?
< ShikharJ> zoq: Test script as in the code that got me the output? It's the same as the one in the PR (GANMNISTTest), with the mentioned hyper-parameters changed. Like epochs limited to 10, ganPreTrain set to 300 and datasetMaxCols set to 10000.
< zoq> ShikharJ: ahh, okay
< ShikharJ> This was about 3 times faster than I was expecting it to take, so probably a larger dataset can also be tested. Let me try full dataset with 20 epochs. It should take a day, and if the results are just as good as the output for O'Reilly example, then we're all good to merge.
< zoq> ShikharJ: I think the results are really good for the current settings, find good paramater for GAN is difficult
< zoq> agreed
< zoq> as I said before, we can always run more experiments on the side
< ShikharJ> zoq: I had also spawned a couple of jobs for 15 and 20 epochs (10,000 images), let's see how the outputs change for those cases as well.
< ShikharJ> I'll post them, as they become available.
< zoq> great, nice to see some load on the machine :)
< zoq> let me install htop :)
< ShikharJ> zoq: I'm sorry this took a while longer than I had planned, I'll get all the tests done before the evaluations.
< zoq> No worries at all, we should take all the time we need to get some good results before we move forward
< zoq> Load average: 2.52
< zoq> still some room left
< ShikharJ> zoq: What's load average?
< zoq> system utilization
< ShikharJ> zoq: I just started the full job, so 3 jobs running now.
< zoq> you can run htop, to see some nice results
< ShikharJ> Load Average 3.05 now.
< zoq> on a 4 core system max is 4.0
< ShikharJ> I guess that's it, so now we can just wait :)
< zoq> right :)
vivekp has quit [Ping timeout: 245 seconds]
vivekp has joined #mlpack
ImQ009 has quit [Quit: Leaving]
vivekp has quit [Ping timeout: 240 seconds]
witness_ has quit [Quit: Connection closed for inactivity]