#mlpack on 2018-06-06 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:44 witness_ has quit [Quit: Connection closed for inactivity]

03:17 vivekp has joined #mlpack

06:47 < ShikharJ> zoq: Should we continue the testing? It's been three days today, and the network still continues to train, on the full test.

07:08 witness_ has joined #mlpack

07:30 < ShikharJ> zoq: It turns out that a single iteration of the optimizer takes about a second, so for 70,000 images iterated over 2000 epochs, this takes way too much time (more than what is required to train probably). I'll see if I'm able to get good results within a day of training or not.

07:41 < ShikharJ> zoq: I have tmux'd another session which should take a day at most, I'll also spawn some other session on different hyperparameters.

10:03 < jenkins-mlpack> Project docker mlpack nightly build build #341: SUCCESS in 2 hr 49 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/341/

11:59 vivekp has quit [Ping timeout: 265 seconds]

12:10 vivekp has joined #mlpack

12:24 wenhao has joined #mlpack

13:58 < rcurtin> ShikharJ: 2000 epochs seems like a lot to me, do GANs really take that long to train? all of the work I have ever done with neural networks and MNIST reaches maximum accuracy usually within 100 epochs

13:58 < rcurtin> note, I am not an expert, so maybe 2k epochs is totally reasonable, I am just curious

14:00 < ShikharJ> rcurtin: The O'Reilly test example ran for that many epochs (100,000 epochs on a 50 batch), so we just went by 2,000. Hopefullty the per epoch evaluation is a lot better with mlpack. Let's see.

14:00 < rcurtin> wow, 100k epochs... and was that really necessary to get the performance they got?

14:00 < rcurtin> I'm not familiar with the example by the way, so I'd be interested in glancing at the paper or reference if you have it handy

14:02 < ShikharJ> Yeah, but seemingly, we got better results with a smaller dataset and a lot smaller epoch and pre-training. So hopefully, the things even out.

14:05 < ShikharJ> rcurtin: Take a look here for the slightly modified example (https://www.oreilly.com/learning/generative-adversarial-networks-for-beginners).

14:06 < ShikharJ> rcurtin: Here is the original paper, though I don't think they have specified the code anywhere (https://arxiv.org/abs/1406.2661).

14:07 < rcurtin> oh, I see, actually that is 100k batches, not 100k epochs (if 'epoch' is defined as one full pass over the data)

14:08 < rcurtin> since the MNIST training data is 55k points, that actually comes out to roughly 91 full passes over the dataset

14:08 < rcurtin> if I am understanding it right

14:08 < ShikharJ> Training data is 60,000 points if I'm not wrong and 10,000 test data points for a total of 70,000 images.

14:09 < rcurtin> that's how I've typically seen it, but if they are using the same mnist package in Python that I've used before, it's 55k training, 5k validation, 10k test

14:09 < rcurtin> if it is 60k points, that's ~83 full passes, which to me seems a lot less crazy than 100k passes :)

14:10 < ShikharJ> But still, it's a lot. A single pass over the entire dataset of 70,000 images would approximately take over 19 hours.

14:11 < rcurtin> right, that seems really long compared to what I would expect

14:11 < rcurtin> if the batch size support is not yet ready for your GAN implementation, that can make a huge difference

14:12 < ShikharJ> I've currently spawned a new job with 10,000 images and 10 epochs to see if we get somewhere. SHould be done by tomorrow.

14:13 < rcurtin> cool, hopefully it performs well :)

14:26 ImQ009 has joined #mlpack

15:31 travis-ci has joined #mlpack

15:32 < travis-ci> ShikharJ/mlpack#172 (GAN - 9014d65 : Shikhar Jaiswal): The build has errored.

15:32 < travis-ci> Change view : https://github.com/ShikharJ/mlpack/compare/01316eae673f...9014d65def05

15:32 < travis-ci> Build details : https://travis-ci.org/ShikharJ/mlpack/builds/388802717

15:32 travis-ci has left #mlpack []

15:59 wenhao has quit [Ping timeout: 260 seconds]

17:37 < zoq> ShikharJ: Let's see if we can get good results on a smaller subset, we can always run more experiments on the side.

19:34 < ShikharJ> zoq: Posted some results which I got, I'll post more tomorrow.

19:35 < zoq> ShikharJ: Looks good, do you mind to post the test script you used?

19:37 < ShikharJ> zoq: Test script as in the code that got me the output? It's the same as the one in the PR (GANMNISTTest), with the mentioned hyper-parameters changed. Like epochs limited to 10, ganPreTrain set to 300 and datasetMaxCols set to 10000.

19:38 < zoq> ShikharJ: ahh, okay

19:39 < ShikharJ> This was about 3 times faster than I was expecting it to take, so probably a larger dataset can also be tested. Let me try full dataset with 20 epochs. It should take a day, and if the results are just as good as the output for O'Reilly example, then we're all good to merge.

19:39 < zoq> ShikharJ: I think the results are really good for the current settings, find good paramater for GAN is difficult

19:40 < zoq> agreed

19:40 < zoq> as I said before, we can always run more experiments on the side

19:41 < ShikharJ> zoq: I had also spawned a couple of jobs for 15 and 20 epochs (10,000 images), let's see how the outputs change for those cases as well.

19:41 < ShikharJ> I'll post them, as they become available.

19:42 < zoq> great, nice to see some load on the machine :)

19:44 < zoq> let me install htop :)

19:44 < ShikharJ> zoq: I'm sorry this took a while longer than I had planned, I'll get all the tests done before the evaluations.

19:45 < zoq> No worries at all, we should take all the time we need to get some good results before we move forward

19:45 < zoq> Load average: 2.52

19:46 < zoq> still some room left

19:46 < ShikharJ> zoq: What's load average?

19:46 < zoq> system utilization

19:46 < ShikharJ> zoq: I just started the full job, so 3 jobs running now.

19:47 < zoq> you can run htop, to see some nice results

19:49 < ShikharJ> Load Average 3.05 now.

19:49 < zoq> on a 4 core system max is 4.0

19:49 < ShikharJ> I guess that's it, so now we can just wait :)

19:50 < zoq> right :)

20:13 vivekp has quit [Ping timeout: 245 seconds]

20:32 vivekp has joined #mlpack

20:33 ImQ009 has quit [Quit: Leaving]

20:43 vivekp has quit [Ping timeout: 240 seconds]

20:47 witness_ has quit [Quit: Connection closed for inactivity]