verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
wenhao has quit [Ping timeout: 260 seconds]
manish7294 has joined #mlpack
< manish7294> zoq: You there?
< manish7294> I guess Ryan must be sleeping right now.
vivekp has quit [Ping timeout: 260 seconds]
< manish7294> I was debugging the convergence of SGD, AMSGrad and BigBatchSGD on LMNN and what I found in common is that they don't converge but terminate after max iterations. To make them converge on even iris, tolerance needs to be of order of at least 1e-03.
< manish7294> And this is not only the case with LMNN, NCA suffers the same issue.
< manish7294> In case it is a issue.
< manish7294> Whereas L-BFGS converges successfully.
< manish7294> rcurtin: And this may have to do with the 100 iterations idea as L-BFGS works just fine with it.
vivekp has joined #mlpack
< manish7294> zoq: In adaptive search of BigBatchSGD, there is a stepSize calculation at https://github.com/mlpack/mlpack/blob/0128ef719418edd90c2c6cdcfd651f75a044d914/src/mlpack/core/optimizers/bigbatch_sgd/adaptive_stepsize.hpp#L95 , I was wondering what will happen if batchSize is kept 1.
manish7294 has quit [Ping timeout: 260 seconds]
sulan_ has joined #mlpack
< ShikharJ> zoq: I posted the results of I the 10,000 image dataset on the DCGAN PR. It seems to be a bt slower than vanilla GAN, primarily because of the Transposed Convolutions, but the results are good. Please take a look
< ShikharJ> *bit
< ShikharJ> zoq: I'll spend some more time finding better hyper-parameters. Unfortunately, the O'Reilly example for DCGAN doesn't test on MNIST, so we have no way of checking for competitiveness.
< ShikharJ> zoq: DCGAN uses a lot more Convolutions and Transposed Convolutions, and is also a bit deeper than the vanilla implementation, so I guess that made the difference. Probably, something we need to keep an eye from now on is the performance of the Convolutional toolbox of mlpack.
< ShikharJ> zoq: Also in the O'Reilly example, they haven't implemented the same model of DCGAN as in the paper, like we have, so the difference may even come there. Now the CelebA dataset is all that remains.
< jenkins-mlpack> Yippee, build fixed!
< jenkins-mlpack> Project docker mlpack nightly build build #347: FIXED in 2 hr 44 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/347/
witness_ has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
wenhao has joined #mlpack
< rcurtin> manish7294: that sounds about right, because SGD uses a different batch every iteration, it's hard to get convergence
< rcurtin> I think this is why most neural network training doesn't converge based on a tolerance but instead a maximum number of iterations (epochs)
< rcurtin> anyway, I would not be surprised if using a larger tolerance (like 1e-3) would give essentially equivalent kNN accuracy results in a fraction of the time, since it takes so many fewer iterations
< rcurtin> I just did a very quick simulation with the covertype-5k dataset; with the regular SGD optimizer, if I just set max_iterations to take five full passes over the data (so, --max_iterations 25000) the resulting kNN accuracy is basically just as good as with a million iterations
< rcurtin> let me try again with the full covertype dataset
< rcurtin> in any case, the idea here would just be that we can set a smaller tolerance or a smaller default number of maximum iterations, and LMNN will converge much quicker but the quality of the solution will be about the same
manish7294 has joined #mlpack
< manish7294> rcurtin: Everything you said looks right on the point.
< zoq> ShikharJ: Perhaps it makes sense to write an executable (gan_main.cpp), and to use that to run some parameter ranges? I can run some tests on another machine too.
< manish7294> rcurtin: After a number of iterations sgd just wanders around the minima.
< manish7294> Hence, as you said its best to have low max iterations.
< zoq> ShikharJ: I think there are a couple of ideas we could look into to improve the conv operations, "Deep Tensor Convolution on Multicores" might be interesting here.
< rcurtin> manish7294: right, exactly. so maybe we can try with a handful of datasets and calculate or plot learning curves
< rcurtin> (i.e. x axis == number of passes over data, y axis = resulting kNN accuracy)
< rcurtin> and then we can see what a good 'default' number of epochs is
< manish7294> rcurtin: Is there any way to plot curve while running through command line?
< manish7294> rcurtin: And should we replace sgd with AMSGrad
< rcurtin> manish7294: I would say, probably the best way is to write a bash script to cycle the max_iterations, then extract the resulting kNN accuracy into a CSV file or something
< rcurtin> then you could use octave or matplotlib or whatever your favorite plotting library to plot it (or just look at the numbers directly)
< rcurtin> I don't know many good C++ plotting libraries that are easy to use though... it's hard to beat Python for that...
< rcurtin> for the second bit, I'd suggest maybe adding an extra option for --optimizer to include amsgrad and perhaps bbsgd also
< ShikharJ> zoq: I'll run for different parameters, and then set the defaults for the most appropriate ones. Though you can clone the dcgan_test.cpp code and run yourself if you wish, that should work fine.
< manish7294> rcurtin: There is a option for optimizer but it currently only supports sgd and lbfgs
< ShikharJ> zoq: Regarding the paper, I only had a brief look through it, though I guess it can come in handy. I'll have to take a deeper look.
< zoq> ShikharJ: I'll see if I can implement the idea over the next days.
< zoq> ShikharJ: Do you use the same parameter for the Celeb dataset?
< ShikharJ> zoq: I'm yet to run for CelebA, I was digging into the code for support of mini-batches. For CelebA, the layer and the kernel sizes are pertaining to the ones in Soumith Chintala's DCGAN implementation.
< zoq> ShikharJ: I see, I think batch support should be the next big 'milestone' right now.
< ShikharJ> zoq: Agreed, I'll let you know if I face any doubts.
< zoq> ShikharJ: Sounds good.
< rcurtin> manish7294: right, do you think it would be easy to add more there? it should be straightforward I think
< rcurtin> also it may be useful to add a --passes option for SGD-like optimizers (so --max_iterations is only used for L-BFGS)
< manish7294> Ya, no problem which ones do you suggest should be there, I personally don't want to keep sgd.
< rcurtin> and --passes would just specify the number of passes over the data. so then maxIterations would be set to data.n_cols * passes
< rcurtin> that's fair. I think it might be useful to leave SGD because people know what it is
< manish7294> right, I will make that change
< rcurtin> but surely AMSgrad and BBSGD are better approaches
< rcurtin> so it's up to you how you'd like to do it
< manish7294> so we can have AMSGrad as default and sgd in the secondary options
< rcurtin> if you want to remove SGD I would suggest adding comments mentioning that AMSgrad or BBSGD are better alternatives than SGD,
< manish7294> It's just because of divergence
< rcurtin> and if you want to leave it I would suggest adding comments saying that AMSgrad or BBSGD might be better choices :)
< rcurtin> right, understood. the divergence is a hard thing to solve with stock SGD
< rcurtin> (back in a bit)
< zoq> manish7294: It might be worth to start with Adam (or another flavour like AMSgrad) and use SGD afterwards: https://arxiv.org/pdf/1712.07628.pdf
< zoq> I'll see if I can implement SWATS over the next days, but you could hardcode something similar.
< manish7294> zoq: That's good thing to do but I fear we may face divergence as we move from Adam to SGD.
< manish7294> rcurtin: I am going to remove gradient batch precalculation part as it will not going to work with new optimizers. I think it won't affect much.
ImQ009 has joined #mlpack
< zoq> manish7294: hm, not sure someone is going to use BigBatchSGD with a batch size of 1, in this case https://github.com/mlpack/mlpack/blob/0128ef719418edd90c2c6cdcfd651f75a044d914/src/mlpack/core/optimizers/bigbatch_sgd/bigbatch_sgd_impl.hpp#L139 has the same issue.
< zoq> manish7294: Might be a good idea to raise at least a warning.
< manish7294> zoq: Ya no problem, I was foolish enough to do that and got nan as my coordinates matrix.
< ShikharJ> zoq: If you wish to experiment around with the DCGAN code for MNIST, there are only two parameters you can really search around (stepSize and multiplier).
< zoq> ShikharJ: Okay, I guess I'll just write a simple executable since I can't pass any parameter to the test without a rebuild.
< ShikharJ> zoq: What I used to do was make different builds in different tmux sessions.
< ShikharJ> zoq: Though increasing the step-size and multiplier, we may be able to speed-up the tests, but it could potentially lead to lower quality outputs. So instead of searching for better hyper-parameters, on a second thought, I feel that introducing the support on the tasks mentioned is what we must spend time working on.
< zoq> ShikharJ: Agreed, as I said on the PR the results are good and they show that it works fine.
< ShikharJ> zoq: Also the hyper-parameters are anyways going to be set by a user, and need not be similar to what we use by default.
< zoq> ShikharJ: Yeah, the settings have to be tailored to the task, the defaults are just a good starting point.
manish7294 has quit [Ping timeout: 260 seconds]
killer_bee[m] has quit [Remote host closed the connection]
prakhar_code[m] has quit [Remote host closed the connection]
prakhar_code[m] has joined #mlpack
< ShikharJ> zoq: I also noticed that CelebA dataset is over 700MBs (for 200,000 images). So I don't think it would be wise to run the test on the full dataset. I'll rather work on a subset if you're fine with that?
< zoq> ShikharJ: sounds reasonable
killer_bee[m] has joined #mlpack
< rcurtin> ShikharJ: I took a look at your blog post, the images look great
< rcurtin> I think that the images are really helpful, I suspect this is the reason why deep learning got so popular---the papers had cool pictures ;)
< rcurtin> much more exciting than a bunch of theory :(
< ShikharJ> rcurtin: What's your stand on Geoffrey Hinton
< ShikharJ> Like his views that Deep Learning is useless and would be replaced by something more radical?
< ShikharJ> Given that Deep Learning craze itself started after Hinton developed the CD-K algorithm for training Deep Belief Networks?
< rcurtin> (I'm getting lunch, let me finish then I'll respond :))
< ShikharJ> Haha, sure.
< rcurtin> ShikharJ: hmm, I'm not sure about Geoffrey Hinton. I can see where he is coming from---deep learning is just curve fitting, so if you want artificial intelligence, maybe something more radical is needed (but you could even debate that point)
< rcurtin> I have heard some interesting things about capsule networks, but I haven't investigated them
< rcurtin> I think a lot of big people in the machine learning field like to say controversial things :)
< ShikharJ> I often fail to see why deep learning is considered so different from statistics itself as well.
< ShikharJ> I sometimes feel with statements like these, that probably no one knows why things work in ML.
< ShikharJ> Obviously leaving aside the statistical ML part.
< ShikharJ> It's almost like some people are adamant, that we're going in a very wrong direction with Deep Learning.
< rcurtin> right, I think that many people come from fields that aren't deep learning, and now that deep learning has the spotlight, the feeling is a little like jealousy or envy
< rcurtin> it's pretty easy to get any paper about deep learning accepted somewhere, but if you do something more niche
< rcurtin> like... for instance... dual-tree algorithms :)
< rcurtin> it can be very hard to get those papers accepted
< rcurtin> I think the same was true before deep learning with SVMs and kernel machines
< ShikharJ> Haha :)
< rcurtin> just a reaction to hype and trends I guess
< rcurtin> but I do agree with your statement... deep learning isn't really different than statistics
< rcurtin> just an application of a particularly complex set of curve fitters :)
< ShikharJ> Even Statistical Machine Translation people are unhappy with this. It can be seen easily. Pretty much every new grad student is doing Neural Machine Translation, as it is a lot easier to get paper accepted for the NMT domain :)
< ShikharJ> Though I'd say even GANs were considered a niche area when they first came out. So there's that.
sulan_ has quit [Quit: Leaving]
< ShikharJ> rcurtin: I was wondering if there are any thoughts for moving mlpack repository from GitHub to GitLab or some other place (like armadillo has been moved)?
killer_bee[m] has quit [Remote host closed the connection]
prakhar_code[m] has quit [Remote host closed the connection]
ImQ009 has quit [Quit: Leaving]
< rcurtin> ShikharJ: I don't see any particular reason to move away from Github, but if the majority of mlpack developers want to move it, I'm certainly not opposed
< rcurtin> it would be a bit of work to make the transition though
prakhar_code[m] has joined #mlpack
prakhar_code[m] has quit [Remote host closed the connection]
prakhar_code[m] has joined #mlpack