#mlpack on 2018-06-09 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:24 vivekp has quit [Ping timeout: 245 seconds]

04:45 vivekp has joined #mlpack

05:49 ImQ009 has joined #mlpack

06:56 ImQ009 has quit [Ping timeout: 240 seconds]

07:25 ImQ009 has joined #mlpack

08:10 manish7294 has joined #mlpack

08:21 travis-ci has joined #mlpack

08:21 < travis-ci> manish7294/mlpack#22 (lmnn - 7c6c8a4 : Manish): The build has errored.

08:21 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/890235866162...7c6c8a4d95bf

08:21 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/75823986

08:21 travis-ci has left #mlpack []

08:31 < manish7294> rcurtin: I have a bit different observation here. I agree that, the recalculation of arma::find and arma::unique is inefficient but I don't think it is the major troublemaker. I have implemented the precalculation as you advised, but it didn't have much affect. I have made quite a number of run by imposing timer on almost each and every part. Here is the observation: https://pastebin.com/TejDffks

09:22 manish7294 has quit [Quit: Page closed]

09:31 travis-ci has joined #mlpack

09:31 < travis-ci> manish7294/mlpack#23 (lmnn - 65920ee : Manish): The build passed.

09:31 < travis-ci> Change view : https://github.com/manish7294/mlpack/compare/7c6c8a4d95bf...65920eee21fa

09:31 < travis-ci> Build details : https://travis-ci.com/manish7294/mlpack/builds/75824267

09:31 travis-ci has left #mlpack []

09:55 < jenkins-mlpack> Project docker mlpack nightly build build #344: STILL UNSTABLE in 2 hr 41 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/344/

11:27 wenhao has joined #mlpack

13:20 sulan_ has joined #mlpack

16:15 witness_ has joined #mlpack

16:19 travis-ci has joined #mlpack

16:19 < travis-ci> mlpack/mlpack#5041 (mlpack-3.0.2 - 0ace41c : Ryan Curtin): The build passed.

16:19 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/mlpack-3.0.2

16:19 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/390132248

16:19 travis-ci has left #mlpack []

16:36 < rcurtin> manish7294: right, ok. what were the results of each of those timers?

16:36 < rcurtin> note also, in my runs, I used 5k points randomly selected from covertype, not the whole dataset

16:55 manish7294 has joined #mlpack

16:58 < manish7294> rcurtin: Sorry, I don't have accurate data I got on full covertype but approximately it was - computing_neighbors - 5 hrs 30 mins, whole impostors() - 7 hrs 10 mins, the cost and gradient calculation part - 7 hrs 10 mins, total - 13 hrs 30 mins, optimizer - sgd, k =12, batch size = 500

16:59 < manish7294> I also made run on first 5k points of covertype-small dataset and timings were more or less similar.

17:16 < rcurtin> ah, I did it with k=3, let's see if k=12 produces something more similar to yours

17:18 < rcurtin> actually, I realize I did it with default k, which is 1

17:20 < rcurtin> right, now I see results similar to yours

17:20 < rcurtin> but even so, with my timings, 21.6s is spent in Impostors() but only 5s is spent in the computing_neighbors timer

17:21 < rcurtin> I know we are on different systems so the optimizations could be somewhat different. also I am using the results before precalculation... so let's update and see what it does

17:21 < manish7294> Please try putting a timer over knn train and search.

17:21 < manish7294> and a timer over whole impostors

17:21 < rcurtin> that should be equivalent to tree_building + computing_neighbors, which was 0.126s + 5.33s, but yes, let me do that

17:25 < rcurtin> oh, interesting, now that timer is basically the same as the whole impostors timer

17:25 < manish7294> Ya, that's what I am hung up on

17:25 < rcurtin> hmm. it seems like maybe there is something that is not being timed by NeighborSearch, let me take a quick glance

17:25 < manish7294> sure

17:26 < rcurtin> ah! I see what it is

17:26 < rcurtin> if you look in neighbor_search_impl.hpp, the 'tree_building' timer is started _only_ for the query tree building in the Search() method

17:26 < rcurtin> so the building of the reference tree in Train() is not counted

17:27 < rcurtin> so then, it seems like a big bottleneck is the continual re-training of the reference tree

17:27 < rcurtin> do you remember this idea of using a tree with a MahalanobisDistance metric that I wrote a bit about? it seems like that could be useful here

17:28 < rcurtin> either that, or simply reducing the number of times we call Impostors() could be helpful

17:28 < manish7294> maybe we can do both

17:29 < rcurtin> right, I think that could also be helpful

17:29 < rcurtin> for acceleration of the rest of EvaluateWithGradient(), I think that by caching the results from Impostors(), we could avoid the calls to metric.Evaluate()

17:30 < rcurtin> I suspect that will be a lot of the runtime

17:30 < rcurtin> and I think that would be a fairly simple optimization

17:30 < manish7294> for reducing impostors call, does the earlier bounding idea sounds reasonable --- mentioned on PR

17:30 < manish7294> We can try that

17:31 < rcurtin> I believe that your idea will work, my only concern is that the overhead of calculating || (M_{t + 1} - M_t) p_i ||^2 for each p_i might cost more than just performing the search

17:32 < rcurtin> I guess if we keep oldCoordinates, it is just || p_i(t - 1) - p_i(t) ||^2 (sorry for the confusing notation, I wanted to use both i and t)

17:32 < rcurtin> but I do think it could provide some speedup, and maybe would not be too hard to implement. if you want to try and see what happens, I believe it would be worthwhile

17:32 < manish7294> correct

17:32 < rcurtin> I did not have time yesterday to finish writing my bound, unfortunately

17:33 < rcurtin> but I think that the bound I am coming up with could be used in tandem with your idea

17:33 < rcurtin> basically I want to calculate || M_{t + 1} - M_t ||^2 (or something like this that does not depend on the points), and if that quantity is small enough we skip the entire Impostors() step

17:33 < rcurtin> however, it would definitely be possible to do this check (once I figure out the bound), and if the check fails, then your idea can be done to reduce the number of points we need to find neighbors for

17:33 < rcurtin> so both could work in tandem

17:34 < rcurtin> do you think that would sound reasonable?

17:34 < manish7294> Right, I got that feeling from your idea but couldn't see beyond that

17:34 < manish7294> ya sure

17:35 < manish7294> and did you find something about divergence?

17:37 < rcurtin> not yet, I don't have any solution there. we could try adding a regularization parameter, but I haven't seen any issues with the implementation itself

17:37 < rcurtin> I'd like to take a look at the MATLAB solution and see how they solved the problem (or avoided it)

17:38 < rcurtin> I see, they have an adaptive learning rate:

17:38 < rcurtin> % Update learning rate

17:38 < rcurtin> if prev_C > C

17:38 < rcurtin> eta = eta * 1.01;

17:38 < rcurtin> else

17:38 < rcurtin> eta = eta * .5;

17:38 < rcurtin> end

17:38 < manish7294> Here sgd fails

17:39 < rcurtin> so it's a little like momentum... a little. if the step gives an improvement, increase step size by a factor of 1.01; if the step makes things bad, halve the step size

17:39 < rcurtin> I wonder if some of the cool optimizers Marcus has implemented could be used in place of SGD

17:39 < rcurtin> there are a couple in there with adaptive learning rates if I remember right

17:40 < manish7294> zoq: Please help out with optimizers having adaptive learning rate.

17:41 < manish7294> as you said earlier, we can try penalty term too.

17:41 < rcurtin> I think BigBatchSGD could be one way to try

17:41 < rcurtin> agreed, I think a penalty term is no problem

17:42 < rcurtin> if you'd like to try that, go ahead---basically just add || L ||^2 to the objective

17:42 < rcurtin> that or b * || L ||^2 for some user-specified parameter b, then you could play around with it and see if you find a good value

17:42 < manish7294> I think gradient will also be having some change

17:42 < rcurtin> right, that will be just + b*L

17:43 < manish7294> Okay I will try with these ideas.

17:45 < rcurtin> yeah, maybe it will help, let me know how it turns out :)

17:46 < manish7294> In one word - awesome :)

17:47 < manish7294> great results without any divergence at step size 50

17:47 < manish7294> on iris

17:47 < manish7294> which was earlier diverging on even 0.1

17:47 < rcurtin> ah, that is with the regularization penalty?

17:48 < manish7294> yes

17:48 < manish7294> tried with vc2 too

17:49 < rcurtin> great, that is nice to hear. I expect maybe with some larger datasets there may need to be some amount of tuning of how much regularization to apply

17:49 < rcurtin> hmm, I just realized, another thing we could do is not apply a regularization but force a normalization

17:50 < rcurtin> it gives us identical KNN results if we use some transformation matrix L, or if we use 2*L

17:50 < rcurtin> thus, it would be reasonable to, after each step, simply normalize the matrix L so that ||L||^2 = 1

17:50 < rcurtin> but I think the regularization you are doing now is just fine, it accomplishes roughly the same thing

17:50 < manish7294> that sounds promising

17:51 < manish7294> will try that too

17:54 < rcurtin> the MATLAB implementation does not seem to normalize the learned matrix

17:55 < rcurtin> but like I said, I think it is a minor issue. the penalty should be another completely sufficient way of doing it, and the runtime cost of computing either b*|| L ||^2 or b*L should be pretty insignificant

17:55 < rcurtin> (as compared to the rest of the calculation)

18:04 < manish7294> This may sound stupid but I made a deadly mistake, I didn't notice lbfgs flag was set to true this whole time in above runs and got our hopes high. Sorry for that, I should be more careful :(

18:04 < rcurtin> no worries, but it sounds like the regularization works for L-BFGS

18:04 < rcurtin> I would expect that the amount of penalty to apply (the 'b') would need to be tuned

18:05 < manish7294> no, I hust made change in batch evaluatewithgradient

18:05 < manish7294> moreover lbfgs doesn't suffer from this problem

18:06 killer_bee[m] has quit [Ping timeout: 256 seconds]

18:06 < manish7294> *just

18:07 prakhar_code[m] has quit [Ping timeout: 260 seconds]

18:07 < manish7294> maybe I should try normalization.

18:12 vivekp has quit [Ping timeout: 264 seconds]

18:20 manish7294 has quit [Quit: Page closed]

18:41 manish7294 has joined #mlpack

18:46 sulan_ has quit [Read error: Connection reset by peer]

18:47 sulan_ has joined #mlpack

18:59 < manish7294> rcurtin: Hopefully this time I didn't make any mistake, normalizing coordinates matrix by diving it with its norm helped avoiding the divergence. But it comes up with the issue on how to update coordinates, currently I made a change in sgd itself for testing purpose and I think optimization time has also increased a bit.

19:07 manish7294 has quit [Quit: Page closed]

20:18 sulan_ has quit [Quit: Leaving]

20:33 prakhar_code[m] has joined #mlpack

20:47 ImQ009 has quit [Quit: Leaving]

20:55 witness_ has quit [Quit: Connection closed for inactivity]

21:08 killer_bee[m] has joined #mlpack