#mlpack on 2018-06-08 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:11 vivekp has joined #mlpack

02:32 manish7294 has joined #mlpack

02:35 < manish7294> rcutin: Here are some more results on covertype: k = 12, initial accuracy - 96.7575 final -96.9964, step_size - 1e-08 , total time: 6hrs, 56mins, 18.5 secs, optimizer - sgd

02:36 < manish7294> k = 15, initial accuracy - 96.3226 final -96.955, total time: 13hrs, 47mins, 27.9 secs, optimizer - lbfgs

02:37 < manish7294> In case of sgd step size was kept this low as the optimization was diverging to inf.

02:52 < manish7294> rcurtin: Ahh, Sorry! again made mistake while typing nick

03:04 < rcurtin> no worries

03:04 < rcurtin> the improvement seems marginal but I don't think it's a problem

03:11 < rcurtin> I think the main focus should be acceleration, I think there is a lot that can be done there

03:12 < rcurtin> also you can see if you can increase the convergence tolerance

03:12 manish7294_ has joined #mlpack

03:12 manish7294 has quit [Ping timeout: 260 seconds]

03:13 < manish7294_> rcurtin: On thing is worrying me that instead of terminating with a failure message in case of divergence, I am getting segmentation fault

03:14 < rcurtin> that could make the process take far fewer iterations

03:14 < rcurtin> sorry for the lag again, I am playing mariokart :)

03:15 < manish7294_> And how's your research paper advancing?

03:15 < rcurtin> a segfault is not good, we should investigate that, maybe there is a bug

03:15 < rcurtin> haha

03:15 < rcurtin> right I am playing mariokart not working on it :)

03:15 < rcurtin> but I think it is ready

03:16 < manish7294_> great, nothing comes between mariokart :)

03:18 < rcurtin> ;)

03:18 < rcurtin> they extended the submission deadline to next week but I think it is ready to submit tomorrow anyway

03:20 manish7294_ has quit [Ping timeout: 260 seconds]

03:22 manish7294 has joined #mlpack

03:22 < rcurtin> by the way I ended up spending all day in meetings so I don't have a kNN bound yet

03:23 < rcurtin> another thought for optimization: for SGD at each iteration it is only necessary to compute new impostors for points in the batch

03:23 < rcurtin> so you could use knn.Search(querySet.cols(begin, begin + batchSize - 1))

03:23 < rcurtin> (I think the 1 is needed, double check that...)

03:29 < manish7294> rcurtin: You have suggested this earlier and it is already there in current implementation; )

03:29 < manish7294> :)

03:30 < manish7294> And I think that is one that has created a difference of almost half in timings of lbfgs and sgd

03:30 < rcurtin> ah, sorry, I did not realize that

03:31 < rcurtin> I need to look closely at the state of the implementation, I will do that tomorrow

03:34 < manish7294> great, but you don't have to hurry as research paper must be top priority:)

03:34 < rcurtin> :)

03:34 < rcurtin> well I am headed to bed now

03:35 < rcurtin> talk later! :)

03:35 < manish7294> sure :)

03:51 < manish7294> leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs

04:00 < jenkins-mlpack> Project docker mlpack weekly build build #45: FAILURE in 3 hr 13 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20weekly%20build/45/

04:00 < jenkins-mlpack> * haritha1313: support for RandSVD in CF

04:00 < jenkins-mlpack> * haritha1313: templatized apply

04:00 < jenkins-mlpack> * haritha1313: adjusting eps addition

04:00 < jenkins-mlpack> * haritha1313: refactoring

04:00 < jenkins-mlpack> * haritha1313: style edits

04:00 < jenkins-mlpack> * haritha1313: remove template

04:00 < jenkins-mlpack> * haritha1313: edit

04:00 < jenkins-mlpack> * haritha1313: debugging

04:00 < jenkins-mlpack> * haritha1313: style edit

04:00 < jenkins-mlpack> * haritha1313: debugged matrix error

04:00 < jenkins-mlpack> * haritha1313: debug emptyctortest

04:00 < jenkins-mlpack> * haritha1313: train debug

04:00 < jenkins-mlpack> * haritha1313: debug

04:00 < jenkins-mlpack> * haritha1313: train debug

04:00 < jenkins-mlpack> * haritha1313: regSVD debugging

04:00 < jenkins-mlpack> * haritha1313: test time reduction

04:15 manish7294 has quit [Ping timeout: 265 seconds]

05:32 dasayan05 has joined #mlpack

05:33 dasayan05 has quit [Client Quit]

06:54 < zoq> rcurtin: mariokart N64?

07:58 sumedhghaisas has joined #mlpack

08:00 sumedhghaisas2 has quit [Ping timeout: 240 seconds]

08:00 sumedhghaisas has quit [Read error: Connection reset by peer]

08:01 sumedhghaisas has joined #mlpack

08:05 sumedhghaisas2 has joined #mlpack

08:05 sumedhghaisas2 has quit [Read error: Connection reset by peer]

08:05 sumedhghaisas2 has joined #mlpack

08:05 sumedhghaisas has quit [Ping timeout: 256 seconds]

08:28 sumedhghaisas has joined #mlpack

08:31 sumedhghaisas3 has joined #mlpack

08:32 sumedhghaisas2 has quit [Ping timeout: 240 seconds]

08:33 sumedhghaisas has quit [Ping timeout: 245 seconds]

08:52 sumedhghaisas3 has quit [Remote host closed the connection]

09:12 < zoq> ShikharJ: In case you missed the last message; see http://www.mlpack.org/irc/

09:53 < jenkins-mlpack> Project docker mlpack nightly build build #343: STILL UNSTABLE in 2 hr 39 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/343/

11:27 < ShikharJ> zoq: Ah, sure, let's merge. We have been a lot patient with this, and I guess the time is right :)

11:28 < ShikharJ> rcurtin: If you might want to take a look please feel free and let us know.

11:31 < zoq> gradient: Okay, let me set the timer, so that we conform with the merge policy.

11:33 sumedhghaisas has joined #mlpack

11:33 < zoq> ShikharJ: Sorry, wrong name.

11:40 < ShikharJ> zoq: Great :)

11:42 < zoq> ShikharJ: Excited to get this merged.

11:43 < ShikharJ> zoq: I'm also nearing completion on the DCGAN PR, once it is ready to be tested (by the weekend), we can merge that as well, and then focus on other tasks. Thanks for helping me out on this. We've been able to complete this because of you and lozhnikov!

11:44 < zoq> ShikharJ: You did everything :)

11:46 < ShikharJ> zoq: We must also thank Kris for his patience with the work. He had implemented a major portion of the API that has helped us to finish off this work without much delay :)

11:48 < zoq> ShikharJ: Definitely, Kris provided a great basis to work with.

11:51 < sumedhghaisas> Atharva: Hi Atharva

11:52 < sumedhghaisas> zoq: Hi Marcus. How are you? We were going to propose a change to the ANN loss function arch. Thought I will run it by you.

11:54 < zoq> sumedhghais: Hey, sure, if this makes things easier sure.

11:56 < sumedhghaisas> zoq: So currently the loss is defined as the last layer, thus can only define the loss over the previous layer

11:56 < sumedhghaisas> This restricts our architecture, we aren't able to implement losses which are dependent on intermediate layers

11:57 < sumedhghaisas> for example, in VAE KL loss defined over the mean and stddev which is the output of Reparametrization layer

11:58 < sumedhghaisas> I mean any kind of regularization cannot be implemented with our arch

11:58 < sumedhghaisas> what I propose is

12:00 < sumedhghaisas> Keeping the loss layer whcih defines the major loss, and implementing a visitor which collect extra loss from remaining layers

12:01 < sumedhghaisas> The layer which adds the extra loss will be responsible for adding corresponding error signals in Backward, as these signals won't affect the layers above the current Backward arch can be kept intact

12:01 < zoq> In this case we don't have to change the main interface right?

12:01 < sumedhghaisas> Exactly

12:02 < sumedhghaisas> All we do is expect a Loss() function which returns a double to be added to the loss

12:02 < ShikharJ> sumedhghaisas: That sounds good to me.

12:03 < zoq> Sounds like a good idea to me, I thought about a layer which forwards the output, but an extra visitor is cleaner.

12:04 < sumedhghaisas> SikharJ, zoq: Just wanted to make sure I am not missing any corner case where this will collapse

12:05 < zoq> Nothing comes to mind, at least for now.

12:06 < Atharva> I will go ahead with this then

12:06 < sumedhghaisas> zoq: :) Okay in my understanding the only function where the actual loss is computed is 'Evaluate' right?

12:06 < zoq> inside the FFN and RNN clas right

12:07 < sumedhghaisas> So that will be the only point of change in my view

12:08 < zoq> yeah, if you are going to use the FFN class, that is the part we have to change

12:08 < sumedhghaisas> Atharva: Are you clear about the implementation? This will avoid the creating of extra VAE class :)

12:10 < Atharva> So, we just put a VAE together with the FFN class

12:10 < Atharva> But the class had a lot of other fnctions planned

12:10 < sumedhghaisas> And if time permits, RNN as well :)

12:10 < Atharva> generate functions for example

12:11 < sumedhghaisas> We can convert any architecture to VAE

12:11 < sumedhghaisas> Okay So lets first implement a FNN extension only and get gradient test passing

12:31 sumedhghaisas has quit [Ping timeout: 260 seconds]

12:52 < rcurtin> zoq: nah, I like to play mariokart 8 deluxe online

12:53 < rcurtin> it's crazy, I think there are a lot of people out there who practice way too much :)

12:55 < zoq> ahh, nintendo switch

12:57 manish7294 has joined #mlpack

12:58 < zoq> Perhaps we could play a community round? Not sure anyone else has a switch?

13:00 < manish7294> rcurtin: I tried debugging segfault error coming up during divergence. I think as the coordinates matrix internal values increases to very large values --- KNN search for impostors fails leading to the error.

13:46 < rcurtin> zoq: I'd be up for it :)

13:46 < rcurtin> manish7294: hmm, could you print the coordinates matrix? is the segfault coming from KNN?

13:47 < rcurtin> I thought that KNN should still work with very large values, so long as there is no nan or inf (I don't know what happens in that case)

13:53 < manish7294> rcurtin: https://pastebin.com/0UZE16iK, The error comes when calculating eval in gradient part --- because impostors() outputs garbage values.

13:56 < manish7294> like values can be something similar to 16487695656652.... and then when we query for transformedDataset.col(impostors(j, i)), error pops up.

13:58 < rcurtin> ah, that is not good, the coordinates matrix should never be diverging in that way

13:59 < manish7294> rcurtin: This happens when step size is comparitively large, I mentioned this earlier in PR.

14:00 < rcurtin> it's a little hard for me to keep track, there are several different issues being debugged

14:00 < rcurtin> if the step size is too large, indeed it will bounce around to extremely large values

14:00 < rcurtin> what is the step size being used? I think this is the covertype dataset?

14:00 < rcurtin> also, your comment from earlier:

14:01 < manish7294> step size is 1 here

14:01 < rcurtin> 03:51 < manish7294> leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs

14:01 < rcurtin> the SGD run took 6 hours overall and LBFGS took 13 hours overall, right?

14:01 < manish7294> yes

14:01 < rcurtin> ah, step size 1 is almost always going to be way way way too large. usually 0.01 or 0.001 or even smaller is closer to the right choice

14:02 < manish7294> but with covertype a step size of 1e-06 or greater leads to same

14:03 < rcurtin> ah, hm. I wonder if we need to add some regularization or something, but let's not worry too much about that for now

14:03 < rcurtin> if SGD is only spending a total of 4 minutes computing neighbors, then definitely the main bottleneck now is somewhere else; do you know what part is slow?

14:04 < manish7294> I think recalculation of gradient due to neighbors everytime can be a reason but can't say it is significant

14:05 < rcurtin> you can do some high-level profiling by adding 'Timer::Start()' and 'Timer::Stop()' calls throughout the code

14:05 < rcurtin> (or you could use a profiler like gprof or perf or something like this, but for high-level ideas probably using Timer is the easiest way)

14:06 < manish7294> I think we should that, it will definitely help

14:06 < manish7294> and can we do something for that divergence thing like throwing an error or something

14:07 < rcurtin> we could, we would have to catch the condition though

14:07 < rcurtin> I'd like to try and reproduce that, so let me check out the code and see if I can get it to happen

14:07 < manish7294> good enough

14:07 < rcurtin> I really don't think it would be a bad idea to add a penalty term like -|| L || to the objective

14:08 < rcurtin> or something like this, it should help keep the entries of the matrix from diverging

14:08 < rcurtin> in any case, let me reproduce it and see

14:08 < manish7294> sure, I shall be doing the timings then.

14:09 < rcurtin> yeah; if we have the computing neighbors down to 4 minutes with covertype and SGD, this is definitely a great start, and if we can reduce the other part of the computation similarly I think the implementation will be fast

14:09 < rcurtin> still a little work to do for L-BFGS, but I think there are still lots of ideas we can do

14:10 < rcurtin> I guess we should benchmark with other implementations at some point, but at the very least that MATLAB implementation will never work with covertype... since it builds the entire n x n distance matrix, we could only compare against a dataset of roughly 6-8k points or less

14:10 < manish7294> Right, and I think shogun's is roughly based on the same

14:12 < manish7294> Ahh! grammatical mistake :)

14:15 < rcurtin> huh, I'm not sure I noticed any grammatical mistake

14:19 < manish7294> shogun's is

14:19 < rcurtin> hmm, I guess technically the implication is that you mean "shogun's implementation"

14:20 < manish7294> right :)

14:20 < rcurtin> which would work as "shogun's implementation is". I guess I am not sure whether leaving the implementation out makes it grammatically incorrect

14:20 < rcurtin> it seems like a pretty small issue either way :)

14:21 < manish7294> leaving implementation will be like shogun is is or shogun has is :)

14:21 < rcurtin> I guess we could go to the grammar stack exchange

14:21 < rcurtin> but there sure is a lot of pedantry in that forum :)

14:22 < manish7294> let's add this to legendary list of issues to deal with for now, haha :)

14:23 < rcurtin> hah, sounds good :)

14:23 < rcurtin> if you like, maybe it might be worthwhile to add a checklist to the LMNN PR of issues to look into, but that is up to you

14:23 < rcurtin> but it might be useful to have some way to track the multiple threads of discussion

14:23 < manish7294> haha

14:24 < manish7294> maybe a comment at the end with some eye catching material will do

14:26 < rcurtin> that works also, however you want to do it. really in the end all I'm looking for is that we can do LMNN on pretty large datasets and it works reasonably well

14:26 < rcurtin> if all the implementations are built like the MATLAB one, then yours will be able to scale much more significantly than anything else already, but I think we can still make it faster still :)

14:28 < rcurtin> I think we are still roughly on track with your timeline, you had written that you planned to be fully done with LMNN by 6/15

14:28 < rcurtin> actually, I guess you have time for LMNN and benchmarking or writing the lmnn_main until July 1st

14:29 < manish7294> The deadline is closing by, Will need to hurry on optimization part

14:29 < rcurtin> I do think that all the accelerations we do for LMNN will apply to BoostMetric also, which will be nice

14:30 < manish7294> Yes, doing boostmetric part will be lot easier :)

14:30 < manish7294> thanks to all LMNN efforts

16:03 manish7294 has quit [Quit: Page closed]

16:30 vivekp has quit [Ping timeout: 240 seconds]

16:32 vivekp has joined #mlpack

20:01 ImQ009 has joined #mlpack

20:33 ImQ009 has quit [Quit: Leaving]

21:01 travis-ci has joined #mlpack

21:01 < travis-ci> mlpack/mlpack#5029 (master - 4c008c4 : Ryan Curtin): The build passed.

21:01 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/1917b1a841d6...4c008c467d75

21:01 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/389889118

21:01 travis-ci has left #mlpack []

21:03 < rcurtin> manish7294: I had time to do some quick experimentation. I built mlpack_lmnn and inserted a number of timers, then ran with SGD with a learning rate of 1e-8 and a batch size of 512 on a subset of 5k points from the covertype dataset

21:03 < rcurtin> this gave reasonable results, but I found that the whole run took 21.8 seconds; of this time, 21.46 was spent in the outer lmnn_sgd_optimization timer, and 20.5 of that was spent in Constraints::Impostors()

21:04 < rcurtin> but the KNN search timer ("computing_neighbors" and "tree_building" will be the parts timed from KNN) only took 3.3 seconds and 0.13 seconds, respectively

21:04 < rcurtin> so it seems like there must be some big inefficiency in the other parts of Impostors()

21:16 < rcurtin> manish7294: I suspect the inefficiency is in the fact that arma::find() and arma::unique() are being called every time Impostors() is called

21:17 < rcurtin> I think you can accelerate things by caching those calculations at the start of the optimization

21:32 travis-ci has joined #mlpack

21:32 < travis-ci> mlpack/mlpack#5030 (master - 6a59dd5 : Ryan Curtin): The build passed.

21:32 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/4c008c467d75...6a59dd568f0b

21:32 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/389889297

21:32 travis-ci has left #mlpack []