verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has joined #mlpack
manish7294 has joined #mlpack
< manish7294>
rcutin: Here are some more results on covertype: k = 12, initial accuracy - 96.7575 final -96.9964, step_size - 1e-08 , total time: 6hrs, 56mins, 18.5 secs, optimizer - sgd
< manish7294>
k = 15, initial accuracy - 96.3226 final -96.955, total time: 13hrs, 47mins, 27.9 secs, optimizer - lbfgs
< manish7294>
In case of sgd step size was kept this low as the optimization was diverging to inf.
< manish7294>
rcurtin: Ahh, Sorry! again made mistake while typing nick
< rcurtin>
no worries
< rcurtin>
the improvement seems marginal but I don't think it's a problem
< rcurtin>
I think the main focus should be acceleration, I think there is a lot that can be done there
< rcurtin>
also you can see if you can increase the convergence tolerance
manish7294_ has joined #mlpack
manish7294 has quit [Ping timeout: 260 seconds]
< manish7294_>
rcurtin: On thing is worrying me that instead of terminating with a failure message in case of divergence, I am getting segmentation fault
< rcurtin>
that could make the process take far fewer iterations
< rcurtin>
sorry for the lag again, I am playing mariokart :)
< manish7294_>
And how's your research paper advancing?
< rcurtin>
a segfault is not good, we should investigate that, maybe there is a bug
< rcurtin>
haha
< rcurtin>
right I am playing mariokart not working on it :)
< rcurtin>
but I think it is ready
< manish7294_>
great, nothing comes between mariokart :)
< rcurtin>
;)
< rcurtin>
they extended the submission deadline to next week but I think it is ready to submit tomorrow anyway
manish7294_ has quit [Ping timeout: 260 seconds]
manish7294 has joined #mlpack
< rcurtin>
by the way I ended up spending all day in meetings so I don't have a kNN bound yet
< rcurtin>
another thought for optimization: for SGD at each iteration it is only necessary to compute new impostors for points in the batch
< rcurtin>
so you could use knn.Search(querySet.cols(begin, begin + batchSize - 1))
< rcurtin>
(I think the 1 is needed, double check that...)
< manish7294>
rcurtin: You have suggested this earlier and it is already there in current implementation; )
< manish7294>
:)
< manish7294>
And I think that is one that has created a difference of almost half in timings of lbfgs and sgd
< rcurtin>
ah, sorry, I did not realize that
< rcurtin>
I need to look closely at the state of the implementation, I will do that tomorrow
< manish7294>
great, but you don't have to hurry as research paper must be top priority:)
< rcurtin>
:)
< rcurtin>
well I am headed to bed now
< rcurtin>
talk later! :)
< manish7294>
sure :)
< manish7294>
leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs
< ShikharJ>
zoq: Ah, sure, let's merge. We have been a lot patient with this, and I guess the time is right :)
< ShikharJ>
rcurtin: If you might want to take a look please feel free and let us know.
< zoq>
gradient: Okay, let me set the timer, so that we conform with the merge policy.
sumedhghaisas has joined #mlpack
< zoq>
ShikharJ: Sorry, wrong name.
< ShikharJ>
zoq: Great :)
< zoq>
ShikharJ: Excited to get this merged.
< ShikharJ>
zoq: I'm also nearing completion on the DCGAN PR, once it is ready to be tested (by the weekend), we can merge that as well, and then focus on other tasks. Thanks for helping me out on this. We've been able to complete this because of you and lozhnikov!
< zoq>
ShikharJ: You did everything :)
< ShikharJ>
zoq: We must also thank Kris for his patience with the work. He had implemented a major portion of the API that has helped us to finish off this work without much delay :)
< zoq>
ShikharJ: Definitely, Kris provided a great basis to work with.
< sumedhghaisas>
Atharva: Hi Atharva
< sumedhghaisas>
zoq: Hi Marcus. How are you? We were going to propose a change to the ANN loss function arch. Thought I will run it by you.
< zoq>
sumedhghais: Hey, sure, if this makes things easier sure.
< sumedhghaisas>
zoq: So currently the loss is defined as the last layer, thus can only define the loss over the previous layer
< sumedhghaisas>
This restricts our architecture, we aren't able to implement losses which are dependent on intermediate layers
< sumedhghaisas>
for example, in VAE KL loss defined over the mean and stddev which is the output of Reparametrization layer
< sumedhghaisas>
I mean any kind of regularization cannot be implemented with our arch
< sumedhghaisas>
what I propose is
< sumedhghaisas>
Keeping the loss layer whcih defines the major loss, and implementing a visitor which collect extra loss from remaining layers
< sumedhghaisas>
The layer which adds the extra loss will be responsible for adding corresponding error signals in Backward, as these signals won't affect the layers above the current Backward arch can be kept intact
< zoq>
In this case we don't have to change the main interface right?
< sumedhghaisas>
Exactly
< sumedhghaisas>
All we do is expect a Loss() function which returns a double to be added to the loss
< ShikharJ>
sumedhghaisas: That sounds good to me.
< zoq>
Sounds like a good idea to me, I thought about a layer which forwards the output, but an extra visitor is cleaner.
< sumedhghaisas>
SikharJ, zoq: Just wanted to make sure I am not missing any corner case where this will collapse
< zoq>
Nothing comes to mind, at least for now.
< Atharva>
I will go ahead with this then
< sumedhghaisas>
zoq: :) Okay in my understanding the only function where the actual loss is computed is 'Evaluate' right?
< zoq>
inside the FFN and RNN clas right
< sumedhghaisas>
So that will be the only point of change in my view
< zoq>
yeah, if you are going to use the FFN class, that is the part we have to change
< sumedhghaisas>
Atharva: Are you clear about the implementation? This will avoid the creating of extra VAE class :)
< Atharva>
So, we just put a VAE together with the FFN class
< Atharva>
But the class had a lot of other fnctions planned
< sumedhghaisas>
And if time permits, RNN as well :)
< Atharva>
generate functions for example
< sumedhghaisas>
We can convert any architecture to VAE
< sumedhghaisas>
Okay So lets first implement a FNN extension only and get gradient test passing
sumedhghaisas has quit [Ping timeout: 260 seconds]
< rcurtin>
zoq: nah, I like to play mariokart 8 deluxe online
< rcurtin>
it's crazy, I think there are a lot of people out there who practice way too much :)
< zoq>
ahh, nintendo switch
manish7294 has joined #mlpack
< zoq>
Perhaps we could play a community round? Not sure anyone else has a switch?
< manish7294>
rcurtin: I tried debugging segfault error coming up during divergence. I think as the coordinates matrix internal values increases to very large values --- KNN search for impostors fails leading to the error.
< rcurtin>
zoq: I'd be up for it :)
< rcurtin>
manish7294: hmm, could you print the coordinates matrix? is the segfault coming from KNN?
< rcurtin>
I thought that KNN should still work with very large values, so long as there is no nan or inf (I don't know what happens in that case)
< manish7294>
rcurtin: https://pastebin.com/0UZE16iK, The error comes when calculating eval in gradient part --- because impostors() outputs garbage values.
< manish7294>
like values can be something similar to 16487695656652.... and then when we query for transformedDataset.col(impostors(j, i)), error pops up.
< rcurtin>
ah, that is not good, the coordinates matrix should never be diverging in that way
< manish7294>
rcurtin: This happens when step size is comparitively large, I mentioned this earlier in PR.
< rcurtin>
it's a little hard for me to keep track, there are several different issues being debugged
< rcurtin>
if the step size is too large, indeed it will bounce around to extremely large values
< rcurtin>
what is the step size being used? I think this is the covertype dataset?
< rcurtin>
also, your comment from earlier:
< manish7294>
step size is 1 here
< rcurtin>
03:51 < manish7294> leaving a comment here regarding comparison of sgd and lbfgs w.r.t above batch optimization : sgd computing neighbors time - 3 mins 58.8 secs, lbfgs - 4 hrs, 17 mins, 5.7 secs
< rcurtin>
the SGD run took 6 hours overall and LBFGS took 13 hours overall, right?
< manish7294>
yes
< rcurtin>
ah, step size 1 is almost always going to be way way way too large. usually 0.01 or 0.001 or even smaller is closer to the right choice
< manish7294>
but with covertype a step size of 1e-06 or greater leads to same
< rcurtin>
ah, hm. I wonder if we need to add some regularization or something, but let's not worry too much about that for now
< rcurtin>
if SGD is only spending a total of 4 minutes computing neighbors, then definitely the main bottleneck now is somewhere else; do you know what part is slow?
< manish7294>
I think recalculation of gradient due to neighbors everytime can be a reason but can't say it is significant
< rcurtin>
you can do some high-level profiling by adding 'Timer::Start()' and 'Timer::Stop()' calls throughout the code
< rcurtin>
(or you could use a profiler like gprof or perf or something like this, but for high-level ideas probably using Timer is the easiest way)
< manish7294>
I think we should that, it will definitely help
< manish7294>
and can we do something for that divergence thing like throwing an error or something
< rcurtin>
we could, we would have to catch the condition though
< rcurtin>
I'd like to try and reproduce that, so let me check out the code and see if I can get it to happen
< manish7294>
good enough
< rcurtin>
I really don't think it would be a bad idea to add a penalty term like -|| L || to the objective
< rcurtin>
or something like this, it should help keep the entries of the matrix from diverging
< rcurtin>
in any case, let me reproduce it and see
< manish7294>
sure, I shall be doing the timings then.
< rcurtin>
yeah; if we have the computing neighbors down to 4 minutes with covertype and SGD, this is definitely a great start, and if we can reduce the other part of the computation similarly I think the implementation will be fast
< rcurtin>
still a little work to do for L-BFGS, but I think there are still lots of ideas we can do
< rcurtin>
I guess we should benchmark with other implementations at some point, but at the very least that MATLAB implementation will never work with covertype... since it builds the entire n x n distance matrix, we could only compare against a dataset of roughly 6-8k points or less
< manish7294>
Right, and I think shogun's is roughly based on the same
< manish7294>
Ahh! grammatical mistake :)
< rcurtin>
huh, I'm not sure I noticed any grammatical mistake
< manish7294>
shogun's is
< rcurtin>
hmm, I guess technically the implication is that you mean "shogun's implementation"
< manish7294>
right :)
< rcurtin>
which would work as "shogun's implementation is". I guess I am not sure whether leaving the implementation out makes it grammatically incorrect
< rcurtin>
it seems like a pretty small issue either way :)
< manish7294>
leaving implementation will be like shogun is is or shogun has is :)
< rcurtin>
I guess we could go to the grammar stack exchange
< rcurtin>
but there sure is a lot of pedantry in that forum :)
< manish7294>
let's add this to legendary list of issues to deal with for now, haha :)
< rcurtin>
hah, sounds good :)
< rcurtin>
if you like, maybe it might be worthwhile to add a checklist to the LMNN PR of issues to look into, but that is up to you
< rcurtin>
but it might be useful to have some way to track the multiple threads of discussion
< manish7294>
haha
< manish7294>
maybe a comment at the end with some eye catching material will do
< rcurtin>
that works also, however you want to do it. really in the end all I'm looking for is that we can do LMNN on pretty large datasets and it works reasonably well
< rcurtin>
if all the implementations are built like the MATLAB one, then yours will be able to scale much more significantly than anything else already, but I think we can still make it faster still :)
< rcurtin>
I think we are still roughly on track with your timeline, you had written that you planned to be fully done with LMNN by 6/15
< rcurtin>
actually, I guess you have time for LMNN and benchmarking or writing the lmnn_main until July 1st
< manish7294>
The deadline is closing by, Will need to hurry on optimization part
< rcurtin>
I do think that all the accelerations we do for LMNN will apply to BoostMetric also, which will be nice
< manish7294>
Yes, doing boostmetric part will be lot easier :)
< manish7294>
thanks to all LMNN efforts
manish7294 has quit [Quit: Page closed]
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
ImQ009 has joined #mlpack
ImQ009 has quit [Quit: Leaving]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#5029 (master - 4c008c4 : Ryan Curtin): The build passed.
< rcurtin>
manish7294: I had time to do some quick experimentation. I built mlpack_lmnn and inserted a number of timers, then ran with SGD with a learning rate of 1e-8 and a batch size of 512 on a subset of 5k points from the covertype dataset
< rcurtin>
this gave reasonable results, but I found that the whole run took 21.8 seconds; of this time, 21.46 was spent in the outer lmnn_sgd_optimization timer, and 20.5 of that was spent in Constraints::Impostors()
< rcurtin>
but the KNN search timer ("computing_neighbors" and "tree_building" will be the parts timed from KNN) only took 3.3 seconds and 0.13 seconds, respectively
< rcurtin>
so it seems like there must be some big inefficiency in the other parts of Impostors()
< rcurtin>
manish7294: I suspect the inefficiency is in the fact that arma::find() and arma::unique() are being called every time Impostors() is called
< rcurtin>
I think you can accelerate things by caching those calculations at the start of the optimization
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#5030 (master - 6a59dd5 : Ryan Curtin): The build passed.