naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
Anand has joined #mlpack
Anand has quit [Quit: Page closed]
witness___ has joined #mlpack
govg has quit [Read error: Connection reset by peer]
witness___ has quit [Quit: Connection closed for inactivity]
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
sumedh_ has joined #mlpack
Gys has joined #mlpack
< Gys>
Hello
< Gys>
Can someone help me with the k-means?
< sumedh_>
Gys: hello Gys, what problem are you facing??
< Gys>
hi sumedh_ , thanks for replying
< Gys>
I would like to know if it's possible to do a "weighted k-means"
< Gys>
I mean, is it a way to calculate clusters by considering a weight for each point?
< Gys>
I thought to simulate this weight by setting several points at the same location. But I would be sure that all points from a same location, which define a weighted point, won't be split into 2 or more clusters...
govg has joined #mlpack
< sumedh_>
Gys: I think the simulation idea would work... points at same location cannot be split into 2 different clusters... As I haven't implemented K-means in MLPACK... naywhayare will be able to help you out...
< sumedh_>
you can test the simulation by creating a small test case...
< sumedh_>
and check if the weighted point is getting split or not...
< Gys>
Yes, I will. Thanks for your reply!
govg has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
Anand has joined #mlpack
< Anand>
Marcus : It seems that a negative index got passed in the mse unit test which caused the build failure. I have taken care of that case now. It should not give an error. If it still does, then the problem is somewhere else.
< Anand>
Also, let me know your views about weka logistic predictions.
< Anand>
I am starting with linear regression soon.
govg has quit [Ping timeout: 240 seconds]
< naywhayare>
Gys: unfortunately, mlpack doesn't implement weighted k-means, but it would probably be easy to modify the code to do so
Anand has quit [Ping timeout: 246 seconds]
< Gys>
hi naywhayare, I'm diving into the src code :)
< naywhayare>
:)
< naywhayare>
what's implemented there is the standard k-means algorithm, so it should be pretty easy to deal with. if the FastCluster() function is still there (I think I removed it) you can just ignore it; the Cluster() function is the one of interest
< Gys>
ok thanks
govg has joined #mlpack
< jenkins-mlpack>
Starting build #1994 for job mlpack - svn checkin test (previous build: SUCCESS)
govg has quit [Read error: Connection reset by peer]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: L_BFGS works fine for Reg SVD
oldbeardo has quit [Changing host]
oldbeardo has joined #mlpack
< naywhayare>
oldbeardo: ok, cool. does it run much faster than the mlpack implementation of SGD?
< oldbeardo>
naywhayare: it runs quite a bit faster
< oldbeardo>
I didn't mention 1.49 because the build doesn't work
< oldbeardo>
on my system that is
< naywhayare>
oh? can you explain more (about the build not working)?
< oldbeardo>
well, it says Boost 1.49 package not found
< naywhayare>
not "Detected version of Boost is too old. Requested version was 1.49 (or newer)." ?
< oldbeardo>
no, I installed 1.49
< naywhayare>
so it finds the package without the change I made (adding 1.49 to find_package(Boost ...)), but when you add 1.49 to the find_package call, it then fails?
< oldbeardo>
yes
< naywhayare>
where did you install boost?
< naywhayare>
or, how? through the package manager?
< oldbeardo>
usr/local/lib
< naywhayare>
so, what happens if you call 'cmake -DBOOST_ROOT=/usr/local/ ../' ?
< oldbeardo>
well I didn't try that, simply removed it
< naywhayare>
ok, can you go ahead and try that? maybe cmake is not noticing the newer version of boost in /usr/local/ and only finding an older version in /usr, and then giving up
< naywhayare>
we definitely need the '1.49' there, though, since the cosine tree code won't compile without boost.heap
< oldbeardo>
I will try it tomorrow, I'm working on Reg SVD for the time being
andrewmw94 has joined #mlpack
< sumedh_>
naywhayare: can we talk about that incremental learning issue right now??
< naywhayare>
sumedh_: can it wait please? I have not had a chance to look into it yet, and am busy with a few other things... the day has just started for me...
< naywhayare>
maybe a couple of hours?
< sumedh_>
sure... no problem :)
< naywhayare>
ok, thanks
< naywhayare>
oldbeardo: so you said that regularized svd with L-BFGS runs much more quickly than the mlpack SGD implementation... but how does it compare to the SGD implementation you wrote by hand?
< oldbeardo>
naywhayare: nice catch, I forgot to compare those two, I will do it today, also what about the unit tests?
< jenkins-mlpack>
Ryan Curtin: Bump minimum required Boost version to 1.49 for boost.heap.
< jenkins-mlpack>
Starting build #1995 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare>
oldbeardo: so for unit tests, what we ought to do is have a few tests that make sure Evaluate() and Gradient() are working correctly
< naywhayare>
usually these are easiest to write with simple synthetic dataset
< naywhayare>
datasets*
< naywhayare>
and then you can calculate the result by hand and hard-code it into the test (just double-check your calculation first ;))
< naywhayare>
then we should also make sure that we can actually perform regularized SVD using some mlpack optimizers on some simple datasets that won't take too long to run
< oldbeardo>
okay, should I remove the Evaluate() and Gradient() functions written for SGD?
< naywhayare>
let's leave them there for now, and maybe we can think of a good way to make them faster
< naywhayare>
representing the gradient as an arma::sp_mat object might be one direction that could provide a solution, but that would require a decent amount of refactoring on the part of the optimizer, and it would need to take some template parameters and so forth...
< oldbeardo>
right, I will leave it as it is for now
< naywhayare>
also, you haven't posted a status update to the blog or mailing list in a while, can you please be sure to do that each week in the future?
< oldbeardo>
sorry about that, I just posted it around half an hour back
< naywhayare>
oh, ok... I think I looked 45 minutes ago, I should have waited a moment, I guess
< jenkins-mlpack>
Ryan Curtin: Refactor RASearch so that it does not accept a leafSize parameter and can build
< jenkins-mlpack>
arbitrary tree types.
< naywhayare>
you could just use 0.5 * bound.Diam(), unless you have a cleverer idea
< naywhayare>
I'm pretty sure that's what the BinarySpaceTree does
< jenkins-mlpack>
Starting build #1997 for job mlpack - svn checkin test (previous build: FAILURE -- last SUCCESS #1995 1 hr 26 min ago)
< andrewmw94>
yeah, I misread the comment. That's what it does too.
< naywhayare>
the cover tree actually calculates the furthest descendant distance, but it only does it because the construction algorithm happens to calculate that anyway, and the node just caches the information
< andrewmw94>
yeah. Should I change the comment on the variable in the Binary space tree
< andrewmw94>
well, actually I guess it is cached, it just caches the 0.5 * diam rather than the actual distance
< andrewmw94>
so I should probably leave it
< andrewmw94>
but do you mind if I clarify what is cached?
Anand has quit [Ping timeout: 246 seconds]
< naywhayare>
yeah, go for it
< naywhayare>
you don't need to ask before clarifying comments, presuming you don't have doubts that what you wrote isn't correct :)
< andrewmw94>
"don't doubts isn't correct." Let's see. Yes, that is true.
< andrewmw94>
:)
< naywhayare>
yeah, the double negative was a little bit odd, but if I wrote "presuming you have no doubts that what you wrote is correct", that would imply a level of certainty that's higher than necessary
< naywhayare>
but maybe the double negative ends up having the same meaning? I'm not sure. I can debug template errors but not English errors
< andrewmw94>
yeah. I find it amusing how programmers use English in a different way than most people. I frequently nest parentheses for example.
< andrewmw94>
and or is always taken in the inclusive sense
< naywhayare>
nested parentheses are incredibly useful for clarifying potentially ambiguous clauses (like nested boolean operators)
< andrewmw94>
yeah, but english teachers don't seem to like them. I have the traversal working, at least, the output matches that of the other traversers for the first few lines
< naywhayare>
do you mean the output of the algorithm?
< andrewmw94>
yes, the neighbors.csv file
< andrewmw94>
I'll write a test case for it to automate it
< naywhayare>
ok. if it's not the same for some lines, I would try to debug with the smallest dataset you can reproduce the error with
< naywhayare>
so the allknn_test.cpp file has a dataset used for the ExhaustiveSyntheticTest that may be useful here
< naywhayare>
it's 13 points, I think
< naywhayare>
so you can trace the entire traversal through the 13-point tree and make sure no invalid prunes are happening
< andrewmw94>
yeah. I meant it's the same for all of the lines I checked, which happened to be the first few
< naywhayare>
ah, ok
< naywhayare>
you can just diff neighbors_rtree.csv neighbors_kdtree.csv, that's what I usually do
< naywhayare>
and distances_rtree.csv distances_kdtree.csv
< naywhayare>
if you end up having to debug the traversal, which you probably will sooner or later, ideas include commenting out all of the 'return DBL_MAX' in the rules class, so that nothing is pruned
< naywhayare>
and if it still gives the wrong result with nothing pruned, the traversal must not be reaching all of the nodes, or must be reaching some nodes twice, or something like that
< andrewmw94>
yeah. Diff says they're the same
< naywhayare>
cool; I'd test for a handful more datasets (the larger the better... my usual arsenal includes corel.csv, covertype.csv, and LCDM_q.csv/LCDM_r.csv; if you need those files, I can make them available)
< andrewmw94>
That would be great. I had trouble getting most of the datasets last time.
< jenkins-mlpack>
* Ryan Curtin: Minor formatting changes according to the style guide (mostly, I think?).
< jenkins-mlpack>
* Ryan Curtin: First pass -- move files to match naming policy, change initialize() to
< jenkins-mlpack>
Initialize(), standardize comment formatting, fix some Doxygen commands. No
< jenkins-mlpack>
serious functionality changes.
< jenkins-mlpack>
* Ryan Curtin: Fix constructor calls, and automatically construct a cover tree with the default
< jenkins-mlpack>
RASearch constructor.
sumedh_ has quit [Ping timeout: 272 seconds]
< naywhayare>
sumedh__: ok, let's talk about incremental SVD, whenever you're ready. sorry for the delay
< sumedh__>
naywhayare: I will be taking dinner in 15 min :( but okay I can at least tell you the problem...
< sumedh__>
before that...
< naywhayare>
sure, that sounds good. that'll give me a little time to think
< naywhayare>
I'm assuming you are implementing Algorithm 3 in the paper "A Guide to Singular Value Decomposition for Collaborative Filtering"
< sumedh__>
yes... 3 and 4... they are almost similar...
< sumedh__>
sorry... 2 and 3...
< sumedh__>
1 and 4 are done...
< naywhayare>
yeah, ok, I see
< sumedh__>
if you see algorithm 2 ... the wUpdate and HUpdate are linked with each other,,,
< sumedh__>
In that we have to call WUpdate and HUpdate for each individual user...
< naywhayare>
I understand the WUpdate and HUpdate functions to be steps 2(a)iii and 2(a)iv
< sumedh__>
yes... but that step is just updating ith row or jth column of respective matrix
< sumedh__>
not the entire one...
< sumedh__>
after updating both of these vectors... we move on to the next user...
< sumedh__>
our current abstraction is incapable of handling this scenario...
< naywhayare>
my first thought is that you can implement WUpdate() and HUpdate() to only work on one user
< naywhayare>
and each time you call WUpdate you increment the index of the user that is being used
< sumedh__>
yes... that is another solution...
< naywhayare>
do you think that would work?
< sumedh__>
but combining both of these function would give a better solution...
< naywhayare>
ok, so the choices seem to be that we can keep them separate and use the AMF abstraction, or we can create a new class that implements those in a different way
< sumedh__>
I will just take dinner and come ...
< naywhayare>
ok, sounds good
< naywhayare>
let me know when you're back, and we can continue
< naywhayare>
I'll try and think of any other possible solutions
< sumedh__>
naywhayare: okay I am back...
< sumedh__>
naywhayare: if I am using an iterator... and I say it++ ... which direction is given priority?? row or column??
< naywhayare>
it will increment the row first, then the column
< naywhayare>
so you'll access all the elements in one column, then move to the next column
< naywhayare>
if you need the other way you can use row_iterator, but row_iterator can be very slow...
< naywhayare>
about the incremental svd... I can't think of anything better. you could modify the termination policy so it only checks for convergence every time all users have been looped over
< naywhayare>
but maybe this doesn't fit into the AMF abstraction very well
< naywhayare>
what do you think? do you think writing another class for incremental SVD is the right option?
< sumedh__>
ohh... I was thinking about how we can work around with current abstraction... we have to use iterator... or sp_mat operations will be slow...
< sumedh__>
changing termination policy is not a good option I agree...
< sumedh__>
so we are left with template specialization... and combining both update functions...
< sumedh__>
or... use some kind of dynamic checking ....
< naywhayare>
I think it's possible to use the current abstraction with the idea I suggested -- increment the user index each time WUpdate() is called and then perform HUpdate() for that user
< naywhayare>
then only check for termination every time the user index is 0
< sumedh__>
we can keep both options open... either implement a single function for both updates and implement 2 separate functions...
< naywhayare>
so I would say if you would rather implement just a single function, we should write a separate class because that doesn't fit in the abstraction very well
< naywhayare>
but with 2 separate functions and a custom termination policy, I think the abstraction could work
< naywhayare>
in my view, either of these options work, with the 2 functions idea being nice because you get some code reuse
< naywhayare>
what do you think?
< sumedh__>
okay... according to current abstraction step function increments iteration...
< sumedh__>
okay I got it...
< sumedh__>
lets pass step function a boolean...
< sumedh__>
depending on that boolean step will increase the iteration count...
< sumedh__>
but this boolean has to be generated by update function somehow...
< naywhayare>
try this -- have the TerminationPolicy class be owned by (or the same as) the update rules class
< naywhayare>
then you can hold the iteration index internally in the TerminationPolicy class and access it in the update rules class
< naywhayare>
or something like that. maybe that is helpful?
< sumedh__>
what if we force update function to return a variable which will be passed to step function??
< naywhayare>
then we have to modify the AMF abstraction; I'd prefer not to do that, if possible
< sumedh__>
no... we don't have to do that... right now we have ...
< sumedh__>
update.WUpdate(V, W, H);
< sumedh__>
update.HUpdate(V, W, H);
< sumedh__>
t_policy.Step(W, H);
< sumedh__>
now it will be
< sumedh__>
bool w_res = update.WUpdate(...
< sumedh__>
bool h_res = update.HUpdate(...
< sumedh__>
t_policy.Step(W, H, w_res && h_res)
witness___ has joined #mlpack
< sumedh__>
the problem with the other approach is... how can be make update class own termination policy class.. if we make it a template parameter... it will be difficult to initialize update object...
< sumedh__>
keeping update rule and termination policy separate is a better design...
< sumedh__>
what about this approach...
< sumedh__>
do
< sumedh__>
{
< sumedh__>
update.WUpdate(..)
< sumedh__>
update.HUpdate(..)
< sumedh__>
} while(update.ISUpdated())
< sumedh__>
t_policy.step()
< naywhayare>
I'm reluctant to make the abstraction more complex, because it means that a user who just wants to try something simple has to do more work and understand a more complex abstraction
< naywhayare>
you could make the UpdateRules policy take a reference to the TerminationPolicy as part of the constructor
< naywhayare>
and the TerminationPolicy holds the iteration number
< sumedh__>
but then update_rule has to be templatized with termination_policy... its better to keep these two policies separate I think,... there has to be a better way..
< naywhayare>
another idea is to have the update rule take a 'const double&' in its constructor, which will represent the iteration number
< naywhayare>
either that or just have the update rule count iterations on its own and also have the termination policy count iterations on its own
< sumedh__>
if we make update rule own termination policy then there is no need of termination policy in AMF... AMF will only interact with update rule...
< sumedh__>
humm.. you are right...
< sumedh__>
that is a good option...
< sumedh__>
make termination policy part of update rule rather than AMF...
< sumedh__>
naywhayare: what you say??
< naywhayare>
no, my idea was that the termination rule and update rule can just count the iteration separately
< naywhayare>
I don't think combining the termination policy and update rule is the best idea, because the two things are fairly orthogonal (in most cases, with this case being an exception)
< sumedh__>
yes... keeping them separate is the best choice... but if they count iteration separately how to force termination??
< naywhayare>
I don't know what you mean; the termination should happen when the validation rmse increases
< naywhayare>
but you only want to check the validation rmse when (iteration % numUsers) == 0
< naywhayare>
so you just write some code to do that in a TerminationPolicy class and I think that should work; what do you think?
< sumedh__>
yes.. thats right... but what about other update rules... when they are using termination policies this additional thing has to be turned off... I am confused about that...
< sumedh__>
do we pass numUsers in termination policy??
< naywhayare>
the constructor of the termination policy can take the number of users as input
< naywhayare>
you can write a special termination policy for these algorithms, 'IncrementalValidationRMSETermination'
< naywhayare>
and we'll only use this for Algorithms 2 and 3 (the incremental SVD algorithms)
< naywhayare>
so a user _could_ use IncrementalValidationRMSETermination with some other set of update rules, but it's meant to be used with IncrementalSVDUpdateRules (or whatever you name the class)
< naywhayare>
does that make sense? or maybe I have overlooked something?
< sumedh__>
won't this create more confusion?? :(
< naywhayare>
I don't think so; we're not setting IncrementalValidationRMSETermination to be the default termination policy or anything
< naywhayare>
it's just one of several options users have (including writing their own), and we can write in the documentation that this termination policy is meant to be used with the incremental SVD rules and will probably act weirdly if used with something else
< sumedh__>
yes... thats the issue... user has to use that update with SVDIncrementalLearning... but yes... user has to read the documentation...
< naywhayare>
a user could use another termination policy, but it might be slower
< naywhayare>
it'll still work, though
< naywhayare>
it'll just check for convergence at every user
< sumedh__>
and might not converge... cause of maxIterations...
< sumedh__>
if there are 6000 users... like in the case of MovieLens...
< naywhayare>
yeah; but in that case, the user should have read the documentation, or just increased the number of iterations
< naywhayare>
I would like to provide some nice typedefs so that users who aren't playing with new ideas can just have some convenient typedefs
< naywhayare>
i.e. 'typedef AMF<> NMF'
< naywhayare>
(because AMF with default parameters is AMF)
< naywhayare>
and that should help clarify things, too, in addition to documentation
< sumedh__>
yes.. thats a very good idea...
< sumedh__>
that will definitely clarify things while reading the code..
< sumedh__>
and most people do read header file to see the abstraction...
< naywhayare>
yeah; and we can make the abstraction clear in a tutorial, too, which is probably what most users read first
< naywhayare>
I think many users are just thinking "I want to run NMF/SVD/incremental SVD/batch SVD and I don't care how, I just want the right commands to type"
< sumedh__>
the problem still persists for algorithm 3... :(
< naywhayare>
which problem? how?
< sumedh__>
in algorithm 3... we have to change the parameter from numUsers to total non zero entries...
< naywhayare>
okay... then you can write a very slightly different validation RMSE termination class
< sumedh__>
but user will have to compute the number of non-zero entries in the matrix...
< sumedh__>
rather than that... lets pass the matrix...
< naywhayare>
that works too
< sumedh__>
but then we have to write 2 different set of termination rules ... one for algorithm 2 and one for algorithm 3...
< sumedh__>
cause in algorithm 2 we have to count number of users...
< sumedh__>
and in algorithm 3 we have to count number of non-zero entries...
< naywhayare>
you can structure those two classes in such a way to reduce code reuse
< naywhayare>
they can hold a ValidationRMSETermination object inside of them and forward the calls when (iteration % numUsers) or (iteration % n_nonzero) is 0
< naywhayare>
and you can use a boolean template parameter to indicate whether or not the number of users or the number of nonzero entries should be used
< sumedh__>
I think dynamic checking is a better way... one can implement just one update function... or separate functions for W and H update....
< sumedh__>
we can check it with templates...
< sumedh__>
like the implementation in PrefixedOutStream...
< sumedh__>
where we check if we have ToString function...
< naywhayare>
that's a really ugly implementation and I'd prefer to avoid reusing that code anywhere else
< naywhayare>
honestly I'd prefer to completely get rid of it because it's so ugly
< naywhayare>
but it's the only way I know of to provide operator<< automatically...
< naywhayare>
what's wrong with something like this? (below)
< naywhayare>
class IncrementalValidationRMSETermination {
< naywhayare>
oops, crap
< naywhayare>
hang on, I'll do this in pastebin, it'll be easier
< sumedh__>
there will be too many TerminationPolicies... and user may get confused... :( I am only concerned about that...
< naywhayare>
pastebin is overloaded, but when it works, I just write into the window and hit 'post', then give a link to the page it produces
< naywhayare>
so I ended up just putting it on my website, and I gave the link above
< naywhayare>
if each termination policy is documented adequately, I don't think it's a problem to have very many. if the user doesn't know what to do, then they can just use the default
oldbeardo has joined #mlpack
< sumedh__>
ohh... and we don't have to pass anything new in the constructor... initialize function in Termination policy takes V matric...
< sumedh__>
we can initialize our increment index there...
< andrewmw94>
naywhayare: I'm getting some puzzling results from running the r_tree and the dual_tree algorithms against the corel.csv data
< andrewmw94>
The diff of the distances file only returns one line where they differ, but the diff of the neighbors file returns many
< naywhayare>
so it is possible that two points are equidistant from a particular query point
< naywhayare>
which would explain why the distance results are the same but the neighbor indices are not
< naywhayare>
I didn't think that happened with the corel dataset, but if the distances are right and the neighbors are wrong, it's probably just mis-ordering
< naywhayare>
and you can see that in the first diff section of neighbors, it's just a change in ordering: 2727,35448,36680 vs. 2727,36680,35448
< andrewmw94>
yeah, that's true for most of them. But how would the tree cause that. I thought the BaseCase took care of sorting?
< naywhayare>
but that one different distance in the distances file indicates a bug to me... which was distances_out.csv? the kd-tree or r-tree?
< naywhayare>
BaseCase sorts by distance, but doesn't make any guarantees about identical distances
< naywhayare>
if the distances file is the same but the neighbors is different, then I would still classify the implementation as correct
< naywhayare>
but the neighbors file is a useful place to start when trying to figure out why a particular point gets pruned...
< andrewmw94>
distances_out.csv is the r_tree, distances_out1.csv was the kd-tree
< naywhayare>
that implies a bug in the kd-tree... can you test against allknn run with the -N option (naive)?
< andrewmw94>
let me make sure I remember correctly first.
< andrewmw94>
yeah, that was right
< andrewmw94>
ok, I'll try naive
< naywhayare>
fascinating. let's see what the naive approach gives...
< naywhayare>
if the r-tree results agree with the naive approach, but the kd-tree results don't, then it seems like we have uncovered a kd-tree bug (or... something like that)
< andrewmw94>
once in 37,000 points. That'll be fun to fix...
< naywhayare>
well, it takes a bit of tracing, but the general workflow will look like this:
< naywhayare>
determine the sequence of nodes leading to the point that is getting incorrectly pruned
< naywhayare>
set breakpoints in gdb on Score(size_t, TreeType&) for each of these nodes
< naywhayare>
then figure out which one is getting incorrectly pruned
< naywhayare>
then figure out why that's happening (this is the hardest part...)
< andrewmw94>
ok, so the cover tree says the kd-tree is wrong in the same way. I'll do the naive now
< naywhayare>
if it's a kd-tree bug, I'll dig in and try to figure out what's wrong
< naywhayare>
since I wrote that code
< andrewmw94>
ok. I don't think I mentioned that it was the dual-
< andrewmw94>
tree traverser. I haven't tried --single yet
< oldbeardo>
naywhayare: hey, I tried SGD vs L_BFGS
< oldbeardo>
naywhayare: SGD performs better taking 3.26 secs as opposed to 4.50 secs taken by L_BFGS
< naywhayare>
oldbeardo: this is your implementation of SGD, right?
< oldbeardo>
naywhayare: this is the time taken for 10 iterations on the GroupLens dataset
< oldbeardo>
naywhayare: yes, that's right
< naywhayare>
how many iterations does it take to converge?
< oldbeardo>
naywhayare: well, somewhere in between 50 to 100 iterations
< oldbeardo>
the relative error converges to 0.19
< oldbeardo>
SGD also performs better in that sense, after 10 iterations the relative error for SGD is 0.23 whereas for L_BFGS it is 0.25
< naywhayare>
okay, I figured SGD would converge better
< naywhayare>
also, as a side note, for some reason the svn authentication file has just completely been nuked for a reason I don't understand. a backup is on the way, but until then, svn commits will fail...
< naywhayare>
oldbeardo: I see a couple of options for how we can accelerate the mlpack SGD+regularized SVD implementation
< oldbeardo>
okay, thanks for that
< naywhayare>
the first I already mentioned -- use arma::sp_mat for the gradient calculation
< naywhayare>
another option is to write a template specialization for SGD<RegularizedSVDFunction>::Optimize()
< oldbeardo>
naywhayare: I ran a test with 100 iterations, SGD is still more time efficient, but L_BFGS gives a lower relative error
< oldbeardo>
naywhayare: I guess that makes sense, since SGD is more of a quick approximation
< naywhayare>
yeah, if you used a specialization you could avoid the expensive calls to mat.zeros() for the gradient
< naywhayare>
and just reset the relevant columns and rows to 0
< naywhayare>
a sparse matrix would also be applicable, and the .clear() operation would reset it to 0
< naywhayare>
you would need to refactor SGD so that it could accept mat or sp_mat from the Gradient() call (using templates, probably)
< naywhayare>
andrewmw94: I can reproduce the error for kd-trees with the corel dataset; lots of errors compared to the naive implementation or the cover tree implementation
< naywhayare>
I'll take a look into it, but probably not today
< naywhayare>
you don't need to worry about it though, other than just noting that comparing with the kd-tree results isn't a great idea at the moment :)
< oldbeardo>
naywhayare: what should I do then? I don't really understand the improvements you are suggesting
< naywhayare>
oldbeardo: I suggested two ways in which you can accelerate the SGD optimization, since that is the main problem you are facing with your implementation
< naywhayare>
the first is to use a template specialization for SGD<RegularizedSVDFunction>::Optimize(); if you do that, you can write custom code that calculates the gradient without calling gradient.zeros(), which is probably more like the implementation you wrote by hand
< naywhayare>
the other option is to refactor RegularizedSVDFunction::Gradient so that it takes an arma::sp_mat& as the gradient and not an arma::mat&
< naywhayare>
because the gradient for an individual user is going to be sparse, this might be a good choice
< naywhayare>
is that a decent clarification? if not, how can I help clarify further?
< oldbeardo>
yes, I understood, I'm not sure of how a template specialization might be written
< oldbeardo>
though it sounds that this may be an easier option than using sp_mat
< naywhayare>
take a look in src/mlpack/core/optimizers/lrsdp/lrsdp_function.cpp, down at the bottom
< naywhayare>
there are two template specializations there
< oldbeardo>
okay, which one of the two would you prefer?
< naywhayare>
I think either is fine; it's up to you. I think you're right that the template specialization is easier, so if I was in your position I'd probably choose to do that
< oldbeardo>
okay, I will give that a try tomorrow, is there anything else that I may be forgetting?
< naywhayare>
about regularized SVD? not that I can think of
< naywhayare>
have you thought about how to integrate QUIC_SVD into the CF class?
< oldbeardo>
I think we had this discussion in the past week
< naywhayare>
we had a discussion on how well QUIC-SVD works for collaborative filtering, but I don't think we can conclude that it works poorly for all datasets
< naywhayare>
therefore, we should make sure that CF<QUIC_SVD> is possible (or something like that) and uses QUIC-SVD for matrix decomposition
< oldbeardo>
I think we can conclude that, GroupLens is a fairly small dataset, the performance will once be worse for bigger datasets
< oldbeardo>
*only be worse
< oldbeardo>
or maybe I'm mistaken, in what case will it work well?
< naywhayare>
my intuition suggests that QUIC-SVD will work better for denser datasets
< naywhayare>
but I'm not certain. either way, if a user could quickly do collaborative filtering with QUIC-SVD, we could easily determine where it performs well and where it does not :)
< oldbeardo>
okay, now for the technical difficulties
< oldbeardo>
QUIC-SVD gives u, v and sigma, how should we handle that?
< naywhayare>
please develop a couple possible ideas for how this can be handled, and then we can talk about which one is best for mlpack
< oldbeardo>
okay, can you give me pointers to resources which deal with problems like these?
< naywhayare>
I am not certain of what resources to point you towards; the SVD itself decomposes a matrix X into U*S*V^T, and CF does a similar decomposition
< naywhayare>
many of the papers you referenced in your proposal or in various IRC conversations should have sufficient detail of how to put the two pieces together, I think
< naywhayare>
cool, it just wraps any other termination policy?
< naywhayare>
that sounds like a good solution
< sumedh__>
yup ... any termination policy :)
< sumedh__>
tried them all...
< naywhayare>
we just need to be sure that IncrementalTermination is documented so that people know to use it with SVDIncrementalLearning
< sumedh__>
okay I have changed its name to IncompleteIncrementalLearning... its actually algorithm 2 of the paper...
< sumedh__>
yes... that very important in this case...
< sumedh__>
I am right now trying to speed up incremental learning for sparse matrix... I guess I have to use row_iterator... it must be faster than addressing each element.. right??
< andrewmw94>
naywhayare: just in case you didn't find out already, it seems that the Binary Space Tree traverser works correctly when in single mode; it's a bug in the dual-tree traverser.
< sumedh__>
naywhayare: oops...
< sumedh__>
the paper uses SVDUSER... but we pass the V matrix as m * n... I need to change the implementation...
< jenkins-mlpack>
Starting build #1998 for job mlpack - svn checkin test (previous build: FIXED)
< sumedh__>
naywhayare: there is no sp_mat column iterator :(
< naywhayare>
andrewmw94: yeah, I had it isolated to the dual-tree case
< naywhayare>
sumedh__: the default sp_mat iterator is the column-major iterator
< sumedh__>
but how start from certain column??
< sumedh__>
or end at certain column??
< naywhayare>
sumedh__: I think begin_col() and end_col() are what you are looking for