naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
newbie1010101 has joined #mlpack
andrewmw94 has quit [Quit: Leaving.]
govg has quit [Ping timeout: 240 seconds]
newbie1010101 has quit [Quit: Page closed]
naywhayare has joined #mlpack
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
< jenkins-mlpack> Starting build #2102 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has joined #mlpack
oldbeardo has joined #mlpack
< oldbeardo> sumedhghaisas: hey, Ryan told me to use your svd_wrapper implementation for QUIC-SVD
< oldbeardo> how should I go about doing that?
< sumedhghaisas> ohh okay... thank god I didn't commit the changes I made yesterday... its simple... I think QUIC_SVD implements ExtractSVD(u, v, sigma) correct??
< sumedhghaisas> oldbeardo: SVDWrapper will call similar Apply function ... So you need to implement template specialized function for QUICk_SVD...
< oldbeardo> yes, I'm not sure about the order, but something like that
< oldbeardo> by the way, ExtractSVD is to be used as an internal function
< sumedhghaisas> ohh... So which function returns the SVD factorization??
< oldbeardo> well, right now the constructor itself
< sumedhghaisas> ohh then see the template specialized function for arma::svd.. there just replace arma::svd call by your factorizer function...
< sumedhghaisas> thats it...
< sumedhghaisas> and make sure the order of parameters is correct...
< oldbeardo> where is the template specialized function for arma::svd?
< sumedhghaisas> it will be in svd_wrapper_impl.hpp
< sumedhghaisas> in cf module..
< oldbeardo> okay, don't get me wrong here, but how does this help us?
< oldbeardo> isn't it essentially writing a new function for every factorizer?
sumedhghaisas has quit [Ping timeout: 272 seconds]
oldbeardo has quit [Ping timeout: 246 seconds]
sumedhghaisas has joined #mlpack
oldbeardo has joined #mlpack
jbc_ has joined #mlpack
< sumedhghaisas> naywhayare: hey ryan, you free/
jbc_ has quit [Quit: jbc_]
< naywhayare> sumedhghaisas: yeah, I am here
< sumedhghaisas> now as siddharth is going to use svd wrapper I want to rollback the changes I have done to the module...
< sumedhghaisas> how do I do that
< sumedhghaisas> ?
< naywhayare> what do you mean? which changes do you want to roll back?
< sumedhghaisas> I planned the module such that QUIC_SVD can be added... but then Siddharth said he does not need it... so I removed the template from it and changed it to ArmaSVDwrapper...
< sumedhghaisas> but its not committed yet...
< naywhayare> okay, so do you mean that you want to revert your changes to the current svn trunk?
< naywhayare> you can just use svn revert for that
< naywhayare> unless I am misunderstanding
< sumedhghaisas> but that will revert all the files right??
< sumedhghaisas> I want to revert only those 2 files...
< sumedhghaisas> cause I added other things too...
< naywhayare> ah, then you can just use an argument to svn revert
< naywhayare> svn revert svd_wrapper.hpp
< naywhayare> (or whatever the filename is)
< naywhayare> that will just revert the changes in that file
< oldbeardo> sumedhghaisas: actually I'm not going to use it
< naywhayare> I have to go for a little while...
< oldbeardo> I'm writing an Apply() method for QUIC-SVD
< sumedhghaisas> oldbeardo: I still think we should use svd wrapper... cause it will be better for user to use one wrapper for all...
< sumedhghaisas> oldbeardo: I am looking through the cf code right now... one thing I noticed that CleanData function is getting called even though its sometimes not required... is there any reason for that??
< sumedhghaisas> we can just shift it inside ApplyFactorizer...
< oldbeardo> where's that?
< sumedhghaisas> i both the constructors...
< sumedhghaisas> CleanData is always getting called right??
< sumedhghaisas> but its not required when factorizer is regularized_svd ... or this is the impression I got from the code... I may have missed something...
< oldbeardo> yes, it is needed irrespective of the factorizer
< oldbeardo> it is used in GetRecommendations()
< sumedhghaisas> ohh okay...
< oldbeardo> naywhayare: is there a reason why cleanedData is sp_mat? I'm not able to use QUIC_SVD because of that, I will have to change the signature of every internal function because of that
< naywhayare> oldbeardo: cleanedData is sp_mat because in the vast majority of collaborative filtering problems, you have an incomplete dataset
< naywhayare> the whole idea being to predict some of the missing values for a user/item combination or something like htat
< naywhayare> *that
< naywhayare> the case for an arma::mat cleanedData implies that the entire rating matrix is mostly dense, which isn't really something that happens in practice that often
< sumedhghaisas> naywhayare: average_initialization is not performing well ... its very unstable... :(
< naywhayare> you could probably refactor QUIC-SVD and CosineTree pretty easily to templatize MatType so a user can use arma::mat or arma:;sp_mat
< naywhayare> sumedhghaisas: how unstable is it with respect to random initialization?
< sumedhghaisas> very much...
< sumedhghaisas> I dont why its happening...
< sumedhghaisas> but maybe the initialization is very high...
< naywhayare> hm, can you describe the results you're getting a little further?
< naywhayare> oldbeardo: I'll go ahead and templatize CosineTree
< sumedhghaisas> one sec... I will just post the residue that I printed after every iteration
< naywhayare> I should have that done by the time I go to bed tonight (which is probably 8 to 10 hours from now)
< sumedhghaisas> 8.44638
< sumedhghaisas> 4.84661
< sumedhghaisas> 1.5715
< sumedhghaisas> 1.58544
< sumedhghaisas> 2.13046
< sumedhghaisas> 1.10905
< sumedhghaisas> 2.62636
< sumedhghaisas> 4.1802
< sumedhghaisas> 2.01445
< sumedhghaisas> 2.88132
< sumedhghaisas> 4.08919
< sumedhghaisas> 1.07427
< sumedhghaisas> 3.68883
< sumedhghaisas> 2.76699
< sumedhghaisas> 1.80398
< sumedhghaisas> ^C
< sumedhghaisas> it basically ends around 3...
< sumedhghaisas> where as random initialization ends around 0.99
< sumedhghaisas> this is on a random sparse matrix of size 100 * 1090
< sumedhghaisas> sorry 100 * 100
< sumedhghaisas> I have tried both ways... first average of V is taken ignoring the zero entries ... then in the second version I considered zero entries.. still same performance...
< naywhayare> are you using the exact same matrix for random initialization?
< naywhayare> sorry if that's an obvious question but I wanted to check before digging in deeper :)
< sumedhghaisas> yes.. checked again.. I have added mlpack::math::RandomSeed(10); line before the factorization...
< sumedhghaisas> I will commit the code .. so that you can take a look...
< naywhayare> okay
sumedhghaisas has quit [Ping timeout: 272 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas> naywhwayare: there are 2 constructors for CF right now... they can simply be combined with default parameters... what you think??
< oldbeardo> naywhayare: I just sent you a mail with the changes for QUIC-SVD
< naywhayare> sumedhghaisas: yeah, if they can be combined with default parameters, we should do that
< naywhayare> oldbeardo: ok, great; when I finish the CosineTree refactorization, I'll commit your changes too
< oldbeardo> this does not deal with the sp_mat issue, though it does build properly
< naywhayare> is the approach of refactoring CosineTree reasonable, or do you think there is a better solution?
< oldbeardo> naywhayare: I think we could turn cleanedData to arma::mat, since all of the processing is finally done row-wise or column-wise
< oldbeardo> though I'm not so sure about the efficiency trade-offs
< naywhayare> yeah, so, for a very sparse and very large dataset, having cleanedData be arma::mat can be really inefficient (memory-wise)
< naywhayare> but, at the same time, the CF class uses NeighborSearch and ends up making a dense rating matrix
< naywhayare> however, I think that's a different problem: at some point, I believe the CF class should use some different NeighborSearch code (or something) that calculates user-item values directly from the factorized W and H matrices
< naywhayare> without explicitly storing W*H
< naywhayare> otherwise CF can't scale to very large sets of users and items (the limit for dense matrices for most computers will be around 50k users, 50k items)
< naywhayare> but that's related to a ticket I opened that we talked about and ultimately resolved with the same conclusion back before the start of GSoC
< sumedhghaisas> what algorithm does NeighborSearch use?? KNN??
< oldbeardo> naywhayare: true, but because of this scaling issue all SVD algorithms are being limited in their functionality
< naywhayare> sumedhghaisas: dual-tree kNN search, yes
< oldbeardo> naywhayare: also, does my QUIC-SVD solution complete my GSoC project? :D
< naywhayare> oldbeardo: I'm not sure what you mean, many of the SVD algorithms we have (including your regularized SVD implementation) take sparse matrices so that scaling issue isn't a problem
< oldbeardo> naywhayare: Reg SVD takes in a coordinate list
< naywhayare> right, which is essentially a sparse matrix stored in a different format
< naywhayare> the storage requirements of that matrix are not (num_users * num_items)
< naywhayare> instead it's (3 * num_nonzero_ratings)
< oldbeardo> yes, I get it
< naywhayare> ok, then I've misunderstood what you said I guess
< oldbeardo> I meant to say that users won't be able to use the SVD algorithms, with dense matrices
< oldbeardo> oops, my bad
< naywhayare> that is true... but users won't have dense matrices, in general, for CF
< oldbeardo> AMF has been templatized with MatType, I thought it was specifically using sp_mat
< naywhayare> yeah, you could use arma::mat or arma::sp_mat
< sumedhghaisas> naywhayare: do you have any research paper for dual-tree kNN search.. I will definitely get it on google... but I thought it would be better to ask you...
< oldbeardo> fine, this approach makes sense then
< naywhayare> for CF I think arma::sp_mat is always used, with the exception of the regularized SVD implementation you wrote, which takes a sparse matrix stored as a coordinate list in an arma::mat
< oldbeardo> naywhayare: we could say that this is another indication of not using QUIC-SVD with CF
< naywhayare> the paper describes what dual-tree algorithms are, and there is a section on kNN (nearest neighbor search)
< naywhayare> I hope that's helpful...
< sumedhghaisas> ohh thats your paper.. great... :)
< naywhayare> oldbeardo: right; I do remember that QUIC-SVD seems to perform poorly with the sparse matrices that we tested it with
< naywhayare> I think templatizing so that it can accept sp_mat is a good move for someone who might want to do further testing later, though
< naywhayare> I may eventually find some student who might be interested in answering the question "what is QUIC-SVD even good for?"
< oldbeardo> well, I have the answer to that, it is extremely fast for large dense matrices
< naywhayare> fair enough. perhaps I don't need to find an undergrad
< naywhayare> either way, unless you have serious objections, I think I will go through and templatize both the CosineTree and QUIC-SVD code to allow arbitrary MatType
< naywhayare> it shouldn't take too long, but your arguments that it's not very useful for CF are valid, so I don't really want to make you waste the little time you have left on something that's not really guaranteed to even be used by anyone
< oldbeardo> naywhayare: no, I'm fine with this, you are the researcher, you know better
< naywhayare> oldbeardo: given your work with QUIC-SVD I think you have the better grasp on it :)
< naywhayare> I wish I knew where Michael Holmes was so I could talk with him about it, but he's apparently disappeared since he graduated
< naywhayare> fell off the face of the planet or something
< oldbeardo> naywhayare: heh, what else do I need to finish?
< naywhayare> so other than that, which I'll take care of, I think there are only two other things: we should add a parameter to the CF executable so the user can use your regularized SVD implementation
< naywhayare> and then the tutorials for QUIC-SVD and regularized SVD
< naywhayare> adding the parameter should be pretty straightforward, I think
< oldbeardo> naywhayare: I will go ahead and add comments in quic_svd.hpp and regularized_svd.hpp about how to use the modules, just like I did in sparse_autoencoder.hpp
< oldbeardo> that should be enough right?
< naywhayare> adding those comments would be great, but I'd like to ask for a little more
< naywhayare> we have a bunch of tutorials at http://www.mlpack.org/tutorial.html
< naywhayare> but there's not a tutorial for CF
< naywhayare> it's way too much work to ask you to write the whole thing
< naywhayare> but... if you could throw together a short section on "How to use CF with regularized SVD", then I can work it into a part of a bigger tutorial
< naywhayare> whenever I finally find time...
< naywhayare> do you think that sounds reasonable?
< oldbeardo> right, I could do that, that's just another 4-5 lines, if you need an explanation of the algorithm I had written one in my application I think
< naywhayare> I guess it would actually be two little parts... one for using the command-line CF with regularized SVD, and one for using the C++ interface
< naywhayare> okay, yeah, that sounds reasonable
< naywhayare> I think after this summer that suddenly mlpack has a really functional and adaptable CF implementation, so I think we might find in a few months that people are drawn to mlpack specifically for the CF implementation
< oldbeardo> and trees of course!
< naywhayare> :)
< oldbeardo> where do I need to add the parameter?
< naywhayare> I think CF is a more popular field though, so I wouldn't be surprised if this library which was originally created for tree algorithms suddenly becomes better-known for its CF
< naywhayare> cf_main.cpp
< naywhayare> or, well, at the top... you could add a PARAM_FLAG() or something like that
< naywhayare> although realistically I guess we should add a PARAM_STRING(), which allows the user to choose the factorizer they want
< naywhayare> that way Sumedh can extend it to add his bunch of factorizers too
< naywhayare> take a look at how the "kernel" parameter is handled in kernel_pca_main.cpp
< oldbeardo> why isn't AMF, NMF present over here?
< naywhayare> in cf_main.cpp? I think it's because Sumedh hasn't finished it yet
< naywhayare> the executable Mudit wrote only supported the default factorizer, which I think is NMF
< oldbeardo> okay, then I guess I will wait for him to finish that, otherwise there may be conflicts
< naywhayare> okay; you can either do that, or we can merge it
< sumedhghaisas> ohh sorry did I miss anything?? :) I was reading that paper...
< naywhayare> sumedhghaisas: yeah, we were talking about the changes to the cf_main.cpp program, to allow the user to specify different factorizers
< sumedhghaisas> naywhayare: ohh yes I forgot to finish the CF executable... I will do that right now...
< naywhayare> okay, sounds great... I guess then Siddharth can just extend it a little bit
< naywhayare> should be like four or five lines of code, I think
< sumedhghaisas> I was just waiting to finish AverageInitialization before that... but its producing very bad results...
< sumedhghaisas> did you take a look at that??
< naywhayare> I looked at it briefly over lunch but I haven't compiled it
< naywhayare> I need to do some laundry... let me finish that and then I'll dig deeper into the issue
< sumedhghaisas> ohh okay no problem... that can be added later...
< sumedhghaisas> sure :)
< oldbeardo> naywhayare: see you later
oldbeardo has quit [Quit: Page closed]
< sumedhghaisas> naywhayare: is it okay to add a header file for cf_main??
< sumedhghaisas> cause with templates lot of code can be reduced...
< naywhayare> sumedhghaisas: why not just add a template function to cf_main.cpp?
< naywhayare> if you're only using that function in cf_main.cpp then it just needs to be available there
< sumedhghaisas> yes.. solved it... sorry :) dumb mistake... I added forward declaration and then implementation.. thats why undefined function error occurred
< naywhayare> :)
< sumedhghaisas> naywhayare: okay I have committed the modified cf_main...
< sumedhghaisas> the AMF related errors are huge... takes long time to figure them out...
< sumedhghaisas> CF tutorial is completely necessary...
sumedhghaisas has quit [Ping timeout: 272 seconds]