naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
newbie1010101 has joined #mlpack
andrewmw94 has quit [Quit: Leaving.]
govg has quit [Ping timeout: 240 seconds]
newbie1010101 has quit [Quit: Page closed]
naywhayare has joined #mlpack
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
< jenkins-mlpack>
Starting build #2102 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has joined #mlpack
oldbeardo has joined #mlpack
< oldbeardo>
sumedhghaisas: hey, Ryan told me to use your svd_wrapper implementation for QUIC-SVD
< oldbeardo>
how should I go about doing that?
< sumedhghaisas>
ohh okay... thank god I didn't commit the changes I made yesterday... its simple... I think QUIC_SVD implements ExtractSVD(u, v, sigma) correct??
< sumedhghaisas>
oldbeardo: SVDWrapper will call similar Apply function ... So you need to implement template specialized function for QUICk_SVD...
< oldbeardo>
yes, I'm not sure about the order, but something like that
< oldbeardo>
by the way, ExtractSVD is to be used as an internal function
< sumedhghaisas>
ohh... So which function returns the SVD factorization??
< oldbeardo>
well, right now the constructor itself
< sumedhghaisas>
ohh then see the template specialized function for arma::svd.. there just replace arma::svd call by your factorizer function...
< sumedhghaisas>
thats it...
< sumedhghaisas>
and make sure the order of parameters is correct...
< oldbeardo>
where is the template specialized function for arma::svd?
< sumedhghaisas>
it will be in svd_wrapper_impl.hpp
< sumedhghaisas>
in cf module..
< oldbeardo>
okay, don't get me wrong here, but how does this help us?
< oldbeardo>
isn't it essentially writing a new function for every factorizer?
sumedhghaisas has quit [Ping timeout: 272 seconds]
oldbeardo has quit [Ping timeout: 246 seconds]
sumedhghaisas has joined #mlpack
oldbeardo has joined #mlpack
jbc_ has joined #mlpack
< sumedhghaisas>
naywhayare: hey ryan, you free/
jbc_ has quit [Quit: jbc_]
< naywhayare>
sumedhghaisas: yeah, I am here
< sumedhghaisas>
now as siddharth is going to use svd wrapper I want to rollback the changes I have done to the module...
< sumedhghaisas>
how do I do that
< sumedhghaisas>
?
< naywhayare>
what do you mean? which changes do you want to roll back?
< sumedhghaisas>
I planned the module such that QUIC_SVD can be added... but then Siddharth said he does not need it... so I removed the template from it and changed it to ArmaSVDwrapper...
< sumedhghaisas>
but its not committed yet...
< naywhayare>
okay, so do you mean that you want to revert your changes to the current svn trunk?
< naywhayare>
you can just use svn revert for that
< naywhayare>
unless I am misunderstanding
< sumedhghaisas>
but that will revert all the files right??
< sumedhghaisas>
I want to revert only those 2 files...
< sumedhghaisas>
cause I added other things too...
< naywhayare>
ah, then you can just use an argument to svn revert
< naywhayare>
svn revert svd_wrapper.hpp
< naywhayare>
(or whatever the filename is)
< naywhayare>
that will just revert the changes in that file
< oldbeardo>
sumedhghaisas: actually I'm not going to use it
< naywhayare>
I have to go for a little while...
< oldbeardo>
I'm writing an Apply() method for QUIC-SVD
< sumedhghaisas>
oldbeardo: I still think we should use svd wrapper... cause it will be better for user to use one wrapper for all...
< sumedhghaisas>
oldbeardo: I am looking through the cf code right now... one thing I noticed that CleanData function is getting called even though its sometimes not required... is there any reason for that??
< sumedhghaisas>
we can just shift it inside ApplyFactorizer...
< oldbeardo>
where's that?
< sumedhghaisas>
i both the constructors...
< sumedhghaisas>
CleanData is always getting called right??
< sumedhghaisas>
but its not required when factorizer is regularized_svd ... or this is the impression I got from the code... I may have missed something...
< oldbeardo>
yes, it is needed irrespective of the factorizer
< oldbeardo>
it is used in GetRecommendations()
< sumedhghaisas>
ohh okay...
< oldbeardo>
naywhayare: is there a reason why cleanedData is sp_mat? I'm not able to use QUIC_SVD because of that, I will have to change the signature of every internal function because of that
< naywhayare>
oldbeardo: cleanedData is sp_mat because in the vast majority of collaborative filtering problems, you have an incomplete dataset
< naywhayare>
the whole idea being to predict some of the missing values for a user/item combination or something like htat
< naywhayare>
*that
< naywhayare>
the case for an arma::mat cleanedData implies that the entire rating matrix is mostly dense, which isn't really something that happens in practice that often
< sumedhghaisas>
naywhayare: average_initialization is not performing well ... its very unstable... :(
< naywhayare>
you could probably refactor QUIC-SVD and CosineTree pretty easily to templatize MatType so a user can use arma::mat or arma:;sp_mat
< naywhayare>
sumedhghaisas: how unstable is it with respect to random initialization?
< sumedhghaisas>
very much...
< sumedhghaisas>
I dont why its happening...
< sumedhghaisas>
but maybe the initialization is very high...
< naywhayare>
hm, can you describe the results you're getting a little further?
< naywhayare>
oldbeardo: I'll go ahead and templatize CosineTree
< sumedhghaisas>
one sec... I will just post the residue that I printed after every iteration
< naywhayare>
I should have that done by the time I go to bed tonight (which is probably 8 to 10 hours from now)
< sumedhghaisas>
8.44638
< sumedhghaisas>
4.84661
< sumedhghaisas>
1.5715
< sumedhghaisas>
1.58544
< sumedhghaisas>
2.13046
< sumedhghaisas>
1.10905
< sumedhghaisas>
2.62636
< sumedhghaisas>
4.1802
< sumedhghaisas>
2.01445
< sumedhghaisas>
2.88132
< sumedhghaisas>
4.08919
< sumedhghaisas>
1.07427
< sumedhghaisas>
3.68883
< sumedhghaisas>
2.76699
< sumedhghaisas>
1.80398
< sumedhghaisas>
^C
< sumedhghaisas>
it basically ends around 3...
< sumedhghaisas>
where as random initialization ends around 0.99
< sumedhghaisas>
this is on a random sparse matrix of size 100 * 1090
< sumedhghaisas>
sorry 100 * 100
< sumedhghaisas>
I have tried both ways... first average of V is taken ignoring the zero entries ... then in the second version I considered zero entries.. still same performance...
< naywhayare>
are you using the exact same matrix for random initialization?
< naywhayare>
sorry if that's an obvious question but I wanted to check before digging in deeper :)
< sumedhghaisas>
yes.. checked again.. I have added mlpack::math::RandomSeed(10); line before the factorization...
< sumedhghaisas>
I will commit the code .. so that you can take a look...
< naywhayare>
okay
sumedhghaisas has quit [Ping timeout: 272 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
naywhwayare: there are 2 constructors for CF right now... they can simply be combined with default parameters... what you think??
< oldbeardo>
naywhayare: I just sent you a mail with the changes for QUIC-SVD
< naywhayare>
sumedhghaisas: yeah, if they can be combined with default parameters, we should do that
< naywhayare>
oldbeardo: ok, great; when I finish the CosineTree refactorization, I'll commit your changes too
< oldbeardo>
this does not deal with the sp_mat issue, though it does build properly
< naywhayare>
is the approach of refactoring CosineTree reasonable, or do you think there is a better solution?
< oldbeardo>
naywhayare: I think we could turn cleanedData to arma::mat, since all of the processing is finally done row-wise or column-wise
< oldbeardo>
though I'm not so sure about the efficiency trade-offs
< naywhayare>
yeah, so, for a very sparse and very large dataset, having cleanedData be arma::mat can be really inefficient (memory-wise)
< naywhayare>
but, at the same time, the CF class uses NeighborSearch and ends up making a dense rating matrix
< naywhayare>
however, I think that's a different problem: at some point, I believe the CF class should use some different NeighborSearch code (or something) that calculates user-item values directly from the factorized W and H matrices
< naywhayare>
without explicitly storing W*H
< naywhayare>
otherwise CF can't scale to very large sets of users and items (the limit for dense matrices for most computers will be around 50k users, 50k items)
< naywhayare>
but that's related to a ticket I opened that we talked about and ultimately resolved with the same conclusion back before the start of GSoC
< sumedhghaisas>
what algorithm does NeighborSearch use?? KNN??
< oldbeardo>
naywhayare: true, but because of this scaling issue all SVD algorithms are being limited in their functionality
< oldbeardo>
naywhayare: also, does my QUIC-SVD solution complete my GSoC project? :D
< naywhayare>
oldbeardo: I'm not sure what you mean, many of the SVD algorithms we have (including your regularized SVD implementation) take sparse matrices so that scaling issue isn't a problem
< oldbeardo>
naywhayare: Reg SVD takes in a coordinate list
< naywhayare>
right, which is essentially a sparse matrix stored in a different format
< naywhayare>
the storage requirements of that matrix are not (num_users * num_items)
< naywhayare>
ok, then I've misunderstood what you said I guess
< oldbeardo>
I meant to say that users won't be able to use the SVD algorithms, with dense matrices
< oldbeardo>
oops, my bad
< naywhayare>
that is true... but users won't have dense matrices, in general, for CF
< oldbeardo>
AMF has been templatized with MatType, I thought it was specifically using sp_mat
< naywhayare>
yeah, you could use arma::mat or arma::sp_mat
< sumedhghaisas>
naywhayare: do you have any research paper for dual-tree kNN search.. I will definitely get it on google... but I thought it would be better to ask you...
< oldbeardo>
fine, this approach makes sense then
< naywhayare>
for CF I think arma::sp_mat is always used, with the exception of the regularized SVD implementation you wrote, which takes a sparse matrix stored as a coordinate list in an arma::mat
< oldbeardo>
naywhayare: we could say that this is another indication of not using QUIC-SVD with CF
< naywhayare>
the paper describes what dual-tree algorithms are, and there is a section on kNN (nearest neighbor search)
< naywhayare>
I hope that's helpful...
< sumedhghaisas>
ohh thats your paper.. great... :)
< naywhayare>
oldbeardo: right; I do remember that QUIC-SVD seems to perform poorly with the sparse matrices that we tested it with
< naywhayare>
I think templatizing so that it can accept sp_mat is a good move for someone who might want to do further testing later, though
< naywhayare>
I may eventually find some student who might be interested in answering the question "what is QUIC-SVD even good for?"
< oldbeardo>
well, I have the answer to that, it is extremely fast for large dense matrices
< naywhayare>
fair enough. perhaps I don't need to find an undergrad
< naywhayare>
either way, unless you have serious objections, I think I will go through and templatize both the CosineTree and QUIC-SVD code to allow arbitrary MatType
< naywhayare>
it shouldn't take too long, but your arguments that it's not very useful for CF are valid, so I don't really want to make you waste the little time you have left on something that's not really guaranteed to even be used by anyone
< oldbeardo>
naywhayare: no, I'm fine with this, you are the researcher, you know better
< naywhayare>
oldbeardo: given your work with QUIC-SVD I think you have the better grasp on it :)
< naywhayare>
I wish I knew where Michael Holmes was so I could talk with him about it, but he's apparently disappeared since he graduated
< naywhayare>
fell off the face of the planet or something
< oldbeardo>
naywhayare: heh, what else do I need to finish?
< naywhayare>
so other than that, which I'll take care of, I think there are only two other things: we should add a parameter to the CF executable so the user can use your regularized SVD implementation
< naywhayare>
and then the tutorials for QUIC-SVD and regularized SVD
< naywhayare>
adding the parameter should be pretty straightforward, I think
< oldbeardo>
naywhayare: I will go ahead and add comments in quic_svd.hpp and regularized_svd.hpp about how to use the modules, just like I did in sparse_autoencoder.hpp
< oldbeardo>
that should be enough right?
< naywhayare>
adding those comments would be great, but I'd like to ask for a little more
< naywhayare>
it's way too much work to ask you to write the whole thing
< naywhayare>
but... if you could throw together a short section on "How to use CF with regularized SVD", then I can work it into a part of a bigger tutorial
< naywhayare>
whenever I finally find time...
< naywhayare>
do you think that sounds reasonable?
< oldbeardo>
right, I could do that, that's just another 4-5 lines, if you need an explanation of the algorithm I had written one in my application I think
< naywhayare>
I guess it would actually be two little parts... one for using the command-line CF with regularized SVD, and one for using the C++ interface
< naywhayare>
okay, yeah, that sounds reasonable
< naywhayare>
I think after this summer that suddenly mlpack has a really functional and adaptable CF implementation, so I think we might find in a few months that people are drawn to mlpack specifically for the CF implementation
< oldbeardo>
and trees of course!
< naywhayare>
:)
< oldbeardo>
where do I need to add the parameter?
< naywhayare>
I think CF is a more popular field though, so I wouldn't be surprised if this library which was originally created for tree algorithms suddenly becomes better-known for its CF
< naywhayare>
cf_main.cpp
< naywhayare>
or, well, at the top... you could add a PARAM_FLAG() or something like that
< naywhayare>
although realistically I guess we should add a PARAM_STRING(), which allows the user to choose the factorizer they want
< naywhayare>
that way Sumedh can extend it to add his bunch of factorizers too
< naywhayare>
take a look at how the "kernel" parameter is handled in kernel_pca_main.cpp
< oldbeardo>
why isn't AMF, NMF present over here?
< naywhayare>
in cf_main.cpp? I think it's because Sumedh hasn't finished it yet
< naywhayare>
the executable Mudit wrote only supported the default factorizer, which I think is NMF
< oldbeardo>
okay, then I guess I will wait for him to finish that, otherwise there may be conflicts
< naywhayare>
okay; you can either do that, or we can merge it
< sumedhghaisas>
ohh sorry did I miss anything?? :) I was reading that paper...
< naywhayare>
sumedhghaisas: yeah, we were talking about the changes to the cf_main.cpp program, to allow the user to specify different factorizers
< sumedhghaisas>
naywhayare: ohh yes I forgot to finish the CF executable... I will do that right now...
< naywhayare>
okay, sounds great... I guess then Siddharth can just extend it a little bit
< naywhayare>
should be like four or five lines of code, I think
< sumedhghaisas>
I was just waiting to finish AverageInitialization before that... but its producing very bad results...
< sumedhghaisas>
did you take a look at that??
< naywhayare>
I looked at it briefly over lunch but I haven't compiled it
< naywhayare>
I need to do some laundry... let me finish that and then I'll dig deeper into the issue
< sumedhghaisas>
ohh okay no problem... that can be added later...
< sumedhghaisas>
sure :)
< oldbeardo>
naywhayare: see you later
oldbeardo has quit [Quit: Page closed]
< sumedhghaisas>
naywhayare: is it okay to add a header file for cf_main??
< sumedhghaisas>
cause with templates lot of code can be reduced...
< naywhayare>
sumedhghaisas: why not just add a template function to cf_main.cpp?
< naywhayare>
if you're only using that function in cf_main.cpp then it just needs to be available there
< sumedhghaisas>
yes.. solved it... sorry :) dumb mistake... I added forward declaration and then implementation.. thats why undefined function error occurred
< naywhayare>
:)
< sumedhghaisas>
naywhayare: okay I have committed the modified cf_main...
< sumedhghaisas>
the AMF related errors are huge... takes long time to figure them out...
< sumedhghaisas>
CF tutorial is completely necessary...
sumedhghaisas has quit [Ping timeout: 272 seconds]