verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
palash123 has joined #mlpack
palash has quit [Ping timeout: 256 seconds]
palash has joined #mlpack
palash123 has quit [Ping timeout: 245 seconds]
palash123 has joined #mlpack
palash has quit [Ping timeout: 255 seconds]
palash has joined #mlpack
palash123 has quit [Ping timeout: 255 seconds]
palash123 has joined #mlpack
palash has quit [Ping timeout: 245 seconds]
govg has quit [Ping timeout: 245 seconds]
palash has joined #mlpack
palash123 has quit [Ping timeout: 240 seconds]
dinesh_ has quit [Ping timeout: 255 seconds]
govg has joined #mlpack
dineshraj01 has joined #mlpack
palash123 has joined #mlpack
palash has quit [Ping timeout: 240 seconds]
palash has joined #mlpack
palash123 has quit [Ping timeout: 245 seconds]
palash123 has joined #mlpack
palash has quit [Ping timeout: 245 seconds]
dineshraj01 has quit [Read error: Connection reset by peer]
dineshraj01 has joined #mlpack
palash has joined #mlpack
palash123 has quit [Ping timeout: 245 seconds]
palash123 has joined #mlpack
palash has quit [Ping timeout: 276 seconds]
dineshraj01 has quit [Ping timeout: 248 seconds]
vivekp has quit [Ping timeout: 258 seconds]
vivekp has joined #mlpack
dhawalht has joined #mlpack
< dhawalht>
Hey, I want to contribute to your open-source project. I am a GSoC-2017 aspirant
< dhawalht>
reply me to dhawalharkawat14@gmail.com
dhawalht has quit [Quit: Page closed]
vivekp has quit [Ping timeout: 240 seconds]
mikeling has joined #mlpack
vivekp has joined #mlpack
palash has joined #mlpack
palash123 has quit [Ping timeout: 276 seconds]
rcurtin_ has joined #mlpack
cult- has left #mlpack []
palash123 has joined #mlpack
palash123 has left #mlpack []
palash123 has joined #mlpack
palash has quit [Ping timeout: 252 seconds]
palash has joined #mlpack
palash123 has quit [Ping timeout: 252 seconds]
palash123 has joined #mlpack
palash has quit [Ping timeout: 255 seconds]
daasankur has joined #mlpack
govg has quit [Ping timeout: 258 seconds]
govg has joined #mlpack
< layback>
i would like to talk to someone with knowledge about the collaborative filtering parts of mlpack. i am benchmarking it against other software with the movielens data set. when random splitting .8/.2 training/testing. common benchmarks show ~.9, but for mlpack I get ~3.5 for NMF and ~.05 with RegSVD. Not sure how they can differ that much, and why the resulting error is so different from others. my guess is
< layback>
that it has something to do with the kNN evaluation protocol (or whatever you may call it). is there something that i may be grossly overlooking? (ps. i have found an old Agrawal document showing results ~.9 with RegSVD, albeit with an older cli interface)
< rcurtin_>
layback: hi there; the error measure you are using is RMSE I guess?
< layback>
rcurtin_: yes!
< rcurtin>
let me find the movielens dataset, hang on
< layback>
its the 1m one i've used!
< rcurtin>
ok, I am only going to use the 100k one so it doesn't take so long to run simulations
< rcurtin>
but we should check that the format is right
< rcurtin>
the input CSV should be three columns: user id, movie id, rating
< layback>
yes!
< rcurtin>
ok, great
< rcurtin>
next question: what are you using to calculate the RMSE, and what are you setting the rank of the decomposition to?
< layback>
so, i've tried ranks from [5, 100] and get range from [3.8, 2.9] for default NMF, and for RegSVD i've tried rank 20 (that is the same as the agrawal thing i mentioned used).
< layback>
but say ~3.5 for NMF with rank 20.
< layback>
sorry this got messy but, for RegSVd -R 20 i get about ~.02
< rcurtin>
ok, and you're using the -T option to calculate RMSE?
< rcurtin>
ok, great, let me try that and see what I get
< layback>
i do split my datasets in an external script i have, but i assure you it is just ranomly split .8/.2.
< rcurtin>
ok, I see the same results for default NMF... I am trying playing with the --min_residue parameter, which will control how exact the decomposition is
< rcurtin>
it makes it take longer to converge, so it may be a little bit until I get results...
< layback>
yes! i feel like i've tried mixing every possible parameter, hehe. I very much appreciate the help!
< rcurtin>
(seems like the 1M dataset actually had 1M + 204 ratings)
< rcurtin>
if I was smart, I would have just used the mlpack_preprocess_split program, but I decided to do it by hand for some reason :)
< rcurtin>
I can't seem to get RMSE < 2.6 with the NMF decomposition on the 1M dataset
< rcurtin>
I'm playing with SVDIncompleteIncremental and SVDCompleteIncremental decompositions, it seems like the results are trash... I think maybe they are being run with not-great parameters for the optimizers and can't be tuned through the command-line program
< rcurtin>
I think I'll update the GSoC description for the CF project to possibly include working with those algorithms to set reasonable defaults for the optimizers
< layback>
hmm, ok, so I can confirm ~.88 RMSE on the 1-m as per your instructions. so there seems to be something wrong with how i prepare my data, I GUESS. ye, the thing that tripped me really was that all of them was showing such different results from each other.
< rcurtin>
yeah, ideally, the defaults should be configured in such a way that each algorithm type converges to something similar
< rcurtin>
but for now I guess RegSVD is the right one to use
< rcurtin>
glad I could help sort it out!
< layback>
i think the problem comes from me trying to also figure out to get a good one-class rating i.e. "like" or "dont know", and i might have passed that to my movielens data as well. if you have any tips for that, please pass them along. otherwise i'll continue tinkering.
< layback>
thanks a lot for the help! very helpful.
< rcurtin>
what do you mean by "one-class rating"?
< rcurtin>
not sure I follow completely
< rcurtin>
also, sure, glad to help, that's why I'm here :)
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#1777 (master - ec3f224 : Ryan Curtin): The build was broken.
< layback>
instead of having a rating [1, 5] for each rating, the actual data i want to work with only has ratings for "likes", so I really dont have an actual rating, basically only 5 star ratings and nothing else. so either i know if users like something or else i don't know anything.
< layback>
so a row in my dataset is just an indication of a "like" between an item and a user.
< rcurtin>
ah, ok, I guess that is a slightly different problem
< rcurtin>
I think maybe any value for "like" will work (like 1 should be fine) but I think you will have to calculate RMSE differently
mikeling has quit [Quit: Connection closed for inactivity]
palash has joined #mlpack
palashahuja has joined #mlpack
palash123 has quit [Ping timeout: 255 seconds]
palash has quit [Ping timeout: 258 seconds]
palash has joined #mlpack
palash has quit [Remote host closed the connection]
palashahuja has quit [Ping timeout: 245 seconds]
gtank has quit [Remote host closed the connection]
gtank has joined #mlpack
daasankur has quit [Ping timeout: 260 seconds]
palashahuja has joined #mlpack
< layback>
well, i guess i can calculate the RMSE to get some indication about the model, but the actual evaluation should probably be of some kind of precision.
< layback>
since movielens dataset is used in every written text ever, my thinking is that it would be helpful to have it as a benchmark in some kind of way.