#mlpack on 2014-06-12 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

01:29 andrewmw94 has left #mlpack []

06:44 sumedhghaisas has joined #mlpack

07:06 < jenkins-mlpack> Starting build #1944 for job mlpack - svn checkin test (previous build: SUCCESS)

07:23 udit_s has joined #mlpack

07:39 < jenkins-mlpack> Project mlpack - svn checkin test build #1944: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1944/

07:39 < jenkins-mlpack> Ryan Curtin: Inline the simple R tree descent heuristic.

09:15 udit_s has quit [Ping timeout: 245 seconds]

09:50 sumedhghaisas has quit [Ping timeout: 240 seconds]

11:37 govg has joined #mlpack

11:37 govg has quit [Changing host]

11:37 govg has joined #mlpack

11:52 govg has quit [Read error: Connection reset by peer]

12:16 andrewmw94 has joined #mlpack

12:23 < andrewmw94> 21:41 < naywhayare> andrewmw94: in looking through your code, I noticed lots of java-like idioms that I think won't work at runtime in C++. if you want me to look through these at some point and provide some feedback, I can do that

12:23 < andrewmw94> 21:41 < naywhayare> but otherwise I'll wait until you say it's finished

12:23 < andrewmw94> 21:41 < naywhayare> so that I don't comment on things you already know about and have made a mental note to redo

12:24 < andrewmw94> naywhayare: yeah, I wrote it quickly in quasi-C++ while reading the paper.

12:24 < andrewmw94> I'm going to go over it all again and look for bugs and invalid C++ code

13:35 oldbeardo has joined #mlpack

14:30 < naywhayare> andrewmw94: ok; let me know if you need any help

14:30 < naywhayare> oldbeardo: the solution I proposed won't be generic, but it wouldn't be generic anyway because you're taking the CosineNodeQueue as an argument

14:32 < oldbeardo> naywhayare: I did what I thought would be a good solution, I think I mentioned it later yesterday

14:33 < oldbeardo> I will send you the code in some time, hopefully it will be okay with you :)

14:34 < oldbeardo> http://www.mlpack.org/irc/mlpack.20140611.html

14:34 < oldbeardo> look at my message at 17.06

14:34 EighthOctave has joined #mlpack

14:37 < naywhayare> yes, I saw what your solution was, but it's going to be slower because it has to maintain the considerBasis bool vector

14:39 < oldbeardo> naywhayare: won't that cost be trivial? also, the priority queue method had some other problem, I had implemented it

14:40 < naywhayare> the size of considerBasis is, at maximum, the number of nodes in the tree -- which is, at maximum, the number of columns in the dataset

14:40 < naywhayare> you have to iterate over that entire vector each time

14:41 < naywhayare> although it is true that for the most part, early on in the tree building process, the overhead is not much, if a user asks for a very close approximation of the svd, your algorithm will be noticeably slower

14:43 < oldbeardo> well, I still think that the bool vector cost would be trivial in comparison to the other calculations

14:44 < naywhayare> ok, if you want another justification, it adds needless complexity to the code; it's not necessary to maintain that vector because all of the basis vectors that can be a part of V correspond exactly to the basis vectors of all of the nodes in the priority queue

14:45 < oldbeardo> true, but iterating over the priority queue is just as complicated

14:46 < oldbeardo> there's no support for random access, as it is implemented as a heap

14:46 < naywhayare> but... you don't need random access. you access it sequentially

14:46 < oldbeardo> how?

14:48 Anand_ has joined #mlpack

14:48 udit_s has joined #mlpack

14:49 < oldbeardo> it you are thinking in terms of a normal for loop, you will still need to provide an index for access, which is not an option

14:49 < naywhayare> I had not realized that priority_queue doesn't provide an iterator. hang on...

14:49 < oldbeardo> the only other option is to copy the queue for each computation that you need

14:53 < naywhayare> no, that's not the only other option. the priority queue takes a container template type; by default this is vector<T>

14:53 < Anand_> Marcus : The design looks good and the idea of keeping the timing and other metrics separate is also fine.

14:53 < naywhayare> create the vector<T> object and then create the priority_queue with the constructor that takes an existing vector object

14:53 < naywhayare> then when you need to iterate over the priority queue, iterate over the vector

14:54 < Anand_> Marcus : I think we need to call the metrics function somewhere. Right?

14:54 < naywhayare> the priority queue isn't an actual container; it's an "adapter" that just works with an underlying container

14:55 < marcus_zoq> Anand_: Okay, great, if you set run: ['metric'] in the config file the script calls the 'RunMetrics()' function.

14:56 < Anand_> Cool!

14:56 < Anand_> Will the current design work for other libraries too?

14:57 < udit_s> naywhayare: hello !

14:57 < naywhayare> udit_s: I took a look at your code last night

14:57 < naywhayare> only a few comments. instead of having all the ancillary csv files, can you encode the datasets directly into the tests because they're so simple?

14:57 < udit_s> okay. what else ?

14:57 < oldbeardo> naywhayare: ummm, okay, I still feel that the basis should be kept separate

14:58 < marcus_zoq> Anand_: We need to change the design (e.g. rename the RunMethod() method). But then it should work.

14:58 < naywhayare> oldbeardo: that is going to incur copying and managing every basis vector, not just the ones that are kept

14:59 < Anand_> Marcus : Yeah, I meant the design after renaming and adding the build function. Should work, right?

14:59 < naywhayare> udit_s: if you can provide some comments on what the tests are actually testing, that would be nice; the names of the test are helpful but I don't completely understand what each of them aims to do

14:59 < naywhayare> udit_s: other than that, it looks good to me. so if you make those changes, I'll start integrating it into trunk

14:59 < oldbeardo> naywhayare: I have changed the implementation, it does not use join_rows() anymore if that's what you mean

14:59 < naywhayare> when I integrate it into trunk, I'll probably go through and make a number of changes to make it faster

15:00 < udit_s> naywhayare: cool. will get back to you in sometime.

15:02 < naywhayare> oldbeardo: join_rows() is a huge cost, I am glad you've taken care of that. but for your solution to be competitive you must ensure that you are never copying the basis vector

15:03 < oldbeardo> naywhayare: I'm not, just when I'm storing it in the CosineNode object, that's something that has to be done

15:04 < naywhayare> I don't know what you mean by that

15:04 < oldbeardo> naywhayare: I think you are worried about me copying the useless vectors at each step right?

15:05 < oldbeardo> naywhayare: I'm not doing that, I'm using the references, to the useful vectors

15:06 < EighthOctave> When I run 'make test', it fails with the message 'mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{1.10311e-05%} between w(i){0.0076869749921349793} and dw(i){0.0076869758400955075} exceeds 1.0000000000000001e-05%'

15:06 < EighthOctave> any idea what's going on?

15:07 < naywhayare> EighthOctave: that looks like a test tolerance issue. a lot of mlpack tests are randomized, and have some small probability of failure

15:07 < naywhayare> it looks like in your case, the probability of failure for that test isn't small enough...

15:07 < EighthOctave> shouldn't make test continue though?

15:07 < EighthOctave> and not end with "fatal error"

15:07 < naywhayare> it shouldn't continue that particular test; but it should continue the rest of the tests in the test suite

15:08 < EighthOctave> I've run multiple times, and it always fails in that spot

15:08 < EighthOctave> the difference is always slightly higher than the allowed tolerance

15:08 < naywhayare> yeah; although the tests are randomized, the random seed is not set, so this basically means that the random seed is set at compile time

15:08 < marcus_zoq> the

15:08 < marcus_zoq> Sorry, wrong window!

15:09 < naywhayare> so every time you run the test, it will fail the exact same way because the RNG is always initialized the same way

15:09 < EighthOctave> I see

15:09 < naywhayare> I bet if you run that test only ('bin/mlpack_test -t NMFTest/SparseNMFALSTest') it'll work

15:09 < EighthOctave> trying...

15:10 < EighthOctave> still fails.

15:10 < oldbeardo> naywhayare: any thoughts?

15:10 < EighthOctave> ./mlpack_test -t NMFTest/SparseNMFALSTest Running 1 test case... /home/may/lib/mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{3.5123%} between w(i){0.0030099676514796503} and dw(i){0.0031195351589538797} exceeds 1.0000000000000001e-05%

15:11 < naywhayare> oldbeardo: yes I have thoughts, please hang on

15:11 < naywhayare> EighthOctave: ok, that failed differently, at least. can you file a bug report on Trac?

15:11 < EighthOctave> sure

15:11 < naywhayare> EighthOctave: I think that this is not an actual issue with NMF (I suspect NMF works fine), but I think it's an issue with the test

15:11 < naywhayare> thank you

15:12 < naywhayare> oldbeardo: although the amortized cost of std::vector<> insert/delete operations is O(1), it is still much faster to only maintain one vector (or priority_queue) than many

15:12 < naywhayare> the only thing I don't know is _how_ much faster it will be

15:14 < marcus_zoq> Anand_: Looks like there is a bug in my code, if you set run: ['timing', 'metric'] we did not run the 'metrics'. So please checkout the latest version. Btw I use the following command to test the script: 'make run LOG=false CONFIG=small_config.yaml'

15:14 < naywhayare> I'm still unclear on why you are so opposed to trying my idea

15:15 < oldbeardo> naywhayare: I'm not opposed to it, I tried it once, and it wasn't making things easier

15:20 Anand_ has quit [Ping timeout: 246 seconds]

15:23 Anand_ has joined #mlpack

15:23 < Anand_> Marcus : Ok. Can you share the new config file?

15:25 < marcus_zoq> Anand_: It's in your branch: https://github.com/zoq/benchmarks/blob/gsoc14/small_config.yaml

15:26 < oldbeardo> naywhayare: I sent you a mail, let me know

15:26 < naywhayare> oldbeardo: ok, I will look through it

15:27 < Anand_> Marcus : Ok, thanks!

15:27 < naywhayare> EighthOctave: thank you for the bug report, I will try to find some time to look at it. could you add some information about the system you are running on please?

15:33 < EighthOctave> naywhayare: added a few details. let me know if you need anything specific

15:36 < naywhayare> yeah... version of gcc and boost, if you don't mind

15:36 < naywhayare> (or clang if you're using clang)

15:40 govg has joined #mlpack

15:46 govg has quit [Ping timeout: 272 seconds]

15:47 oldbeardo has quit [Quit: Page closed]

15:48 udit_s has quit [Ping timeout: 245 seconds]

15:52 < EighthOctave> naywhayare: info added

16:00 govg has joined #mlpack

16:00 govg has quit [Changing host]

16:00 govg has joined #mlpack

16:03 udit_s has joined #mlpack

16:03 Anand_ has quit [Ping timeout: 246 seconds]

16:06 Anand_ has joined #mlpack

16:15 < naywhayare> EighthOctave: great, thanks

16:44 oldbeardo has joined #mlpack

16:45 < oldbeardo> naywhayare: I saw your mail, how do you suggest we proceed?

16:50 < naywhayare> oldbeardo: I would like you to try the idea I've suggested, unless you see some big problem I have overlooked that makes it impossible, or if you have another solution that addresses the problems I pointed out (or justification that those problems are not important)

16:51 Anand__ has joined #mlpack

16:51 Anand_ has quit [Ping timeout: 246 seconds]

16:52 < oldbeardo> naywhayare: all of them are valid points, specially since you are thinking in datasets which are in GBs

16:53 < oldbeardo> the first point most of all, even I don't think it's a good solution

16:54 < oldbeardo> however, my concern with the priority queue idea is that the basis should be independent of the queue, like it is in the algorithm

16:54 < oldbeardo> because all implementation problems were present because of that

16:55 < oldbeardo> I will try to come up with something that is independent, but first I will try out your idea

17:01 < naywhayare> well, but the basis isn't independent of the queue. the basis corresponds only to basis vectors of nodes that are still in the queue

17:02 < oldbeardo> at the end of the iteration, yes. but the problem arises at step 3(c) of the algorithm

17:03 < naywhayare> right... which is where I suggested adding a parameter to the modified gram-schmidt function that will take an additional vector if necessary

17:03 < naywhayare> and that additional vector would be the orthonormalized vector for the left child, when you were doing step 3c for the right child

17:05 < naywhayare> that ModifiedGramSchmidt() function is already reasonably restricted to the case of cosine trees anyway

17:07 < oldbeardo> right, so I need to remove the AddToBasis() function, and write two versions of MGS

17:07 < naywhayare> yeah -- fortunately, one of those MGS functions can call the other, so it shouldn't be too hard

17:08 < oldbeardo> okay, I will get back soon, as you said we are near 15th June

17:08 < naywhayare> again, if it isn't done by the deadline, don't worry. it's okay if things fall a little behind in the name of code quality

17:09 < naywhayare> much better to have less high-quality, rigorously-tested and fast code than lots of lower-quality it-works-but-it's-slow code

17:09 < naywhayare> also I guess it's not a deadline, just a schedule :)

17:09 < oldbeardo> certainly, but I would be happy if I stick to it :)

17:09 < naywhayare> I will do most of the tedious integration with trunk, when your code is done

17:10 < oldbeardo> thanks for that

17:11 < oldbeardo> naywhayare: also, apart from this issue is everything else alright?

17:17 < oldbeardo> oh, I saw one more thing, I will also have to write two versions of MonteCarloError(), won't I?

17:19 < naywhayare> oldbeardo: yeah, no problems. you might need two versions of MonteCarloError(), I am not sure

17:19 < naywhayare> either way, if you do, they should be quite similar and should be able to reuse a lot of the same code

17:21 < oldbeardo> okay, will get back tomorrow

17:23 oldbeardo has quit [Quit: Page closed]

17:46 Anand__ has quit [Ping timeout: 246 seconds]

17:50 < udit_s> naywhayare: done.

18:03 < naywhayare> looks good

18:04 < naywhayare> do you want to add decision_stump/ to mlpack/methods/, and then add decision_stump_test.cpp to mlpack/tests/ and update the CMake configuration?

18:05 < udit_s> on my end ?

18:05 < naywhayare> yeah, commit those things to trunk/

18:06 < udit_s> ok. hold on.

18:06 < naywhayare> sure, I can walk you through the process of what needs to be changed, etc.

18:07 < udit_s> I think that would be better.

18:07 < udit_s> right now, I'm just updating the required folders on my clone of your trunk repo.

18:11 < udit_s> what changes do I make in my decision_stump_tests.cpp ?

18:36 sumedhghaisas has joined #mlpack

18:36 < sumedhghaisas> naywhayare: Found any bugs in the code??

18:36 < naywhayare> udit_s: hang on, impromptu meeting... I'll get back to you shortly

19:30 < naywhayare> sumedhghaisas: I wasn't looking for them, do you want me to look?

19:32 < naywhayare> the website with the paper seems to be down so I can't look at it right now

19:32 < naywhayare> but a good place to start might be to see if they have results for other datasets, and see if you are able to reproduce those other results

19:37 < sumedhghaisas> I checked the code twice... I couldnt find any... yes one thing...

19:37 < sumedhghaisas> they mentioned that... I matrix...

19:38 < sumedhghaisas> Only consider filled entries...

19:39 < naywhayare> what is "I matrix"?

19:39 < sumedhghaisas> naywhayare: I have added condition such that non-zero entries will be considered...

19:39 < sumedhghaisas> If user 1 rates item 1 them I(1,1) will be one...

19:39 < sumedhghaisas> except zero...

19:41 < sumedhghaisas> naywhayare: I have the paper.. should I send it to you??

19:44 sumedhghaisas has quit [Remote host closed the connection]

19:46 sumedhghaisas has joined #mlpack

19:46 < naywhayare> sumedhghaisas: yeah, can you email me the paper?

19:46 < sumedhghaisas> okay... will do that right away...

19:48 < naywhayare> thank you

19:48 < sumedhghaisas> got my mail??

19:49 < naywhayare> yeah

19:52 < naywhayare> ok, so on the movielens dataset, with momentum of 0.9, we should get an RMSE of about 0.89

19:52 < naywhayare> (from Figure 1)

19:53 < sumedhghaisas> yes... around that...

19:53 < sumedhghaisas> wait... in our code are we calculating RMSE??

19:53 < naywhayare> no idea

19:54 oldbeardo has joined #mlpack

19:54 < naywhayare> our code calculates residue (|| A - \hat{A} ||_F^2)

19:54 < naywhayare> that looks different than their formula for RMSE

19:55 < sumedhghaisas> yes...

19:55 < sumedhghaisas> my god...

19:55 < naywhayare> haha

19:55 < naywhayare> I'm trying to figure out how we can obtain J_{ij} from our dataset

19:56 < sumedhghaisas> you mean I{i,j ??

19:56 < naywhayare> no, J_{ij}, which is used in equation 1 (on the first page)

19:57 < naywhayare> I_{ij} is just whether or not the object j was scored by user i

19:59 < oldbeardo> naywhayare: you talked about using an instantiated vector object for the priority queue, could you give me an example? I couldn't find it here: http://www.cplusplus.com/reference/queue/priority_queue/priority_queue/

19:59 < naywhayare> hm. (for the movielens dataset) "we randomly select three scores from each user as test data"

20:00 < sumedhghaisas> I and J look same to me...

20:00 < naywhayare> that makes things more complicated and I'm not sure if we'll be able to replicate their result for the movielens dataset

20:00 < naywhayare> oldbeardo: line 32 uses the constructor that takes a comparison function, but we want to pass in a container too, so something like

20:00 < naywhayare> std::vector<T> vector;

20:01 < naywhayare> std::priority_queue<T, std::vector<T> > pq(comparison_function, vector);

20:01 < naywhayare> where I guess comparison_function should be a function pointer to CosineNode::operator<() (or something like that ?)

20:01 < oldbeardo> currently I'm using this

20:02 < oldbeardo> typedef std::priority_queue<CosineNode*, std::vector<CosineNode*>, CompareCosineNode> CosineNodeQueue;

20:02 < naywhayare> right, that looks correct

20:02 < naywhayare> then I guess you'd need

20:02 < naywhayare> std::vector<CosineNode*> vec; /* or some other name */

20:03 < naywhayare> CosineNodeQueue queue(&CompareCosineNode /* I think? might need to change just a little */, vec);

20:03 < naywhayare> wait, hang on, there's an issue:

20:03 < sumedhghaisas> I generally use boost heap as priority queue... read somewhere that it is faster...

20:03 < naywhayare> "A priority_queue keeps internally a comparing function and a container object as data, which are copies of comp and ctnr respectively."

20:04 < naywhayare> so I think my idea with priority_queue isn't going to work

20:04 < oldbeardo> that sucks

20:04 < naywhayare> huh, thanks sumedh; it looks like boost::priority_queue does what we need -- and provides iterators too

20:05 < oldbeardo> great! uses the same kind of interface?

20:05 < naywhayare> seems similar enough -- http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html

20:05 < sumedhghaisas> I didn't know what you need but okay :)

20:05 < naywhayare> there's a section on "Priority Queue Iterators"

20:05 < naywhayare> plus an example

20:06 < naywhayare> the usual iterators have a caveat, but it shouldn't matter for us:

20:06 < naywhayare> "Iterators do not visit heap elements in any specific order."

20:06 < naywhayare> for gram-schmidt orthonormalization that shouldn't matter though

20:06 < oldbeardo> right

20:06 govg has quit [Ping timeout: 264 seconds]

20:07 < naywhayare> we might have to modify the CMake configuration to require boost.heap, but that's easy

20:07 < naywhayare> (I'll show you how, if it's necessary)

20:07 < sumedhghaisas> naywhayare: how exactly J is different than I??

20:07 < sumedhghaisas> I am unable to figure out...

20:08 < naywhayare> I'm not sure -- it just says "Let J be the indicator of A", which is quite unclear

20:09 < sumedhghaisas> but it also says its a {0, 1} matrix of size n * m...

20:09 < naywhayare> yeah, it's definitely an indicator of some sort. I think what it means is this --

20:09 < naywhayare> you split your dataset into a training and test set; both matrices of size n * m

20:10 < naywhayare> so if some A_{ij} is part of the test set, then A_{ij} in the training set will be empty

20:10 < naywhayare> so I think J_{ij} just indicates whether or not a point is part of the test set

20:10 < naywhayare> after you run SVD and get a reconstructed A', you compare only the test values

20:10 < oldbeardo> thanks sumedhghaisas, naywhayare

20:11 < sumedhghaisas> ohhh...

20:11 < sumedhghaisas> so we have to choose some random points...

20:12 < sumedhghaisas> and compute its index...

20:12 < naywhayare> oldbeardo: hopefully there isn't anything I overlooked...

20:12 < naywhayare> sumedhghaisas: yeah, you could try picking 3 random test points for each user, then removing them from the input matrix

20:13 < naywhayare> then calculate the RMSE on those test points, and hopefully it will be somewhere close to 0.89

20:13 < naywhayare> I doubt it will be exactly the same, because our train/test set isn't exactly the same, but hopefully somewhere close

20:13 < oldbeardo> naywhayare: I will hang around for some more time to clarify if I have questions

20:14 < sumedhghaisas> norm = sqrt(accu(WH % WH) / nm);

20:14 < sumedhghaisas> if (iteration != 0)

20:14 < sumedhghaisas> {

20:14 < sumedhghaisas> oldResidue = residue;

20:14 < sumedhghaisas> residue = fabs(normOld - norm);

20:14 < sumedhghaisas> residue /= normOld;

20:14 < sumedhghaisas> }

20:14 < sumedhghaisas> this is how we are computing residue....

20:14 < naywhayare> right, but we're computing RMSE not residue, so this will have to happen after AMF has completed

20:15 < naywhayare> reconstruct the data matrix after AMF is done, then calculate the RMSE only on the test points that you withheld from the data

20:16 < sumedhghaisas> I was saying... paper mentions calculating validation RMSE... we can divide dataset into 3 parts indeed...

20:16 < sumedhghaisas> and the residue will be calculated on validation set...

20:17 < sumedhghaisas> little more expensive but less prone to overfitting...

20:19 < sumedhghaisas> naywhayare: sound good??

20:20 < naywhayare> yeah, but what does the API for this look like? what will we need to change?

20:21 < sumedhghaisas> humm... what more parameters we need...

20:21 < naywhayare> the validation set, or a way to create the validation set

20:21 < sumedhghaisas> ratio between sets for one...

20:23 < naywhayare> we now have a few ideas for how to terminate AMF, so I think we should make the termination condition a template parameter

20:23 < sumedhghaisas> I think testing set can be computed without any information from the user...

20:24 < naywhayare> well, but the user may have a testing set of their own they want to use (like the netflix dataset), or they may want to tune the size of it

20:24 < sumedhghaisas> like set it to 10 percent... hardcoded...

20:24 < naywhayare> for now, because we're debugging the svd with momentum, why don't you just go ahead and make the modifications necessary to sample a validation / test set in the same way that they do in the paper?

20:24 < sumedhghaisas> how can they have different testing set than training set?? I am little confused...

20:25 < naywhayare> when you download the netflix dataset, it comes with a pre-split training set and testing set

20:26 oldbeardo has quit [Quit: Page closed]

20:26 < sumedhghaisas> ohhh... sorry for that... you just put a zero for entries taken as testing set...

20:27 < naywhayare> yes, but in the paper they sample very specifically -- three random ratings from each user

20:27 < sumedhghaisas> and later compare them against W * H

20:27 < naywhayare> yes

20:27 < sumedhghaisas> okay for them testing set and validation set are the same...

20:28 < naywhayare> yes, I think that is true

20:29 < sumedhghaisas> can we add function parameter for this value '3'?

20:29 < sumedhghaisas> ohh.. then separate testing set will be an issue...

20:31 < sumedhghaisas> okay then either separate testing set should be provided... or default testing set will be computed with certain ratio...

20:31 < naywhayare> so like I was suggesting I think it's easiest to solve this problem with templates

20:31 govg has joined #mlpack

20:31 govg has quit [Changing host]

20:31 govg has joined #mlpack

20:32 < naywhayare> either way, for now, let's just try and make it work, so don't worry about making the API nice yet

20:33 < sumedhghaisas> yeah... thats true... I will test it with hardcoded 3...

20:34 < sumedhghaisas> okay... how to remove the entries from the matrix ?? copying is a huge overhead...

20:35 < sumedhghaisas> ahh... computing I(i,j)...

20:35 < naywhayare> just setting A_{ij} = 0 should be sufficient

20:36 < sumedhghaisas> yes...

20:36 < naywhayare> you can hold another sparse matrix too, and just set the value of ij to the old value of A_{ij}

20:37 < sumedhghaisas> I am thinking .... now that we have introduced I(i,j) ... how to pass it to update_rule??

20:38 < naywhayare> no, you don't need I(i, j) for the training

20:39 < naywhayare> the only thing you're trying to change is the termination condition, I thought

20:39 < naywhayare> I thought your goal was to use validation RMSE

20:39 < naywhayare> if you have some validation set B (a sparse matrix, where the only nonzero values are the validation values)

20:39 < naywhayare> then you just iterate over the nonzero values and compare against the value of W*H for that entry

20:40 < sumedhghaisas> yes ... but we somehow have to delete those entries from training data... like ignore it while training right??

20:41 < naywhayare> yes, you do that just by setting A_ij = 0

20:41 < naywhayare> that removes the entry

20:41 < naywhayare> so suppose you're constructing this sparse validation matrix B from A

20:41 < sumedhghaisas> yes... so now update_rule will require A(i,j)...

20:41 < naywhayare> you pick some point A_ij to sample... then do B_ij = A_ij and A_ij = 0

20:43 < sumedhghaisas> yes... look at equation 5 and 6...

20:43 < sumedhghaisas> delta computation will require I matrix ... and the computation is done in update_rule...

20:43 < naywhayare> but the I matrix is just the nonzero entries in the data matrix

20:44 < naywhayare> so you can just perform that calculating for all nonzero entries in the matrix

20:44 < sumedhghaisas> yes... until now I am doing the exact same thing...

20:45 < sumedhghaisas> but now we have testing points in that data...

20:45 < naywhayare> no, you subtracted the testing points from the data when you set A_ij = 0 for all testing points (i, j)

20:45 arcane has joined #mlpack

20:46 < naywhayare> arcane: I didn't realize you submitted patches for all of #350; trac didn't send an email indicating you attached all of those

20:46 < naywhayare> it only sent me one for the comment, and I thought "huh, well, I guess good thing he pointed that out" without realizing you'd done it all already :)

20:46 < arcane> naywhayare, oh i wonder why that happened. Good you got one mail atleast :)

20:47 < arcane> are the changes correct ?

20:47 < naywhayare> haven't looked yet; the ticket is quite simple, so they're probably right

20:47 < naywhayare> that'll be a big API change, so I should bump the version I release next to 1.1.0 instead of 1.0.9

20:48 < sumedhghaisas> naywhayare: okay so we are also making changes to data matrix... I thought only changes to I(i,j) will suffice...

20:49 < sumedhghaisas> then either we have to copy the data or remove the const...

20:49 < naywhayare> remove the const for now

20:49 < naywhayare> we can figure out a better solution later. I think a better priority for now is just to make the algorithm

20:49 < naywhayare> *just to make the algorithm work

20:49 < arcane> well I am curious. how does it matter if it is 1.0.9 or 1.1.0. I thought all 1.0.9 meant was the next higher version or may be I am wrong

20:50 < sumedhghaisas> yeah... you are right...

20:50 < naywhayare> so not everyone agrees on this, but usually major version bumps (1.x.x -> 2.x.x) are reserved for major changes; minor version bumps (1.0.x -> 1.1.x) are reserved for smaller API changes; and patch bumps (1.0.8 -> 1.0.9) are reserved for fixes or improvements that don't have any API changes

20:51 < arcane> ah ok !

20:51 < naywhayare> like I said, not everyone agrees on how it should be done

20:52 < naywhayare> and also mlpack's API has been horrendously unstable, too. I've been meaning to finalize some of the abstractions, but it's never easy to do that...

20:52 < jenkins-mlpack> Starting build #1945 for job mlpack - svn checkin test (previous build: SUCCESS)

20:53 < arcane> yup ... it may not be easy to keep the API simple enough all the time

20:58 < arcane> naywhayare, I was looking at #320. It explains that we can not instantiate BinarySpaceTree<BallBound<> > because of the lack of Metric() function and all. Looking through the code of BinarySpaceTree, I found that it uses bound.Diameter(). Ballbound does not implement this function. I imagine it would be just twice the radius ?

20:59 < naywhayare> yeah, just twice the radius

20:59 < arcane> so implementing this should be simple enough. This along with the fix for MetricType should make ballbound usable with BST

20:59 udit_s has quit [Quit: Leaving]

21:00 < arcane> ok

21:01 < naywhayare> yeah; it should be pretty straightforward. one of the things I need to think about is that the BinarySpaceTree can't actually support arbitrary metrics

21:01 < naywhayare> it can only support LMetric<> (Euclidean space metrics)

21:01 < naywhayare> so the abstractions will need to change a little bit, but your fix for now will be just fine, and the refactoring of abstractions will probably be unrelated to your changes anyway

21:03 < arcane> right

21:20 arcane has quit [Quit: Leaving]

21:25 < jenkins-mlpack> Project mlpack - svn checkin test build #1945: UNSTABLE in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1945/

21:25 < jenkins-mlpack> * saxena.udit: Decision Stump added

21:25 < jenkins-mlpack> * Ryan Curtin: Remove leafSize parameter from DTB constructor.

21:54 < sumedhghaisas> naywhayare: In this we forgot the fact that with their parameters my code is diverging in 4 iteration... :(

21:54 < sumedhghaisas> I have implemented all the necessary requirements now...

21:55 < sumedhghaisas> but the output is this...

21:55 < sumedhghaisas> I am printing the residue of all the iterations...

21:55 < sumedhghaisas> 0: inf

21:56 < sumedhghaisas> 1: 1.27 * e+60

21:56 < sumedhghaisas> 2: inf

21:56 < sumedhghaisas> 2.7 * e250

21:56 < sumedhghaisas> the last value is the RMSE

21:56 < sumedhghaisas> my god... thats a really big value...

21:58 < sumedhghaisas> naywhayare: sorry the last value is square of RMSE...

22:09 < naywhayare> sumedhghaisas: for the residue to be inf, then normOld must be 0

22:10 < sumedhghaisas> ohh ignore the first one...

22:10 < naywhayare> right, but on iteration 2 it happens again

22:10 < sumedhghaisas> yes...

22:10 < naywhayare> this means the update rule must be returning either W = 0 or H = 0

22:10 < naywhayare> in iteration 1

22:10 < sumedhghaisas> not necessarily zero... can be some small finite amount right??

22:11 < naywhayare> it's have to be extremely small

22:11 < naywhayare> *it'd have to be

22:11 < naywhayare> you should investigate whether W or H is very close to 0, and find what situation led to that

22:11 < sumedhghaisas> if I decrease momentum and step... everything works fine...

22:14 < naywhayare> well, sure, but we should find out exactly why that is happening for the given choice of momentum and step

22:15 < sumedhghaisas> okay I will check whats happening to W and H... can you take a look at my update rule implementation and check it with papers equation?? I think its correct but I may have missed a minor point...

22:16 < naywhayare> yes, I will do that now

22:16 < naywhayare> I'm looking at svd_batchlearning.hpp, and equations 5 and 6 in the paper

22:17 < sumedhghaisas> yeah... correct...

22:20 < naywhayare> shouldn't line 64 be -= not +=?

22:21 < naywhayare> same with line 99

22:21 < naywhayare> wait, I see deltaW and deltaH are the negative gradients, not the positive gradients

22:22 < naywhayare> so I think that's correct as it is

22:22 < sumedhghaisas> yeah...

22:24 < sumedhghaisas> okay this weird... let me paste the output here...

22:24 < sumedhghaisas> 3.0236e+05 3.0236e+05

22:25 < sumedhghaisas> 1.8642e+05 1.8667e+05

22:25 < sumedhghaisas> 2.9620e+02 3.1039e+02

22:25 < sumedhghaisas> inf

22:25 < sumedhghaisas> -8.8345e+22 -8.8377e+22

22:25 < sumedhghaisas> -5.4506e+22 -5.4525e+22

22:25 < sumedhghaisas> -8.8620e+19 -8.8651e+19

22:25 < sumedhghaisas> 1.26008e+60

22:25 < sumedhghaisas> 4.8000e+125 4.8017e+125

22:25 < sumedhghaisas> 2.9614e+125 2.9625e+125

22:25 < sumedhghaisas> 4.8149e+122 4.8167e+122

22:25 < sumedhghaisas> inf

22:25 < sumedhghaisas> 4.35836e+250

22:25 < sumedhghaisas> I am printing W matrix...

22:25 < sumedhghaisas> these looks like sudden jumps...

22:25 < naywhayare> how about the H matrix?

22:26 < sumedhghaisas> Its just too large to predict anything...

22:26 < naywhayare> yeah, that's definitely diverging in a bad way

22:26 < naywhayare> but knowing what the H matrix is may clarify things

22:26 < naywhayare> I have to leave, sorry... back in 10/15 minutes

22:26 < sumedhghaisas> okay...

22:45 < naywhayare> alright, back

22:45 < naywhayare> guess it took a little longer than I thought

22:46 < sumedhghaisas> still puzzled by sudden jumps in the value... :( I printed H.. but made no sense... so I printed its sum...

22:46 < sumedhghaisas> same there... sudden jumps...

22:47 < naywhayare> very large numbers?

22:47 < naywhayare> or very small?

22:47 < sumedhghaisas> very large...

22:48 < naywhayare> ok; what about the norm of W*H?

22:48 < sumedhghaisas> I should have printed the average... wait...

22:55 < sumedhghaisas> okay here is the output...

22:55 < sumedhghaisas> naywhayare:

22:55 < sumedhghaisas> W value :

22:55 < sumedhghaisas> 3.0251e+05 3.0246e+05

22:55 < sumedhghaisas> 1.8674e+05 1.8641e+05

22:55 < sumedhghaisas> 2.6779e+02 2.6383e+02

22:55 < sumedhghaisas> Average H : 9221419744100

22:55 < sumedhghaisas> Norm WH : 1.11032e+13

22:55 < sumedhghaisas> inf

22:55 < sumedhghaisas> W value :

22:55 < sumedhghaisas> -8.8617e+22 -8.8564e+22

22:55 < sumedhghaisas> -5.4659e+22 -5.4627e+22

22:55 < sumedhghaisas> -7.7873e+19 -7.7827e+19

22:55 < sumedhghaisas> Average H : 0

22:55 < sumedhghaisas> Norm WH : 1.40924e+73

22:55 < sumedhghaisas> 1.26922e+60

22:55 < sumedhghaisas> W value :

22:55 < sumedhghaisas> 4.8742e+125 4.8713e+125

22:55 < sumedhghaisas> 3.0064e+125 3.0046e+125

22:55 < sumedhghaisas> 4.2833e+122 4.2807e+122

22:55 < sumedhghaisas> Average H : 4610722377450

22:55 < sumedhghaisas> Norm WH : inf

22:56 < sumedhghaisas> inf

22:56 < sumedhghaisas> 1.48409e+251

22:56 < naywhayare> ok, so the norm is actually overflowing, not being divided by 0

22:56 < sumedhghaisas> yes... but H is zero in the second iteration...

22:57 < naywhayare> how are you calculating the average?

22:57 < sumedhghaisas> sum divided by number of entries... simple average...

22:57 < naywhayare> take the abs of each entry

22:57 < sumedhghaisas> ahh...

22:57 < naywhayare> see if the average H changes to something not zero

22:57 < naywhayare> that will tell us a little bit more about what's happening

22:59 < sumedhghaisas> you are right... not zero now...

22:59 < sumedhghaisas> W value :

22:59 < sumedhghaisas> 3.0251e+05 3.0236e+05

22:59 < sumedhghaisas> 1.8659e+05 1.8653e+05

22:59 < sumedhghaisas> 2.7447e+02 2.6797e+02

22:59 < sumedhghaisas> Average H : 24996606

22:59 < sumedhghaisas> Norm WH : 1.10941e+13

22:59 < sumedhghaisas> inf

22:59 < sumedhghaisas> W value :

22:59 < sumedhghaisas> -8.8475e+22 -8.8434e+22

22:59 < sumedhghaisas> -5.4578e+22 -5.4552e+22

22:59 < sumedhghaisas> -7.9344e+19 -7.9307e+19

22:59 < sumedhghaisas> Average H : 9219297273400

22:59 < sumedhghaisas> Norm WH : 1.40195e+73

22:59 < sumedhghaisas> 1.26369e+60

22:59 < sumedhghaisas> W value :

22:59 < sumedhghaisas> 4.8307e+125 4.8285e+125

22:59 < sumedhghaisas> 2.9799e+125 2.9785e+125

22:59 < sumedhghaisas> 4.3322e+122 4.3302e+122

22:59 < sumedhghaisas> Average H : 9219297273400

22:59 < sumedhghaisas> Norm WH : inf

22:59 < sumedhghaisas> inf

22:59 < sumedhghaisas> 5.464e+250

22:59 < sumedhghaisas> average H didn't change??

23:02 < naywhayare> so if you vary the momentum parameter to something that makes it converge (a smaller value?) what happens then?

23:02 < sumedhghaisas> lets try 0.2...

23:03 < sumedhghaisas> better see what happens for 0..

23:03 < sumedhghaisas> W value :

23:03 < sumedhghaisas> 3.0230e+05 3.0265e+05

23:03 < sumedhghaisas> 1.8653e+05 1.8660e+05

23:03 < sumedhghaisas> 2.9095e+02 3.0353e+02

23:03 < sumedhghaisas> Average H : 25012053

23:03 < sumedhghaisas> Norm WH : 1.11027e+13

23:03 < sumedhghaisas> inf

23:03 < sumedhghaisas> W value :

23:03 < sumedhghaisas> -8.8544e+22 -8.8627e+22

23:03 < sumedhghaisas> -5.4614e+22 -5.4665e+22

23:03 < sumedhghaisas> -8.7011e+19 -8.7093e+19

23:03 < sumedhghaisas> Average H : 9219297273400

23:03 < sumedhghaisas> Norm WH : 1.40898e+73

23:03 < sumedhghaisas> 1.26904e+60

23:03 < sumedhghaisas> W value :

23:03 < sumedhghaisas> 4.8690e+125 4.8735e+125

23:04 < sumedhghaisas> 3.0032e+125 3.0060e+125

23:04 < sumedhghaisas> 4.7847e+122 4.7892e+122

23:04 < sumedhghaisas> Average H : 9219297273400

23:04 < sumedhghaisas> Norm WH : inf

23:04 < sumedhghaisas> inf

23:04 < sumedhghaisas> 8.1877e+250

23:04 < sumedhghaisas> no drastic change...

23:04 < sumedhghaisas> and average H getting saturated...

23:04 < sumedhghaisas> or something...

23:05 < sumedhghaisas> momentum is not making significant contribution in this case...

23:06 < naywhayare> okay, but this converges when you use a smaller learning parameter?

23:06 < sumedhghaisas> yes... okay let me find that exactly...

23:09 < sumedhghaisas> with 0.0000001 definitely converging... getting final residue less than e-8...

23:10 < sumedhghaisas> naywhayare: diverging win 4 iterations... for 0.000001

23:12 < naywhayare> okay

23:12 < naywhayare> let me do just a bit of reading

23:12 < sumedhghaisas> okay this may help... 0.0000003...

23:12 < sumedhghaisas> giving almost same results in no time...

23:12 < sumedhghaisas> same as 0.0000001...

23:13 < sumedhghaisas> my god... its lesser than NMF...

23:13 < naywhayare> so it's well-known for any gradient descent type method, the choice of step size (or learning rate) can make a huge difference

23:13 < naywhayare> when the step size is too large, the gradient descent method may step completely over the minimum and then diverge

23:13 < naywhayare> but if it's too small it'll take a long time

23:14 < naywhayare> this seems to be exactly what you are experiencing

23:14 < sumedhghaisas> and again diverging for 0.0000004... in just 4 iteration...

23:14 < sumedhghaisas> so 0...3 is just perfect ...

23:15 < sumedhghaisas> yeah... same kind of experience...

23:15 < sumedhghaisas> I dont know how 0.0002 works...

23:15 < naywhayare> it seems to me like your implementation is fine

23:15 < naywhayare> it does converge, after all, when the step size is small enough

23:16 < sumedhghaisas> yes... but then what about results ... we are off by miles...

23:16 < naywhayare> for validation RMSE?

23:16 < sumedhghaisas> *paper

23:17 < sumedhghaisas> parameter

23:17 < sumedhghaisas> not even converging with their parameters...

23:17 < naywhayare> if you use their momentum of 0.9, can you find a learning rate that will converge?

23:18 < sumedhghaisas> okay let me see..

23:19 < sumedhghaisas> with 0.0000003 and 0.9 final residue is 0.23... nothing compared to e-8... but still...

23:19 < naywhayare> try making the learning rate smaller

23:21 < sumedhghaisas> not good with 0.0000001 either... initial performance is good...

23:22 < sumedhghaisas> so I guess momentum should be decreased with time..

23:23 < sumedhghaisas> 0.00000001 and 0.9... final residue of e-4... took 78 sec... too much time...

23:23 < sumedhghaisas> I think momentum of 0.9 is too large...

23:25 < naywhayare> alright

23:25 < naywhayare> are you calculating the validation RMSE?

23:26 < sumedhghaisas> no... I have commented that code for testing... lets check that...

23:31 < naywhayare> yeah; let's see if we can produce validation RMSE numbers as good as the ones in the paper

23:31 < naywhayare> and if you can, I think we can say the algorithm's working, and their implementation's parameters seem to behave differently than ours

23:37 < sumedhghaisas> oohh ... I have to choose number from non-zero entries... didn't consider that...

23:38 < sumedhghaisas> naywhayare: I am really sleepy now... I will do it first thing after I wake up... is that fine??

23:43 < naywhayare> of course - go get some sleep!