naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
andrewmw94 has left #mlpack []
sumedhghaisas has joined #mlpack
< jenkins-mlpack> Starting build #1944 for job mlpack - svn checkin test (previous build: SUCCESS)
udit_s has joined #mlpack
< jenkins-mlpack> Project mlpack - svn checkin test build #1944: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1944/
< jenkins-mlpack> Ryan Curtin: Inline the simple R tree descent heuristic.
udit_s has quit [Ping timeout: 245 seconds]
sumedhghaisas has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Read error: Connection reset by peer]
andrewmw94 has joined #mlpack
< andrewmw94> 21:41 < naywhayare> andrewmw94: in looking through your code, I noticed lots of java-like idioms that I think won't work at runtime in C++. if you want me to look through these at some point and provide some feedback, I can do that
< andrewmw94> 21:41 < naywhayare> but otherwise I'll wait until you say it's finished
< andrewmw94> 21:41 < naywhayare> so that I don't comment on things you already know about and have made a mental note to redo
< andrewmw94> naywhayare: yeah, I wrote it quickly in quasi-C++ while reading the paper.
< andrewmw94> I'm going to go over it all again and look for bugs and invalid C++ code
oldbeardo has joined #mlpack
< naywhayare> andrewmw94: ok; let me know if you need any help
< naywhayare> oldbeardo: the solution I proposed won't be generic, but it wouldn't be generic anyway because you're taking the CosineNodeQueue as an argument
< oldbeardo> naywhayare: I did what I thought would be a good solution, I think I mentioned it later yesterday
< oldbeardo> I will send you the code in some time, hopefully it will be okay with you :)
< oldbeardo> look at my message at 17.06
EighthOctave has joined #mlpack
< naywhayare> yes, I saw what your solution was, but it's going to be slower because it has to maintain the considerBasis bool vector
< oldbeardo> naywhayare: won't that cost be trivial? also, the priority queue method had some other problem, I had implemented it
< naywhayare> the size of considerBasis is, at maximum, the number of nodes in the tree -- which is, at maximum, the number of columns in the dataset
< naywhayare> you have to iterate over that entire vector each time
< naywhayare> although it is true that for the most part, early on in the tree building process, the overhead is not much, if a user asks for a very close approximation of the svd, your algorithm will be noticeably slower
< oldbeardo> well, I still think that the bool vector cost would be trivial in comparison to the other calculations
< naywhayare> ok, if you want another justification, it adds needless complexity to the code; it's not necessary to maintain that vector because all of the basis vectors that can be a part of V correspond exactly to the basis vectors of all of the nodes in the priority queue
< oldbeardo> true, but iterating over the priority queue is just as complicated
< oldbeardo> there's no support for random access, as it is implemented as a heap
< naywhayare> but... you don't need random access. you access it sequentially
< oldbeardo> how?
Anand_ has joined #mlpack
udit_s has joined #mlpack
< oldbeardo> it you are thinking in terms of a normal for loop, you will still need to provide an index for access, which is not an option
< naywhayare> I had not realized that priority_queue doesn't provide an iterator. hang on...
< oldbeardo> the only other option is to copy the queue for each computation that you need
< naywhayare> no, that's not the only other option. the priority queue takes a container template type; by default this is vector<T>
< Anand_> Marcus : The design looks good and the idea of keeping the timing and other metrics separate is also fine.
< naywhayare> create the vector<T> object and then create the priority_queue with the constructor that takes an existing vector object
< naywhayare> then when you need to iterate over the priority queue, iterate over the vector
< Anand_> Marcus : I think we need to call the metrics function somewhere. Right?
< naywhayare> the priority queue isn't an actual container; it's an "adapter" that just works with an underlying container
< marcus_zoq> Anand_: Okay, great, if you set run: ['metric'] in the config file the script calls the 'RunMetrics()' function.
< Anand_> Cool!
< Anand_> Will the current design work for other libraries too?
< udit_s> naywhayare: hello !
< naywhayare> udit_s: I took a look at your code last night
< naywhayare> only a few comments. instead of having all the ancillary csv files, can you encode the datasets directly into the tests because they're so simple?
< udit_s> okay. what else ?
< oldbeardo> naywhayare: ummm, okay, I still feel that the basis should be kept separate
< marcus_zoq> Anand_: We need to change the design (e.g. rename the RunMethod() method). But then it should work.
< naywhayare> oldbeardo: that is going to incur copying and managing every basis vector, not just the ones that are kept
< Anand_> Marcus : Yeah, I meant the design after renaming and adding the build function. Should work, right?
< naywhayare> udit_s: if you can provide some comments on what the tests are actually testing, that would be nice; the names of the test are helpful but I don't completely understand what each of them aims to do
< naywhayare> udit_s: other than that, it looks good to me. so if you make those changes, I'll start integrating it into trunk
< oldbeardo> naywhayare: I have changed the implementation, it does not use join_rows() anymore if that's what you mean
< naywhayare> when I integrate it into trunk, I'll probably go through and make a number of changes to make it faster
< udit_s> naywhayare: cool. will get back to you in sometime.
< naywhayare> oldbeardo: join_rows() is a huge cost, I am glad you've taken care of that. but for your solution to be competitive you must ensure that you are never copying the basis vector
< oldbeardo> naywhayare: I'm not, just when I'm storing it in the CosineNode object, that's something that has to be done
< naywhayare> I don't know what you mean by that
< oldbeardo> naywhayare: I think you are worried about me copying the useless vectors at each step right?
< oldbeardo> naywhayare: I'm not doing that, I'm using the references, to the useful vectors
< EighthOctave> When I run 'make test', it fails with the message 'mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{1.10311e-05%} between w(i){0.0076869749921349793} and dw(i){0.0076869758400955075} exceeds 1.0000000000000001e-05%'
< EighthOctave> any idea what's going on?
< naywhayare> EighthOctave: that looks like a test tolerance issue. a lot of mlpack tests are randomized, and have some small probability of failure
< naywhayare> it looks like in your case, the probability of failure for that test isn't small enough...
< EighthOctave> shouldn't make test continue though?
< EighthOctave> and not end with "fatal error"
< naywhayare> it shouldn't continue that particular test; but it should continue the rest of the tests in the test suite
< EighthOctave> I've run multiple times, and it always fails in that spot
< EighthOctave> the difference is always slightly higher than the allowed tolerance
< naywhayare> yeah; although the tests are randomized, the random seed is not set, so this basically means that the random seed is set at compile time
< marcus_zoq> the
< marcus_zoq> Sorry, wrong window!
< naywhayare> so every time you run the test, it will fail the exact same way because the RNG is always initialized the same way
< EighthOctave> I see
< naywhayare> I bet if you run that test only ('bin/mlpack_test -t NMFTest/SparseNMFALSTest') it'll work
< EighthOctave> trying...
< EighthOctave> still fails.
< oldbeardo> naywhayare: any thoughts?
< EighthOctave> ./mlpack_test -t NMFTest/SparseNMFALSTest Running 1 test case... /home/may/lib/mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{3.5123%} between w(i){0.0030099676514796503} and dw(i){0.0031195351589538797} exceeds 1.0000000000000001e-05%
< naywhayare> oldbeardo: yes I have thoughts, please hang on
< naywhayare> EighthOctave: ok, that failed differently, at least. can you file a bug report on Trac?
< EighthOctave> sure
< naywhayare> EighthOctave: I think that this is not an actual issue with NMF (I suspect NMF works fine), but I think it's an issue with the test
< naywhayare> thank you
< naywhayare> oldbeardo: although the amortized cost of std::vector<> insert/delete operations is O(1), it is still much faster to only maintain one vector (or priority_queue) than many
< naywhayare> the only thing I don't know is _how_ much faster it will be
< marcus_zoq> Anand_: Looks like there is a bug in my code, if you set run: ['timing', 'metric'] we did not run the 'metrics'. So please checkout the latest version. Btw I use the following command to test the script: 'make run LOG=false CONFIG=small_config.yaml'
< naywhayare> I'm still unclear on why you are so opposed to trying my idea
< oldbeardo> naywhayare: I'm not opposed to it, I tried it once, and it wasn't making things easier
Anand_ has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< Anand_> Marcus : Ok. Can you share the new config file?
< marcus_zoq> Anand_: It's in your branch: https://github.com/zoq/benchmarks/blob/gsoc14/small_config.yaml
< oldbeardo> naywhayare: I sent you a mail, let me know
< naywhayare> oldbeardo: ok, I will look through it
< Anand_> Marcus : Ok, thanks!
< naywhayare> EighthOctave: thank you for the bug report, I will try to find some time to look at it. could you add some information about the system you are running on please?
< EighthOctave> naywhayare: added a few details. let me know if you need anything specific
< naywhayare> yeah... version of gcc and boost, if you don't mind
< naywhayare> (or clang if you're using clang)
govg has joined #mlpack
govg has quit [Ping timeout: 272 seconds]
oldbeardo has quit [Quit: Page closed]
udit_s has quit [Ping timeout: 245 seconds]
< EighthOctave> naywhayare: info added
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
udit_s has joined #mlpack
Anand_ has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< naywhayare> EighthOctave: great, thanks
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: I saw your mail, how do you suggest we proceed?
< naywhayare> oldbeardo: I would like you to try the idea I've suggested, unless you see some big problem I have overlooked that makes it impossible, or if you have another solution that addresses the problems I pointed out (or justification that those problems are not important)
Anand__ has joined #mlpack
Anand_ has quit [Ping timeout: 246 seconds]
< oldbeardo> naywhayare: all of them are valid points, specially since you are thinking in datasets which are in GBs
< oldbeardo> the first point most of all, even I don't think it's a good solution
< oldbeardo> however, my concern with the priority queue idea is that the basis should be independent of the queue, like it is in the algorithm
< oldbeardo> because all implementation problems were present because of that
< oldbeardo> I will try to come up with something that is independent, but first I will try out your idea
< naywhayare> well, but the basis isn't independent of the queue. the basis corresponds only to basis vectors of nodes that are still in the queue
< oldbeardo> at the end of the iteration, yes. but the problem arises at step 3(c) of the algorithm
< naywhayare> right... which is where I suggested adding a parameter to the modified gram-schmidt function that will take an additional vector if necessary
< naywhayare> and that additional vector would be the orthonormalized vector for the left child, when you were doing step 3c for the right child
< naywhayare> that ModifiedGramSchmidt() function is already reasonably restricted to the case of cosine trees anyway
< oldbeardo> right, so I need to remove the AddToBasis() function, and write two versions of MGS
< naywhayare> yeah -- fortunately, one of those MGS functions can call the other, so it shouldn't be too hard
< oldbeardo> okay, I will get back soon, as you said we are near 15th June
< naywhayare> again, if it isn't done by the deadline, don't worry. it's okay if things fall a little behind in the name of code quality
< naywhayare> much better to have less high-quality, rigorously-tested and fast code than lots of lower-quality it-works-but-it's-slow code
< naywhayare> also I guess it's not a deadline, just a schedule :)
< oldbeardo> certainly, but I would be happy if I stick to it :)
< naywhayare> I will do most of the tedious integration with trunk, when your code is done
< oldbeardo> thanks for that
< oldbeardo> naywhayare: also, apart from this issue is everything else alright?
< oldbeardo> oh, I saw one more thing, I will also have to write two versions of MonteCarloError(), won't I?
< naywhayare> oldbeardo: yeah, no problems. you might need two versions of MonteCarloError(), I am not sure
< naywhayare> either way, if you do, they should be quite similar and should be able to reuse a lot of the same code
< oldbeardo> okay, will get back tomorrow
oldbeardo has quit [Quit: Page closed]
Anand__ has quit [Ping timeout: 246 seconds]
< udit_s> naywhayare: done.
< naywhayare> looks good
< naywhayare> do you want to add decision_stump/ to mlpack/methods/, and then add decision_stump_test.cpp to mlpack/tests/ and update the CMake configuration?
< udit_s> on my end ?
< naywhayare> yeah, commit those things to trunk/
< udit_s> ok. hold on.
< naywhayare> sure, I can walk you through the process of what needs to be changed, etc.
< udit_s> I think that would be better.
< udit_s> right now, I'm just updating the required folders on my clone of your trunk repo.
< udit_s> what changes do I make in my decision_stump_tests.cpp ?
sumedhghaisas has joined #mlpack
< sumedhghaisas> naywhayare: Found any bugs in the code??
< naywhayare> udit_s: hang on, impromptu meeting... I'll get back to you shortly
< naywhayare> sumedhghaisas: I wasn't looking for them, do you want me to look?
< naywhayare> the website with the paper seems to be down so I can't look at it right now
< naywhayare> but a good place to start might be to see if they have results for other datasets, and see if you are able to reproduce those other results
< sumedhghaisas> I checked the code twice... I couldnt find any... yes one thing...
< sumedhghaisas> they mentioned that... I matrix...
< sumedhghaisas> Only consider filled entries...
< naywhayare> what is "I matrix"?
< sumedhghaisas> naywhayare: I have added condition such that non-zero entries will be considered...
< sumedhghaisas> If user 1 rates item 1 them I(1,1) will be one...
< sumedhghaisas> except zero...
< sumedhghaisas> naywhayare: I have the paper.. should I send it to you??
sumedhghaisas has quit [Remote host closed the connection]
sumedhghaisas has joined #mlpack
< naywhayare> sumedhghaisas: yeah, can you email me the paper?
< sumedhghaisas> okay... will do that right away...
< naywhayare> thank you
< sumedhghaisas> got my mail??
< naywhayare> yeah
< naywhayare> ok, so on the movielens dataset, with momentum of 0.9, we should get an RMSE of about 0.89
< naywhayare> (from Figure 1)
< sumedhghaisas> yes... around that...
< sumedhghaisas> wait... in our code are we calculating RMSE??
< naywhayare> no idea
oldbeardo has joined #mlpack
< naywhayare> our code calculates residue (|| A - \hat{A} ||_F^2)
< naywhayare> that looks different than their formula for RMSE
< sumedhghaisas> yes...
< sumedhghaisas> my god...
< naywhayare> haha
< naywhayare> I'm trying to figure out how we can obtain J_{ij} from our dataset
< sumedhghaisas> you mean I{i,j ??
< naywhayare> no, J_{ij}, which is used in equation 1 (on the first page)
< naywhayare> I_{ij} is just whether or not the object j was scored by user i
< oldbeardo> naywhayare: you talked about using an instantiated vector object for the priority queue, could you give me an example? I couldn't find it here: http://www.cplusplus.com/reference/queue/priority_queue/priority_queue/
< naywhayare> hm. (for the movielens dataset) "we randomly select three scores from each user as test data"
< sumedhghaisas> I and J look same to me...
< naywhayare> that makes things more complicated and I'm not sure if we'll be able to replicate their result for the movielens dataset
< naywhayare> oldbeardo: line 32 uses the constructor that takes a comparison function, but we want to pass in a container too, so something like
< naywhayare> std::vector<T> vector;
< naywhayare> std::priority_queue<T, std::vector<T> > pq(comparison_function, vector);
< naywhayare> where I guess comparison_function should be a function pointer to CosineNode::operator<() (or something like that ?)
< oldbeardo> currently I'm using this
< oldbeardo> typedef std::priority_queue<CosineNode*, std::vector<CosineNode*>, CompareCosineNode> CosineNodeQueue;
< naywhayare> right, that looks correct
< naywhayare> then I guess you'd need
< naywhayare> std::vector<CosineNode*> vec; /* or some other name */
< naywhayare> CosineNodeQueue queue(&CompareCosineNode /* I think? might need to change just a little */, vec);
< naywhayare> wait, hang on, there's an issue:
< sumedhghaisas> I generally use boost heap as priority queue... read somewhere that it is faster...
< naywhayare> "A priority_queue keeps internally a comparing function and a container object as data, which are copies of comp and ctnr respectively."
< naywhayare> so I think my idea with priority_queue isn't going to work
< oldbeardo> that sucks
< naywhayare> huh, thanks sumedh; it looks like boost::priority_queue does what we need -- and provides iterators too
< oldbeardo> great! uses the same kind of interface?
< naywhayare> seems similar enough -- http://www.boost.org/doc/libs/1_55_0/doc/html/heap.html
< sumedhghaisas> I didn't know what you need but okay :)
< naywhayare> there's a section on "Priority Queue Iterators"
< naywhayare> plus an example
< naywhayare> the usual iterators have a caveat, but it shouldn't matter for us:
< naywhayare> "Iterators do not visit heap elements in any specific order."
< naywhayare> for gram-schmidt orthonormalization that shouldn't matter though
< oldbeardo> right
govg has quit [Ping timeout: 264 seconds]
< naywhayare> we might have to modify the CMake configuration to require boost.heap, but that's easy
< naywhayare> (I'll show you how, if it's necessary)
< sumedhghaisas> naywhayare: how exactly J is different than I??
< sumedhghaisas> I am unable to figure out...
< naywhayare> I'm not sure -- it just says "Let J be the indicator of A", which is quite unclear
< sumedhghaisas> but it also says its a {0, 1} matrix of size n * m...
< naywhayare> yeah, it's definitely an indicator of some sort. I think what it means is this --
< naywhayare> you split your dataset into a training and test set; both matrices of size n * m
< naywhayare> so if some A_{ij} is part of the test set, then A_{ij} in the training set will be empty
< naywhayare> so I think J_{ij} just indicates whether or not a point is part of the test set
< naywhayare> after you run SVD and get a reconstructed A', you compare only the test values
< oldbeardo> thanks sumedhghaisas, naywhayare
< sumedhghaisas> ohhh...
< sumedhghaisas> so we have to choose some random points...
< sumedhghaisas> and compute its index...
< naywhayare> oldbeardo: hopefully there isn't anything I overlooked...
< naywhayare> sumedhghaisas: yeah, you could try picking 3 random test points for each user, then removing them from the input matrix
< naywhayare> then calculate the RMSE on those test points, and hopefully it will be somewhere close to 0.89
< naywhayare> I doubt it will be exactly the same, because our train/test set isn't exactly the same, but hopefully somewhere close
< oldbeardo> naywhayare: I will hang around for some more time to clarify if I have questions
< sumedhghaisas> norm = sqrt(accu(WH % WH) / nm);
< sumedhghaisas> if (iteration != 0)
< sumedhghaisas> {
< sumedhghaisas> oldResidue = residue;
< sumedhghaisas> residue = fabs(normOld - norm);
< sumedhghaisas> residue /= normOld;
< sumedhghaisas> }
< sumedhghaisas> this is how we are computing residue....
< naywhayare> right, but we're computing RMSE not residue, so this will have to happen after AMF has completed
< naywhayare> reconstruct the data matrix after AMF is done, then calculate the RMSE only on the test points that you withheld from the data
< sumedhghaisas> I was saying... paper mentions calculating validation RMSE... we can divide dataset into 3 parts indeed...
< sumedhghaisas> and the residue will be calculated on validation set...
< sumedhghaisas> little more expensive but less prone to overfitting...
< sumedhghaisas> naywhayare: sound good??
< naywhayare> yeah, but what does the API for this look like? what will we need to change?
< sumedhghaisas> humm... what more parameters we need...
< naywhayare> the validation set, or a way to create the validation set
< sumedhghaisas> ratio between sets for one...
< naywhayare> we now have a few ideas for how to terminate AMF, so I think we should make the termination condition a template parameter
< sumedhghaisas> I think testing set can be computed without any information from the user...
< naywhayare> well, but the user may have a testing set of their own they want to use (like the netflix dataset), or they may want to tune the size of it
< sumedhghaisas> like set it to 10 percent... hardcoded...
< naywhayare> for now, because we're debugging the svd with momentum, why don't you just go ahead and make the modifications necessary to sample a validation / test set in the same way that they do in the paper?
< sumedhghaisas> how can they have different testing set than training set?? I am little confused...
< naywhayare> when you download the netflix dataset, it comes with a pre-split training set and testing set
oldbeardo has quit [Quit: Page closed]
< sumedhghaisas> ohhh... sorry for that... you just put a zero for entries taken as testing set...
< naywhayare> yes, but in the paper they sample very specifically -- three random ratings from each user
< sumedhghaisas> and later compare them against W * H
< naywhayare> yes
< sumedhghaisas> okay for them testing set and validation set are the same...
< naywhayare> yes, I think that is true
< sumedhghaisas> can we add function parameter for this value '3'?
< sumedhghaisas> ohh.. then separate testing set will be an issue...
< sumedhghaisas> okay then either separate testing set should be provided... or default testing set will be computed with certain ratio...
< naywhayare> so like I was suggesting I think it's easiest to solve this problem with templates
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare> either way, for now, let's just try and make it work, so don't worry about making the API nice yet
< sumedhghaisas> yeah... thats true... I will test it with hardcoded 3...
< sumedhghaisas> okay... how to remove the entries from the matrix ?? copying is a huge overhead...
< sumedhghaisas> ahh... computing I(i,j)...
< naywhayare> just setting A_{ij} = 0 should be sufficient
< sumedhghaisas> yes...
< naywhayare> you can hold another sparse matrix too, and just set the value of ij to the old value of A_{ij}
< sumedhghaisas> I am thinking .... now that we have introduced I(i,j) ... how to pass it to update_rule??
< naywhayare> no, you don't need I(i, j) for the training
< naywhayare> the only thing you're trying to change is the termination condition, I thought
< naywhayare> I thought your goal was to use validation RMSE
< naywhayare> if you have some validation set B (a sparse matrix, where the only nonzero values are the validation values)
< naywhayare> then you just iterate over the nonzero values and compare against the value of W*H for that entry
< sumedhghaisas> yes ... but we somehow have to delete those entries from training data... like ignore it while training right??
< naywhayare> yes, you do that just by setting A_ij = 0
< naywhayare> that removes the entry
< naywhayare> so suppose you're constructing this sparse validation matrix B from A
< sumedhghaisas> yes... so now update_rule will require A(i,j)...
< naywhayare> you pick some point A_ij to sample... then do B_ij = A_ij and A_ij = 0
< sumedhghaisas> yes... look at equation 5 and 6...
< sumedhghaisas> delta computation will require I matrix ... and the computation is done in update_rule...
< naywhayare> but the I matrix is just the nonzero entries in the data matrix
< naywhayare> so you can just perform that calculating for all nonzero entries in the matrix
< sumedhghaisas> yes... until now I am doing the exact same thing...
< sumedhghaisas> but now we have testing points in that data...
< naywhayare> no, you subtracted the testing points from the data when you set A_ij = 0 for all testing points (i, j)
arcane has joined #mlpack
< naywhayare> arcane: I didn't realize you submitted patches for all of #350; trac didn't send an email indicating you attached all of those
< naywhayare> it only sent me one for the comment, and I thought "huh, well, I guess good thing he pointed that out" without realizing you'd done it all already :)
< arcane> naywhayare, oh i wonder why that happened. Good you got one mail atleast :)
< arcane> are the changes correct ?
< naywhayare> haven't looked yet; the ticket is quite simple, so they're probably right
< naywhayare> that'll be a big API change, so I should bump the version I release next to 1.1.0 instead of 1.0.9
< sumedhghaisas> naywhayare: okay so we are also making changes to data matrix... I thought only changes to I(i,j) will suffice...
< sumedhghaisas> then either we have to copy the data or remove the const...
< naywhayare> remove the const for now
< naywhayare> we can figure out a better solution later. I think a better priority for now is just to make the algorithm
< naywhayare> *just to make the algorithm work
< arcane> well I am curious. how does it matter if it is 1.0.9 or 1.1.0. I thought all 1.0.9 meant was the next higher version or may be I am wrong
< sumedhghaisas> yeah... you are right...
< naywhayare> so not everyone agrees on this, but usually major version bumps (1.x.x -> 2.x.x) are reserved for major changes; minor version bumps (1.0.x -> 1.1.x) are reserved for smaller API changes; and patch bumps (1.0.8 -> 1.0.9) are reserved for fixes or improvements that don't have any API changes
< arcane> ah ok !
< naywhayare> like I said, not everyone agrees on how it should be done
< naywhayare> and also mlpack's API has been horrendously unstable, too. I've been meaning to finalize some of the abstractions, but it's never easy to do that...
< jenkins-mlpack> Starting build #1945 for job mlpack - svn checkin test (previous build: SUCCESS)
< arcane> yup ... it may not be easy to keep the API simple enough all the time
< arcane> naywhayare, I was looking at #320. It explains that we can not instantiate BinarySpaceTree<BallBound<> > because of the lack of Metric() function and all. Looking through the code of BinarySpaceTree, I found that it uses bound.Diameter(). Ballbound does not implement this function. I imagine it would be just twice the radius ?
< naywhayare> yeah, just twice the radius
< arcane> so implementing this should be simple enough. This along with the fix for MetricType should make ballbound usable with BST
udit_s has quit [Quit: Leaving]
< arcane> ok
< naywhayare> yeah; it should be pretty straightforward. one of the things I need to think about is that the BinarySpaceTree can't actually support arbitrary metrics
< naywhayare> it can only support LMetric<> (Euclidean space metrics)
< naywhayare> so the abstractions will need to change a little bit, but your fix for now will be just fine, and the refactoring of abstractions will probably be unrelated to your changes anyway
< arcane> right
arcane has quit [Quit: Leaving]
< jenkins-mlpack> Project mlpack - svn checkin test build #1945: UNSTABLE in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1945/
< jenkins-mlpack> * saxena.udit: Decision Stump added
< jenkins-mlpack> * Ryan Curtin: Remove leafSize parameter from DTB constructor.
< sumedhghaisas> naywhayare: In this we forgot the fact that with their parameters my code is diverging in 4 iteration... :(
< sumedhghaisas> I have implemented all the necessary requirements now...
< sumedhghaisas> but the output is this...
< sumedhghaisas> I am printing the residue of all the iterations...
< sumedhghaisas> 0: inf
< sumedhghaisas> 1: 1.27 * e+60
< sumedhghaisas> 2: inf
< sumedhghaisas> 2.7 * e250
< sumedhghaisas> the last value is the RMSE
< sumedhghaisas> my god... thats a really big value...
< sumedhghaisas> naywhayare: sorry the last value is square of RMSE...
< naywhayare> sumedhghaisas: for the residue to be inf, then normOld must be 0
< sumedhghaisas> ohh ignore the first one...
< naywhayare> right, but on iteration 2 it happens again
< sumedhghaisas> yes...
< naywhayare> this means the update rule must be returning either W = 0 or H = 0
< naywhayare> in iteration 1
< sumedhghaisas> not necessarily zero... can be some small finite amount right??
< naywhayare> it's have to be extremely small
< naywhayare> *it'd have to be
< naywhayare> you should investigate whether W or H is very close to 0, and find what situation led to that
< sumedhghaisas> if I decrease momentum and step... everything works fine...
< naywhayare> well, sure, but we should find out exactly why that is happening for the given choice of momentum and step
< sumedhghaisas> okay I will check whats happening to W and H... can you take a look at my update rule implementation and check it with papers equation?? I think its correct but I may have missed a minor point...
< naywhayare> yes, I will do that now
< naywhayare> I'm looking at svd_batchlearning.hpp, and equations 5 and 6 in the paper
< sumedhghaisas> yeah... correct...
< naywhayare> shouldn't line 64 be -= not +=?
< naywhayare> same with line 99
< naywhayare> wait, I see deltaW and deltaH are the negative gradients, not the positive gradients
< naywhayare> so I think that's correct as it is
< sumedhghaisas> yeah...
< sumedhghaisas> okay this weird... let me paste the output here...
< sumedhghaisas> 3.0236e+05 3.0236e+05
< sumedhghaisas> 1.8642e+05 1.8667e+05
< sumedhghaisas> 2.9620e+02 3.1039e+02
< sumedhghaisas> inf
< sumedhghaisas> -8.8345e+22 -8.8377e+22
< sumedhghaisas> -5.4506e+22 -5.4525e+22
< sumedhghaisas> -8.8620e+19 -8.8651e+19
< sumedhghaisas> 1.26008e+60
< sumedhghaisas> 4.8000e+125 4.8017e+125
< sumedhghaisas> 2.9614e+125 2.9625e+125
< sumedhghaisas> 4.8149e+122 4.8167e+122
< sumedhghaisas> inf
< sumedhghaisas> 4.35836e+250
< sumedhghaisas> I am printing W matrix...
< sumedhghaisas> these looks like sudden jumps...
< naywhayare> how about the H matrix?
< sumedhghaisas> Its just too large to predict anything...
< naywhayare> yeah, that's definitely diverging in a bad way
< naywhayare> but knowing what the H matrix is may clarify things
< naywhayare> I have to leave, sorry... back in 10/15 minutes
< sumedhghaisas> okay...
< naywhayare> alright, back
< naywhayare> guess it took a little longer than I thought
< sumedhghaisas> still puzzled by sudden jumps in the value... :( I printed H.. but made no sense... so I printed its sum...
< sumedhghaisas> same there... sudden jumps...
< naywhayare> very large numbers?
< naywhayare> or very small?
< sumedhghaisas> very large...
< naywhayare> ok; what about the norm of W*H?
< sumedhghaisas> I should have printed the average... wait...
< sumedhghaisas> okay here is the output...
< sumedhghaisas> naywhayare:
< sumedhghaisas> W value :
< sumedhghaisas> 3.0251e+05 3.0246e+05
< sumedhghaisas> 1.8674e+05 1.8641e+05
< sumedhghaisas> 2.6779e+02 2.6383e+02
< sumedhghaisas> Average H : 9221419744100
< sumedhghaisas> Norm WH : 1.11032e+13
< sumedhghaisas> inf
< sumedhghaisas> W value :
< sumedhghaisas> -8.8617e+22 -8.8564e+22
< sumedhghaisas> -5.4659e+22 -5.4627e+22
< sumedhghaisas> -7.7873e+19 -7.7827e+19
< sumedhghaisas> Average H : 0
< sumedhghaisas> Norm WH : 1.40924e+73
< sumedhghaisas> 1.26922e+60
< sumedhghaisas> W value :
< sumedhghaisas> 4.8742e+125 4.8713e+125
< sumedhghaisas> 3.0064e+125 3.0046e+125
< sumedhghaisas> 4.2833e+122 4.2807e+122
< sumedhghaisas> Average H : 4610722377450
< sumedhghaisas> Norm WH : inf
< sumedhghaisas> inf
< sumedhghaisas> 1.48409e+251
< naywhayare> ok, so the norm is actually overflowing, not being divided by 0
< sumedhghaisas> yes... but H is zero in the second iteration...
< naywhayare> how are you calculating the average?
< sumedhghaisas> sum divided by number of entries... simple average...
< naywhayare> take the abs of each entry
< sumedhghaisas> ahh...
< naywhayare> see if the average H changes to something not zero
< naywhayare> that will tell us a little bit more about what's happening
< sumedhghaisas> you are right... not zero now...
< sumedhghaisas> W value :
< sumedhghaisas> 3.0251e+05 3.0236e+05
< sumedhghaisas> 1.8659e+05 1.8653e+05
< sumedhghaisas> 2.7447e+02 2.6797e+02
< sumedhghaisas> Average H : 24996606
< sumedhghaisas> Norm WH : 1.10941e+13
< sumedhghaisas> inf
< sumedhghaisas> W value :
< sumedhghaisas> -8.8475e+22 -8.8434e+22
< sumedhghaisas> -5.4578e+22 -5.4552e+22
< sumedhghaisas> -7.9344e+19 -7.9307e+19
< sumedhghaisas> Average H : 9219297273400
< sumedhghaisas> Norm WH : 1.40195e+73
< sumedhghaisas> 1.26369e+60
< sumedhghaisas> W value :
< sumedhghaisas> 4.8307e+125 4.8285e+125
< sumedhghaisas> 2.9799e+125 2.9785e+125
< sumedhghaisas> 4.3322e+122 4.3302e+122
< sumedhghaisas> Average H : 9219297273400
< sumedhghaisas> Norm WH : inf
< sumedhghaisas> inf
< sumedhghaisas> 5.464e+250
< sumedhghaisas> average H didn't change??
< naywhayare> so if you vary the momentum parameter to something that makes it converge (a smaller value?) what happens then?
< sumedhghaisas> lets try 0.2...
< sumedhghaisas> better see what happens for 0..
< sumedhghaisas> W value :
< sumedhghaisas> 3.0230e+05 3.0265e+05
< sumedhghaisas> 1.8653e+05 1.8660e+05
< sumedhghaisas> 2.9095e+02 3.0353e+02
< sumedhghaisas> Average H : 25012053
< sumedhghaisas> Norm WH : 1.11027e+13
< sumedhghaisas> inf
< sumedhghaisas> W value :
< sumedhghaisas> -8.8544e+22 -8.8627e+22
< sumedhghaisas> -5.4614e+22 -5.4665e+22
< sumedhghaisas> -8.7011e+19 -8.7093e+19
< sumedhghaisas> Average H : 9219297273400
< sumedhghaisas> Norm WH : 1.40898e+73
< sumedhghaisas> 1.26904e+60
< sumedhghaisas> W value :
< sumedhghaisas> 4.8690e+125 4.8735e+125
< sumedhghaisas> 3.0032e+125 3.0060e+125
< sumedhghaisas> 4.7847e+122 4.7892e+122
< sumedhghaisas> Average H : 9219297273400
< sumedhghaisas> Norm WH : inf
< sumedhghaisas> inf
< sumedhghaisas> 8.1877e+250
< sumedhghaisas> no drastic change...
< sumedhghaisas> and average H getting saturated...
< sumedhghaisas> or something...
< sumedhghaisas> momentum is not making significant contribution in this case...
< naywhayare> okay, but this converges when you use a smaller learning parameter?
< sumedhghaisas> yes... okay let me find that exactly...
< sumedhghaisas> with 0.0000001 definitely converging... getting final residue less than e-8...
< sumedhghaisas> naywhayare: diverging win 4 iterations... for 0.000001
< naywhayare> okay
< naywhayare> let me do just a bit of reading
< sumedhghaisas> okay this may help... 0.0000003...
< sumedhghaisas> giving almost same results in no time...
< sumedhghaisas> same as 0.0000001...
< sumedhghaisas> my god... its lesser than NMF...
< naywhayare> so it's well-known for any gradient descent type method, the choice of step size (or learning rate) can make a huge difference
< naywhayare> when the step size is too large, the gradient descent method may step completely over the minimum and then diverge
< naywhayare> but if it's too small it'll take a long time
< naywhayare> this seems to be exactly what you are experiencing
< sumedhghaisas> and again diverging for 0.0000004... in just 4 iteration...
< sumedhghaisas> so 0...3 is just perfect ...
< sumedhghaisas> yeah... same kind of experience...
< sumedhghaisas> I dont know how 0.0002 works...
< naywhayare> it seems to me like your implementation is fine
< naywhayare> it does converge, after all, when the step size is small enough
< sumedhghaisas> yes... but then what about results ... we are off by miles...
< naywhayare> for validation RMSE?
< sumedhghaisas> *paper
< sumedhghaisas> parameter
< sumedhghaisas> not even converging with their parameters...
< naywhayare> if you use their momentum of 0.9, can you find a learning rate that will converge?
< sumedhghaisas> okay let me see..
< sumedhghaisas> with 0.0000003 and 0.9 final residue is 0.23... nothing compared to e-8... but still...
< naywhayare> try making the learning rate smaller
< sumedhghaisas> not good with 0.0000001 either... initial performance is good...
< sumedhghaisas> so I guess momentum should be decreased with time..
< sumedhghaisas> 0.00000001 and 0.9... final residue of e-4... took 78 sec... too much time...
< sumedhghaisas> I think momentum of 0.9 is too large...
< naywhayare> alright
< naywhayare> are you calculating the validation RMSE?
< sumedhghaisas> no... I have commented that code for testing... lets check that...
< naywhayare> yeah; let's see if we can produce validation RMSE numbers as good as the ones in the paper
< naywhayare> and if you can, I think we can say the algorithm's working, and their implementation's parameters seem to behave differently than ours
< sumedhghaisas> oohh ... I have to choose number from non-zero entries... didn't consider that...
< sumedhghaisas> naywhayare: I am really sleepy now... I will do it first thing after I wake up... is that fine??
< naywhayare> of course - go get some sleep!