naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
andrewmw94 has left #mlpack []
sumedhghaisas has joined #mlpack
< jenkins-mlpack>
Starting build #1944 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack>
Ryan Curtin: Inline the simple R tree descent heuristic.
udit_s has quit [Ping timeout: 245 seconds]
sumedhghaisas has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Read error: Connection reset by peer]
andrewmw94 has joined #mlpack
< andrewmw94>
21:41 < naywhayare> andrewmw94: in looking through your code, I noticed lots of java-like idioms that I think won't work at runtime in C++. if you want me to look through these at some point and provide some feedback, I can do that
< andrewmw94>
21:41 < naywhayare> but otherwise I'll wait until you say it's finished
< andrewmw94>
21:41 < naywhayare> so that I don't comment on things you already know about and have made a mental note to redo
< andrewmw94>
naywhayare: yeah, I wrote it quickly in quasi-C++ while reading the paper.
< andrewmw94>
I'm going to go over it all again and look for bugs and invalid C++ code
oldbeardo has joined #mlpack
< naywhayare>
andrewmw94: ok; let me know if you need any help
< naywhayare>
oldbeardo: the solution I proposed won't be generic, but it wouldn't be generic anyway because you're taking the CosineNodeQueue as an argument
< oldbeardo>
naywhayare: I did what I thought would be a good solution, I think I mentioned it later yesterday
< oldbeardo>
I will send you the code in some time, hopefully it will be okay with you :)
< naywhayare>
yes, I saw what your solution was, but it's going to be slower because it has to maintain the considerBasis bool vector
< oldbeardo>
naywhayare: won't that cost be trivial? also, the priority queue method had some other problem, I had implemented it
< naywhayare>
the size of considerBasis is, at maximum, the number of nodes in the tree -- which is, at maximum, the number of columns in the dataset
< naywhayare>
you have to iterate over that entire vector each time
< naywhayare>
although it is true that for the most part, early on in the tree building process, the overhead is not much, if a user asks for a very close approximation of the svd, your algorithm will be noticeably slower
< oldbeardo>
well, I still think that the bool vector cost would be trivial in comparison to the other calculations
< naywhayare>
ok, if you want another justification, it adds needless complexity to the code; it's not necessary to maintain that vector because all of the basis vectors that can be a part of V correspond exactly to the basis vectors of all of the nodes in the priority queue
< oldbeardo>
true, but iterating over the priority queue is just as complicated
< oldbeardo>
there's no support for random access, as it is implemented as a heap
< naywhayare>
but... you don't need random access. you access it sequentially
< oldbeardo>
how?
Anand_ has joined #mlpack
udit_s has joined #mlpack
< oldbeardo>
it you are thinking in terms of a normal for loop, you will still need to provide an index for access, which is not an option
< naywhayare>
I had not realized that priority_queue doesn't provide an iterator. hang on...
< oldbeardo>
the only other option is to copy the queue for each computation that you need
< naywhayare>
no, that's not the only other option. the priority queue takes a container template type; by default this is vector<T>
< Anand_>
Marcus : The design looks good and the idea of keeping the timing and other metrics separate is also fine.
< naywhayare>
create the vector<T> object and then create the priority_queue with the constructor that takes an existing vector object
< naywhayare>
then when you need to iterate over the priority queue, iterate over the vector
< Anand_>
Marcus : I think we need to call the metrics function somewhere. Right?
< naywhayare>
the priority queue isn't an actual container; it's an "adapter" that just works with an underlying container
< marcus_zoq>
Anand_: Okay, great, if you set run: ['metric'] in the config file the script calls the 'RunMetrics()' function.
< Anand_>
Cool!
< Anand_>
Will the current design work for other libraries too?
< udit_s>
naywhayare: hello !
< naywhayare>
udit_s: I took a look at your code last night
< naywhayare>
only a few comments. instead of having all the ancillary csv files, can you encode the datasets directly into the tests because they're so simple?
< udit_s>
okay. what else ?
< oldbeardo>
naywhayare: ummm, okay, I still feel that the basis should be kept separate
< marcus_zoq>
Anand_: We need to change the design (e.g. rename the RunMethod() method). But then it should work.
< naywhayare>
oldbeardo: that is going to incur copying and managing every basis vector, not just the ones that are kept
< Anand_>
Marcus : Yeah, I meant the design after renaming and adding the build function. Should work, right?
< naywhayare>
udit_s: if you can provide some comments on what the tests are actually testing, that would be nice; the names of the test are helpful but I don't completely understand what each of them aims to do
< naywhayare>
udit_s: other than that, it looks good to me. so if you make those changes, I'll start integrating it into trunk
< oldbeardo>
naywhayare: I have changed the implementation, it does not use join_rows() anymore if that's what you mean
< naywhayare>
when I integrate it into trunk, I'll probably go through and make a number of changes to make it faster
< udit_s>
naywhayare: cool. will get back to you in sometime.
< naywhayare>
oldbeardo: join_rows() is a huge cost, I am glad you've taken care of that. but for your solution to be competitive you must ensure that you are never copying the basis vector
< oldbeardo>
naywhayare: I'm not, just when I'm storing it in the CosineNode object, that's something that has to be done
< naywhayare>
I don't know what you mean by that
< oldbeardo>
naywhayare: I think you are worried about me copying the useless vectors at each step right?
< oldbeardo>
naywhayare: I'm not doing that, I'm using the references, to the useful vectors
< EighthOctave>
When I run 'make test', it fails with the message 'mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{1.10311e-05%} between w(i){0.0076869749921349793} and dw(i){0.0076869758400955075} exceeds 1.0000000000000001e-05%'
< EighthOctave>
any idea what's going on?
< naywhayare>
EighthOctave: that looks like a test tolerance issue. a lot of mlpack tests are randomized, and have some small probability of failure
< naywhayare>
it looks like in your case, the probability of failure for that test isn't small enough...
< EighthOctave>
shouldn't make test continue though?
< EighthOctave>
and not end with "fatal error"
< naywhayare>
it shouldn't continue that particular test; but it should continue the rest of the tests in the test suite
< EighthOctave>
I've run multiple times, and it always fails in that spot
< EighthOctave>
the difference is always slightly higher than the allowed tolerance
< naywhayare>
yeah; although the tests are randomized, the random seed is not set, so this basically means that the random seed is set at compile time
< marcus_zoq>
the
< marcus_zoq>
Sorry, wrong window!
< naywhayare>
so every time you run the test, it will fail the exact same way because the RNG is always initialized the same way
< EighthOctave>
I see
< naywhayare>
I bet if you run that test only ('bin/mlpack_test -t NMFTest/SparseNMFALSTest') it'll work
< EighthOctave>
trying...
< EighthOctave>
still fails.
< oldbeardo>
naywhayare: any thoughts?
< EighthOctave>
./mlpack_test -t NMFTest/SparseNMFALSTest Running 1 test case... /home/may/lib/mlpack-1.0.8/src/mlpack/tests/nmf_test.cpp(271): fatal error in "SparseNMFALSTest": difference{3.5123%} between w(i){0.0030099676514796503} and dw(i){0.0031195351589538797} exceeds 1.0000000000000001e-05%
< naywhayare>
oldbeardo: yes I have thoughts, please hang on
< naywhayare>
EighthOctave: ok, that failed differently, at least. can you file a bug report on Trac?
< EighthOctave>
sure
< naywhayare>
EighthOctave: I think that this is not an actual issue with NMF (I suspect NMF works fine), but I think it's an issue with the test
< naywhayare>
thank you
< naywhayare>
oldbeardo: although the amortized cost of std::vector<> insert/delete operations is O(1), it is still much faster to only maintain one vector (or priority_queue) than many
< naywhayare>
the only thing I don't know is _how_ much faster it will be
< marcus_zoq>
Anand_: Looks like there is a bug in my code, if you set run: ['timing', 'metric'] we did not run the 'metrics'. So please checkout the latest version. Btw I use the following command to test the script: 'make run LOG=false CONFIG=small_config.yaml'
< naywhayare>
I'm still unclear on why you are so opposed to trying my idea
< oldbeardo>
naywhayare: I'm not opposed to it, I tried it once, and it wasn't making things easier
Anand_ has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< Anand_>
Marcus : Ok. Can you share the new config file?
< oldbeardo>
naywhayare: I sent you a mail, let me know
< naywhayare>
oldbeardo: ok, I will look through it
< Anand_>
Marcus : Ok, thanks!
< naywhayare>
EighthOctave: thank you for the bug report, I will try to find some time to look at it. could you add some information about the system you are running on please?
< EighthOctave>
naywhayare: added a few details. let me know if you need anything specific
< naywhayare>
yeah... version of gcc and boost, if you don't mind
< naywhayare>
(or clang if you're using clang)
govg has joined #mlpack
govg has quit [Ping timeout: 272 seconds]
oldbeardo has quit [Quit: Page closed]
udit_s has quit [Ping timeout: 245 seconds]
< EighthOctave>
naywhayare: info added
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
udit_s has joined #mlpack
Anand_ has quit [Ping timeout: 246 seconds]
Anand_ has joined #mlpack
< naywhayare>
EighthOctave: great, thanks
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: I saw your mail, how do you suggest we proceed?
< naywhayare>
oldbeardo: I would like you to try the idea I've suggested, unless you see some big problem I have overlooked that makes it impossible, or if you have another solution that addresses the problems I pointed out (or justification that those problems are not important)
Anand__ has joined #mlpack
Anand_ has quit [Ping timeout: 246 seconds]
< oldbeardo>
naywhayare: all of them are valid points, specially since you are thinking in datasets which are in GBs
< oldbeardo>
the first point most of all, even I don't think it's a good solution
< oldbeardo>
however, my concern with the priority queue idea is that the basis should be independent of the queue, like it is in the algorithm
< oldbeardo>
because all implementation problems were present because of that
< oldbeardo>
I will try to come up with something that is independent, but first I will try out your idea
< naywhayare>
well, but the basis isn't independent of the queue. the basis corresponds only to basis vectors of nodes that are still in the queue
< oldbeardo>
at the end of the iteration, yes. but the problem arises at step 3(c) of the algorithm
< naywhayare>
right... which is where I suggested adding a parameter to the modified gram-schmidt function that will take an additional vector if necessary
< naywhayare>
and that additional vector would be the orthonormalized vector for the left child, when you were doing step 3c for the right child
< naywhayare>
that ModifiedGramSchmidt() function is already reasonably restricted to the case of cosine trees anyway
< oldbeardo>
right, so I need to remove the AddToBasis() function, and write two versions of MGS
< naywhayare>
yeah -- fortunately, one of those MGS functions can call the other, so it shouldn't be too hard
< oldbeardo>
okay, I will get back soon, as you said we are near 15th June
< naywhayare>
again, if it isn't done by the deadline, don't worry. it's okay if things fall a little behind in the name of code quality
< naywhayare>
much better to have less high-quality, rigorously-tested and fast code than lots of lower-quality it-works-but-it's-slow code
< naywhayare>
also I guess it's not a deadline, just a schedule :)
< oldbeardo>
certainly, but I would be happy if I stick to it :)
< naywhayare>
I will do most of the tedious integration with trunk, when your code is done
< oldbeardo>
thanks for that
< oldbeardo>
naywhayare: also, apart from this issue is everything else alright?
< oldbeardo>
oh, I saw one more thing, I will also have to write two versions of MonteCarloError(), won't I?
< naywhayare>
oldbeardo: yeah, no problems. you might need two versions of MonteCarloError(), I am not sure
< naywhayare>
either way, if you do, they should be quite similar and should be able to reuse a lot of the same code
< oldbeardo>
okay, will get back tomorrow
oldbeardo has quit [Quit: Page closed]
Anand__ has quit [Ping timeout: 246 seconds]
< udit_s>
naywhayare: done.
< naywhayare>
looks good
< naywhayare>
do you want to add decision_stump/ to mlpack/methods/, and then add decision_stump_test.cpp to mlpack/tests/ and update the CMake configuration?
< udit_s>
on my end ?
< naywhayare>
yeah, commit those things to trunk/
< udit_s>
ok. hold on.
< naywhayare>
sure, I can walk you through the process of what needs to be changed, etc.
< udit_s>
I think that would be better.
< udit_s>
right now, I'm just updating the required folders on my clone of your trunk repo.
< udit_s>
what changes do I make in my decision_stump_tests.cpp ?
sumedhghaisas has joined #mlpack
< sumedhghaisas>
naywhayare: Found any bugs in the code??
< naywhayare>
udit_s: hang on, impromptu meeting... I'll get back to you shortly
< naywhayare>
sumedhghaisas: I wasn't looking for them, do you want me to look?
< naywhayare>
the website with the paper seems to be down so I can't look at it right now
< naywhayare>
but a good place to start might be to see if they have results for other datasets, and see if you are able to reproduce those other results
< sumedhghaisas>
I checked the code twice... I couldnt find any... yes one thing...
< sumedhghaisas>
they mentioned that... I matrix...
< sumedhghaisas>
Only consider filled entries...
< naywhayare>
what is "I matrix"?
< sumedhghaisas>
naywhayare: I have added condition such that non-zero entries will be considered...
< sumedhghaisas>
If user 1 rates item 1 them I(1,1) will be one...
< sumedhghaisas>
except zero...
< sumedhghaisas>
naywhayare: I have the paper.. should I send it to you??
sumedhghaisas has quit [Remote host closed the connection]
sumedhghaisas has joined #mlpack
< naywhayare>
sumedhghaisas: yeah, can you email me the paper?
< sumedhghaisas>
okay... will do that right away...
< naywhayare>
thank you
< sumedhghaisas>
got my mail??
< naywhayare>
yeah
< naywhayare>
ok, so on the movielens dataset, with momentum of 0.9, we should get an RMSE of about 0.89
< naywhayare>
(from Figure 1)
< sumedhghaisas>
yes... around that...
< sumedhghaisas>
wait... in our code are we calculating RMSE??
< sumedhghaisas>
I didn't know what you need but okay :)
< naywhayare>
there's a section on "Priority Queue Iterators"
< naywhayare>
plus an example
< naywhayare>
the usual iterators have a caveat, but it shouldn't matter for us:
< naywhayare>
"Iterators do not visit heap elements in any specific order."
< naywhayare>
for gram-schmidt orthonormalization that shouldn't matter though
< oldbeardo>
right
govg has quit [Ping timeout: 264 seconds]
< naywhayare>
we might have to modify the CMake configuration to require boost.heap, but that's easy
< naywhayare>
(I'll show you how, if it's necessary)
< sumedhghaisas>
naywhayare: how exactly J is different than I??
< sumedhghaisas>
I am unable to figure out...
< naywhayare>
I'm not sure -- it just says "Let J be the indicator of A", which is quite unclear
< sumedhghaisas>
but it also says its a {0, 1} matrix of size n * m...
< naywhayare>
yeah, it's definitely an indicator of some sort. I think what it means is this --
< naywhayare>
you split your dataset into a training and test set; both matrices of size n * m
< naywhayare>
so if some A_{ij} is part of the test set, then A_{ij} in the training set will be empty
< naywhayare>
so I think J_{ij} just indicates whether or not a point is part of the test set
< naywhayare>
after you run SVD and get a reconstructed A', you compare only the test values
< oldbeardo>
thanks sumedhghaisas, naywhayare
< sumedhghaisas>
ohhh...
< sumedhghaisas>
so we have to choose some random points...
< sumedhghaisas>
and compute its index...
< naywhayare>
oldbeardo: hopefully there isn't anything I overlooked...
< naywhayare>
sumedhghaisas: yeah, you could try picking 3 random test points for each user, then removing them from the input matrix
< naywhayare>
then calculate the RMSE on those test points, and hopefully it will be somewhere close to 0.89
< naywhayare>
I doubt it will be exactly the same, because our train/test set isn't exactly the same, but hopefully somewhere close
< oldbeardo>
naywhayare: I will hang around for some more time to clarify if I have questions
< sumedhghaisas>
norm = sqrt(accu(WH % WH) / nm);
< sumedhghaisas>
if (iteration != 0)
< sumedhghaisas>
{
< sumedhghaisas>
oldResidue = residue;
< sumedhghaisas>
residue = fabs(normOld - norm);
< sumedhghaisas>
residue /= normOld;
< sumedhghaisas>
}
< sumedhghaisas>
this is how we are computing residue....
< naywhayare>
right, but we're computing RMSE not residue, so this will have to happen after AMF has completed
< naywhayare>
reconstruct the data matrix after AMF is done, then calculate the RMSE only on the test points that you withheld from the data
< sumedhghaisas>
I was saying... paper mentions calculating validation RMSE... we can divide dataset into 3 parts indeed...
< sumedhghaisas>
and the residue will be calculated on validation set...
< sumedhghaisas>
little more expensive but less prone to overfitting...
< sumedhghaisas>
naywhayare: sound good??
< naywhayare>
yeah, but what does the API for this look like? what will we need to change?
< sumedhghaisas>
humm... what more parameters we need...
< naywhayare>
the validation set, or a way to create the validation set
< sumedhghaisas>
ratio between sets for one...
< naywhayare>
we now have a few ideas for how to terminate AMF, so I think we should make the termination condition a template parameter
< sumedhghaisas>
I think testing set can be computed without any information from the user...
< naywhayare>
well, but the user may have a testing set of their own they want to use (like the netflix dataset), or they may want to tune the size of it
< sumedhghaisas>
like set it to 10 percent... hardcoded...
< naywhayare>
for now, because we're debugging the svd with momentum, why don't you just go ahead and make the modifications necessary to sample a validation / test set in the same way that they do in the paper?
< sumedhghaisas>
how can they have different testing set than training set?? I am little confused...
< naywhayare>
when you download the netflix dataset, it comes with a pre-split training set and testing set
oldbeardo has quit [Quit: Page closed]
< sumedhghaisas>
ohhh... sorry for that... you just put a zero for entries taken as testing set...
< naywhayare>
yes, but in the paper they sample very specifically -- three random ratings from each user
< sumedhghaisas>
and later compare them against W * H
< naywhayare>
yes
< sumedhghaisas>
okay for them testing set and validation set are the same...
< naywhayare>
yes, I think that is true
< sumedhghaisas>
can we add function parameter for this value '3'?
< sumedhghaisas>
ohh.. then separate testing set will be an issue...
< sumedhghaisas>
okay then either separate testing set should be provided... or default testing set will be computed with certain ratio...
< naywhayare>
so like I was suggesting I think it's easiest to solve this problem with templates
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
< naywhayare>
either way, for now, let's just try and make it work, so don't worry about making the API nice yet
< sumedhghaisas>
yeah... thats true... I will test it with hardcoded 3...
< sumedhghaisas>
okay... how to remove the entries from the matrix ?? copying is a huge overhead...
< sumedhghaisas>
ahh... computing I(i,j)...
< naywhayare>
just setting A_{ij} = 0 should be sufficient
< sumedhghaisas>
yes...
< naywhayare>
you can hold another sparse matrix too, and just set the value of ij to the old value of A_{ij}
< sumedhghaisas>
I am thinking .... now that we have introduced I(i,j) ... how to pass it to update_rule??
< naywhayare>
no, you don't need I(i, j) for the training
< naywhayare>
the only thing you're trying to change is the termination condition, I thought
< naywhayare>
I thought your goal was to use validation RMSE
< naywhayare>
if you have some validation set B (a sparse matrix, where the only nonzero values are the validation values)
< naywhayare>
then you just iterate over the nonzero values and compare against the value of W*H for that entry
< sumedhghaisas>
yes ... but we somehow have to delete those entries from training data... like ignore it while training right??
< naywhayare>
yes, you do that just by setting A_ij = 0
< naywhayare>
that removes the entry
< naywhayare>
so suppose you're constructing this sparse validation matrix B from A
< sumedhghaisas>
yes... so now update_rule will require A(i,j)...
< naywhayare>
you pick some point A_ij to sample... then do B_ij = A_ij and A_ij = 0
< sumedhghaisas>
yes... look at equation 5 and 6...
< sumedhghaisas>
delta computation will require I matrix ... and the computation is done in update_rule...
< naywhayare>
but the I matrix is just the nonzero entries in the data matrix
< naywhayare>
so you can just perform that calculating for all nonzero entries in the matrix
< sumedhghaisas>
yes... until now I am doing the exact same thing...
< sumedhghaisas>
but now we have testing points in that data...
< naywhayare>
no, you subtracted the testing points from the data when you set A_ij = 0 for all testing points (i, j)
arcane has joined #mlpack
< naywhayare>
arcane: I didn't realize you submitted patches for all of #350; trac didn't send an email indicating you attached all of those
< naywhayare>
it only sent me one for the comment, and I thought "huh, well, I guess good thing he pointed that out" without realizing you'd done it all already :)
< arcane>
naywhayare, oh i wonder why that happened. Good you got one mail atleast :)
< arcane>
are the changes correct ?
< naywhayare>
haven't looked yet; the ticket is quite simple, so they're probably right
< naywhayare>
that'll be a big API change, so I should bump the version I release next to 1.1.0 instead of 1.0.9
< sumedhghaisas>
naywhayare: okay so we are also making changes to data matrix... I thought only changes to I(i,j) will suffice...
< sumedhghaisas>
then either we have to copy the data or remove the const...
< naywhayare>
remove the const for now
< naywhayare>
we can figure out a better solution later. I think a better priority for now is just to make the algorithm
< naywhayare>
*just to make the algorithm work
< arcane>
well I am curious. how does it matter if it is 1.0.9 or 1.1.0. I thought all 1.0.9 meant was the next higher version or may be I am wrong
< sumedhghaisas>
yeah... you are right...
< naywhayare>
so not everyone agrees on this, but usually major version bumps (1.x.x -> 2.x.x) are reserved for major changes; minor version bumps (1.0.x -> 1.1.x) are reserved for smaller API changes; and patch bumps (1.0.8 -> 1.0.9) are reserved for fixes or improvements that don't have any API changes
< arcane>
ah ok !
< naywhayare>
like I said, not everyone agrees on how it should be done
< naywhayare>
and also mlpack's API has been horrendously unstable, too. I've been meaning to finalize some of the abstractions, but it's never easy to do that...
< jenkins-mlpack>
Starting build #1945 for job mlpack - svn checkin test (previous build: SUCCESS)
< arcane>
yup ... it may not be easy to keep the API simple enough all the time
< arcane>
naywhayare, I was looking at #320. It explains that we can not instantiate BinarySpaceTree<BallBound<> > because of the lack of Metric() function and all. Looking through the code of BinarySpaceTree, I found that it uses bound.Diameter(). Ballbound does not implement this function. I imagine it would be just twice the radius ?
< naywhayare>
yeah, just twice the radius
< arcane>
so implementing this should be simple enough. This along with the fix for MetricType should make ballbound usable with BST
udit_s has quit [Quit: Leaving]
< arcane>
ok
< naywhayare>
yeah; it should be pretty straightforward. one of the things I need to think about is that the BinarySpaceTree can't actually support arbitrary metrics
< naywhayare>
it can only support LMetric<> (Euclidean space metrics)
< naywhayare>
so the abstractions will need to change a little bit, but your fix for now will be just fine, and the refactoring of abstractions will probably be unrelated to your changes anyway
< jenkins-mlpack>
* Ryan Curtin: Remove leafSize parameter from DTB constructor.
< sumedhghaisas>
naywhayare: In this we forgot the fact that with their parameters my code is diverging in 4 iteration... :(
< sumedhghaisas>
I have implemented all the necessary requirements now...
< sumedhghaisas>
but the output is this...
< sumedhghaisas>
I am printing the residue of all the iterations...
< sumedhghaisas>
0: inf
< sumedhghaisas>
1: 1.27 * e+60
< sumedhghaisas>
2: inf
< sumedhghaisas>
2.7 * e250
< sumedhghaisas>
the last value is the RMSE
< sumedhghaisas>
my god... thats a really big value...
< sumedhghaisas>
naywhayare: sorry the last value is square of RMSE...
< naywhayare>
sumedhghaisas: for the residue to be inf, then normOld must be 0
< sumedhghaisas>
ohh ignore the first one...
< naywhayare>
right, but on iteration 2 it happens again
< sumedhghaisas>
yes...
< naywhayare>
this means the update rule must be returning either W = 0 or H = 0
< naywhayare>
in iteration 1
< sumedhghaisas>
not necessarily zero... can be some small finite amount right??
< naywhayare>
it's have to be extremely small
< naywhayare>
*it'd have to be
< naywhayare>
you should investigate whether W or H is very close to 0, and find what situation led to that
< sumedhghaisas>
if I decrease momentum and step... everything works fine...
< naywhayare>
well, sure, but we should find out exactly why that is happening for the given choice of momentum and step
< sumedhghaisas>
okay I will check whats happening to W and H... can you take a look at my update rule implementation and check it with papers equation?? I think its correct but I may have missed a minor point...
< naywhayare>
yes, I will do that now
< naywhayare>
I'm looking at svd_batchlearning.hpp, and equations 5 and 6 in the paper
< sumedhghaisas>
yeah... correct...
< naywhayare>
shouldn't line 64 be -= not +=?
< naywhayare>
same with line 99
< naywhayare>
wait, I see deltaW and deltaH are the negative gradients, not the positive gradients
< naywhayare>
so I think that's correct as it is
< sumedhghaisas>
yeah...
< sumedhghaisas>
okay this weird... let me paste the output here...
< sumedhghaisas>
3.0236e+05 3.0236e+05
< sumedhghaisas>
1.8642e+05 1.8667e+05
< sumedhghaisas>
2.9620e+02 3.1039e+02
< sumedhghaisas>
inf
< sumedhghaisas>
-8.8345e+22 -8.8377e+22
< sumedhghaisas>
-5.4506e+22 -5.4525e+22
< sumedhghaisas>
-8.8620e+19 -8.8651e+19
< sumedhghaisas>
1.26008e+60
< sumedhghaisas>
4.8000e+125 4.8017e+125
< sumedhghaisas>
2.9614e+125 2.9625e+125
< sumedhghaisas>
4.8149e+122 4.8167e+122
< sumedhghaisas>
inf
< sumedhghaisas>
4.35836e+250
< sumedhghaisas>
I am printing W matrix...
< sumedhghaisas>
these looks like sudden jumps...
< naywhayare>
how about the H matrix?
< sumedhghaisas>
Its just too large to predict anything...
< naywhayare>
yeah, that's definitely diverging in a bad way
< naywhayare>
but knowing what the H matrix is may clarify things
< naywhayare>
I have to leave, sorry... back in 10/15 minutes
< sumedhghaisas>
okay...
< naywhayare>
alright, back
< naywhayare>
guess it took a little longer than I thought
< sumedhghaisas>
still puzzled by sudden jumps in the value... :( I printed H.. but made no sense... so I printed its sum...
< sumedhghaisas>
same there... sudden jumps...
< naywhayare>
very large numbers?
< naywhayare>
or very small?
< sumedhghaisas>
very large...
< naywhayare>
ok; what about the norm of W*H?
< sumedhghaisas>
I should have printed the average... wait...
< sumedhghaisas>
okay here is the output...
< sumedhghaisas>
naywhayare:
< sumedhghaisas>
W value :
< sumedhghaisas>
3.0251e+05 3.0246e+05
< sumedhghaisas>
1.8674e+05 1.8641e+05
< sumedhghaisas>
2.6779e+02 2.6383e+02
< sumedhghaisas>
Average H : 9221419744100
< sumedhghaisas>
Norm WH : 1.11032e+13
< sumedhghaisas>
inf
< sumedhghaisas>
W value :
< sumedhghaisas>
-8.8617e+22 -8.8564e+22
< sumedhghaisas>
-5.4659e+22 -5.4627e+22
< sumedhghaisas>
-7.7873e+19 -7.7827e+19
< sumedhghaisas>
Average H : 0
< sumedhghaisas>
Norm WH : 1.40924e+73
< sumedhghaisas>
1.26922e+60
< sumedhghaisas>
W value :
< sumedhghaisas>
4.8742e+125 4.8713e+125
< sumedhghaisas>
3.0064e+125 3.0046e+125
< sumedhghaisas>
4.2833e+122 4.2807e+122
< sumedhghaisas>
Average H : 4610722377450
< sumedhghaisas>
Norm WH : inf
< sumedhghaisas>
inf
< sumedhghaisas>
1.48409e+251
< naywhayare>
ok, so the norm is actually overflowing, not being divided by 0
< sumedhghaisas>
yes... but H is zero in the second iteration...
< naywhayare>
how are you calculating the average?
< sumedhghaisas>
sum divided by number of entries... simple average...
< naywhayare>
take the abs of each entry
< sumedhghaisas>
ahh...
< naywhayare>
see if the average H changes to something not zero
< naywhayare>
that will tell us a little bit more about what's happening
< sumedhghaisas>
you are right... not zero now...
< sumedhghaisas>
W value :
< sumedhghaisas>
3.0251e+05 3.0236e+05
< sumedhghaisas>
1.8659e+05 1.8653e+05
< sumedhghaisas>
2.7447e+02 2.6797e+02
< sumedhghaisas>
Average H : 24996606
< sumedhghaisas>
Norm WH : 1.10941e+13
< sumedhghaisas>
inf
< sumedhghaisas>
W value :
< sumedhghaisas>
-8.8475e+22 -8.8434e+22
< sumedhghaisas>
-5.4578e+22 -5.4552e+22
< sumedhghaisas>
-7.9344e+19 -7.9307e+19
< sumedhghaisas>
Average H : 9219297273400
< sumedhghaisas>
Norm WH : 1.40195e+73
< sumedhghaisas>
1.26369e+60
< sumedhghaisas>
W value :
< sumedhghaisas>
4.8307e+125 4.8285e+125
< sumedhghaisas>
2.9799e+125 2.9785e+125
< sumedhghaisas>
4.3322e+122 4.3302e+122
< sumedhghaisas>
Average H : 9219297273400
< sumedhghaisas>
Norm WH : inf
< sumedhghaisas>
inf
< sumedhghaisas>
5.464e+250
< sumedhghaisas>
average H didn't change??
< naywhayare>
so if you vary the momentum parameter to something that makes it converge (a smaller value?) what happens then?
< sumedhghaisas>
lets try 0.2...
< sumedhghaisas>
better see what happens for 0..
< sumedhghaisas>
W value :
< sumedhghaisas>
3.0230e+05 3.0265e+05
< sumedhghaisas>
1.8653e+05 1.8660e+05
< sumedhghaisas>
2.9095e+02 3.0353e+02
< sumedhghaisas>
Average H : 25012053
< sumedhghaisas>
Norm WH : 1.11027e+13
< sumedhghaisas>
inf
< sumedhghaisas>
W value :
< sumedhghaisas>
-8.8544e+22 -8.8627e+22
< sumedhghaisas>
-5.4614e+22 -5.4665e+22
< sumedhghaisas>
-8.7011e+19 -8.7093e+19
< sumedhghaisas>
Average H : 9219297273400
< sumedhghaisas>
Norm WH : 1.40898e+73
< sumedhghaisas>
1.26904e+60
< sumedhghaisas>
W value :
< sumedhghaisas>
4.8690e+125 4.8735e+125
< sumedhghaisas>
3.0032e+125 3.0060e+125
< sumedhghaisas>
4.7847e+122 4.7892e+122
< sumedhghaisas>
Average H : 9219297273400
< sumedhghaisas>
Norm WH : inf
< sumedhghaisas>
inf
< sumedhghaisas>
8.1877e+250
< sumedhghaisas>
no drastic change...
< sumedhghaisas>
and average H getting saturated...
< sumedhghaisas>
or something...
< sumedhghaisas>
momentum is not making significant contribution in this case...
< naywhayare>
okay, but this converges when you use a smaller learning parameter?
< sumedhghaisas>
yes... okay let me find that exactly...
< sumedhghaisas>
with 0.0000001 definitely converging... getting final residue less than e-8...
< sumedhghaisas>
naywhayare: diverging win 4 iterations... for 0.000001
< naywhayare>
okay
< naywhayare>
let me do just a bit of reading
< sumedhghaisas>
okay this may help... 0.0000003...
< sumedhghaisas>
giving almost same results in no time...
< sumedhghaisas>
same as 0.0000001...
< sumedhghaisas>
my god... its lesser than NMF...
< naywhayare>
so it's well-known for any gradient descent type method, the choice of step size (or learning rate) can make a huge difference
< naywhayare>
when the step size is too large, the gradient descent method may step completely over the minimum and then diverge
< naywhayare>
but if it's too small it'll take a long time
< naywhayare>
this seems to be exactly what you are experiencing
< sumedhghaisas>
and again diverging for 0.0000004... in just 4 iteration...
< sumedhghaisas>
so 0...3 is just perfect ...
< sumedhghaisas>
yeah... same kind of experience...
< sumedhghaisas>
I dont know how 0.0002 works...
< naywhayare>
it seems to me like your implementation is fine
< naywhayare>
it does converge, after all, when the step size is small enough
< sumedhghaisas>
yes... but then what about results ... we are off by miles...
< naywhayare>
for validation RMSE?
< sumedhghaisas>
*paper
< sumedhghaisas>
parameter
< sumedhghaisas>
not even converging with their parameters...
< naywhayare>
if you use their momentum of 0.9, can you find a learning rate that will converge?
< sumedhghaisas>
okay let me see..
< sumedhghaisas>
with 0.0000003 and 0.9 final residue is 0.23... nothing compared to e-8... but still...
< naywhayare>
try making the learning rate smaller
< sumedhghaisas>
not good with 0.0000001 either... initial performance is good...
< sumedhghaisas>
so I guess momentum should be decreased with time..
< sumedhghaisas>
0.00000001 and 0.9... final residue of e-4... took 78 sec... too much time...
< sumedhghaisas>
I think momentum of 0.9 is too large...
< naywhayare>
alright
< naywhayare>
are you calculating the validation RMSE?
< sumedhghaisas>
no... I have commented that code for testing... lets check that...
< naywhayare>
yeah; let's see if we can produce validation RMSE numbers as good as the ones in the paper
< naywhayare>
and if you can, I think we can say the algorithm's working, and their implementation's parameters seem to behave differently than ours
< sumedhghaisas>
oohh ... I have to choose number from non-zero entries... didn't consider that...
< sumedhghaisas>
naywhayare: I am really sleepy now... I will do it first thing after I wake up... is that fine??