naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Read error: No route to host]
< jenkins-mlpack> Starting build #1936 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack> Project mlpack - svn checkin test build #1936: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1936/
< jenkins-mlpack> andrewmw94: change a comment to be more accurate. Add on --r_tree option to the knn_all program. Doesn't do anything yet.
udit_s has joined #mlpack
udit_s has quit [Ping timeout: 260 seconds]
udit_s has joined #mlpack
udit_s has quit [Client Quit]
udit_s has joined #mlpack
udit_s has quit [Ping timeout: 245 seconds]
udit_s has joined #mlpack
udit_s has quit [Ping timeout: 260 seconds]
udit_s has joined #mlpack
udit_s has quit [Ping timeout: 245 seconds]
udit_s has joined #mlpack
andrewmw94 has joined #mlpack
Anand has joined #mlpack
oldbeardo has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< oldbeardo> naywhayare: I tested the implementation on the GroupLens100k dataset
< oldbeardo> I don't think it is suitable for Collaborative Filtering, since the rating matrix is quite sparse
< oldbeardo> creating a basis which will capture the variation in the complete matrix is not easy, and will definitely not be low rank
< oldbeardo> so using QUIC-SVD will defeat the purpose of speeding things up
udit_s has quit [Read error: Connection reset by peer]
< andrewmw94> naywhayare: So I'm trying to compile my code before I submit it, so I don't break the build, but I'm stumped on this error:
< andrewmw94> error: ‘RectangleTree’ is not a class template
< andrewmw94> class RectangleTree<StatisticType, MatType, SplitType, DescentType>::
< andrewmw94> I tried to set up the templates in the same way as the BinarySpaceTree and the SingleTreeTraverser
< andrewmw94> as far as I can tell, the only differences are the names and the removal of BoundType and addition of DescentType
< andrewmw94> any idea what I did wrong?
udit_s has joined #mlpack
< naywhayare> oldbeardo: so if we find a rating matrix that is less sparse, you think QuicSVD will work better?
< naywhayare> andrewmw94: what line of what file is that?
< oldbeardo> naywhayare: it does work, on the US declaration matrix as mentioned in the paper
< oldbeardo> naywhayare: I think it's something mentioned day before
< oldbeardo> *I mentioned
< naywhayare> ok; so you think we should add the declaration dataset as a test dataset?
< andrewmw94> naywhayare: /mlpack/core/tree/rectangle_tree/rectangle_tree_traverser.hpp:24:7 You will also want to see /mlpack/core/tree/rectangle_tree/rectangle_tree.hpp:16:0
< naywhayare> and you're right, you did mention that over the weekend; I missed it
< andrewmw94> I'll wait to avoid confusion with oldbeardo
< oldbeardo> thanks andrewmw94
< oldbeardo> naywhayare: well the dataset is around 450mb in size
< oldbeardo> it's approx. a 4000 X 4000 matrx
< naywhayare> eek, that is pretty large. is it sparse?
< oldbeardo> nopes, it's actually the gray scale of an image
< oldbeardo> which itself is 2mb in size
< naywhayare> okay
< naywhayare> so I'm glad you've tested on the declaration dataset because it allows us to see that quic-svd is working, but that's way too large to put into src/mlpack/tests/
< oldbeardo> yes, it was good to see it work
< naywhayare> but you say that GroupLens isn't low-rank?
< naywhayare> I'm surprised about this because that suggests that NMF wouldn't work well either
< oldbeardo> no, I'm saying that finding a basis of low rank through cosine tree construction is not possible
< naywhayare> andrewmw94: rectangle_tree_traverser.hpp:53 doesn't have a closing semicolon. maybe that is the issue?
< naywhayare> oldbeardo: ok, I see
< naywhayare> let me do a little bit of quick reading to understand why
< oldbeardo> naywhayare: okay, thanks andrewmw94, now I will do the waiting bit
< naywhayare> oldbeardo: I'm not seeing why quic-svd isn't applicable to low-rank matrices. In equation 2 they lay out the definition of the optimal k-rank approximation, and then all of the rest of the paper seems focused on LRMA (low-rank matrix approximatino)
< naywhayare> *approximation
< naywhayare> it seems to me that the speedup of quic-svd is also dependent on the rank of the matrix (lower rank -> faster quic-svd)
< oldbeardo> naywhayare: I said that only after running it on the GroupLens100k matrix
< oldbeardo> to give some perspective, the relative error starts at around 0.04 for the declaration matrix
< oldbeardo> whereas for the GroupLens matrix it starts at 0.6, and hovers around 0.55 after 15-20 iterations
< naywhayare> ok; when you say relative error, do you mean relative reconstruction error?
< oldbeardo> I mean monteCarloError / root.frobNormSquared()
< naywhayare> oh, ok, I see
< oldbeardo> and I think I understand why
< naywhayare> can you explain that further?
< oldbeardo> the basis vectors chosen are the centroids of the columns in the node
< oldbeardo> for a sparse matrix, that vector won't be highly representative of anything
< oldbeardo> does that make sense?
< naywhayare> yeah, it makes sense
< naywhayare> what if you made a transformation to the grouplens dataset and substituted each 0 value with the average rating for that item?
< naywhayare> I think that might help that issue
< naywhayare> basically I want to get one small dataset that we have good performance on to ensure that the algorithm is working, so we can use that for our tests
< naywhayare> it doesn't have to be the grouplens dataset but it would be convenient if it was, because then we wouldn't have to add a new dataset (which takes up space and makes the distribution larger)
< oldbeardo> right, I'll try that, what about testing the cosine tree implementaion?
< naywhayare> I spent the weekend relaxing after my paper deadline instead of thinking about trees :) I finally have some time again, so I'll make room to figure something out later today
< oldbeardo> heh, okay, so finally done with the NIPS submission eh?
< naywhayare> yeah. I actually gave up on one of the papers about 24 hours before the deadline, so I went home. but while I was laying in bed I thought "there's one more thing I didn't try..."
< naywhayare> so I got back up and went back to lab to try it... and then I got good results, which was good, except it meant I now had to stay up all night and write the paper
< oldbeardo> nice, are the deadlines always so thoughtful? I mean weekend right after it
< naywhayare> yeah, I appreciated that. much better than a sunday night deadline
< oldbeardo> okay, anyway I'll try that out, we will have a chat tomorrow about trees
< oldbeardo> and best of luck :)
< andrewmw94> naywhayare: I don't think the semi colon is the problem. After fixing it I still get:
< andrewmw94> [ 40%] Building CXX object src/mlpack/methods/cf/CMakeFiles/cf.dir/cf_main.cpp.o
< andrewmw94> Building CXX object src/mlpack/methods/gmm/CMakeFiles/gmm.dir/gmm_main.cpp.o
< andrewmw94> In file included from /home/awells/Development/mlpack/mlpack/trunk/src/mlpack/../mlpack/core/tree/rectangle_tree.hpp:16:0,
< andrewmw94> from /home/awells/Development/mlpack/mlpack/trunk/src/mlpack/../mlpack/methods/neighbor_search/neighbor_search.hpp:16,
< andrewmw94> from /home/awells/Development/mlpack/mlpack/trunk/src/mlpack/methods/cf/cf.hpp:14,
< andrewmw94> from /home/awells/Development/mlpack/mlpack/trunk/src/mlpack/methods/cf/cf_main.cpp:9:
< andrewmw94> /home/awells/Development/mlpack/mlpack/trunk/src/mlpack/../mlpack/core/tree/rectangle_tree/rectangle_tree_traverser.hpp:24:7: error: ‘RectangleTree’ is not a class template
< andrewmw94> class RectangleTree<StatisticType, MatType, SplitType, DescentType>::
< oldbeardo> andrewmw94: you missed a semi-colon in rectangle_tree_traverser.hpp
< oldbeardo> sorry, I think naywhayare already mentioned that
< andrewmw94> yeah, I fixed that, but I don't want to commit, since it will break the build
< andrewmw94> let me turn of compilation so you can see the sources I have
< andrewmw94> ok, it's committed
< andrewmw94> remove the comment on line 16 of neighbor_search.hpp and on lines 275 -279 of all_knn.cpp to get the version that I have
< andrewmw94> sorry, allknn_main.cpp
< naywhayare> ok, let me do that now
< jenkins-mlpack> Starting build #1937 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare> it's gotta be a missing bracket or something
< naywhayare> that's what the errors look like
< naywhayare> but I haven't pinpointed anything
< andrewmw94> ok. I'm going to get lunch, but thanks for the tip
< naywhayare> I'll keep looking
udit_s has quit [Quit: Leaving]
< naywhayare> andrewmw94: the problem is in src/mlpack/core/tree/rectangle_tree.hpp, lines 8 and 9:
< naywhayare> #define __MLPACK_CORE_TREE_RECTINGLE_TREE_RECTANGLE_TREE_HPP
< naywhayare> that's the same macro used in src/mlpack/core/tree/rectangle_tree/rectangle_tree.hpp
< naywhayare> so as a result rectangle_tree.hpp never actually gets included
< naywhayare> anyway, fixing that reveals a whole host of other errors, but those should be easier to solve
< naywhayare> one that I see that I know how to solve is the use of 'RectangleTree&' in RTreeSplit
< naywhayare> you'll have to use the full templated name and make it a templatized function, using RectangleTree<StatisticType, MatType, SplitType, DescentType>
< naywhayare> you can use RectangleTree& in RectangleTreeTraverser because RectangleTreeTraverser is inside of the RectangleTree class, but RTreeSplit is not
oldbeardo has quit [Quit: Page closed]
< jenkins-mlpack> Project mlpack - svn checkin test build #1937: SUCCESS in 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1937/
< jenkins-mlpack> andrewmw94: comment out compilation. Some small changes to support R tree in all_knn
udit_s has joined #mlpack
< udit_s> naywhayare: hello ! how/s it going ?
< naywhayare> pretty good, now that I have relaxed a little bit
< naywhayare> sorry that I didn't respond to your earlier messages in time before you left
< udit_s> did your submission spill over the weekend ?
< naywhayare> no, but catching up on sleep did :)
< udit_s> :D
< naywhayare> I took a look through your code; it looks like it's a lot more flexible now
< naywhayare> I'd like to expand the tests a little bit, but that shouldn't be too hard, then we can integrate it into trunk
< udit_s> yeah, so I did have a few things to talk to you about.
< naywhayare> I haven't yet compiled it and run the tests, though
< naywhayare> ok, go ahead
< udit_s> especially the test cases and unit tests.
< udit_s> I don't have a satisfactory test case to test the code out. And I wanted some suggestions on writing more unit tests - what other extreme cases can we come up with ?
< naywhayare> so the easiest test cases to write are the simple ones
< naywhayare> one good example you already wrote -- ensure that the tree doesn't split when all the labels are from one class
< naywhayare> another idea would be to generate a very simple dataset... values below 0 have class label 0, and values above 0 have class label 1
< naywhayare> then make sure that the split value is somewhere around 0, and that everything is perfectly classified (because a decision stump can do perfect classification for that)
< udit_s> noted. go on.
< udit_s> also, did you go through the concept of buckets ?
< udit_s> was this what you had in mind ?
< naywhayare> another idea is to make sure the binning is working right; pass in a dataset with mixed labels but with fewer points than inpBucketSize and make sure it doesn't split
< udit_s> or something similar ?
< naywhayare> yeah, your idea makes sense
< naywhayare> I haven't gone through the exact code that sets up the bins, but the general idea is what OneR does, I think
< naywhayare> another test is to set up a dataset with three or four classes, where each class falls into a specific non-overlapping range; then, ensure the stump gets perfect classification
< naywhayare> and then the last idea I have is to do the same thing with three or four classes, but make them very slightly overlapping, and check that the classification from the stump is pretty good (but not perfect)
< udit_s> okay - I was lacking these ideas. What do you think about the other stuff, documentation and the likes...
< udit_s> efficiency,
< udit_s> formatting, and the CMakeLists.txt ?
< naywhayare> I'll look at the efficiency of the code when we merge it into trunk. I think there are some improvements that could be done, but it's a lot easier to do them when we have good tests in place
< naywhayare> so that I can make a change, run the test, and make sure I didn't break everything
< naywhayare> the CMake configuration seems just fine
< udit_s> okay.
< naywhayare> the documentation seems fine too, although I'll probably add a lot to the PROGRAM_INFO() macro
< udit_s> okay, strangely, I've been getting this error when I compile decision_stump_main.cpp
< udit_s> undefined reference to symbol '_ZN5boost15program_options8validateERNS_3anyERKSt6vectorISsSaISsEEPSsi'
< naywhayare> add -lboost_program_options
< udit_s> thanks, works fine now.
< udit_s> so, let me get back to you after I have those tests written.
< naywhayare> ok, sounds good. send me a message if you have any problems or questions about writing the tests or about Boost.Test
< udit_s> I actually spent today reading up on perceptrons and was trying to come up with an implementation
< udit_s> so, I had a few points about that too.
< udit_s> But I think that'll only start after tomorrow.
< udit_s> ok then,
udit_s has left #mlpack []
govg has joined #mlpack
< andrewmw94> naywhayare: I'm stuck again. I get an error saying that HRectBound does not name a type, but I definitely included it and the macro for the hrectbound.hpp file is defined. Any idea why? Do you still have to provide something for the template even though all values have default arguments? (I commited the latest code if you want to see. Line 16 of rectangle_tree.hpp should be commented out. It's there to verify that hrectbound.hp
< jenkins-mlpack> Starting build #1938 for job mlpack - svn checkin test (previous build: SUCCESS)
< naywhayare> andrewmw94: it's in namespace bound
< naywhayare> so use bound::HRectBound<>
< andrewmw94> ah. Duh. I removed the template from the BSP tree to make the R tree, since it only uses rectangles, and missed that
< jenkins-mlpack> Project mlpack - svn checkin test build #1938: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1938/
< jenkins-mlpack> andrewmw94: fixed some mistakes. Still commented out to allow build.
govg has quit [Quit: leaving]
< naywhayare> andrewmw94: if you have further problems, feel free to ask. I am here to help :)
< naywhayare> also, as I'm sure you now know (if you didn't already before), debugging complex c++ can be a nightmare...
sumedhghaisas has joined #mlpack
< sumedhghaisas> naywhayare: you there??
< naywhayare> sumedhghaisas: yeah, I am here
< sumedhghaisas> naywhayare: okay... In AMF if residue goes below minResidue... then the loop is stopped...
< naywhayare> right
< sumedhghaisas> but for SVD function this causes a major issue... even the paper mentions that...
< sumedhghaisas> when residue starts to increase we have to prune it..
< sumedhghaisas> all the SVD algorithms mentions this points like.. when residue starts to increase prune...
< naywhayare> by prune, you mean terminate the algorithm?
< sumedhghaisas> yeah... terminate the loop and return the answer...
< naywhayare> you're referring to Algorithm 1, step 2, right? "repeat until the validation RMSE starts to increase"
< sumedhghaisas> yes...
< naywhayare> okay
< naywhayare> I understand the problem
< sumedhghaisas> this is common in all the later algorithms...
< sumedhghaisas> I tested it on couple of datasets.. (subset of MovieLens)
< sumedhghaisas> for movie lens the residue starts to increase from e-6....
< sumedhghaisas> then almost never decrease...
< naywhayare> so one option is to change the condition 'residue > minResidue' to 'lastResidue - residue < tolerance'
< naywhayare> where tolerance is some parameter
< naywhayare> and when residue > lastResidue, that will still terminate (assuming tolerance > 0, which it should be)
< naywhayare> actually, we should probably normalize this... ((lastResidue - residue) / residue) < tolerance
< naywhayare> this way the tolerance parameter doesn't depend on the norms of the matrices
< naywhayare> I think this is a better condition that 'residue > minResidue', because minResidue is a parameter that will need to be set different for every input matrix
< sumedhghaisas> Yeah... you are right normalization is better...
< naywhayare> and if you do it like that, that will solve your convergence problems too
< sumedhghaisas> so it means that residue should decrease by certain percentage with respect to last-tolerance...
< sumedhghaisas> or else the algorithm will stop...
< sumedhghaisas> sounds good...
< sumedhghaisas> ohh that should be (last_residue - residue) / last_residue right??
< naywhayare> sure, that's probably more stable
< sumedhghaisas> this solves all the problems for SVD batch learning... there is one remaining for SVD momentum...
< sumedhghaisas> in that same paper...
< sumedhghaisas> algorithm 4
< sumedhghaisas> step 1
< sumedhghaisas> sorry step 2
< sumedhghaisas> we have to initialize those two matrices... but they are dependent on the dataset...
< sumedhghaisas> either can to take the dataset as parameter in the constructor...
< sumedhghaisas> *have
< sumedhghaisas> so there cannot be a default constructor so AMF constructor wont the way it is defined now...
< sumedhghaisas> *work
< naywhayare> right, they need to be initialized to the same size as W and H, and filled with zeros
< sumedhghaisas> yes... another way.. create a function inside all update rules... can be called initialize...
< sumedhghaisas> AMf will call that function...
< sumedhghaisas> only one time...
< naywhayare> yes, I think that is the right way to do it
< naywhayare> it can't really be done in the constructor unless we specify the rank of the decomposition
< naywhayare> but the rank is specified when Apply() is called
< sumedhghaisas> yeah rank... I did forget that...
< naywhayare> and I'd like to keep it that way, because a user may want to call Apply() multiple times with different rank parameters, but a different object isn't always necessary
< naywhayare> so let's do that -- add an Initialize() function which takes the data matrix V and the rank r (I don't think you need anything more than just that for now)
< sumedhghaisas> Yeah so Initialize(size_t rank, const arma::mat& dataset)... correct??
< naywhayare> yeah, that's fine, although I'd reverse the parameter ordering because Apply() is ordered as 'dataset, rank' not 'rank, dataset' (but that's a very minor issue :))
< sumedhghaisas> yeah...
< sumedhghaisas> okay.. will do that...
< sumedhghaisas> another minor point...
< naywhayare> sure, go ahead
< sumedhghaisas> It will be better to return the residue in Apply function...
< naywhayare> yeah, it returns void right now
< naywhayare> better to return the residue, you are right
< sumedhghaisas> okay thanks... will try to make the commit today only...
< sumedhghaisas> anyways your paper submission... how did it go??
< sumedhghaisas> completed everything???
< naywhayare> I got both papers done and submitted
< naywhayare> at the last minute...
< naywhayare> but it's good to have them done. I don't know if they'll be accepted, but at least I got them submitted, so I'm happy with that
< sumedhghaisas> both?? you submitted two paper?? woah...
< sumedhghaisas> good to hear that :) considering the amount of work you have done... they will definitely be accepted...
< sumedhghaisas> I am learning GIT system side by side... I love it... especially branches... Do you mind if I upload the truck on GITHUB and update it regularly?? working with GIT is really fun...
< sumedhghaisas> *trunk
< naywhayare> try using git2svn: https://github.com/etgryphon/git2svn
< naywhayare> I think that's what marcus uses
< sumedhghaisas> this looks nice... will try that...
< sumedhghaisas> what do you think the default value of tolerance should be ?? 0.01??
< naywhayare> 1e-5 is the usual value I use
< sumedhghaisas> tolerance cannot be lesser than zero and cannot be higher than 1 right??
< naywhayare> but run the NMF tests and make sure that value of 1e-5 doesn't make the tests take forever
< naywhayare> yeah, it should be between 0 and 1
< sumedhghaisas> I dont think it will affect the tests...
< sumedhghaisas> cause minReidue remains the same...
< naywhayare> I thought you were removing minResidue and only using the tolerance
< sumedhghaisas> okay... I will first remove minResidue and check it on MovieLens dataset...
andrewmw94 has quit [Quit: Leaving.]
< sumedhghaisas> naywhayare: Okay NMF and SVD both working fine with tolerance...
< sumedhghaisas> But I have added a condition that loop runs at least 2 time... this i required cause initially I have set oldResidue to DBL_MAX...
< sumedhghaisas> *is
< naywhayare> your condition is (iteration > 1) && ((oldResidue - residue) / oldResidue) < tolerance ?
< naywhayare> if so that seems reasonable to me
< sumedhghaisas> yeah similar....
< sumedhghaisas> its in while...
< sumedhghaisas> so iteration < 4 && ...
< sumedhghaisas> cause at the end of second run.. iteration is 3...
< naywhayare> okay...
< sumedhghaisas> naywhayare: I am getting this error when I ran SVN commit
< sumedhghaisas> sumedh@sumedh-Aspire-5742:~/trunk$ svn commit
< sumedhghaisas> svn: E155015: Commit failed (details follow):
< sumedhghaisas> svn: E155015: Aborting commit: '/home/sumedh/trunk/src/mlpack/core/math/random.hpp' remains in conflict
< sumedhghaisas> I did run svn update as you said...
< naywhayare> did you modify random.hpp?
< sumedhghaisas> I did ... then I used SVN update...
< sumedhghaisas> I thought this will clear things out...
< naywhayare> and it made the file conflicted?
< sumedhghaisas> yes...
< naywhayare> you can do 'svn revert random.hpp'
< naywhayare> or, you should be able to do that
< sumedhghaisas> how can I unstage that file from the commit...
< sumedhghaisas> ??
< sumedhghaisas> sumedh@sumedh-Aspire-5742:~/trunk$ svn revert random.hpp
< sumedhghaisas> Skipped 'random.hpp'
< sumedhghaisas> and still conflict i there
< naywhayare> you can choose to only commit some files with 'svn commit path/to/file1 path/to/file2 path/to/file3 ...'
< sumedhghaisas> okay that would be better....