naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 252 seconds]
Anand has joined #mlpack
< jenkins-mlpack> Project mlpack - nightly matrix build build #470: STILL UNSTABLE in 1 hr 34 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/470/
Anand has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
andrewmw94 has joined #mlpack
< jenkins-mlpack> Starting build #1926 for job mlpack - svn checkin test (previous build: SUCCESS)
sumedhghaisas has quit [Ping timeout: 240 seconds]
< jenkins-mlpack> Project mlpack - svn checkin test build #1926: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1926/
< jenkins-mlpack> andrewmw94: more R Tree stuff
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: just sent you the updated files
< oldbeardo> naywhayare: hoping to get a review today :)
Anand_ has joined #mlpack
< naywhayare> oldbeardo: I am looking through it now
< oldbeardo> okay
< naywhayare> this looks fine to me, but my main thought is that the CosineTree and CosineNode classes can probably be easily merged
< naywhayare> but I want to investigate that more before I say it's possible and straightforward :)
< oldbeardo> yeah, I gave that a thought, I don't think it's that easy
< naywhayare> let me continue reading and thinking about it, and I'll get back to you on that issue
< oldbeardo> also, is the output of test.cpp alright?
< oldbeardo> I mean is that how it's supposed to work?
< naywhayare> consider writing the test to use the Boost unit test framework and do all of the tests automatically
< naywhayare> the test you have written definitely ensures that the code compiles and runs, but without any output it's not possible to say whether or not the cosine tree worked
< naywhayare> testing the cosine tree on its own might be a little difficult; we'd have to think about properties of the tree that are always true
< naywhayare> testing QUIC-SVD is a little easier -- you just test that the residual || A - \hat{A} ||_F <= some tolerance, where \hat{A} is the matrix reconstructed from the approximate SVD provided by QUIC-SVD
< naywhayare> so we should try and think of a way to test that the cosine tree has been constructed properly. I'm not sure how to do that; I'll have to do a little reading before I have any ideas
< oldbeardo> okay, the code actually does give an output
< oldbeardo> it prints out the Monte Carlo error estimate after each splitting step
< naywhayare> ok, I missed that
< naywhayare> so when you look at the Monte Carlo error estimates, is there something in specific you are looking for that could be automated?
< oldbeardo> I was looking for proof that the implementation works
< oldbeardo> I thought you will be knowing better about what things to look for to confirm that
< oldbeardo> but at least for the GroupLens100k dataset the error estimate decreases at each step
< naywhayare> well, that's one simple test -- ensure that the estimate decreases at each step
< oldbeardo> okay, how did they test the algorithm when they came up with it?
< naywhayare> and they didn't test the cosine tree at all
< oldbeardo> okay, so what should I do?
< naywhayare> you should find properties of the cosine tree that you can use to test it and guarantee that your implementation works
< naywhayare> one place to start is to ensure that the error estimate decreases at every step
< oldbeardo> well, apart from that I'm quite clueless, that's the only test I could come up with
sumedhghaisas has joined #mlpack
< oldbeardo> sumedhghaisas: did you make any changes to 'cf.hpp'?
< sumedhghaisas> yes... I just changed its base from NMF to AMF(Alternating Matrix factorization)
< sumedhghaisas> oldbeardo: Anything went wrong??
Anand_ has quit [Ping timeout: 240 seconds]
< oldbeardo> aah, okay
< oldbeardo> you have written AMF?
< sumedhghaisas> yes... just a slight change from NMF...
< oldbeardo> because some of the files have the author name as Rajendran Mohan
< oldbeardo> did you also include <set>, <map> and <iostream>?
< sumedhghaisas> Ohh... cause they are from NMF update rules only... I have just modified them accordingly...
< sumedhghaisas> No...
< sumedhghaisas> wait let me see...
< sumedhghaisas> Ohh... my mistake... I was doing some experiments and forgot to remove these includes...
< sumedhghaisas> will remove it in the next commit...
< oldbeardo> yup, sure, was wondering what purpose they served
< sumedhghaisas> QUICK SVD... is it a technique based on alternating updating??
< sumedhghaisas> *updation
< sumedhghaisas> otherwise a better abstraction is needed for CF module...
< oldbeardo> yes, we would be needing that, QUIC-SVD is not really an optimization algorithm in itself
< sumedhghaisas> okay... I am sure we will find a good way when working code is ready....
< oldbeardo> yeah, let's hope so
oldbeardo has quit [Quit: Page closed]
< naywhayare> andrewmw94: any chance you can make your commit messages more descriptive? :)
< naywhayare> I can see you're actually doing implementation now (and have been for a handful of days) not just setting up the files
< jenkins-mlpack> Starting build #1927 for job mlpack - svn checkin test (previous build: SUCCESS)
< andrewmw94> yeah. I'll try, but the stuff is so broad I didn't think it would help that much
< andrewmw94> I defined the interface and jump from place to place as I realize what things I forgot
< andrewmw94> Also, random question, is there a standard way you give reference papers in the code?
< andrewmw94> I'm slightly modifying the R-Tree algorithms because we only use points, but I still thought I should reference the paper
< andrewmw94> for now, it's just a comment in one of the files
< naywhayare> this actually came up as a ticket once, let me find it
< naywhayare> general consensus was 'no citations in -h, BiBTeX citations in Doxygen comments, links to external documentation'
< andrewmw94> "-h"?
< naywhayare> mlpack_program -h
< andrewmw94> ahh
< naywhayare> when you type that, it gives a bunch of helpful information that's written in the PROGRAM_INFO() macro at the top of every ..._main.cpp file
< naywhayare> for what you're doing, it shouldn't be something you encounter, I don't think
< andrewmw94> yeah
< andrewmw94> "I do enough latex that I read it the same as normal text, so I might be biased"
< andrewmw94> I like that comment
< naywhayare> I have a plugin for pidgin that will render latex that's written in IM messages, I love it
< jenkins-mlpack> Project mlpack - svn checkin test build #1927: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1927/
< jenkins-mlpack> andrewmw94: R tree stuff
< andrewmw94> Do you happen to know off the top of your head how efficient it is to change columns in an arma::mat?
< andrewmw94> I assume swapping them is basically as fast as swapping elements in an array?
< naywhayare> yeah, I'm not sure exactly how it's implemented, but it should be O(d) where d is the number of rows
< naywhayare> whether it's 2d, 3d, xd, I don't know though; you'd have to look at the implementation
< andrewmw94> alrighty
< andrewmw94> kind of a silly question, but should I worry about code that makes what I claim are reasonable assumptions about the tree's parameters?
< andrewmw94> specifically, splitting a node that only has one point in it would cause the code to crash
< andrewmw94> but that should only happen if the maximum number of points is 0.
< naywhayare> with kd-trees this situation is avoided because the function that determines the point index to split on will return -1 when it can't find the point to split on
< naywhayare> is there no easy way to fit in a check like that into the R tree code?
< naywhayare> if there actually isn't, I would document the assumption in the function's documentation and then also describe those datasets which will cause the construction algorithm to crash in the R tree class documentation
< andrewmw94> well, the algorithm in the original paper is to find the worst pair of points to place in the same rectangle
< andrewmw94> which doesn't work if there's only one point
< naywhayare> so this situation only occurs when the dataset has only one point?
< andrewmw94> no, that's fine since it doesn't split. It would occur if leafSize is set to 0.
< andrewmw94> or something less than 0 I guess.
< andrewmw94> where leafSize, as in the BSP tree, is the maximum number of points in each leaf
< naywhayare> oh, ok. I would hope no user is dumb enough to set leafSize to 0
< andrewmw94> I hope
< naywhayare> so I think your assumption is perfectly reasonable then
< naywhayare> you can write about that assumption in the documentation if you like, but honestly, I think it's not necessary; any user who actually reads what the leafSize parameter is should be able to reasonably deduce that the leaf size should be greater than 0
< naywhayare> otherwise the tree structure doesn't make any sense
< andrewmw94> I'll have a boost::assert just in case, but yeah. It's more of a, "something could theoretically cause this to crash" than something I would think could happen
< naywhayare> there's also Log::Assert, but that function is mostly underused and it's unclear why that should be chosen over just assert() or boost::assert() or whatever else
< naywhayare> the idea was that it would print a backtrace too... and I think it does this on some systems, but it doesn't give you a line number or human-understandable error message or anything
< naywhayare> I don't think anyone has had the time, motivation, or even recognized that this was a problem since Log::Assert was originally written, though
< andrewmw94> ahh. So I should use that in the vain hope that it will sometime be fixed "for the next release" ?
< naywhayare> also how to solve it or whether Log::Assert can provide anything over other functionality is unclear too
< naywhayare> sure, I guess, it's probably a good idea to use it, but if you use assert() or boost::assert() that's not a huge problem either
< naywhayare> all three have the intended effect -- when the program is compiled with debugging symbols, it stops instead of segfaulting
sumedhghaisas has quit [Ping timeout: 245 seconds]
andrewmw94 has left #mlpack []
< jenkins-mlpack> Starting build #1928 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack> Project mlpack - svn checkin test build #1928: SUCCESS in 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1928/
< jenkins-mlpack> andrewmw94: added code for accessing immediate child nodes (need to think of a way to rename this to be less confusing). Some more quasi-code to split nodes and insert points.