naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas has quit [Ping timeout: 252 seconds]
< oldbeardo>
naywhayare: just sent you the updated files
< oldbeardo>
naywhayare: hoping to get a review today :)
Anand_ has joined #mlpack
< naywhayare>
oldbeardo: I am looking through it now
< oldbeardo>
okay
< naywhayare>
this looks fine to me, but my main thought is that the CosineTree and CosineNode classes can probably be easily merged
< naywhayare>
but I want to investigate that more before I say it's possible and straightforward :)
< oldbeardo>
yeah, I gave that a thought, I don't think it's that easy
< naywhayare>
let me continue reading and thinking about it, and I'll get back to you on that issue
< oldbeardo>
also, is the output of test.cpp alright?
< oldbeardo>
I mean is that how it's supposed to work?
< naywhayare>
consider writing the test to use the Boost unit test framework and do all of the tests automatically
< naywhayare>
the test you have written definitely ensures that the code compiles and runs, but without any output it's not possible to say whether or not the cosine tree worked
< naywhayare>
testing the cosine tree on its own might be a little difficult; we'd have to think about properties of the tree that are always true
< naywhayare>
testing QUIC-SVD is a little easier -- you just test that the residual || A - \hat{A} ||_F <= some tolerance, where \hat{A} is the matrix reconstructed from the approximate SVD provided by QUIC-SVD
< naywhayare>
so we should try and think of a way to test that the cosine tree has been constructed properly. I'm not sure how to do that; I'll have to do a little reading before I have any ideas
< oldbeardo>
okay, the code actually does give an output
< oldbeardo>
it prints out the Monte Carlo error estimate after each splitting step
< naywhayare>
ok, I missed that
< naywhayare>
so when you look at the Monte Carlo error estimates, is there something in specific you are looking for that could be automated?
< oldbeardo>
I was looking for proof that the implementation works
< oldbeardo>
I thought you will be knowing better about what things to look for to confirm that
< oldbeardo>
but at least for the GroupLens100k dataset the error estimate decreases at each step
< naywhayare>
well, that's one simple test -- ensure that the estimate decreases at each step
< oldbeardo>
okay, how did they test the algorithm when they came up with it?
< naywhayare>
general consensus was 'no citations in -h, BiBTeX citations in Doxygen comments, links to external documentation'
< andrewmw94>
"-h"?
< naywhayare>
mlpack_program -h
< andrewmw94>
ahh
< naywhayare>
when you type that, it gives a bunch of helpful information that's written in the PROGRAM_INFO() macro at the top of every ..._main.cpp file
< naywhayare>
for what you're doing, it shouldn't be something you encounter, I don't think
< andrewmw94>
yeah
< andrewmw94>
"I do enough latex that I read it the same as normal text, so I might be biased"
< andrewmw94>
I like that comment
< naywhayare>
I have a plugin for pidgin that will render latex that's written in IM messages, I love it
< andrewmw94>
Do you happen to know off the top of your head how efficient it is to change columns in an arma::mat?
< andrewmw94>
I assume swapping them is basically as fast as swapping elements in an array?
< naywhayare>
yeah, I'm not sure exactly how it's implemented, but it should be O(d) where d is the number of rows
< naywhayare>
whether it's 2d, 3d, xd, I don't know though; you'd have to look at the implementation
< andrewmw94>
alrighty
< andrewmw94>
kind of a silly question, but should I worry about code that makes what I claim are reasonable assumptions about the tree's parameters?
< andrewmw94>
specifically, splitting a node that only has one point in it would cause the code to crash
< andrewmw94>
but that should only happen if the maximum number of points is 0.
< naywhayare>
with kd-trees this situation is avoided because the function that determines the point index to split on will return -1 when it can't find the point to split on
< naywhayare>
is there no easy way to fit in a check like that into the R tree code?
< naywhayare>
if there actually isn't, I would document the assumption in the function's documentation and then also describe those datasets which will cause the construction algorithm to crash in the R tree class documentation
< andrewmw94>
well, the algorithm in the original paper is to find the worst pair of points to place in the same rectangle
< andrewmw94>
which doesn't work if there's only one point
< naywhayare>
so this situation only occurs when the dataset has only one point?
< andrewmw94>
no, that's fine since it doesn't split. It would occur if leafSize is set to 0.
< andrewmw94>
or something less than 0 I guess.
< andrewmw94>
where leafSize, as in the BSP tree, is the maximum number of points in each leaf
< naywhayare>
oh, ok. I would hope no user is dumb enough to set leafSize to 0
< andrewmw94>
I hope
< naywhayare>
so I think your assumption is perfectly reasonable then
< naywhayare>
you can write about that assumption in the documentation if you like, but honestly, I think it's not necessary; any user who actually reads what the leafSize parameter is should be able to reasonably deduce that the leaf size should be greater than 0
< naywhayare>
otherwise the tree structure doesn't make any sense
< andrewmw94>
I'll have a boost::assert just in case, but yeah. It's more of a, "something could theoretically cause this to crash" than something I would think could happen
< naywhayare>
there's also Log::Assert, but that function is mostly underused and it's unclear why that should be chosen over just assert() or boost::assert() or whatever else
< naywhayare>
the idea was that it would print a backtrace too... and I think it does this on some systems, but it doesn't give you a line number or human-understandable error message or anything
< naywhayare>
I don't think anyone has had the time, motivation, or even recognized that this was a problem since Log::Assert was originally written, though
< andrewmw94>
ahh. So I should use that in the vain hope that it will sometime be fixed "for the next release" ?
< naywhayare>
also how to solve it or whether Log::Assert can provide anything over other functionality is unclear too
< naywhayare>
sure, I guess, it's probably a good idea to use it, but if you use assert() or boost::assert() that's not a huge problem either
< naywhayare>
all three have the intended effect -- when the program is compiled with debugging symbols, it stops instead of segfaulting
sumedhghaisas has quit [Ping timeout: 245 seconds]
andrewmw94 has left #mlpack []
< jenkins-mlpack>
Starting build #1928 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack>
andrewmw94: added code for accessing immediate child nodes (need to think of a way to rename this to be less confusing). Some more quasi-code to split nodes and insert points.