#mlpack on 2014-05-30 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:20 sumedhghaisas has quit [Ping timeout: 252 seconds]

03:51 Anand has joined #mlpack

05:34 < jenkins-mlpack> Project mlpack - nightly matrix build build #470: STILL UNSTABLE in 1 hr 34 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/470/

06:40 Anand has quit [Ping timeout: 240 seconds]

11:32 sumedhghaisas has joined #mlpack

12:04 andrewmw94 has joined #mlpack

12:10 < jenkins-mlpack> Starting build #1926 for job mlpack - svn checkin test (previous build: SUCCESS)

12:27 sumedhghaisas has quit [Ping timeout: 240 seconds]

12:43 < jenkins-mlpack> Project mlpack - svn checkin test build #1926: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1926/

12:43 < jenkins-mlpack> andrewmw94: more R Tree stuff

13:58 oldbeardo has joined #mlpack

14:03 < oldbeardo> naywhayare: just sent you the updated files

15:36 < oldbeardo> naywhayare: hoping to get a review today :)

15:54 Anand_ has joined #mlpack

16:17 < naywhayare> oldbeardo: I am looking through it now

16:22 < oldbeardo> okay

16:22 < naywhayare> this looks fine to me, but my main thought is that the CosineTree and CosineNode classes can probably be easily merged

16:22 < naywhayare> but I want to investigate that more before I say it's possible and straightforward :)

16:23 < oldbeardo> yeah, I gave that a thought, I don't think it's that easy

16:23 < naywhayare> let me continue reading and thinking about it, and I'll get back to you on that issue

16:23 < oldbeardo> also, is the output of test.cpp alright?

16:23 < oldbeardo> I mean is that how it's supposed to work?

16:25 < naywhayare> consider writing the test to use the Boost unit test framework and do all of the tests automatically

16:26 < naywhayare> the test you have written definitely ensures that the code compiles and runs, but without any output it's not possible to say whether or not the cosine tree worked

16:26 < naywhayare> testing the cosine tree on its own might be a little difficult; we'd have to think about properties of the tree that are always true

16:27 < naywhayare> testing QUIC-SVD is a little easier -- you just test that the residual || A - \hat{A} ||_F <= some tolerance, where \hat{A} is the matrix reconstructed from the approximate SVD provided by QUIC-SVD

16:28 < naywhayare> so we should try and think of a way to test that the cosine tree has been constructed properly. I'm not sure how to do that; I'll have to do a little reading before I have any ideas

16:39 < oldbeardo> okay, the code actually does give an output

16:40 < oldbeardo> it prints out the Monte Carlo error estimate after each splitting step

16:46 < naywhayare> ok, I missed that

16:47 < naywhayare> so when you look at the Monte Carlo error estimates, is there something in specific you are looking for that could be automated?

16:47 < oldbeardo> I was looking for proof that the implementation works

16:48 < oldbeardo> I thought you will be knowing better about what things to look for to confirm that

16:49 < oldbeardo> but at least for the GroupLens100k dataset the error estimate decreases at each step

16:49 < naywhayare> well, that's one simple test -- ensure that the estimate decreases at each step

16:54 < oldbeardo> okay, how did they test the algorithm when they came up with it?

16:56 < naywhayare> very poorly: http://mlpack.org/trac/browser/tags/mlpack-0.4/mlpack/quicsvd/quicsvd.h

16:56 < naywhayare> and they didn't test the cosine tree at all

17:02 < oldbeardo> okay, so what should I do?

17:02 < naywhayare> you should find properties of the cosine tree that you can use to test it and guarantee that your implementation works

17:03 < naywhayare> one place to start is to ensure that the error estimate decreases at every step

17:04 < oldbeardo> well, apart from that I'm quite clueless, that's the only test I could come up with

17:46 sumedhghaisas has joined #mlpack

18:18 < oldbeardo> sumedhghaisas: did you make any changes to 'cf.hpp'?

18:19 < sumedhghaisas> yes... I just changed its base from NMF to AMF(Alternating Matrix factorization)

18:20 < sumedhghaisas> oldbeardo: Anything went wrong??

18:20 Anand_ has quit [Ping timeout: 240 seconds]

18:21 < oldbeardo> aah, okay

18:21 < oldbeardo> you have written AMF?

18:21 < sumedhghaisas> yes... just a slight change from NMF...

18:22 < oldbeardo> because some of the files have the author name as Rajendran Mohan

18:23 < oldbeardo> did you also include <set>, <map> and <iostream>?

18:23 < sumedhghaisas> Ohh... cause they are from NMF update rules only... I have just modified them accordingly...

18:23 < sumedhghaisas> No...

18:23 < sumedhghaisas> wait let me see...

18:24 < sumedhghaisas> Ohh... my mistake... I was doing some experiments and forgot to remove these includes...

18:24 < sumedhghaisas> will remove it in the next commit...

18:25 < oldbeardo> yup, sure, was wondering what purpose they served

18:26 < sumedhghaisas> QUICK SVD... is it a technique based on alternating updating??

18:26 < sumedhghaisas> *updation

18:27 < sumedhghaisas> otherwise a better abstraction is needed for CF module...

18:30 < oldbeardo> yes, we would be needing that, QUIC-SVD is not really an optimization algorithm in itself

18:32 < sumedhghaisas> okay... I am sure we will find a good way when working code is ready....

18:33 < oldbeardo> yeah, let's hope so

18:34 oldbeardo has quit [Quit: Page closed]

20:38 < naywhayare> andrewmw94: any chance you can make your commit messages more descriptive? :)

20:39 < naywhayare> I can see you're actually doing implementation now (and have been for a handful of days) not just setting up the files

20:42 < jenkins-mlpack> Starting build #1927 for job mlpack - svn checkin test (previous build: SUCCESS)

21:02 < andrewmw94> yeah. I'll try, but the stuff is so broad I didn't think it would help that much

21:02 < andrewmw94> I defined the interface and jump from place to place as I realize what things I forgot

21:03 < andrewmw94> Also, random question, is there a standard way you give reference papers in the code?

21:03 < andrewmw94> I'm slightly modifying the R-Tree algorithms because we only use points, but I still thought I should reference the paper

21:04 < andrewmw94> for now, it's just a comment in one of the files

21:07 < naywhayare> this actually came up as a ticket once, let me find it

21:07 < naywhayare> http://www.mlpack.org/trac/ticket/201

21:07 < naywhayare> general consensus was 'no citations in -h, BiBTeX citations in Doxygen comments, links to external documentation'

21:08 < andrewmw94> "-h"?

21:08 < naywhayare> mlpack_program -h

21:08 < andrewmw94> ahh

21:09 < naywhayare> when you type that, it gives a bunch of helpful information that's written in the PROGRAM_INFO() macro at the top of every ..._main.cpp file

21:09 < naywhayare> for what you're doing, it shouldn't be something you encounter, I don't think

21:09 < andrewmw94> yeah

21:10 < andrewmw94> "I do enough latex that I read it the same as normal text, so I might be biased"

21:10 < andrewmw94> I like that comment

21:11 < naywhayare> I have a plugin for pidgin that will render latex that's written in IM messages, I love it

21:15 < jenkins-mlpack> Project mlpack - svn checkin test build #1927: SUCCESS in 33 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1927/

21:15 < jenkins-mlpack> andrewmw94: R tree stuff

21:50 < andrewmw94> Do you happen to know off the top of your head how efficient it is to change columns in an arma::mat?

21:51 < andrewmw94> I assume swapping them is basically as fast as swapping elements in an array?

21:51 < naywhayare> yeah, I'm not sure exactly how it's implemented, but it should be O(d) where d is the number of rows

21:52 < naywhayare> whether it's 2d, 3d, xd, I don't know though; you'd have to look at the implementation

21:52 < andrewmw94> alrighty

22:09 < andrewmw94> kind of a silly question, but should I worry about code that makes what I claim are reasonable assumptions about the tree's parameters?

22:09 < andrewmw94> specifically, splitting a node that only has one point in it would cause the code to crash

22:09 < andrewmw94> but that should only happen if the maximum number of points is 0.

22:13 < naywhayare> with kd-trees this situation is avoided because the function that determines the point index to split on will return -1 when it can't find the point to split on

22:14 < naywhayare> is there no easy way to fit in a check like that into the R tree code?

22:14 < naywhayare> if there actually isn't, I would document the assumption in the function's documentation and then also describe those datasets which will cause the construction algorithm to crash in the R tree class documentation

22:14 < andrewmw94> well, the algorithm in the original paper is to find the worst pair of points to place in the same rectangle

22:14 < andrewmw94> which doesn't work if there's only one point

22:15 < naywhayare> so this situation only occurs when the dataset has only one point?

22:15 < andrewmw94> no, that's fine since it doesn't split. It would occur if leafSize is set to 0.

22:16 < andrewmw94> or something less than 0 I guess.

22:16 < andrewmw94> where leafSize, as in the BSP tree, is the maximum number of points in each leaf

22:16 < naywhayare> oh, ok. I would hope no user is dumb enough to set leafSize to 0

22:16 < andrewmw94> I hope

22:16 < naywhayare> so I think your assumption is perfectly reasonable then

22:17 < naywhayare> you can write about that assumption in the documentation if you like, but honestly, I think it's not necessary; any user who actually reads what the leafSize parameter is should be able to reasonably deduce that the leaf size should be greater than 0

22:17 < naywhayare> otherwise the tree structure doesn't make any sense

22:18 < andrewmw94> I'll have a boost::assert just in case, but yeah. It's more of a, "something could theoretically cause this to crash" than something I would think could happen

22:19 < naywhayare> there's also Log::Assert, but that function is mostly underused and it's unclear why that should be chosen over just assert() or boost::assert() or whatever else

22:19 < naywhayare> the idea was that it would print a backtrace too... and I think it does this on some systems, but it doesn't give you a line number or human-understandable error message or anything

22:20 < naywhayare> I don't think anyone has had the time, motivation, or even recognized that this was a problem since Log::Assert was originally written, though

22:22 < andrewmw94> ahh. So I should use that in the vain hope that it will sometime be fixed "for the next release" ?

22:22 < naywhayare> also how to solve it or whether Log::Assert can provide anything over other functionality is unclear too

22:22 < naywhayare> sure, I guess, it's probably a good idea to use it, but if you use assert() or boost::assert() that's not a huge problem either

22:22 < naywhayare> all three have the intended effect -- when the program is compiled with debugging symbols, it stops instead of segfaulting

22:42 sumedhghaisas has quit [Ping timeout: 245 seconds]

22:43 andrewmw94 has left #mlpack []

22:48 < jenkins-mlpack> Starting build #1928 for job mlpack - svn checkin test (previous build: SUCCESS)

23:21 < jenkins-mlpack> Project mlpack - svn checkin test build #1928: SUCCESS in 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/1928/

23:21 < jenkins-mlpack> andrewmw94: added code for accessing immediate child nodes (need to think of a way to rename this to be less confusing). Some more quasi-code to split nodes and insert points.