#mlpack on 2014-08-07 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:17 < jenkins-mlpack> Project mlpack - svn checkin test build #2080: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2080/

00:17 < jenkins-mlpack> sumedhghaisas: * minor changes

01:54 jbc__ has joined #mlpack

02:04 < jbc__> naywhayare: yeah, so I think what happens in that test case is the probabilities of the outcomes are 50/50 (getting rid of non-unique) and because it’s equal probability the classifier just uses the first index of the maximum probability - hence getting stuck on zero

02:05 < jbc__> as that’s the first label, and each label has equal prob

03:45 < jenkins-mlpack> Starting build #2081 for job mlpack - svn checkin test (previous build: SUCCESS)

04:30 govg_ has joined #mlpack

05:16 < jenkins-mlpack> Project mlpack - svn checkin test build #2081: SUCCESS in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2081/

05:16 < jenkins-mlpack> michaelfox99: GMM::Save() now adds type information to XML files

06:15 < jenkins-mlpack> Project mlpack - nightly matrix build build #552: STILL FAILING in 2 hr 15 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/552/

06:15 < jenkins-mlpack> * michaelfox99: phi.hpp no longer used, see GaussianDistribution::Probability()

06:15 < jenkins-mlpack> * michaelfox99: GMM::Save() now adds type information to XML files

06:15 < jenkins-mlpack> * sumedhghaisas: * minor changes

06:15 < jenkins-mlpack> * sumedhghaisas: * modified PlainSVD module to return normalized frobenius norm

06:15 < jenkins-mlpack> * modified PlainSVD tests

06:15 < jenkins-mlpack> * added row_col_iterator operator-- tests

06:15 < jenkins-mlpack> * sumedhghaisas: * changed row_col_iterator::operator-- implementation

06:15 < jenkins-mlpack> * added documentation to termination policies

06:15 < jenkins-mlpack> * minor fix of PlainSVD module

06:15 < jenkins-mlpack> * Ryan Curtin: Oops, I fixed this backwards. The actual error is the output.

06:15 < jenkins-mlpack> * Ryan Curtin: Fix incorrect check (how did this happen?).

07:06 < jenkins-mlpack> Starting build #2082 for job mlpack - svn checkin test (previous build: SUCCESS)

07:11 < jenkins-mlpack> Project mlpack - svn checkin test build #2082: FAILURE in 4 min 45 sec: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2082/

07:11 < jenkins-mlpack> michaelfox99: phi.hpp no longer used, see GaussianDistribution::Probability()

09:18 sumedhghaisas has joined #mlpack

09:26 jenkins-mlpack has quit [Read error: Connection reset by peer]

09:43 jenkins-mlpack has joined #mlpack

10:05 jenkins-mlpack has quit [Read error: Connection reset by peer]

10:11 jenkins-mlpack has joined #mlpack

11:46 sumedhghaisas has quit [Remote host closed the connection]

12:37 < jenkins-mlpack> Starting build #2083 for job mlpack - svn checkin test (previous build: FAILURE -- last SUCCESS #2081 8 hr 51 min ago)

12:47 < marcus_zoq> naywhayare: hm, the benchmark job failed again

14:07 < jenkins-mlpack> Yippie, build fixed!

14:07 < jenkins-mlpack> Project mlpack - svn checkin test build #2083: FIXED in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2083/

14:07 < jenkins-mlpack> michaelfox99: removed #include <mlpack/methods/gmm/phi.hpp> which included only unused functions

14:40 < naywhayare> marcus_zoq: I am not sure why it failed\

14:40 < naywhayare> FATAL: Unable to delete script file /tmp/hudson4018934327994967650.sh

14:40 < naywhayare> but I can't come up with a reason for why that happened

14:40 < naywhayare> I can remove those hudson* scripts as jenkins after destroying the kerberos ticket

14:43 < marcus_zoq> naywhayare: Okay, It's worth a try.

14:43 < naywhayare> I found this: https://issues.jenkins-ci.org/browse/JENKINS-12235

14:44 < naywhayare> most of the comments are unhelpful, but some people seem to suggest that the issue is that the ssh connection sits idle and times out because the job doesn't produce any output

14:44 < naywhayare> somewhere in there, someone suggests adding ClientAliveInterval 60 to the sshd_config of the slave in question

14:44 < naywhayare> so I've tried doing that for shoeshine, and we can see if it works

14:44 < marcus_zoq> Okay, sounds like a good plan.

14:45 < naywhayare> do you want to restart the job now, or should we make other changes too?

14:47 < marcus_zoq> naywhayare: I think we need to kill the benchmark process before restarting.

14:47 < naywhayare> oh... you're right, it's still running in the background

14:48 < naywhayare> ok, I've killed all associated processes

14:49 < marcus_zoq> naywhayare: Okay, Thanks! I restart the job.

15:25 < jenkins-mlpack> Starting build #2084 for job mlpack - svn checkin test (previous build: FIXED)

15:27 andrewmw94 has joined #mlpack

15:28 < naywhayare> andrewmw94: I hope you have nice ways of filtering out jenkins emails... sorry about all those. :(

15:29 < andrewmw94> no problem. I can just sort by recipient. I caught some (I hope small) ailment on the flight so I didn't notice them until now.

15:30 < andrewmw94> Hopefully being tired won't introduce too many bugs :)

15:32 < naywhayare> :)

15:33 < naywhayare> I need to figure out how to get jenkins to send just one email when things in the matrix build are broken

15:35 < naywhayare> instead of one for every single possible broken configuration... which will only get worse, exponentially, as I get around to adding boost versions and compiler versions to the matrix

15:35 < naywhayare> (the end result of this is that every system in the build farm is at 100% load every second of every day, building configurations of mlpack that nobody even uses anyway, I guess)

15:37 < andrewmw94> yeah. That sounds complicated. I'm not sure how you can guarantee you are seeing the same bug in each configuration. I guess you could just make an assumption like "If revision XXXX is failing test _____, on N configurations, only report it once.

15:39 < naywhayare> yeah, I mean, ideally I want an email that just says "you broke the build on this large list of configurations: (plus a link to the build log of each failing configuration)"

16:56 < jenkins-mlpack> Project mlpack - svn checkin test build #2084: SUCCESS in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2084/

16:56 < jenkins-mlpack> * Ryan Curtin: Some formatting fixes.

16:56 < jenkins-mlpack> * Ryan Curtin: Change names of functions.

16:56 < jenkins-mlpack> * Ryan Curtin: Add some const and fix some formatting.

16:56 < jenkins-mlpack> * Ryan Curtin: Add header comments, fix header guard naming to be in line with the rest of the

16:56 < jenkins-mlpack> files.

16:56 < jenkins-mlpack> * Ryan Curtin: Add header comments and clean up BiBTeX citation a bit.

16:56 < jenkins-mlpack> Starting build #2085 for job mlpack - svn checkin test (previous build: SUCCESS)

17:25 Anand has joined #mlpack

17:44 Anand has quit [Ping timeout: 246 seconds]

18:28 < jenkins-mlpack> Project mlpack - svn checkin test build #2085: SUCCESS in 1 hr 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2085/

18:28 < jenkins-mlpack> Ryan Curtin: Simplify a matrix calculation.

20:13 < jenkins-mlpack> Project mlpack - nightly matrix build build #553: STILL FAILING in 5 hr 27 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/553/

20:13 < jenkins-mlpack> michaelfox99: removed #include <mlpack/methods/gmm/phi.hpp> which included only unused functions

21:49 jbc__ has quit [Quit: jbc__]

21:51 jbc__ has joined #mlpack

21:52 jbc__ has quit [Client Quit]

21:57 < andrewmw94> naywhayare:

21:57 < andrewmw94> I have a design question about X trees

21:58 < naywhayare> sure, go ahead

21:58 < andrewmw94> Ok, so the paper calls for storing a binary tree in each node to keep track of the past split dimensions

21:58 < andrewmw94> (X trees and R* trees have a dimension along which the split is performed)

21:59 < naywhayare> blah, what's the name of the paper again? "X tree" isn't a good search term

21:59 < andrewmw94> http://www.dbs.ifi.lmu.de/Publikationen/Papers/x-tree.ps

21:59 < naywhayare> ha... ps. how archaic

21:59 < andrewmw94> as a side note, is there a reason these papers are usually made by germans?

21:59 < andrewmw94> yeah, haven't seen a postscript file in a long time

22:00 < naywhayare> I dunno, marcus_zoq might? maybe databases are especially important in Germany?

22:00 < andrewmw94> yeah, I don't know. I just noticed it when I was reading this paper.

22:01 < andrewmw94> anyways, the relevant section is section 3.3

22:01 < naywhayare> seems like it's mostly the folks who did the original R tree

22:01 < andrewmw94> there are a lot of superflous proofs with the important information spread around them. I didn't like the setup here.

22:02 < naywhayare> sounds like the papers I write :)

22:02 < andrewmw94> it seems like they could have summarized the whole algorithm in half a page, but then I guess there isn't enough to publish

22:02 < andrewmw94> though they did have a lot of tests. I just wish they would separate them so I didn't have to read proofs of things that I find obvious

22:04 < naywhayare> it may be that the audience of the conference they submitted it to couldn't be assumed to be familiar with the logic of their proofs and reasoning, so they had to be explicit

22:04 < naywhayare> I run into this when I submit dual-tree algorithms papers... the reviewers are always saying "I have no idea what dual-tree algorithms are, and I can't understand any of this"

22:05 < naywhayare> so either reviewers aren't interested in reading references to get up to speed, or, my writing is too dense. not yet sure which

22:05 < naywhayare> anyway, I am looking at section 3.3

22:05 < andrewmw94> hmm, interesting. I've often wondered why people can't seem to follow my proofs

22:05 < andrewmw94> maybe I should look into this more

22:06 < naywhayare> I always hand my proofs to people with minimal background in what I do and try to see if they can follow it. if not, I try to simplify it

22:07 < naywhayare> (I'm not always successful with the simplification)

22:07 < naywhayare> I see what you mean about the verbosity... on page 8 one paragraph starts with

22:08 < naywhayare> According to Lemma 1, for finding an overlap-free split we have to determine a dimension according to which all MBRs of S have been split previously.

22:08 < naywhayare> then, two paragraphs later,

22:08 < naywhayare> According to Lemma 1, we may find an overlap-free split if there is a dimension according to which all MBRs of S have been split.

22:08 < naywhayare> although they are finding overlap-free splits, their paper is not overlap-free :)

22:09 < andrewmw94> haha. Indeed.

22:09 < andrewmw94> I think I have other problems with my proofs. I had a graph theory class last semester where I got perfect scores on most tests, but practically all of them required talking to the teacher to explain what I wrote. Though the teacher didn't speak English natively, so that may have been part of the problem.

22:10 < naywhayare> verbosity is part of a solution, but on a test you're time-constrained so that may not be the right answer

22:10 < naywhayare> it's hard to write a good proof, because the reader may be from wildly varing backgrounds, and so the same logical jumps that you make effortlessly may not be obvious to them

22:11 < naywhayare> and vice versa too, so you may end up writing a proof that seems overly verbose and obvious to some others

22:11 < andrewmw94> yeah. That's what really bugs me. I'm reading some books in symbolic logic, and one of them has proofs for the simplest things.

22:12 < naywhayare> perhaps the intended audience has background knowledge lower than yours?

22:12 < andrewmw94> a paraphrased example in programming lingo: If two strings are identical, the ith character of each string is identical

22:13 < andrewmw94> I doubt it.

22:13 < andrewmw94> It's a graduate level book.

22:13 < naywhayare> ah, yeah, that type of clarity. very verbose, but it also leaves no room for interpretation

22:14 < andrewmw94> yeah. I find it silly. If I didn't see that immediately why would you think I could follow your proof?

22:14 < andrewmw94> actually, I usually can see the result immediately but can't follow the proof ;)

22:14 < andrewmw94> but maybe they just want to introduce the terminology that way or something. I don't know

22:15 < naywhayare> yeah, the difference between intuitive understanding of results and rigorous proofs of them is often quite large

22:15 < naywhayare> for instance, the O(N) runtime proofs for dual-tree nearest neighbor search were hypothesized in 2005 or 2006, but it took another two or three years of hard work to actually prove it

22:16 < naywhayare> anyway, I have to get up and go soon... what was your question?

22:16 < naywhayare> was it how to store this binary tree of previous split dimensions?

22:17 < andrewmw94> yeah, should I do that in the RectangleTree

22:17 < naywhayare> does the RectangleTree already store its split dimension?

22:17 < andrewmw94> or do I need to create a new class for the X tree, where that is the only difference

22:17 < andrewmw94> no, it would only be used by the X tree.

22:17 < andrewmw94> on the other hand, most users will want to use the X tree

22:18 < andrewmw94> according to the authors at least, it should be better than the others whet D > 8 or so

22:18 < andrewmw94> where D is the number of dimensions

22:19 < naywhayare> if the tree stores its split dimension, then you'd already be storing that entire tree implicitly and you could just look at the split dimensions of all the parents

22:19 < naywhayare> however, if it doesn't, that makes things a bit more difficult

22:19 < andrewmw94> Yeah, that's not stored currently.

22:20 < naywhayare> I think at this point the easiest thing to do is just have the RectangleTree nodes store their split dimension, like the BinarySpaceTree, and then use that

22:20 < naywhayare> it'd require minor refactoring of the existing RectangleTree splitting classes though

22:21 < naywhayare> unless you have a better idea; that's all I can come up with on short notice

22:21 < andrewmw94> Yeah, that's the best I have so far too

22:21 < naywhayare> the alternative seems to be either making a different class entirely for the X tree, or refactoring the DescentType API very seriously (which does not sound easy or fun)

22:22 < andrewmw94> agreed

22:22 < naywhayare> one alternate, and I don't know how much work it would be, is to allow the RectangleTree to be created with an instantiated DescentType object

22:22 < naywhayare> instead of just relying on static functions of DescentType

22:22 < naywhayare> and then build the binary tree with the split dimensions inside of XTreeSplit, storing the tree as a member of XTreeSplit

22:23 < naywhayare> that would allow you to keep the split dimension out of the RectangleTree class and also to not store that tree after construction time (since it isn't necessary after you make the tree)

22:23 < andrewmw94> yeah, I thought about that. I'm not sure how I would efficiently map between nodes in that class and nodes of the RectangleTree

22:24 < andrewmw94> I'm also not sure about the "unused after construction" assumption. One of the main goals of R*trees and X trees seems to be the ability to dynamically build the tree

22:24 < naywhayare> ah. in that case, I think that keeping the split dimension stored in the node is the best idea

22:24 < andrewmw94> ok. Thanks for the help

22:25 < naywhayare> and the only other idea I can think of is to add another auxiliary class (kind of like StatisticType) that just holds whatever else a particular tree variant needs

22:25 < naywhayare> so, for example, the X tree would hold the XTreeExtraData struct, or something like that, but the RectangleTree would hold an empty struct

22:25 < naywhayare> that seems a bit... clunky though

22:25 < andrewmw94> yeah

23:08 jbc__ has joined #mlpack

23:20 andrewmw94 has left #mlpack []

23:57 jbc__ has quit [Quit: jbc__]