naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< jenkins-mlpack> Project mlpack - svn checkin test build #2080: SUCCESS in 1 hr 29 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2080/
< jenkins-mlpack> sumedhghaisas: * minor changes
jbc__ has joined #mlpack
< jbc__> naywhayare: yeah, so I think what happens in that test case is the probabilities of the outcomes are 50/50 (getting rid of non-unique) and because it’s equal probability the classifier just uses the first index of the maximum probability - hence getting stuck on zero
< jbc__> as that’s the first label, and each label has equal prob
< jenkins-mlpack> Starting build #2081 for job mlpack - svn checkin test (previous build: SUCCESS)
govg_ has joined #mlpack
< jenkins-mlpack> Project mlpack - svn checkin test build #2081: SUCCESS in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2081/
< jenkins-mlpack> michaelfox99: GMM::Save() now adds type information to XML files
< jenkins-mlpack> Project mlpack - nightly matrix build build #552: STILL FAILING in 2 hr 15 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/552/
< jenkins-mlpack> * michaelfox99: phi.hpp no longer used, see GaussianDistribution::Probability()
< jenkins-mlpack> * michaelfox99: GMM::Save() now adds type information to XML files
< jenkins-mlpack> * sumedhghaisas: * minor changes
< jenkins-mlpack> * sumedhghaisas: * modified PlainSVD module to return normalized frobenius norm
< jenkins-mlpack> * modified PlainSVD tests
< jenkins-mlpack> * added row_col_iterator operator-- tests
< jenkins-mlpack> * sumedhghaisas: * changed row_col_iterator::operator-- implementation
< jenkins-mlpack> * added documentation to termination policies
< jenkins-mlpack> * minor fix of PlainSVD module
< jenkins-mlpack> * Ryan Curtin: Oops, I fixed this backwards. The actual error is the output.
< jenkins-mlpack> * Ryan Curtin: Fix incorrect check (how did this happen?).
< jenkins-mlpack> Starting build #2082 for job mlpack - svn checkin test (previous build: SUCCESS)
< jenkins-mlpack> Project mlpack - svn checkin test build #2082: FAILURE in 4 min 45 sec: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2082/
< jenkins-mlpack> michaelfox99: phi.hpp no longer used, see GaussianDistribution::Probability()
sumedhghaisas has joined #mlpack
jenkins-mlpack has quit [Read error: Connection reset by peer]
jenkins-mlpack has joined #mlpack
jenkins-mlpack has quit [Read error: Connection reset by peer]
jenkins-mlpack has joined #mlpack
sumedhghaisas has quit [Remote host closed the connection]
< jenkins-mlpack> Starting build #2083 for job mlpack - svn checkin test (previous build: FAILURE -- last SUCCESS #2081 8 hr 51 min ago)
< marcus_zoq> naywhayare: hm, the benchmark job failed again
< jenkins-mlpack> Yippie, build fixed!
< jenkins-mlpack> Project mlpack - svn checkin test build #2083: FIXED in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2083/
< jenkins-mlpack> michaelfox99: removed #include <mlpack/methods/gmm/phi.hpp> which included only unused functions
< naywhayare> marcus_zoq: I am not sure why it failed\
< naywhayare> FATAL: Unable to delete script file /tmp/hudson4018934327994967650.sh
< naywhayare> but I can't come up with a reason for why that happened
< naywhayare> I can remove those hudson* scripts as jenkins after destroying the kerberos ticket
< marcus_zoq> naywhayare: Okay, It's worth a try.
< naywhayare> most of the comments are unhelpful, but some people seem to suggest that the issue is that the ssh connection sits idle and times out because the job doesn't produce any output
< naywhayare> somewhere in there, someone suggests adding ClientAliveInterval 60 to the sshd_config of the slave in question
< naywhayare> so I've tried doing that for shoeshine, and we can see if it works
< marcus_zoq> Okay, sounds like a good plan.
< naywhayare> do you want to restart the job now, or should we make other changes too?
< marcus_zoq> naywhayare: I think we need to kill the benchmark process before restarting.
< naywhayare> oh... you're right, it's still running in the background
< naywhayare> ok, I've killed all associated processes
< marcus_zoq> naywhayare: Okay, Thanks! I restart the job.
< jenkins-mlpack> Starting build #2084 for job mlpack - svn checkin test (previous build: FIXED)
andrewmw94 has joined #mlpack
< naywhayare> andrewmw94: I hope you have nice ways of filtering out jenkins emails... sorry about all those. :(
< andrewmw94> no problem. I can just sort by recipient. I caught some (I hope small) ailment on the flight so I didn't notice them until now.
< andrewmw94> Hopefully being tired won't introduce too many bugs :)
< naywhayare> :)
< naywhayare> I need to figure out how to get jenkins to send just one email when things in the matrix build are broken
< naywhayare> instead of one for every single possible broken configuration... which will only get worse, exponentially, as I get around to adding boost versions and compiler versions to the matrix
< naywhayare> (the end result of this is that every system in the build farm is at 100% load every second of every day, building configurations of mlpack that nobody even uses anyway, I guess)
< andrewmw94> yeah. That sounds complicated. I'm not sure how you can guarantee you are seeing the same bug in each configuration. I guess you could just make an assumption like "If revision XXXX is failing test _____, on N configurations, only report it once.
< naywhayare> yeah, I mean, ideally I want an email that just says "you broke the build on this large list of configurations: (plus a link to the build log of each failing configuration)"
< jenkins-mlpack> Project mlpack - svn checkin test build #2084: SUCCESS in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2084/
< jenkins-mlpack> * Ryan Curtin: Some formatting fixes.
< jenkins-mlpack> * Ryan Curtin: Change names of functions.
< jenkins-mlpack> * Ryan Curtin: Add some const and fix some formatting.
< jenkins-mlpack> * Ryan Curtin: Add header comments, fix header guard naming to be in line with the rest of the
< jenkins-mlpack> files.
< jenkins-mlpack> * Ryan Curtin: Add header comments and clean up BiBTeX citation a bit.
< jenkins-mlpack> Starting build #2085 for job mlpack - svn checkin test (previous build: SUCCESS)
Anand has joined #mlpack
Anand has quit [Ping timeout: 246 seconds]
< jenkins-mlpack> Project mlpack - svn checkin test build #2085: SUCCESS in 1 hr 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2085/
< jenkins-mlpack> Ryan Curtin: Simplify a matrix calculation.
< jenkins-mlpack> Project mlpack - nightly matrix build build #553: STILL FAILING in 5 hr 27 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/553/
< jenkins-mlpack> michaelfox99: removed #include <mlpack/methods/gmm/phi.hpp> which included only unused functions
jbc__ has quit [Quit: jbc__]
jbc__ has joined #mlpack
jbc__ has quit [Client Quit]
< andrewmw94> naywhayare:
< andrewmw94> I have a design question about X trees
< naywhayare> sure, go ahead
< andrewmw94> Ok, so the paper calls for storing a binary tree in each node to keep track of the past split dimensions
< andrewmw94> (X trees and R* trees have a dimension along which the split is performed)
< naywhayare> blah, what's the name of the paper again? "X tree" isn't a good search term
< naywhayare> ha... ps. how archaic
< andrewmw94> as a side note, is there a reason these papers are usually made by germans?
< andrewmw94> yeah, haven't seen a postscript file in a long time
< naywhayare> I dunno, marcus_zoq might? maybe databases are especially important in Germany?
< andrewmw94> yeah, I don't know. I just noticed it when I was reading this paper.
< andrewmw94> anyways, the relevant section is section 3.3
< naywhayare> seems like it's mostly the folks who did the original R tree
< andrewmw94> there are a lot of superflous proofs with the important information spread around them. I didn't like the setup here.
< naywhayare> sounds like the papers I write :)
< andrewmw94> it seems like they could have summarized the whole algorithm in half a page, but then I guess there isn't enough to publish
< andrewmw94> though they did have a lot of tests. I just wish they would separate them so I didn't have to read proofs of things that I find obvious
< naywhayare> it may be that the audience of the conference they submitted it to couldn't be assumed to be familiar with the logic of their proofs and reasoning, so they had to be explicit
< naywhayare> I run into this when I submit dual-tree algorithms papers... the reviewers are always saying "I have no idea what dual-tree algorithms are, and I can't understand any of this"
< naywhayare> so either reviewers aren't interested in reading references to get up to speed, or, my writing is too dense. not yet sure which
< naywhayare> anyway, I am looking at section 3.3
< andrewmw94> hmm, interesting. I've often wondered why people can't seem to follow my proofs
< andrewmw94> maybe I should look into this more
< naywhayare> I always hand my proofs to people with minimal background in what I do and try to see if they can follow it. if not, I try to simplify it
< naywhayare> (I'm not always successful with the simplification)
< naywhayare> I see what you mean about the verbosity... on page 8 one paragraph starts with
< naywhayare> According to Lemma 1, for finding an overlap-free split we have to determine a dimension according to which all MBRs of S have been split previously.
< naywhayare> then, two paragraphs later,
< naywhayare> According to Lemma 1, we may find an overlap-free split if there is a dimension according to which all MBRs of S have been split.
< naywhayare> although they are finding overlap-free splits, their paper is not overlap-free :)
< andrewmw94> haha. Indeed.
< andrewmw94> I think I have other problems with my proofs. I had a graph theory class last semester where I got perfect scores on most tests, but practically all of them required talking to the teacher to explain what I wrote. Though the teacher didn't speak English natively, so that may have been part of the problem.
< naywhayare> verbosity is part of a solution, but on a test you're time-constrained so that may not be the right answer
< naywhayare> it's hard to write a good proof, because the reader may be from wildly varing backgrounds, and so the same logical jumps that you make effortlessly may not be obvious to them
< naywhayare> and vice versa too, so you may end up writing a proof that seems overly verbose and obvious to some others
< andrewmw94> yeah. That's what really bugs me. I'm reading some books in symbolic logic, and one of them has proofs for the simplest things.
< naywhayare> perhaps the intended audience has background knowledge lower than yours?
< andrewmw94> a paraphrased example in programming lingo: If two strings are identical, the ith character of each string is identical
< andrewmw94> I doubt it.
< andrewmw94> It's a graduate level book.
< naywhayare> ah, yeah, that type of clarity. very verbose, but it also leaves no room for interpretation
< andrewmw94> yeah. I find it silly. If I didn't see that immediately why would you think I could follow your proof?
< andrewmw94> actually, I usually can see the result immediately but can't follow the proof ;)
< andrewmw94> but maybe they just want to introduce the terminology that way or something. I don't know
< naywhayare> yeah, the difference between intuitive understanding of results and rigorous proofs of them is often quite large
< naywhayare> for instance, the O(N) runtime proofs for dual-tree nearest neighbor search were hypothesized in 2005 or 2006, but it took another two or three years of hard work to actually prove it
< naywhayare> anyway, I have to get up and go soon... what was your question?
< naywhayare> was it how to store this binary tree of previous split dimensions?
< andrewmw94> yeah, should I do that in the RectangleTree
< naywhayare> does the RectangleTree already store its split dimension?
< andrewmw94> or do I need to create a new class for the X tree, where that is the only difference
< andrewmw94> no, it would only be used by the X tree.
< andrewmw94> on the other hand, most users will want to use the X tree
< andrewmw94> according to the authors at least, it should be better than the others whet D > 8 or so
< andrewmw94> where D is the number of dimensions
< naywhayare> if the tree stores its split dimension, then you'd already be storing that entire tree implicitly and you could just look at the split dimensions of all the parents
< naywhayare> however, if it doesn't, that makes things a bit more difficult
< andrewmw94> Yeah, that's not stored currently.
< naywhayare> I think at this point the easiest thing to do is just have the RectangleTree nodes store their split dimension, like the BinarySpaceTree, and then use that
< naywhayare> it'd require minor refactoring of the existing RectangleTree splitting classes though
< naywhayare> unless you have a better idea; that's all I can come up with on short notice
< andrewmw94> Yeah, that's the best I have so far too
< naywhayare> the alternative seems to be either making a different class entirely for the X tree, or refactoring the DescentType API very seriously (which does not sound easy or fun)
< andrewmw94> agreed
< naywhayare> one alternate, and I don't know how much work it would be, is to allow the RectangleTree to be created with an instantiated DescentType object
< naywhayare> instead of just relying on static functions of DescentType
< naywhayare> and then build the binary tree with the split dimensions inside of XTreeSplit, storing the tree as a member of XTreeSplit
< naywhayare> that would allow you to keep the split dimension out of the RectangleTree class and also to not store that tree after construction time (since it isn't necessary after you make the tree)
< andrewmw94> yeah, I thought about that. I'm not sure how I would efficiently map between nodes in that class and nodes of the RectangleTree
< andrewmw94> I'm also not sure about the "unused after construction" assumption. One of the main goals of R*trees and X trees seems to be the ability to dynamically build the tree
< naywhayare> ah. in that case, I think that keeping the split dimension stored in the node is the best idea
< andrewmw94> ok. Thanks for the help
< naywhayare> and the only other idea I can think of is to add another auxiliary class (kind of like StatisticType) that just holds whatever else a particular tree variant needs
< naywhayare> so, for example, the X tree would hold the XTreeExtraData struct, or something like that, but the RectangleTree would hold an empty struct
< naywhayare> that seems a bit... clunky though
< andrewmw94> yeah
jbc__ has joined #mlpack
andrewmw94 has left #mlpack []
jbc__ has quit [Quit: jbc__]