#mlpack on 2014-07-10 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:07 < jenkins-mlpack> Project mlpack - svn checkin test build #2007: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2007/

00:07 < jenkins-mlpack> sumedhghaisas: * added local minima storing functionality to termination policies

01:14 andrewmw94 has quit [Quit: Leaving.]

01:14 andrewmw94 has joined #mlpack

01:17 andrewmw94 has quit [Client Quit]

03:51 Anand has joined #mlpack

04:48 < Anand> Marcus : I have modified the mlpack interface for all methods. Also added linear regression for weka, scikit and mlpack. Please have a look. I will add linear regression for shogun and matlab today and then will make a merge with master today only!

04:55 Anand has quit [Ping timeout: 246 seconds]

08:57 naywhayare has joined #mlpack

09:01 < jenkins-mlpack> Project mlpack - nightly matrix build build #512: FAILURE in 5 hr 1 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/512/

09:01 < jenkins-mlpack> * sumedhghaisas: * added local minima storing functionality to termination policies

09:01 < jenkins-mlpack> * Ryan Curtin: Lengthen comments that weren't 80 columns long. This may be the most trivial

09:01 < jenkins-mlpack> fix ever in my long, decorated history of trivial commits.

09:01 < jenkins-mlpack> * Ryan Curtin: Very minor changes.

09:01 < jenkins-mlpack> * saxena.udit: IsDistinct() improved.

09:01 < jenkins-mlpack> * Ryan Curtin: Don't use arma::unique() because it's slow.

09:01 < jenkins-mlpack> * Ryan Curtin: Use bool instead of int for tracking convergence.

09:01 < jenkins-mlpack> * Ryan Curtin: Fix some formatting issues; no functionality change.

09:01 < jenkins-mlpack> * Ryan Curtin: Const-correctness and 80-character lines... very trivial fix, no functionality

09:01 < jenkins-mlpack> change.

09:01 < jenkins-mlpack> * saxena.udit: Entropy calculation improved.

09:01 < jenkins-mlpack> * andrewmw94: R tree now has dataset and indices

09:01 < jenkins-mlpack> * Ryan Curtin: Include mlpack/core.hpp.

09:01 < jenkins-mlpack> * Ryan Curtin: Another test to make sure the correct splitting attribute is used.

09:01 < jenkins-mlpack> * Ryan Curtin: Fix some formatting, fix backwards entropy splitting, add getters/setters, and

09:01 < jenkins-mlpack> comment a little bit about the internal structure of the class.

10:00 naywhayare has joined #mlpack

10:01 sumedhghaisas has joined #mlpack

10:01 Anand has quit [Ping timeout: 246 seconds]

10:13 Anand has joined #mlpack

10:31 < Anand> Marcus : Does Shogun support python linear regression just like logistic regression? Look at how we did logistic regression for shogun. We didn't have to write any C code.

10:39 Anand has quit [Ping timeout: 246 seconds]

11:27 < jenkins-mlpack> Starting build #2008 for job mlpack - svn checkin test (previous build: SUCCESS)

12:47 < jenkins-mlpack> Project mlpack - svn checkin test build #2008: SUCCESS in 1 hr 20 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2008/

12:47 < jenkins-mlpack> saxena.udit: Changes are part of perceptron code review, as discussed with Ryan

13:32 sumedh_ has joined #mlpack

13:36 sumedhghaisas has quit [Ping timeout: 245 seconds]

13:55 < jenkins-mlpack> Starting build #2009 for job mlpack - svn checkin test (previous build: SUCCESS)

13:58 udit_s has joined #mlpack

14:30 andrewmw94 has joined #mlpack

14:35 < naywhayare> andrewmw94: my solution to the weird corel dataset bug is in http://www.mlpack.org/trac/changeset/16808

14:36 < naywhayare> I had to modify the tree abstraction very slightly, by adding the function MinimumBoundDistance()

14:36 < naywhayare> I implemented this in HRectBound (r16809) and then added a function MinimumBoundDistance() to the tree types; for RectangleTree, it just passes on HRectBound::MinWidth()

14:38 < andrewmw94> ok. I'm not sure I understand the error in the dual tree traverser, but it's good to know it is fixed.

14:40 < andrewmw94> does the MininmumBoundDistance() return the MinWidth() for BSP trees too? Because I don't think that matches the comment.

14:40 < naywhayare> yeah, that is what it returns

14:41 < naywhayare> oh, hang on... I have botched my terminology

14:41 < naywhayare> I have to divide everything by 2... MinimumBoundDistance = MinWidth / 2

14:42 < andrewmw94> ahh. I think I get it now

14:43 < andrewmw94> how long did it take you to find this. It looks painful.

14:43 < naywhayare> probably 3-4 hours to find the bug, and then I spent about a day thinking about the right way to fix it

14:44 < andrewmw94> not too bad I guess...

14:44 < andrewmw94> but I don't envy you

14:44 < naywhayare> well... I also wrote all the code, so going into it I had some idea of what the bug was

14:44 < naywhayare> the actual code that was wrong was an attempt at a clever way to prune a node combination without actually doing the O(d) MinDistance() calculation between them

14:45 < andrewmw94> ooh, that reminds me. When I was looking through the neighbor search code, I saw that you have a bunch of different ways to calculate bounds (5 I think)

14:46 < andrewmw94> which seems like it could be slower than just using the one that works best. But then I thought, maybe we could change it so higher up in the tree, where a prune would save a ton of computation, it does more precise bounds checking. But as it reaches the bottom, it just does something fast.

14:47 < andrewmw94> Do you think that would have potential?

14:47 < naywhayare> yes, I think that would be a good idea

14:47 < naywhayare> how to implement it is a little less clear...

14:47 < naywhayare> I guess you could do it based on node.NumDescendants()

14:47 < naywhayare> but some of those 5 bounds also depend on bounds propagating from children or parents, so we'd need to also be sure that we weren't breaking those

14:48 < naywhayare> each of those bounds can be derived using different variants of the triangle inequality

14:48 < naywhayare> and they are basically bounds on "what is the largest possible nearest neighbor distance of any query point in the given query node, given everything we know so far?"

14:49 < naywhayare> and each of the bounds considers some different aspect of the "everything we know so far"

14:50 < andrewmw94> yeah. Changing it would be complicated. I was thinking it could be based on the depth of the tree below this node. R trees are balanced, so that gives you a branchingfactor^depth * minFillLeaves as a lower bound.

14:51 < andrewmw94> but we wouldn't want it specific to R trees

14:54 Anand has joined #mlpack

14:55 < Anand> Marcus : I was talking about linear regression for shogun. Logistic regression is already done.

14:55 < Anand> You used modshogun for multiclass logistic regression. Is there a similar import and methods for linear regression?

15:12 < Anand> Also, we need to think about matlab linear regression code

15:14 < jenkins-mlpack> Project mlpack - svn checkin test build #2009: SUCCESS in 1 hr 19 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2009/

15:14 < jenkins-mlpack> saxena.udit: Minor improvement. No major functionality changes

15:14 < jenkins-mlpack> Starting build #2010 for job mlpack - svn checkin test (previous build: SUCCESS)

15:46 Anand has quit [Ping timeout: 246 seconds]

16:07 < udit_s> naywhayare: Hey ! Did you get around to the AdaBoost mail ?

16:50 < jenkins-mlpack> Project mlpack - svn checkin test build #2010: SUCCESS in 1 hr 35 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2010/

16:50 < jenkins-mlpack> * Ryan Curtin: Oops, this needed to be divided by 2.

16:50 < jenkins-mlpack> * Ryan Curtin: Use slightly safer Width().

16:50 < jenkins-mlpack> * Ryan Curtin: Use the bound's cached MinWidth() for MinimumBoundDistance().

16:50 < jenkins-mlpack> * Ryan Curtin: Test MinWidth().

16:50 < jenkins-mlpack> * Ryan Curtin: Add MinWidth(), which is a better solution than having the tree calculate it by

16:50 < jenkins-mlpack> hand.

16:50 < jenkins-mlpack> * Ryan Curtin: Fix elusive bug that only occurred in particularly rare situations.

16:50 < jenkins-mlpack> * Ryan Curtin: Add MinimumBoundDistance().

16:50 < jenkins-mlpack> This represents the minimum distance between the center of a node and any edge

16:50 < jenkins-mlpack> of the bound. Note that for ball bounds, this is equivalent to the furthest

16:50 < jenkins-mlpack> descendant distance.

17:06 Anand has joined #mlpack

17:06 < Anand> Marcus : I am not getting how the current code is giving us the predicted labels. I am not concerned about the MSE calculation, but the predicted labels

17:15 < marcus_zoq> Anand: Ah okay, we need to modify the timing code, in the way we did with the logistic regression. So if the user pass more than one file use the second file as test set. We can assume that the last row of the training set are the responses. Afterwards, we can use 'model.apply(RealFeatures(testSet.T)).get_labels()' to get the labels. Does this make sense?

17:17 < Anand> Ok. So, is this apply(..) method applicable for linear regression too?

17:17 < Anand> If yes, then I got it

17:18 < marcus_zoq> Anand: The apply function is capable to predict the labels.

17:18 < Anand> Ok. Got it. And what about matlab?

17:26 < marcus_zoq> Anand: We need to rewrite the matlab code, it looks like the regress class isn't able to predict new data. We can use a combination of fitlm and feval.

17:27 < marcus_zoq> Anand: If you like I can make the necessary changes.

17:29 < Anand> I guess fitval is like mnrval used in logistic regression and feval like mnrval, right?

17:29 < Anand> maybe I can do this myself!

17:30 < marcus_zoq> Anand: Right, you can use matlab on the build server right?

17:31 < Anand> Oh, yes I will need to run it on the build server. I will try. Else, I will make the changes and you can have a look then!

17:32 < marcus_zoq> Anand: Okay

17:37 Anand has quit [Ping timeout: 246 seconds]

18:15 udit_s has quit [Quit: Leaving]

18:17 sumedh_ has quit [Ping timeout: 255 seconds]

18:21 oldbeardo has joined #mlpack

18:22 < oldbeardo> naywhayare: there?

18:34 < naywhayare> oldbeardo: only sort of

18:34 < naywhayare> I am helping someone inspect a car today... but I have my phone, which is maybe ok for this :)

18:34 < oldbeardo> naywhayare: hmmm, that may not be enough

18:40 < naywhayare> :-(

18:41 < naywhayare> you can leave messages in the channel and I will try to answer when I can

18:44 sumedh_ has joined #mlpack

19:00 < oldbeardo> naywhayare: I also wanted to inform you, I may be unavailable for a week starting Saturday, I'm switching cities

19:17 < oldbeardo> naywhayare: I will try to add the tests by tomorrow

19:18 oldbeardo has quit [Quit: Page closed]

19:22 < jenkins-mlpack> Starting build #2011 for job mlpack - svn checkin test (previous build: SUCCESS)

20:23 udit_s has joined #mlpack

20:24 < udit_s> naywhayare: Hey, are you free now ?

20:25 < naywhayare> udit_s: yeah, I am back now

20:27 < udit_s> I'll complete the gaussian distribution test for the perceptron by tomorrow. Otherwise, you said you were done with the Perceptron ?

20:27 < naywhayare> yeah, I think so

20:27 < naywhayare> I still need to go through decision_stump_main.cpp and perceptron_main.cpp, but that shouldn't take long... I just keep forgetting

20:27 < naywhayare> I am learning about AdaBoost now so that I can respond well to your email :)

20:28 < udit_s> Okay. Whenever you're free, let's talk about AdaBoost. Because I'm kinda stuck on those two points I've mentioned.

20:28 < udit_s> Oh. Cool.

20:28 < naywhayare> right; I am working on an answer now. give me a handful of minutes (maybe 20 to 30? I need to do some reading) and I will have a good response for you

20:30 < udit_s> Sure. Take your time. Actually, I just wanted to catch you before I slept off. I wanted to get started on it by tomorrow.

20:32 < udit_s> Let's talk tomorrow then ? Say, 1300 UTC ?

20:33 < udit_s> We could have a discussion similar to what we did before the Perceptron...

20:33 < udit_s> And I'll work on it over the weekend.

20:35 < naywhayare> okay

20:35 < naywhayare> I should be awake by 1300 UTC

20:36 < udit_s> What would you prefer ? Would now (the next hour or so) be a better time for you ?

20:41 < jenkins-mlpack> Project mlpack - svn checkin test build #2011: SUCCESS in 1 hr 18 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2011/

20:41 < jenkins-mlpack> siddharth.950: Adding Regularized SVD Code

20:41 < naywhayare> nah, let's do it tomorrow morning

20:41 < naywhayare> (sorry for the slow response, I stepped out)

20:52 < udit_s> okay.

20:52 udit_s has quit [Quit: Leaving]

23:33 sumedh__ has joined #mlpack

23:33 sumedh_ has quit [Read error: Connection reset by peer]

23:33 andrewmw94 has left #mlpack []