#mlpack on 2014-05-14 — irc logs at libera.irclog.whitequark.org

04:17 koderok has joined #mlpack

05:32 < jenkins-mlpack> Project mlpack - nightly matrix build build #454: STILL UNSTABLE in 1 hr 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/454/

05:32 < jenkins-mlpack> Ryan Curtin: Const parameters will cause this to be two billion times faster.

05:53 koderok has quit [Ping timeout: 252 seconds]

14:21 oldbeardo has joined #mlpack

14:41 < oldbeardo> naywhayare: did you get a chance to look at the code?

14:47 oldbeardo has quit [Ping timeout: 240 seconds]

14:58 oldbeardo has joined #mlpack

15:20 oldbeardo has quit [Quit: Page closed]

15:23 oldbeardo has joined #mlpack

15:26 koderok has joined #mlpack

15:53 oldbeardo has quit [Quit: Page closed]

15:57 koderok has quit [Ping timeout: 265 seconds]

16:14 Anand has joined #mlpack

16:14 < Anand> Hi!

16:15 < Anand> I finally am able to run the benchmarks!

16:16 < Anand> I removed libboost, rebuilt 1.55 from source, then rebuilt mlpack. It works now

16:28 koderok has joined #mlpack

16:30 < marcus_zoq> Anand: Great :)

16:42 < naywhayare> Anand: good to hear that you got it working

16:47 < Anand> It had to work! :)

16:48 < Anand> Now, I was planning to get started with the metrics implementation.

16:48 < Anand> The metrics that I am going to implement are very different from measuring run times.

16:49 < Anand> Eg: I will need counts of true/false positives and negatives after running a method on a data set.

16:50 < naywhayare> should be pretty straightforward to implement, assuming you have the true labels of a dataset and the predicted labels

16:51 < Anand> Exactly! So, my question is : do we store the predicted labels

16:51 < Anand> ?

16:51 < naywhayare> depends on the method and the dataset

16:52 < naywhayare> I assume that in Marcus's current dataset set, there are no labels (runtime doesn't depend on labels, in general)

16:52 < naywhayare> but for many of those datasets, I would imagine that labels might exist

16:52 < naywhayare> for instance, the covertype dataset is one of my favorites to use for nearest neighbor search for testing; it has labels, but I generally ignore them since I'm mostly concerned with runtime

16:52 < Anand> yes, true lables will always be there

16:52 < Anand> what about predicted labels?

16:53 < naywhayare> depends on the algorithm you're using

16:53 < naywhayare> for instance, the NaiveBayesClassifier class will output predictions with its Predict() method:

16:53 < Anand> Take any example

16:53 < naywhayare> http://www.mlpack.org/doxygen.php?doc=classmlpack_1_1naive__bayes_1_1NaiveBayesClassifier.html#ad53dbe6c9ac9d63f6735c22e8109d165

16:53 < naywhayare> sorry, the Classify() function, not the Predict() function

16:54 < naywhayare> alternately, if you run the 'nbc' executable, it'll save its predictions to a file with the '--output' parameter

16:54 < marcus_zoq> And if we run the nbc method, we save the output by default in the 'output.csv' file.

16:54 < marcus_zoq> Right

16:55 < Anand> Ok!

16:55 < Anand> Are there methods where such vectors/outputs are not returned/saved?

16:55 < Anand> I will need to modify their code, then

16:55 < naywhayare> allknn and allkfn do not predict classes, so those wouldn't be relevant to your metric

16:56 < Anand> right

16:56 < naywhayare> there are such things as k-nearest-neighbor classifiers, but you'd need to write a little extra code to make that work

16:56 < naywhayare> the same applies to logistic regression, LARS, and linear regression: all of these give numerical predictions but not class label predictions

16:57 < naywhayare> let me look at other classifiers... maybe k-means would be one you can use?

16:57 < naywhayare> nca (neighborhood components analysis) is one, although it's not specifically a classifier -- it's a metric learning method

16:57 < naywhayare> so the goal is to learn a weighting matrix on the data that improves classification

16:58 < naywhayare> HMMs and GMMs can do classification... sort of. they can return the log-likelihood of points being in a certain class... but that's not a label

16:58 < Anand> kmeans is a multi class classifier. I will use it

16:59 < Anand> HMMs and Gmms talk about likelihood : correct. Classifier or not depends on the dataset being used

17:00 < Anand> if the dataset talks about lables, we better talk about labels too

17:01 < naywhayare> so usually what you might do if you have likelihoods (or numerical predictions from regression) is to give some kind of cut-off, and you say "if greater than <cutoff>, class 0; otherwise, class 1" to assign labels

17:01 < naywhayare> in these situations, you can also generate a ROC graph by moving the cutoff, which might be something interesting to display too

17:01 < Anand> exactly!

17:02 < Anand> So, I guess I will need to add certain things to the methods to make all of them work. Right?

17:02 < naywhayare> potentially, yes

17:03 < naywhayare> and it's possible that the metric you're implementing will not be applicable to a lot of the methods that mlpack implements

17:03 < naywhayare> Udit is going to implement some weak classifiers, I think, so there should be more applicable methods by the end of the summer

17:03 < Anand> the metrics I discussed are useful mainly for classifiers

17:04 < Anand> mlpack has some methods which fall directly under this category

17:04 < Anand> others have to be made to fall

17:04 < Anand> :P

17:04 < naywhayare> I agree :)

17:04 < naywhayare> also, if you modify allknn or allkfn to have a classifier, I will gladly integrate that into the trunk code

17:05 < naywhayare> or maybe I will get around to writing that myself, because I know a fast way to do knn classification with trees

17:05 < naywhayare> it'll be at least June until I get to that though...

17:06 < Anand> Ok. I guess I will start with classifiers like nbc then

17:06 < Anand> I will have a look at allknn too

17:06 < Anand> HMMs will work too

17:07 < naywhayare> yeah; NBC would be the easiest to start with because it already does what you need

17:11 < Anand> yeah

17:12 < Anand> Ok then. I will let you know after looking at all the methods

17:15 < Anand> I will go now! Bye! :)

17:16 < marcus_zoq> Sounds like a good plan. Bye!

17:17 < Anand> Good bye Marcus! :) Please let me know more!

17:17 Anand has quit [Quit: Page closed]

19:20 koderok has quit [Ping timeout: 264 seconds]