koderok has joined #mlpack
< jenkins-mlpack> Project mlpack - nightly matrix build build #454: STILL UNSTABLE in 1 hr 32 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20nightly%20matrix%20build/454/
< jenkins-mlpack> Ryan Curtin: Const parameters will cause this to be two billion times faster.
koderok has quit [Ping timeout: 252 seconds]
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: did you get a chance to look at the code?
oldbeardo has quit [Ping timeout: 240 seconds]
oldbeardo has joined #mlpack
oldbeardo has quit [Quit: Page closed]
oldbeardo has joined #mlpack
koderok has joined #mlpack
oldbeardo has quit [Quit: Page closed]
koderok has quit [Ping timeout: 265 seconds]
Anand has joined #mlpack
< Anand> Hi!
< Anand> I finally am able to run the benchmarks!
< Anand> I removed libboost, rebuilt 1.55 from source, then rebuilt mlpack. It works now
koderok has joined #mlpack
< marcus_zoq> Anand: Great :)
< naywhayare> Anand: good to hear that you got it working
< Anand> It had to work! :)
< Anand> Now, I was planning to get started with the metrics implementation.
< Anand> The metrics that I am going to implement are very different from measuring run times.
< Anand> Eg: I will need counts of true/false positives and negatives after running a method on a data set.
< naywhayare> should be pretty straightforward to implement, assuming you have the true labels of a dataset and the predicted labels
< Anand> Exactly! So, my question is : do we store the predicted labels
< Anand> ?
< naywhayare> depends on the method and the dataset
< naywhayare> I assume that in Marcus's current dataset set, there are no labels (runtime doesn't depend on labels, in general)
< naywhayare> but for many of those datasets, I would imagine that labels might exist
< naywhayare> for instance, the covertype dataset is one of my favorites to use for nearest neighbor search for testing; it has labels, but I generally ignore them since I'm mostly concerned with runtime
< Anand> yes, true lables will always be there
< Anand> what about predicted labels?
< naywhayare> depends on the algorithm you're using
< naywhayare> for instance, the NaiveBayesClassifier class will output predictions with its Predict() method:
< Anand> Take any example
< naywhayare> sorry, the Classify() function, not the Predict() function
< naywhayare> alternately, if you run the 'nbc' executable, it'll save its predictions to a file with the '--output' parameter
< marcus_zoq> And if we run the nbc method, we save the output by default in the 'output.csv' file.
< marcus_zoq> Right
< Anand> Ok!
< Anand> Are there methods where such vectors/outputs are not returned/saved?
< Anand> I will need to modify their code, then
< naywhayare> allknn and allkfn do not predict classes, so those wouldn't be relevant to your metric
< Anand> right
< naywhayare> there are such things as k-nearest-neighbor classifiers, but you'd need to write a little extra code to make that work
< naywhayare> the same applies to logistic regression, LARS, and linear regression: all of these give numerical predictions but not class label predictions
< naywhayare> let me look at other classifiers... maybe k-means would be one you can use?
< naywhayare> nca (neighborhood components analysis) is one, although it's not specifically a classifier -- it's a metric learning method
< naywhayare> so the goal is to learn a weighting matrix on the data that improves classification
< naywhayare> HMMs and GMMs can do classification... sort of. they can return the log-likelihood of points being in a certain class... but that's not a label
< Anand> kmeans is a multi class classifier. I will use it
< Anand> HMMs and Gmms talk about likelihood : correct. Classifier or not depends on the dataset being used
< Anand> if the dataset talks about lables, we better talk about labels too
< naywhayare> so usually what you might do if you have likelihoods (or numerical predictions from regression) is to give some kind of cut-off, and you say "if greater than <cutoff>, class 0; otherwise, class 1" to assign labels
< naywhayare> in these situations, you can also generate a ROC graph by moving the cutoff, which might be something interesting to display too
< Anand> exactly!
< Anand> So, I guess I will need to add certain things to the methods to make all of them work. Right?
< naywhayare> potentially, yes
< naywhayare> and it's possible that the metric you're implementing will not be applicable to a lot of the methods that mlpack implements
< naywhayare> Udit is going to implement some weak classifiers, I think, so there should be more applicable methods by the end of the summer
< Anand> the metrics I discussed are useful mainly for classifiers
< Anand> mlpack has some methods which fall directly under this category
< Anand> others have to be made to fall
< Anand> :P
< naywhayare> I agree :)
< naywhayare> also, if you modify allknn or allkfn to have a classifier, I will gladly integrate that into the trunk code
< naywhayare> or maybe I will get around to writing that myself, because I know a fast way to do knn classification with trees
< naywhayare> it'll be at least June until I get to that though...
< Anand> Ok. I guess I will start with classifiers like nbc then
< Anand> I will have a look at allknn too
< Anand> HMMs will work too
< naywhayare> yeah; NBC would be the easiest to start with because it already does what you need
< Anand> yeah
< Anand> Ok then. I will let you know after looking at all the methods
< Anand> I will go now! Bye! :)
< marcus_zoq> Sounds like a good plan. Bye!
< Anand> Good bye Marcus! :) Please let me know more!
Anand has quit [Quit: Page closed]
koderok has quit [Ping timeout: 264 seconds]