< jenkins-mlpack>
Ryan Curtin: Const parameters will cause this to be two billion times faster.
koderok has quit [Ping timeout: 252 seconds]
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: did you get a chance to look at the code?
oldbeardo has quit [Ping timeout: 240 seconds]
oldbeardo has joined #mlpack
oldbeardo has quit [Quit: Page closed]
oldbeardo has joined #mlpack
koderok has joined #mlpack
oldbeardo has quit [Quit: Page closed]
koderok has quit [Ping timeout: 265 seconds]
Anand has joined #mlpack
< Anand>
Hi!
< Anand>
I finally am able to run the benchmarks!
< Anand>
I removed libboost, rebuilt 1.55 from source, then rebuilt mlpack. It works now
koderok has joined #mlpack
< marcus_zoq>
Anand: Great :)
< naywhayare>
Anand: good to hear that you got it working
< Anand>
It had to work! :)
< Anand>
Now, I was planning to get started with the metrics implementation.
< Anand>
The metrics that I am going to implement are very different from measuring run times.
< Anand>
Eg: I will need counts of true/false positives and negatives after running a method on a data set.
< naywhayare>
should be pretty straightforward to implement, assuming you have the true labels of a dataset and the predicted labels
< Anand>
Exactly! So, my question is : do we store the predicted labels
< Anand>
?
< naywhayare>
depends on the method and the dataset
< naywhayare>
I assume that in Marcus's current dataset set, there are no labels (runtime doesn't depend on labels, in general)
< naywhayare>
but for many of those datasets, I would imagine that labels might exist
< naywhayare>
for instance, the covertype dataset is one of my favorites to use for nearest neighbor search for testing; it has labels, but I generally ignore them since I'm mostly concerned with runtime
< Anand>
yes, true lables will always be there
< Anand>
what about predicted labels?
< naywhayare>
depends on the algorithm you're using
< naywhayare>
for instance, the NaiveBayesClassifier class will output predictions with its Predict() method:
< naywhayare>
sorry, the Classify() function, not the Predict() function
< naywhayare>
alternately, if you run the 'nbc' executable, it'll save its predictions to a file with the '--output' parameter
< marcus_zoq>
And if we run the nbc method, we save the output by default in the 'output.csv' file.
< marcus_zoq>
Right
< Anand>
Ok!
< Anand>
Are there methods where such vectors/outputs are not returned/saved?
< Anand>
I will need to modify their code, then
< naywhayare>
allknn and allkfn do not predict classes, so those wouldn't be relevant to your metric
< Anand>
right
< naywhayare>
there are such things as k-nearest-neighbor classifiers, but you'd need to write a little extra code to make that work
< naywhayare>
the same applies to logistic regression, LARS, and linear regression: all of these give numerical predictions but not class label predictions
< naywhayare>
let me look at other classifiers... maybe k-means would be one you can use?
< naywhayare>
nca (neighborhood components analysis) is one, although it's not specifically a classifier -- it's a metric learning method
< naywhayare>
so the goal is to learn a weighting matrix on the data that improves classification
< naywhayare>
HMMs and GMMs can do classification... sort of. they can return the log-likelihood of points being in a certain class... but that's not a label
< Anand>
kmeans is a multi class classifier. I will use it
< Anand>
HMMs and Gmms talk about likelihood : correct. Classifier or not depends on the dataset being used
< Anand>
if the dataset talks about lables, we better talk about labels too
< naywhayare>
so usually what you might do if you have likelihoods (or numerical predictions from regression) is to give some kind of cut-off, and you say "if greater than <cutoff>, class 0; otherwise, class 1" to assign labels
< naywhayare>
in these situations, you can also generate a ROC graph by moving the cutoff, which might be something interesting to display too
< Anand>
exactly!
< Anand>
So, I guess I will need to add certain things to the methods to make all of them work. Right?
< naywhayare>
potentially, yes
< naywhayare>
and it's possible that the metric you're implementing will not be applicable to a lot of the methods that mlpack implements
< naywhayare>
Udit is going to implement some weak classifiers, I think, so there should be more applicable methods by the end of the summer
< Anand>
the metrics I discussed are useful mainly for classifiers
< Anand>
mlpack has some methods which fall directly under this category
< Anand>
others have to be made to fall
< Anand>
:P
< naywhayare>
I agree :)
< naywhayare>
also, if you modify allknn or allkfn to have a classifier, I will gladly integrate that into the trunk code
< naywhayare>
or maybe I will get around to writing that myself, because I know a fast way to do knn classification with trees
< naywhayare>
it'll be at least June until I get to that though...
< Anand>
Ok. I guess I will start with classifiers like nbc then
< Anand>
I will have a look at allknn too
< Anand>
HMMs will work too
< naywhayare>
yeah; NBC would be the easiest to start with because it already does what you need
< Anand>
yeah
< Anand>
Ok then. I will let you know after looking at all the methods
< Anand>
I will go now! Bye! :)
< marcus_zoq>
Sounds like a good plan. Bye!
< Anand>
Good bye Marcus! :) Please let me know more!