< naywhayare>
oldbeardo: before the summer starts, we should work out some times in which I will try to be available every day on IRC
< naywhayare>
the time zone difference makes it a little difficult; I think the difference between you and me is 9 and a half hours
< naywhayare>
so probably when you do most of your work, I will be sleeping, but maybe we can arrange some times so that I am available at the beginning and end of your workdays
< oldbeardo>
I don't think it will be an issue
< naywhayare>
ok; if I am responsive enough now, then we can just keep doing what we are already doing :)
< oldbeardo>
right now I leave early because I get internet access only till 11pm
< oldbeardo>
once I'm back at home I can continue working till 2-3 am
< naywhayare>
ok, if that works for you
< naywhayare>
I can start waking up a little earlier too, if that's easier for you
< oldbeardo>
yeah, it would be
< oldbeardo>
not to sound rude, I thought Ajinkya was my mentor
< naywhayare>
ah, I thought we were co-mentoring, but you're right
< naywhayare>
I knew he was helping with one of the CF projects but I couldn't remember if it was Sumedh or you
< naywhayare>
I should have just looked it up
< oldbeardo>
well, I have no problem if you were my mentor too :)
< naywhayare>
ok, well if you have something set up with him, then that's good. I'll try to start waking up earlier anyway... that would probably be good for my productivity :)
< oldbeardo>
yup, it will be your lunch time soon, won't it?
< oldbeardo>
I'm asking because I'm going to upload the Softmax Regression codes soon, would be great if we could finish it today, so that the next thing I do is QUIC-SVD
< naywhayare>
I'm actually cooking lunch at home today
< naywhayare>
so I won't be leaving a computer
< oldbeardo>
oh great
< oldbeardo>
naywhayare: have you ever worked with GPUs?
< naywhayare>
oldbeardo: no, I haven't. I have basic knowledge of GPUs and I have attended some introductory presentations on using CUDA and similar libraries
< naywhayare>
but I would not call myself an expert in any way :)
< oldbeardo>
okay, no problem, I uploaded the code just now
< oldbeardo>
IRC looks so empty now that the GSoC formalities are over
< naywhayare>
yeah, not many people idling in here anymore
< naywhayare>
still more active than it was two years ago :)
< naywhayare>
let me download the code you uploaded, hang on
< oldbeardo>
okay, wait are you saying that two years back you were the only active member on IRC? :D
< naywhayare>
basically, yeah. even jenkins-mlpack wasn't here :)
< naywhayare>
one of the old guys who used to work for the fastlab (james cline / Rodya) used to idle in here
< naywhayare>
but he works for Rdio now and I don't think he's got time for mlpack anymore
< oldbeardo>
what's Rdio?
< naywhayare>
streaming music service; I don't think they're that popular, but I think they're doing okay
< naywhayare>
I've never used it and I don't know too much about them
< oldbeardo>
okay, just saw their website, looks in the early stages, and their service isn't available in India
< naywhayare>
yeah, I think they are still a pretty small company
< naywhayare>
ok, the test looks good; I think maybe ComputeAccuracy() should be in the test and not in SoftmaxRegression, though
< naywhayare>
I can't think of cases other than testing when a user might want to call ComputeAccuracy()
< naywhayare>
I could see that maybe a user might want to say "ok, how did the model do?" and print that as output
< naywhayare>
but realistically I think a better idea might be to have a function in src/mlpack/core/ somewhere that scores predictions given true labels
< naywhayare>
good to see the change to sp_mat worked fine
< naywhayare>
although you only call GetGroundTruthMatrix() once, in the constructor of SoftmaxRegressionFunction; do you think that it should just be inlined into the constructor since it's not used anywhere else?
< oldbeardo>
wait, I'll have to look, I forgot how I had arranged it
< oldbeardo>
well, I made it this way so that if the code is needed for some other implementation, it can easily be copied
< oldbeardo>
and inlining won't provide any significant advantage
< naywhayare>
the only advantage it provides is code simplicity
< naywhayare>
I think if someone else needs it, we can refactor it out then, if you think that's reasonable
< oldbeardo>
I think it's better this way, I guess I'm a sucker for abstraction
< naywhayare>
ok, we can leave it as is then
< naywhayare>
I spend a lot of time trying to figure out how to keep APIs clean, so when I see a function I often think "do we need another function? can we get rid of it?"
< oldbeardo>
okay, I haven't dealt with issues like these, what are the problems that arise because of this?
< naywhayare>
when the API gets very complex it can be confusing for new users
< naywhayare>
so ideally you want to make it as simple as possible for users to do most of what they need to do
< naywhayare>
but you also want to provide flexibility so more advanced users can do more complex things
< naywhayare>
so finding the right cutoff for what to provide as functions, what to have as template parameters, and so forth, can be quite difficult
< naywhayare>
in my opinion that is an example of a very bloated API
< naywhayare>
at one point mlpack used Epetra for sparse matrix support, but... Epetra is impossible to understand and is incredibly complex and has a million different types and functions and so forth
< oldbeardo>
right, I get your point
< naywhayare>
I'm sure Epetra does some very useful things, but... wow, it would take a long time to learn what it does
< oldbeardo>
can't we make the internal functions private, instead of inlining?
< naywhayare>
a private function would only be useful to that class
< naywhayare>
I don't have a huge problem with leaving it public; SoftmaxRegressionFunction isn't a class most users should be using anyway
< naywhayare>
kind of an internal class
< oldbeardo>
okay, any other issues with the code?
< naywhayare>
yeah, the ComputeAccuracy() function, and then I think we should adapt the logistic_regression_main.cpp executable to use SoftmaxRegression, and then make a softmax_regression_main.cpp
< naywhayare>
alternately just make one softmax_regression_main.cpp that does a superset of what logistic_regression_main.cpp does
< oldbeardo>
I took the ComputeAccuracy() function from the LogisticRegression class itself
< naywhayare>
huh, why did I accept that then? let me look at its history
< naywhayare>
weird, I don't think I spent enough time with this code before I accepted it
< naywhayare>
let me think about it for a little while. if you can think of cases where ComputeAccuracy() or ComputeError() would be useful to an end user, let me know
< naywhayare>
I think maybe the better option is to remove ComputeAccuracy() and then add some other method somewhere that computes accuracy given a list of predictions and true labels
< naywhayare>
so it's not just in logistic regression or softmax regression
< oldbeardo>
it will be useful to check the training set accuracy
< oldbeardo>
but yes, this can be made a common function
< naywhayare>
ok, I agree
< naywhayare>
now we have to figure out where to put the common function :)
< naywhayare>
I think maybe we will need a new namespace or directory under src/mlpack/core/
< naywhayare>
I'm not sure ComputeAccuracy() would fit under anything that's already there... arma_extend, data (which is save/load/change labels), dists, kernels, math, metrics, optimizers, tree, util
< oldbeardo>
how about core/performance
< naywhayare>
yeah, let's do that for now and maybe we'll think of something better later
< naywhayare>
performance sort of implies runtime performance, not classification performance, but I don't have a better word
< naywhayare>
one last concern: the LogisticRegression code allows one to specify a decision boundary
< naywhayare>
this is common for most two-class classifiers, but could we generalize that to the multi-class case?
< oldbeardo>
ummm, I don't think so, I will check it out anyway
< naywhayare>
I think maybe the user could pass some weighting vector, then you could multiply the class probabilities for each point by the weighting vector before selecting the max
< naywhayare>
think about the two class case... if you want a decision boundary of 0.5, you could pass [1 1] (the default), but if you want a boundary decision of 0.25, you could pass something different, like [1.5 0.5]?
< naywhayare>
I haven't done the exact math yet to figure out what the exact relation is
< naywhayare>
but I think something like that might work
< oldbeardo>
see the problem there is you assume a linear separation in the binary case
< oldbeardo>
so you have to check only for a single line
< oldbeardo>
in the multiclass case you will have to check for an exponential number of line combinations to come up with the class
< naywhayare>
hang on a second, let me sketch something out
< oldbeardo>
at least that's what my intuition tells me
< naywhayare>
ok, hang on, uploading some images
< oldbeardo>
sure
< oldbeardo>
which site are you using? because of them are clocked
< oldbeardo>
*some of them
< naywhayare>
my own :)
< oldbeardo>
heh, nice one
< naywhayare>
the hard part is getting them off my phone
< naywhayare>
and I tried to draw what I thought the modified decision surface would look like
< naywhayare>
to classify a point, just like softmax regression, you just find the probability of the point coming from each of those gaussians, and select the gaussian with max probability
< naywhayare>
but if I double the probability of the point coming from g2, then I get a modified decision surface like in the image, but I didn't have to calculate numerous lines; just multiply the probability estimate for g2 by 2 (or whatever factor)
< naywhayare>
but I'm not sure that's the same thing that is happening in logistic regression
< oldbeardo>
I'm actually a little confused by this
< oldbeardo>
I have no idea about the Gaussian interpretation of Logistic Regression
< naywhayare>
no, I wasn't using logistic regression specifically as an example
< naywhayare>
this is some classifier that has no relation to logistic regression or softmax regression
< naywhayare>
the similarity is that it classifies in the same way softmax regression does
< naywhayare>
I think that you should be able to represent softmax regression as a series of PDFs in some space (like these three gaussians)
< oldbeardo>
is this an existing algorithm or are you trying to come up with a new one?
< naywhayare>
no, I'm trying to extend the idea of a modifiable decision boundary from logistic regression to softmax regression
< naywhayare>
I don't know if anyone has done that before
< naywhayare>
ideally I would like to entirely remove the logistic regression module and replace it with softmax regression
< naywhayare>
but to do this we need to be sure that softmax regression has the same functionality; the only thing it's missing is the decision boundary parameter
< naywhayare>
so I was trying to think of a multiclass generalization to the decision boundary parameter
< oldbeardo>
actually we shouldn't
< naywhayare>
you don't think so?
< oldbeardo>
yes I don't, this might be a silly reason but I'll say it anyway
< naywhayare>
it's ok, go ahead :)
< oldbeardo>
while testing today I used the same Gaussians as in the Logistic Regression test
< oldbeardo>
they had base points as "1.0 1.0 1.0" and "9.0 9.0 9.0"
< oldbeardo>
using that dataset, I was getting an accuracy of 52%
< oldbeardo>
so I worked out the math, turns out Softmax does not give the Logistic cost function when num_classes=2
< oldbeardo>
I did this mentally so I may be incorrect
< naywhayare>
I thought that it worked out to be the same... hang on, let me look up that site that said it was the same
< oldbeardo>
the point I'm making is that Softmax has a bias towards features points with a higher norm
< naywhayare>
can you explain why that is? I'm trying to understand
< oldbeardo>
okay
< oldbeardo>
the probability for a class is exp(lin_j) / sum(exp(lin_i))
< oldbeardo>
if you take num_classes = 2 it becomes exp(lin_0) / (exp(lin_0) + exp(lin_1))
< oldbeardo>
which is 1 / (1 + exp(lin_1 - lin_0))
< naywhayare>
lin_j = \theta_j^T * x?
< naywhayare>
just to be sure we are on the right page
< oldbeardo>
yes
< naywhayare>
ok
< oldbeardo>
now this is not the same as sigmoid(lin_0)
< oldbeardo>
so, the learned weights are in favour of the class which has higher norm for its data points
< oldbeardo>
at least that's what I inferred from the printed probabilities
< naywhayare>
I need to spend some time thinking about why this is true
< naywhayare>
but I see you what you mean
< naywhayare>
I have to go in a few moments, so I wanted to clarify one more thing -- you asked about Mat_extra_bones.hpp and so forth
< naywhayare>
SpMat_extra_bones.hpp, specifically
< naywhayare>
in the Armadillo code, the class Mat is defined in Mat_bones.hpp
< naywhayare>
at the bottom of the file, it has this nice stanza:
< naywhayare>
but that's inside the definition of the Mat class
< naywhayare>
so to extend the functionality of the Mat class, you just have to define ARMA_EXTRA_MAT_PROTO and create that file, and include some things in it
< oldbeardo>
okay, nice naming though :)
< naywhayare>
it's the same for SpMat, Col, Row, and so forth
< naywhayare>
ah, you can thank Conrad for the bones/meat naming; that was his idea
< naywhayare>
anyway, the SpMat_extra_bones.hpp file just adds the batch constructors for SpMat if the version of Armadillo being used is older than 3.810.0
< naywhayare>
most of the stuff in arma_extend/ is backports so that mlpack can work with older versions of Armadillo while still utilizing new functionality
< oldbeardo>
okay, also I wanted to ask
< oldbeardo>
I was thinking of shifting to Ubuntu 14.04, will the latest armadillo work?
< naywhayare>
it should, and if it doesn't, we'll just fix the bugs :)
< naywhayare>
anything else before I go? I assume you're about to leave too
< oldbeardo>
no, that's it, yup I'm about to
< naywhayare>
ok, I'm gonna go then. see you later