naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Quit: leaving]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Quit: Lost terminal]
govg has joined #mlpack
oldbeardo has joined #mlpack
oldbeardo has quit [Ping timeout: 246 seconds]
oldbeardo has joined #mlpack
< oldbeardo>
naywhayare: ping me whenever you are free
< naywhayare>
oldbeardo: I am free now
< oldbeardo>
okay, so what should we do about SoftmaxRegression?
< naywhayare>
so, last we talked, it was about how softmax regression is equivalent to logistic regression in the two-class case
< naywhayare>
but you last said that you hadn't found a suitable explanation
< naywhayare>
is that still the case, or did you find something to clarify that?
< oldbeardo>
well, I couldn't find a way of explaining how the two models are working so differently
< naywhayare>
ah, okay, but you agree that they should be working the same?
< oldbeardo>
mathematically, yes, but I'm still not convinced that practically they always will, mainly because they are not
< naywhayare>
okay
< naywhayare>
let me take a look at the code for a moment
< oldbeardo>
sure, it would be better if you could have a look at the tests first, make sure that I'm checking the right things
< naywhayare>
okay, I will do that
< naywhayare>
I remember looking at the tests in the past; I think they are fine
< naywhayare>
but I'll look again
< oldbeardo>
okay, sure
< jenkins-mlpack>
Starting build #2113 for job mlpack - svn checkin test (previous build: STILL UNSTABLE -- last SUCCESS #2109 6 days 21 hr ago)
< naywhayare>
oldbeardo: so I ran SoftmaxRegressionTwoClasses
< naywhayare>
the accuracy is 100%...
< oldbeardo>
yes, that's because I have made the test so
< naywhayare>
oh, I see, the Gaussians are different
< naywhayare>
okay, let me change them a bit and see what happens
< naywhayare>
(when I say the Gaussians are different, I mean with respect to LogisticRegressionTest/LogisticRegressionSGDGaussianTest)
< oldbeardo>
yes, try something like 1.0 1.0 1.0 and 9.0 9.0 9.0
< naywhayare>
oldbeardo: so I took a look at the parameters of the model for softmax regression and logistic regression
< naywhayare>
and I noticed that there's not an intercept term
< naywhayare>
so, for those 3-dimensional gaussians, the logistic regression model has these parameters (or something close to these):
< naywhayare>
-7.8147 0.5665 0.6082 0.6016
< naywhayare>
where the big -7.8147 term is the intercept term
< naywhayare>
for softmax regression on the same data, I got this matrix:
< naywhayare>
-0.0325 0.0325
< oldbeardo>
right, that makes sense
< naywhayare>
-0.0625 0.0625
< naywhayare>
-0.0471 0.0471
< naywhayare>
so I thought, I can add the intercept to the input data for now by doing
< naywhayare>
that just inserts a row of ones, which ends up being equivalent to adding an intercept to the model
< naywhayare>
that gave me this model:
< naywhayare>
0.3610 -0.3610
< naywhayare>
-0.0544 0.0544
< naywhayare>
-0.0766 0.0766
< naywhayare>
-0.0648 0.0648
< naywhayare>
the accuracy is better (96.2%), but it's still wrong, so I'm trying to dig a little bit deeper into the Evaluate() and Gradient() functions
< naywhayare>
it's interesting that the theta values for each class cancel out...
< oldbeardo>
nice debugging man! I never thought of that
< naywhayare>
thanks :)
< oldbeardo>
naywhayare: I think I know what the problem is now
< oldbeardo>
the intercept term is treated specially in Logistic, but not in my implementation
< naywhayare>
yeah, but it should end up being the equivalent to running the model with no special intercept term and an added row of ones
< oldbeardo>
no, I mean the gradient calculation for the bias term is different that all the other weights
< oldbeardo>
*than
< oldbeardo>
that's not the case in my implementation
< naywhayare>
in logistic regression, gradient[0] = -arma::accu(responses - sigmoids)
< naywhayare>
if the first row of the data was entirely ones, then this would be equivalent to