naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
govg has quit [Quit: leaving]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
govg has quit [Changing host]
govg has joined #mlpack
govg has quit [Quit: Lost terminal]
govg has joined #mlpack
oldbeardo has joined #mlpack
oldbeardo has quit [Ping timeout: 246 seconds]
oldbeardo has joined #mlpack
< oldbeardo> naywhayare: ping me whenever you are free
< naywhayare> oldbeardo: I am free now
< oldbeardo> okay, so what should we do about SoftmaxRegression?
< naywhayare> so, last we talked, it was about how softmax regression is equivalent to logistic regression in the two-class case
< naywhayare> but you last said that you hadn't found a suitable explanation
< naywhayare> is that still the case, or did you find something to clarify that?
< oldbeardo> well, I couldn't find a way of explaining how the two models are working so differently
< naywhayare> ah, okay, but you agree that they should be working the same?
< oldbeardo> mathematically, yes, but I'm still not convinced that practically they always will, mainly because they are not
< naywhayare> okay
< naywhayare> let me take a look at the code for a moment
< oldbeardo> sure, it would be better if you could have a look at the tests first, make sure that I'm checking the right things
< naywhayare> okay, I will do that
< naywhayare> I remember looking at the tests in the past; I think they are fine
< naywhayare> but I'll look again
< oldbeardo> okay, sure
< jenkins-mlpack> Starting build #2113 for job mlpack - svn checkin test (previous build: STILL UNSTABLE -- last SUCCESS #2109 6 days 21 hr ago)
< naywhayare> oldbeardo: so I ran SoftmaxRegressionTwoClasses
< naywhayare> the accuracy is 100%...
< oldbeardo> yes, that's because I have made the test so
< naywhayare> oh, I see, the Gaussians are different
< naywhayare> okay, let me change them a bit and see what happens
< naywhayare> (when I say the Gaussians are different, I mean with respect to LogisticRegressionTest/LogisticRegressionSGDGaussianTest)
< oldbeardo> yes, try something like 1.0 1.0 1.0 and 9.0 9.0 9.0
< naywhayare> oldbeardo: so I took a look at the parameters of the model for softmax regression and logistic regression
< naywhayare> and I noticed that there's not an intercept term
< naywhayare> so, for those 3-dimensional gaussians, the logistic regression model has these parameters (or something close to these):
< naywhayare> -7.8147 0.5665 0.6082 0.6016
< naywhayare> where the big -7.8147 term is the intercept term
< naywhayare> for softmax regression on the same data, I got this matrix:
< naywhayare> -0.0325 0.0325
< oldbeardo> right, that makes sense
< naywhayare> -0.0625 0.0625
< naywhayare> -0.0471 0.0471
< naywhayare> so I thought, I can add the intercept to the input data for now by doing
< naywhayare> data.insert_rows(0, arma::ones<arma::rowvec>(points));
< naywhayare> that just inserts a row of ones, which ends up being equivalent to adding an intercept to the model
< naywhayare> that gave me this model:
< naywhayare> 0.3610 -0.3610
< naywhayare> -0.0544 0.0544
< naywhayare> -0.0766 0.0766
< naywhayare> -0.0648 0.0648
< naywhayare> the accuracy is better (96.2%), but it's still wrong, so I'm trying to dig a little bit deeper into the Evaluate() and Gradient() functions
< naywhayare> it's interesting that the theta values for each class cancel out...
< oldbeardo> nice debugging man! I never thought of that
< naywhayare> thanks :)
< oldbeardo> naywhayare: I think I know what the problem is now
< oldbeardo> the intercept term is treated specially in Logistic, but not in my implementation
< naywhayare> yeah, but it should end up being the equivalent to running the model with no special intercept term and an added row of ones
< oldbeardo> no, I mean the gradient calculation for the bias term is different that all the other weights
< oldbeardo> *than
< oldbeardo> that's not the case in my implementation
< naywhayare> in logistic regression, gradient[0] = -arma::accu(responses - sigmoids)
< naywhayare> if the first row of the data was entirely ones, then this would be equivalent to
< naywhayare> -((responses - sigmoids) * data.row(0))
< naywhayare> so I think that should work out to be the same thing
< oldbeardo> there's no regularization for the bias term
< naywhayare> oh!
< naywhayare> you're right, and there shouldn't be
< oldbeardo> wow, finally a fruitful discussion about this! :)
< naywhayare> I'm removing the regularization from the bias term to see what it does...
< naywhayare> okay, now I get 100% accuracy and these parameters:
< naywhayare> 6.9251e+01 -2.1942e-09
< naywhayare> -4.8352e+00 -5.5676e-10
< naywhayare> -4.6970e+00 6.3588e-09
< naywhayare> -4.7268e+00 3.0040e-09
< oldbeardo> nice!
< naywhayare> so if I subtract the second column's term (which is basically zero), it looks basically like the logistic regression case
< naywhayare> here are the changes I made:
< naywhayare> line 113:
< naywhayare> weightDecay = 0.5 * lambda * (arma::accu(parameters % parameters) - arma::accu(parameters.row(0) % parameters.row(0)));
< naywhayare> line 144: (right after gradient calculation)
< naywhayare> gradient.row(0) -= lambda * parameters.row(0);
< naywhayare> those are hacky and they assume that the data has first row all equal to one
< naywhayare> but you could adapt that into a working solution for the intercept
< oldbeardo> great, it's good to finally have an explanation
< naywhayare> yeah, I am glad we have figured it out :)
< naywhayare> until you pointed out that the gradient calculation was different, I was running out of tricks
< naywhayare> I was doing things like checking for invalid memory accesses, etc.
< oldbeardo> I got the idea right after you said it was lacking an intercept term
< naywhayare> :)
< jenkins-mlpack> Project mlpack - svn checkin test build #2113: STILL UNSTABLE in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2113/
< jenkins-mlpack> Ryan Curtin: Update HISTORY.txt.
oldbeardo has quit [Quit: Page closed]
govg has quit [Quit: leaving]
govg has joined #mlpack