#mlpack on 2014-09-02 — irc logs at libera.irclog.whitequark.org

2014-05-21 16:24 naywhayare changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

05:48 govg has quit [Quit: leaving]

05:48 govg has joined #mlpack

05:48 govg has quit [Changing host]

05:48 govg has joined #mlpack

05:49 govg has quit [Client Quit]

05:49 govg has joined #mlpack

05:49 govg has quit [Changing host]

05:49 govg has joined #mlpack

05:55 govg has quit [Quit: Lost terminal]

14:20 govg has joined #mlpack

14:43 oldbeardo has joined #mlpack

15:02 oldbeardo has quit [Ping timeout: 246 seconds]

15:22 oldbeardo has joined #mlpack

15:28 < oldbeardo> naywhayare: ping me whenever you are free

15:29 < naywhayare> oldbeardo: I am free now

15:30 < oldbeardo> okay, so what should we do about SoftmaxRegression?

15:30 < naywhayare> so, last we talked, it was about how softmax regression is equivalent to logistic regression in the two-class case

15:30 < naywhayare> but you last said that you hadn't found a suitable explanation

15:31 < naywhayare> is that still the case, or did you find something to clarify that?

15:31 < oldbeardo> well, I couldn't find a way of explaining how the two models are working so differently

15:31 < naywhayare> ah, okay, but you agree that they should be working the same?

15:32 < oldbeardo> mathematically, yes, but I'm still not convinced that practically they always will, mainly because they are not

15:32 < naywhayare> okay

15:33 < naywhayare> let me take a look at the code for a moment

15:33 < oldbeardo> sure, it would be better if you could have a look at the tests first, make sure that I'm checking the right things

15:33 < naywhayare> okay, I will do that

15:33 < naywhayare> I remember looking at the tests in the past; I think they are fine

15:33 < naywhayare> but I'll look again

15:34 < oldbeardo> okay, sure

15:38 < jenkins-mlpack> Starting build #2113 for job mlpack - svn checkin test (previous build: STILL UNSTABLE -- last SUCCESS #2109 6 days 21 hr ago)

15:48 < naywhayare> oldbeardo: so I ran SoftmaxRegressionTwoClasses

15:48 < naywhayare> the accuracy is 100%...

15:49 < oldbeardo> yes, that's because I have made the test so

15:50 < naywhayare> oh, I see, the Gaussians are different

15:50 < naywhayare> okay, let me change them a bit and see what happens

15:50 < naywhayare> (when I say the Gaussians are different, I mean with respect to LogisticRegressionTest/LogisticRegressionSGDGaussianTest)

15:51 < oldbeardo> yes, try something like 1.0 1.0 1.0 and 9.0 9.0 9.0

16:03 < naywhayare> oldbeardo: so I took a look at the parameters of the model for softmax regression and logistic regression

16:03 < naywhayare> and I noticed that there's not an intercept term

16:03 < naywhayare> so, for those 3-dimensional gaussians, the logistic regression model has these parameters (or something close to these):

16:04 < naywhayare> -7.8147 0.5665 0.6082 0.6016

16:04 < naywhayare> where the big -7.8147 term is the intercept term

16:04 < naywhayare> for softmax regression on the same data, I got this matrix:

16:04 < naywhayare> -0.0325 0.0325

16:04 < oldbeardo> right, that makes sense

16:04 < naywhayare> -0.0625 0.0625

16:04 < naywhayare> -0.0471 0.0471

16:04 < naywhayare> so I thought, I can add the intercept to the input data for now by doing

16:05 < naywhayare> data.insert_rows(0, arma::ones<arma::rowvec>(points));

16:05 < naywhayare> that just inserts a row of ones, which ends up being equivalent to adding an intercept to the model

16:05 < naywhayare> that gave me this model:

16:05 < naywhayare> 0.3610 -0.3610

16:05 < naywhayare> -0.0544 0.0544

16:05 < naywhayare> -0.0766 0.0766

16:05 < naywhayare> -0.0648 0.0648

16:06 < naywhayare> the accuracy is better (96.2%), but it's still wrong, so I'm trying to dig a little bit deeper into the Evaluate() and Gradient() functions

16:06 < naywhayare> it's interesting that the theta values for each class cancel out...

16:06 < oldbeardo> nice debugging man! I never thought of that

16:07 < naywhayare> thanks :)

16:10 < oldbeardo> naywhayare: I think I know what the problem is now

16:10 < oldbeardo> the intercept term is treated specially in Logistic, but not in my implementation

16:12 < naywhayare> yeah, but it should end up being the equivalent to running the model with no special intercept term and an added row of ones

16:15 < oldbeardo> no, I mean the gradient calculation for the bias term is different that all the other weights

16:15 < oldbeardo> *than

16:15 < oldbeardo> that's not the case in my implementation

16:19 < naywhayare> in logistic regression, gradient[0] = -arma::accu(responses - sigmoids)

16:19 < naywhayare> if the first row of the data was entirely ones, then this would be equivalent to

16:19 < naywhayare> -((responses - sigmoids) * data.row(0))

16:19 < naywhayare> so I think that should work out to be the same thing

16:23 < oldbeardo> there's no regularization for the bias term

16:24 < naywhayare> oh!

16:24 < naywhayare> you're right, and there shouldn't be

16:25 < oldbeardo> wow, finally a fruitful discussion about this! :)

16:25 < naywhayare> I'm removing the regularization from the bias term to see what it does...

16:28 < naywhayare> okay, now I get 100% accuracy and these parameters:

16:28 < naywhayare> 6.9251e+01 -2.1942e-09

16:28 < naywhayare> -4.8352e+00 -5.5676e-10

16:28 < naywhayare> -4.6970e+00 6.3588e-09

16:29 < naywhayare> -4.7268e+00 3.0040e-09

16:29 < oldbeardo> nice!

16:29 < naywhayare> so if I subtract the second column's term (which is basically zero), it looks basically like the logistic regression case

16:29 < naywhayare> here are the changes I made:

16:29 < naywhayare> line 113:

16:29 < naywhayare> weightDecay = 0.5 * lambda * (arma::accu(parameters % parameters) - arma::accu(parameters.row(0) % parameters.row(0)));

16:29 < naywhayare> line 144: (right after gradient calculation)

16:30 < naywhayare> gradient.row(0) -= lambda * parameters.row(0);

16:30 < naywhayare> those are hacky and they assume that the data has first row all equal to one

16:30 < naywhayare> but you could adapt that into a working solution for the intercept

16:32 < oldbeardo> great, it's good to finally have an explanation

16:32 < naywhayare> yeah, I am glad we have figured it out :)

16:33 < naywhayare> until you pointed out that the gradient calculation was different, I was running out of tricks

16:33 < naywhayare> I was doing things like checking for invalid memory accesses, etc.

16:34 < oldbeardo> I got the idea right after you said it was lacking an intercept term

16:35 < naywhayare> :)

17:09 < jenkins-mlpack> Project mlpack - svn checkin test build #2113: STILL UNSTABLE in 1 hr 30 min: http://big.cc.gt.atl.ga.us:8080/job/mlpack%20-%20svn%20checkin%20test/2113/

17:09 < jenkins-mlpack> Ryan Curtin: Update HISTORY.txt.

17:13 oldbeardo has quit [Quit: Page closed]

18:31 govg has quit [Quit: leaving]

18:54 govg has joined #mlpack