verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< rcurtin>
rsv: is the gradient undefined when all dimensions in x are 0, or when any dimension in x is 0?
< rsv>
i think any dimension in x (this is components in the parameters vector right?)
< rcurtin>
yeah
< rcurtin>
okay
< rcurtin>
I was gonna say, if it only happens when all dimensions in x are zero, that probably happens so little that you don't need to worry about it
< rcurtin>
I don't have a great answer; I've never considered this particular problem
< rcurtin>
a quick search reveals that one technique people use is subgradient descent, but the abstractions mlpack has in place won't work for that
< rcurtin>
the algorithm you proposed in the slides could definitely work, but you'd have to do a decent amount of refactoring to make that work (and you'd probably have to throw away the Evaluate()/Gradient() functions in LogisticRegressionFunction and just implement the algorithm by hand)
< rsv>
what do you mean by refactoring?
< rsv>
i didn't realize that L1 regularization would be so much more complicated than L2 (which is already implemented in mlpack)
< rcurtin>
yeah, so the issue is that L2 regularized regression is differentiable, so you can use standard optimizers like SGD and L-BFGS
< rcurtin>
but as you've pointed out this is not the case for L1-regularized regression (I hadn't realized this until you pointed it out)
< rcurtin>
this means that you can't use mlpack's SGD or L-BFGS implementations, and instead you'll have to use something like the algorithm suggested in the slides
< rcurtin>
(or, you could just declare that the derivative at x = 0 is 0... but I don't know how that will affect the algorithm)
< rsv>
right, okay
< rsv>
i'll have to think about how much this matters for the x=0 case...