#mlpack on 2017-07-13 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:48 kris1 has left #mlpack []

00:49 kris1 has joined #mlpack

00:49 kris1 has left #mlpack []

00:50 kris1 has joined #mlpack

00:50 kris1 has quit [Client Quit]

01:08 vivekp has quit [Ping timeout: 268 seconds]

01:08 vivekp has joined #mlpack

01:21 mikeling is now known as mikeling|brb

02:19 sumedhghaisas has quit [Ping timeout: 260 seconds]

02:26 mikeling|brb is now known as mikeling

03:15 sumedhghaisas has joined #mlpack

03:29 sumedhghaisas has quit [Ping timeout: 260 seconds]

04:05 micyril has joined #mlpack

05:17 vivekp has quit [Ping timeout: 255 seconds]

05:20 vivekp has joined #mlpack

05:22 partobs-mdp has joined #mlpack

06:27 andrzejk_ has joined #mlpack

06:32 micyril has quit [Quit: Page closed]

07:44 andrzejk_ has quit [Quit: Textual IRC Client: www.textualapp.com]

08:03 kris1 has joined #mlpack

08:42 kris1 has quit [Quit: kris1]

09:05 shikhar has joined #mlpack

09:09 kris1 has joined #mlpack

09:44 kris1 has quit [Quit: kris1]

10:04 govg_ has joined #mlpack

10:17 govg_ has quit [Quit: leaving]

10:24 kris1 has joined #mlpack

10:34 shikhar has quit [Read error: Connection reset by peer]

10:35 shikhar has joined #mlpack

10:52 < partobs-mdp> zoq: I've read your comment on task API pull request. I have found a bug in task instance generator: it was emitting numbers, starting from the *most* significant bit - the HAM paper reports results with representation, starting from the *least* significant bit.

10:52 vivekp has quit [Ping timeout: 240 seconds]

10:52 < partobs-mdp> zoq: So, in essence, this became Add+Reverse task. As HAM paper reports, vanilla LSTM _miserably_ failed ReverseTask

10:54 < partobs-mdp> zoq: If this matters, the paper uses considerably more data for training - 1000 batches * 50 samples each

10:58 < partobs-mdp> zoq: Oh, and another thing: they used x-entropy loss, not MSE loss.

11:07 < zoq> partobs-mdp: Besides starting from the *least* significant bit, I'm not sure they used a binary encoding for the special symbols (+,=) as you proposed.

11:07 < zoq> Also, I think using more data for training especially for longer sequences could improve the results, but that is easy to test.

11:07 < zoq> Perhaps a good idea to switch to cross entropy, yes.

11:08 < partobs-mdp> zoq: In process - I'm implementing a CrossEntropyError layer right now.

11:26 < partobs-mdp> zoq: The linking is still running. During that time, could you take a glance at the current implementations and tell if they look right for you?

11:26 < partobs-mdp> zoq: The error evaluation: return -arma::sum(target * arma::log(input) + (1. - target) * arma::log(1. - input));

11:27 < partobs-mdp> zoq: The gradient evaluation: output = (1. - target) / (1. - input) - target / input;

11:29 vivekp has joined #mlpack

11:41 < zoq> partobs-mdp: you can also write: (y - x) / (1.0 - x) * x for the gradient

11:51 < partobs-mdp> zoq: Keep getting this message while trying to build mlpack/models fork:

11:51 < partobs-mdp> zoq: error: ‘CrossEntropyError’ was not declared in this scope

11:51 < partobs-mdp> No idea what's happened - the declaration was more or less copy-paste from MeanSquareError

11:52 < partobs-mdp> Of course, I added new files to CMakeList.txt in layers/ directory

11:52 < zoq> Have you added the new layer in layer_types.hpp and layer.hpp?

11:53 < zoq> ah just layer_types.hpp

11:54 < partobs-mdp> zoq: Oops :) I didn't add there, so I'll try and report the results in a few minutes

11:54 < zoq> okay

12:41 < kris1> the reset function in FFN now only just intialises the the parameters of the network using the intialise rule. It does not reset the indivisual layers.

12:43 < zoq> kris1: The NetworkInitialization class that i used for the parameter initialization calls the Reset function: https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/init_rules/network_init.hpp#L94

12:45 < partobs-mdp> zoq: Implemented CrossEntropyLayer. Getting a performance slightly worse than with MeanSquaredError. The problem I noticed is that the objective sometimes "explodes", like this: 6.6 -> 6.0 -> 12.6 -> >500

12:46 < partobs-mdp> zoq: Clipping the gradient to range [-5, +5] makes the situation only worse (on 125 epochs) - it gets only 20% vs ~85% on maxLen = 10

12:46 < partobs-mdp> zoq: Right now testing on 500 epochs

12:49 < zoq> partobs-mdp: which task?

12:50 < partobs-mdp> zoq: CopyTask

12:50 < zoq> partobs-mdp: hm, maybe you can test another initialization method, like Gaussian

12:50 < partobs-mdp> zoq: Can you outline how to change the initialize method?

12:51 < partobs-mdp> zoq: The problem meanwhile is getting worse - even though the model gets really low objective values (~0.3), it then explodes to completely insane values (~2k, 4x the objective before optimization)

12:52 < zoq> partobs-mdp: instead of doing RNN<> you can use: RNN<NegativeLogLikelihood<>, GaussianInitialization>

12:55 < zoq> Another idea is to change the way you train, I made a comment on that in the models repo, right now you train a single sample x times in a row.

12:55 < zoq> I think an easy solution would be to provide an interface to train on arma::field or to train on arma::mat.

12:58 < partobs-mdp> zoq: But right now the code uses fixed-length arma::mat objects for training. I also use optimizer parameteres to set epoch count. Shouldn't it already train properly?

12:58 < partobs-mdp> zoq: error: ‘GaussianInitialization’ was not declared in this scope

12:59 < zoq> partobs-mdp: You have to include gaussian_init.hpp from init_rules.

12:59 < zoq> partobs-mdp: You already use fixed length arma::mat for training? let me check the code

13:01 < zoq> partobs-mdp: Ah I see, in this case nevermind, my bad.

13:03 < partobs-mdp> zoq: Managed to compile that, right now evaluating the new model

13:04 < zoq> okay, let's see if that improves the results

13:04 < partobs-mdp> zoq: so far nothing explodes - waiting for optimizer to get into <1 objective zone :)

13:06 < partobs-mdp> zoq: 25k iterations, the explosions are back :( Bumped from steady ~16 to >500

13:07 < zoq> you already use gradient clipping right?

13:08 < partobs-mdp> yes, I clip to [-5, +5]

13:08 < partobs-mdp> maybe I should clip to something smaller - or try to tune the LR

13:09 < zoq> hm, yeah I was think to switch Adam with MiniBatchSGD.

13:09 < partobs-mdp> [INFO ] RNN::RNN(): final objective of trained model is 0.490519.

13:09 < partobs-mdp> [INFO ] Final score: 0.304688

13:09 < partobs-mdp> Overfit?

13:10 < partobs-mdp> If so, what if I add dropout to my model?

13:11 < zoq> Worth a test, at the end of the network?

13:11 < partobs-mdp> zoq: Or even after each ReLU?

13:11 < partobs-mdp> *after each nonlinearity?

13:11 < zoq> Maybe you can also increase the sample size?

13:11 < rcurtin> I am watching quietly, partobs-mdp: do you mean that even with the [-5,+5] gradient clipping that the objective explodes then re-converges to 0.490519?

13:12 < zoq> hm, yeah we have to be careful with the dropout rate.

13:12 < partobs-mdp> rcurtin: yes, it explodes all the way up to 2l xent-loss, and then goes down to 0.5 (even 2-3 time)

13:12 < partobs-mdp> *times

13:16 < zoq> partobs-mdp: Maybe you can push the cross entropy layer? Maybe you missed something there?

13:16 < rcurtin> yeah, that is what I was thinking

13:17 < partobs-mdp> zoq: Sure, but I feel fair to warn you: I added the gradient clipping *there* (I know, it's horribly dirty, but we can fix that once we get something working)

13:17 < partobs-mdp> clipping to [-2, +2] seems hopeless

13:22 < partobs-mdp> Pushed. Added two dropout layers - it kind of optimizes the objective, but it's doing it *real* slow.

13:22 < partobs-mdp> Maybe learning rate = 1e-3 is too small here?

13:23 < partobs-mdp> By the way, how do I set dropout rate from model.Add()?

13:29 < rcurtin> partobs-mdp: if I am understanding this right, there have to be some restrictions on the input and target for the cross entropy layer---the input/target values must be in [0, 1], right?

13:33 < rcurtin> my best guess is that if this is happening in a later layer of the network, then 'input' is getting closer and closer to either 0 or 1 and this causes instability in the Backward() calculation

13:34 < rcurtin> that would also cause instability in Forward() I guess, because log(0) = -Inf

13:37 < zoq> yeah, maybe you could use: trunc_log

13:39 < zoq> I guess target / input is also not safe, since input could be zero? so target / (input + eps)?

13:39 < rcurtin> I would have thought that clipping to small values would solve this type of problem though---I think that adding epsilon to the denominator effectively does the same

13:39 < rcurtin> or, perhaps, is the clipping just causing the objective to explode more slowly? (in which case it would be working correctly, I guess)

13:46 kris1 has quit [Quit: kris1]

13:51 < partobs-mdp> zoq: rcurtin: Added 1e-2 to denominator in the gradient computation, used trunc_log, still getting explosions

13:56 < rcurtin> hmm, can you try intercepting when these explosions happen and inspecting what exactly the cause of the explosion is?

13:57 < rcurtin> I guess you could do this with a debugger like gdb and catch when the objective gets very large, or when the gradient gets very large, or something like this

13:57 < rcurtin> I think the first thing to do is track down exactly what is causing the gradient to explode (unfortunately that could be very time-consuming); maybe zoq has a better idea?

14:16 kris1 has joined #mlpack

14:59 < zoq> I agree, it's time-consuming but at the end, we get some insights that could be helpful

15:05 kris1_ has joined #mlpack

15:07 kris1 has quit [Ping timeout: 260 seconds]

15:07 kris1_ is now known as kris1

15:36 < shikhar> rcurtin: A quick question, regarding templatizing the gradient parameter for parallel SGD. I found that logistic regression, regularized SVD, NCA, RNN and FFN implement the DecomposableFunctionType interface on the Gradient function.

15:36 < shikhar> Should I go about changing them all? I don't see any issues otherwise.

15:42 kris1 has quit [Ping timeout: 260 seconds]

15:47 kris1 has joined #mlpack

15:57 partobs-mdp has quit [Remote host closed the connection]

16:05 kris1 has quit [Ping timeout: 260 seconds]

16:23 mikeling has quit [Quit: Connection closed for inactivity]

16:50 shikhar has quit [Quit: WeeChat 1.7]

16:55 kris1 has joined #mlpack

17:33 < kris1> FFN2 — > FFN1 i want to pass the delta and Gradient from the FFN1 to FFN2. So i would have to basically ffn1.outlayertype.Delta = ffn1.network().front.Delta() but i am confused how would i pass the gradient values?

17:38 < zoq> kris1: The same way should work for the Gradient.

17:48 sumedhghaisas has joined #mlpack

18:02 < kris1> I have a hard time understanding this. So if set the gradients of the outputlayer of ffn2 to the gradiendt from ffn1 front layer and then call the ffn2.Gradient() function will this equal backpropogation of whole combined network ffn2 -> ffn1

18:08 < kris1> this is a very rought implmentation fo the idea of the GAN i am talking about have a look if possible https://gist.github.com/kris-singh/fb455fa809634bc8f1afc2872407352d

18:23 hello_ has joined #mlpack

18:23 hello_ has quit [Client Quit]

18:46 < zoq> kris1: So if I understand you right you like to share the delta and gradient parameter?

18:48 < kris1> Well yes kinda. I want to pass the gradients and delta from ffn1 to ffn2. So this is not exaclty sharing. I want to backpropogate through both the FFN1 and FFN2 combined

18:52 < zoq> But, if you just wanted to backpropagation through FFN1 and afterwards through FFN2 using the error from FFN1 (delta) why do you need the gradients? Maybe I missed something?

18:58 < kris1> Hmmm yes sorry. I just need to pass the errors and that would be multiplied by the local gradients.

19:01 < kris1> Could you have a look at the gist https://gist.github.com/kris-singh/fb455fa809634bc8f1afc2872407352d. Just wanted to get your thoughts on it.

19:04 < zoq> So, yes something like generator.outputlayer.Delta() = discriminator.network.front().Delta(); should work, in the Gradient function of generator.Gradient(); we have to make sure that the delta is used, but besids that it looks good.

19:06 < zoq> ah you call the Gradient function from the FFN class right?

19:07 < zoq> In this case, it should work right away.

19:14 < kris1> Yes the Gradient function from the FFN class.

19:15 < kris1> Okay thanks i will test it out then on a simple task.

19:17 < zoq> Okay, let us know if you run into any problems.

19:30 andrzejk_ has joined #mlpack

20:11 shikhar has joined #mlpack

21:52 andrzejk_ has quit [Quit: Textual IRC Client: www.textualapp.com]

22:04 shikhar has quit [Quit: WeeChat 1.7]