#mlpack on 2018-09-19 — irc logs at libera.irclog.whitequark.org

2018-08-06 04:28 ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.

00:14 beaver1 has joined #mlpack

00:14 beaver1 has quit [Remote host closed the connection]

01:26 unreal_ has joined #mlpack

01:26 unreal_ is now known as Guest40437

01:30 Guest40437 has quit [Remote host closed the connection]

02:37 sheep28 has joined #mlpack

02:39 sheep28 has quit [Killed (TheMaster (Spam is not permitted on freenode.))]

04:36 vivekp has quit [Ping timeout: 272 seconds]

04:53 witness has joined #mlpack

05:20 philips29 has joined #mlpack

05:23 philips29 has quit [Remote host closed the connection]

06:00 vivekp has joined #mlpack

08:58 MLJens has joined #mlpack

08:59 < MLJens> Hello. Relating to my request of yesterday, yes - I checked the dimension of the labels and the trainings sets.

08:59 < MLJens> For MNist, I have 28x28 = 784 input neurons.

08:59 < MLJens> The digits are from 0 to 9 - so I got 10 output neurons.

08:59 < MLJens> for a faster computation, i limited the number of trainingsexamples to 991

09:00 < MLJens> My trainingsset has 784 rows and 991 columns

09:01 < MLJens> My labels have one row and 991 columns

09:03 < MLJens> So i stepped in the function network.train raising the exception.

09:03 < MLJens> The error occures in the function 'double Optimize(DecomposableFunctionType& function, arma::mat& iterate)'

09:22 < zoq> MLJens: How does the target vector for digit 0 look like?

09:27 < MLJens> You mean in the trainset? {0 0 0 0 ... 1.0 1.0 ... 0 0 0 0}^T So, white pixels are zero, black pixels are 1.0. I did some preparation, that only zeros and ones are part of the trainingsets.

09:36 < MLJens> It's interesting, that after the exception is thrown, the trainset has 104802528 rows, and 34772056277008 columns

09:40 < zoq> Actually I was talking about the target/labels vector

09:41 < zoq> I guess it looks like: [0, .... 0, 1, .... 1, 2, ... 2, ...]?

09:43 < MLJens> Correct, it has one row and 991 (number of training examples) columns.

09:44 < MLJens> In my post on stack overflow, there was the old code shown with a vector for the labels but i still changed that to a matrix with one row and nExample columns.

09:47 < zoq> instead of starting with 0 can you start with 1 as label so 0 becomes 1 and 1 becomes 2 and so on

09:50 < MLJens> Here you can finde two pictures, showing the structure of my train- and lableset: https://stackoverflow.com/questions/52382543/performing-mnist-example-with-mlpack

09:51 < zoq> looks good

10:09 < zoq> can you print the labels vector and post the result or part of the result

11:13 jhei21 has joined #mlpack

11:14 jhei21 has quit [Remote host closed the connection]

11:53 witness has quit [Quit: Connection closed for inactivity]

11:58 n000g24 has joined #mlpack

11:58 n000g24 has quit [Remote host closed the connection]

12:07 akhandait_ has quit [Quit: Connection closed for inactivity]

12:32 MLJens has quit [Ping timeout: 252 seconds]

12:47 MLJens has joined #mlpack

13:03 < MLJens> Hi, here is the labels.raw_print() result:

13:03 < MLJens> Label matrix: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2

13:03 < MLJens> Oh, that is not completted.

13:03 < MLJens> It looks like the vector, I postet.

13:04 < MLJens> You got 99 x 0 followed by 99 x 1 followed by 99 x 2 and so on.

14:27 < MLJens> Ok, I added trainset.raw_print() and labels.raw_print() and forwarded the output to a textfile.

14:27 < MLJens> This textfile is now on my google drive: https://drive.google.com/open?id=1X2NLt5hAy4XVNDq5nnl_P_-GYdiiqsI9

14:54 < zoq> MLJens: Can you increase the labels by one?

14:54 < zoq> the value of the labels

14:59 < MLJens> Thank you so much - that fixed this problem.

14:59 < MLJens> Ok, my code is still training - I don't know, if everything works now - but the current error disappeared.

15:10 < MLJens> One more question. As the training is taking quite long time - is there a possibility to plot the convergence of the solver for each iteration?

15:10 < zoq> great, perhaps it's a good idea to decrease the number of training iterations

15:11 < zoq> did you build with -DDEBUG=OFF?

15:12 < zoq> the optimizer should print the loss to the console

15:14 < MLJens> Thanks, I will check that. I have just 50 training examples, so I don't want to decrease it any more. In the end, there are 60000 examples :-)

15:14 < MLJens> Greetings from Berlin Ostbahnhof - I saw, you're located in Berlin too.

15:14 < zoq> Danke :)

15:16 < zoq> Next time we can meet up to solve the issue :)

15:17 < MLJens> Would be better, but in the end you helped me a lot.

15:18 < MLJens> Now, I'm going to let him train as I have to leaf to visit a concert in Lido. If he has not finished tomorrow, I'll wirte again here. If he has... I'm sure, some more questions will come up as I want to get deeper in this library.

15:20 < zoq> Sounds good have fun and see you tomorrow

15:22 < MLJens> Thank you, see you...

15:22 MLJens has quit [Quit: Page closed]

17:03 vivekp has quit [Read error: Connection reset by peer]

17:06 vivekp has joined #mlpack

17:31 caiojcarvalho has joined #mlpack

17:31 caiojcarvalho has quit [Client Quit]

17:31 caiojcarvalho has joined #mlpack

17:31 caiojcarvalho has quit [Client Quit]

17:32 caiojcarvalho has joined #mlpack

17:35 caiojcarvalho has quit [Client Quit]

17:36 cjlcarvalho has joined #mlpack

18:25 wenhao has joined #mlpack

18:27 ImQ009 has joined #mlpack

19:27 ImQ009 has quit [Quit: Leaving]

19:44 sophiebits has joined #mlpack

19:49 sophiebits has quit [Remote host closed the connection]

19:57 < rcurtin> zoq: I am looking at the LayerNorm code and trying to understand it... I am thinking that maybe the line 'output.each_row() %= gamma' should be 'output.each_col() %= gamma', where gamma has length equal to input.n_rows (i.e. the dimension of the input)

19:57 < rcurtin> from what I can tell of the paper, I think that we are learning one gamma value and one beta value for each dimension of the input

19:58 < rcurtin> let me know what you think... I am not a huge expert on layer normalization, this is only based on a quick review of the paper and code :)

20:39 wenhao has quit [Ping timeout: 252 seconds]

21:31 mdoep26 has joined #mlpack

21:31 harding19 has joined #mlpack

21:33 mdoep26 has quit [Remote host closed the connection]

21:34 miaows14 has joined #mlpack

21:35 harding19 has quit [Remote host closed the connection]

21:36 miaows14 has quit [Remote host closed the connection]

21:54 koenig14 has joined #mlpack

21:57 koenig14 has quit [Killed (Sigyn (Spam is off topic on freenode.))]

22:12 < zoq> rcurtin: Really like the figure Shikhar mentioned: http://mlexplained.com/2018/01/13/weight-normalization-and-layer-normalization-explained-normalization-in-deep-learning-part-2/

23:46 < rcurtin> yep, that is a nice explanation. I agree with that, and that's the operation we're doing with the lines

23:46 < rcurtin> output = input.each_row() - mean;

23:46 < rcurtin> output.each_row() /= arma::sqrt(variance + eps);

23:48 < rcurtin> but then I don't follow the need for the 'gamma' or 'beta' parameters (or what their shape should be)

23:52 < rcurtin> I think that 'gamma' and 'beta' correspond to g and b in Eq.5 of the paper, with the sentence:

23:52 < rcurtin> "They also learn an adaptive bias b and gain g for each neuron after the normalization."

23:52 < rcurtin> to me that seems to input that there's one bias and gain parameter for each neuron, and the number of neurons (I think) would be equal to the number of features (input.n_rows)

23:52 < rcurtin> I could be wrong---like I said I am not an expert, so correct me if I am wrong :)

23:53 < rcurtin> Shikhar's link doesn't seem to mention the bias/gain parameter. In fact I wonder if you get effectively the same thing if you remove bias/gain from LayerNorm, and then create a network with a LayerNorm layer followed immediately by a Linear layer

23:53 < rcurtin> (this would mean there are no parameters to be learned in the LayerNorm layer, I suppose)

23:54 < rcurtin> forgive me if what I wrote makes no sense please :)