ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.
beaver1 has joined #mlpack
beaver1 has quit [Remote host closed the connection]
unreal_ has joined #mlpack
unreal_ is now known as Guest40437
Guest40437 has quit [Remote host closed the connection]
sheep28 has joined #mlpack
sheep28 has quit [Killed (TheMaster (Spam is not permitted on freenode.))]
vivekp has quit [Ping timeout: 272 seconds]
witness has joined #mlpack
philips29 has joined #mlpack
philips29 has quit [Remote host closed the connection]
vivekp has joined #mlpack
MLJens has joined #mlpack
< MLJens> Hello. Relating to my request of yesterday, yes - I checked the dimension of the labels and the trainings sets.
< MLJens> For MNist, I have 28x28 = 784 input neurons.
< MLJens> The digits are from 0 to 9 - so I got 10 output neurons.
< MLJens> for a faster computation, i limited the number of trainingsexamples to 991
< MLJens> My trainingsset has 784 rows and 991 columns
< MLJens> My labels have one row and 991 columns
< MLJens> So i stepped in the function network.train raising the exception.
< MLJens> The error occures in the function 'double Optimize(DecomposableFunctionType& function, arma::mat& iterate)'
< zoq> MLJens: How does the target vector for digit 0 look like?
< MLJens> You mean in the trainset? {0 0 0 0 ... 1.0 1.0 ... 0 0 0 0}^T So, white pixels are zero, black pixels are 1.0. I did some preparation, that only zeros and ones are part of the trainingsets.
< MLJens> It's interesting, that after the exception is thrown, the trainset has 104802528 rows, and 34772056277008 columns
< zoq> Actually I was talking about the target/labels vector
< zoq> I guess it looks like: [0, .... 0, 1, .... 1, 2, ... 2, ...]?
< MLJens> Correct, it has one row and 991 (number of training examples) columns.
< MLJens> In my post on stack overflow, there was the old code shown with a vector for the labels but i still changed that to a matrix with one row and nExample columns.
< zoq> instead of starting with 0 can you start with 1 as label so 0 becomes 1 and 1 becomes 2 and so on
< MLJens> Here you can finde two pictures, showing the structure of my train- and lableset: https://stackoverflow.com/questions/52382543/performing-mnist-example-with-mlpack
< zoq> looks good
< zoq> can you print the labels vector and post the result or part of the result
jhei21 has joined #mlpack
jhei21 has quit [Remote host closed the connection]
witness has quit [Quit: Connection closed for inactivity]
n000g24 has joined #mlpack
n000g24 has quit [Remote host closed the connection]
akhandait_ has quit [Quit: Connection closed for inactivity]
MLJens has quit [Ping timeout: 252 seconds]
MLJens has joined #mlpack
< MLJens> Hi, here is the labels.raw_print() result:
< MLJens> Label matrix: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2
< MLJens> Oh, that is not completted.
< MLJens> It looks like the vector, I postet.
< MLJens> You got 99 x 0 followed by 99 x 1 followed by 99 x 2 and so on.
< MLJens> Ok, I added trainset.raw_print() and labels.raw_print() and forwarded the output to a textfile.
< MLJens> This textfile is now on my google drive: https://drive.google.com/open?id=1X2NLt5hAy4XVNDq5nnl_P_-GYdiiqsI9
< zoq> MLJens: Can you increase the labels by one?
< zoq> the value of the labels
< MLJens> Thank you so much - that fixed this problem.
< MLJens> Ok, my code is still training - I don't know, if everything works now - but the current error disappeared.
< MLJens> One more question. As the training is taking quite long time - is there a possibility to plot the convergence of the solver for each iteration?
< zoq> great, perhaps it's a good idea to decrease the number of training iterations
< zoq> did you build with -DDEBUG=OFF?
< zoq> the optimizer should print the loss to the console
< MLJens> Thanks, I will check that. I have just 50 training examples, so I don't want to decrease it any more. In the end, there are 60000 examples :-)
< MLJens> Greetings from Berlin Ostbahnhof - I saw, you're located in Berlin too.
< zoq> Danke :)
< zoq> Next time we can meet up to solve the issue :)
< MLJens> Would be better, but in the end you helped me a lot.
< MLJens> Now, I'm going to let him train as I have to leaf to visit a concert in Lido. If he has not finished tomorrow, I'll wirte again here. If he has... I'm sure, some more questions will come up as I want to get deeper in this library.
< zoq> Sounds good have fun and see you tomorrow
< MLJens> Thank you, see you...
MLJens has quit [Quit: Page closed]
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
caiojcarvalho has joined #mlpack
caiojcarvalho has quit [Client Quit]
caiojcarvalho has joined #mlpack
caiojcarvalho has quit [Client Quit]
caiojcarvalho has joined #mlpack
caiojcarvalho has quit [Client Quit]
cjlcarvalho has joined #mlpack
wenhao has joined #mlpack
ImQ009 has joined #mlpack
ImQ009 has quit [Quit: Leaving]
sophiebits has joined #mlpack
sophiebits has quit [Remote host closed the connection]
< rcurtin> zoq: I am looking at the LayerNorm code and trying to understand it... I am thinking that maybe the line 'output.each_row() %= gamma' should be 'output.each_col() %= gamma', where gamma has length equal to input.n_rows (i.e. the dimension of the input)
< rcurtin> from what I can tell of the paper, I think that we are learning one gamma value and one beta value for each dimension of the input
< rcurtin> let me know what you think... I am not a huge expert on layer normalization, this is only based on a quick review of the paper and code :)
wenhao has quit [Ping timeout: 252 seconds]
mdoep26 has joined #mlpack
harding19 has joined #mlpack
mdoep26 has quit [Remote host closed the connection]
miaows14 has joined #mlpack
harding19 has quit [Remote host closed the connection]
miaows14 has quit [Remote host closed the connection]
koenig14 has joined #mlpack
koenig14 has quit [Killed (Sigyn (Spam is off topic on freenode.))]
< rcurtin> yep, that is a nice explanation. I agree with that, and that's the operation we're doing with the lines
< rcurtin> output = input.each_row() - mean;
< rcurtin> output.each_row() /= arma::sqrt(variance + eps);
< rcurtin> but then I don't follow the need for the 'gamma' or 'beta' parameters (or what their shape should be)
< rcurtin> I think that 'gamma' and 'beta' correspond to g and b in Eq.5 of the paper, with the sentence:
< rcurtin> "They also learn an adaptive bias b and gain g for each neuron after the normalization."
< rcurtin> to me that seems to input that there's one bias and gain parameter for each neuron, and the number of neurons (I think) would be equal to the number of features (input.n_rows)
< rcurtin> I could be wrong---like I said I am not an expert, so correct me if I am wrong :)
< rcurtin> Shikhar's link doesn't seem to mention the bias/gain parameter. In fact I wonder if you get effectively the same thing if you remove bias/gain from LayerNorm, and then create a network with a LayerNorm layer followed immediately by a Linear layer
< rcurtin> (this would mean there are no parameters to be learned in the LayerNorm layer, I suppose)
< rcurtin> forgive me if what I wrote makes no sense please :)