#mlpack on 2018-09-20 — irc logs at libera.irclog.whitequark.org

2018-08-06 04:28 ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.

00:12 < rcurtin> also, I believe (but am not 100% sure) that I will at least be in California during the mentor summit

00:13 < rcurtin> so I will be sure to at least drop by the area :)

00:24 wiking has quit [Quit: ZNC 1.7.1 - https://znc.in]

00:25 wiking has joined #mlpack

00:41 noresult0 has joined #mlpack

00:41 noresult0 has quit [Remote host closed the connection]

00:45 Davey20 has joined #mlpack

00:46 Davey20 has quit [Remote host closed the connection]

01:17 ricklly has quit [Ping timeout: 252 seconds]

01:31 Inspire-26 has joined #mlpack

01:34 Inspire-26 has quit [Remote host closed the connection]

02:22 ddenison20 has joined #mlpack

02:27 ddenison20 has quit [Remote host closed the connection]

07:00 tgerczei8 has joined #mlpack

07:04 tgerczei8 has quit [Remote host closed the connection]

08:13 eelvex has joined #mlpack

08:17 eelvex has quit [K-Lined]

08:24 < ShikharJ> rcurtin: As far as my understanding goes regarding this, a neuron is nothing but a simple activation (computed by matrix operations) in this context. Also, if you read the abstract, it specifically mentions that the gain and bias parameters specifically follow from BatchNorm technique. And in equation 4 it specifically mentions that these vectors have the same dimensions as the mean. So we need to have just a single vector

08:24 < ShikharJ> for a single Forward routine.

08:26 hemi77026 has joined #mlpack

08:30 hemi77026 has quit [Read error: Connection reset by peer]

08:30 foobar29 has joined #mlpack

08:32 foobar29 has quit [Remote host closed the connection]

08:45 Torrinco6 has joined #mlpack

08:47 Torrinco6 has quit [Remote host closed the connection]

10:45 rdb25 has joined #mlpack

10:49 rdb25 has quit [Remote host closed the connection]

10:57 realies14 has joined #mlpack

10:58 realies14 has quit [Remote host closed the connection]

11:36 jjohn14 has joined #mlpack

11:40 jjohn14 has quit [Remote host closed the connection]

12:05 badpixel13 has joined #mlpack

12:09 badpixel13 has quit [Killed (Unit193 (Spam is not permitted on freenode.))]

12:09 stryngs8 has joined #mlpack

12:14 stryngs8 has quit [Remote host closed the connection]

13:11 lounge-userant5 has joined #mlpack

13:12 lounge-userant5 has quit [Remote host closed the connection]

13:42 robertohueso has joined #mlpack

13:56 cbreak28 has joined #mlpack

13:57 cbreak28 has quit [Killed (Unit193 (Spam is not permitted on freenode.))]

14:00 < rcurtin> ShikharJ: right, so to me this would imply that g and b should be vectors of length input.n_rows, and that we should be using each_col() when g and b are multiplied and added instead of each_row()

14:22 Guest24388 has joined #mlpack

14:22 Guest24388 has quit [Remote host closed the connection]

14:41 ask-ygU5AP564 has joined #mlpack

14:43 ask-ygU5AP564 has quit [Remote host closed the connection]

14:48 < zoq> rcurtin: That sounds correct to me; will incoperate that into the open PR; Shikhar what do you think?

14:48 < zoq> rcurtin: If you are in California, let me know :)

14:52 < rcurtin> definitely. I will probably order the plane tickets today or tomorrow; just need to double-check

14:53 < zoq> another race?

15:01 robertohueso has quit [Quit: Leaving.]

15:03 < rcurtin> no, actually this would be to travel out to the Berkeley office of my new company

15:03 < rcurtin> but I am pretty sure I would find some racing to do while I was out there :)

15:04 ImQ009 has joined #mlpack

15:09 < zoq> I see, would be great if it works out, but sounds like it does

15:11 robertohueso has joined #mlpack

15:19 robertohueso has quit [Quit: Leaving.]

15:45 < ShikharJ> rcurtin: No, quite the opposite. In LayerNorm, we have the mean vector as 1 x n_cols, so the g and b vectors should also be of the same shape. If you look carefully at the equation 4, you'll see that we're doing an element wise multiplication with g and (x - mu), which wouldn't be valid if you put the shape of the g vector to be n_rows instead of n_cols.

15:46 < ShikharJ> rcurtin: I mean you'll have to do element wise mutiplication with each_row().

15:52 < rcurtin> (hang on, I'm in a meeting. I'll respond when I have a chance, possibly a few hours)

17:40 robertohueso has joined #mlpack

17:54 Wintereise8 has joined #mlpack

17:57 Wintereise8 has quit [Remote host closed the connection]

17:59 < rcurtin> ShikharJ: it seems to me like equation 4 in the paper is assuming that the input a^t is one single point with dimension (n_rows x 1)---so, that is, the paper assumes a batch size of 1

17:59 < rcurtin> thus the output h^t should have size (n_rows x 1) also

17:59 < rcurtin> \mu^t is a scalar, since it's the mean of all elements in a^t; so is \sigma^t

17:59 < rcurtin> in this case, for elementwise multiplication to make sense, then g would have to have the shape (n_rows x 1) also

18:00 < rcurtin> if we generalize to larger batch sizes... then A^t (let's call it capitalized since it's a matrix now not a vector) has size (n_rows x n_cols) where n_cols is the batch size

18:02 < rcurtin> and in this case I agree, the mean vector has size (1 x n_cols), and the operation (A^t - \mu^t) would actually be implemented as A^t.each_row() -= \mu^t

18:03 < rcurtin> but it only makes sense to learn a bias and gain (b and g) for all points, instead of one for each point

18:04 < rcurtin> so g and b must have size (1 x n_rows)

18:04 < rcurtin> I hope this makes sense, I am not sure if I wrote it well. But to me it made sense when I considered that the equations are written for a batch size of 1, then I manually generalized them from there

18:18 wannabeOG has joined #mlpack

18:19 wannabeOG has quit [Client Quit]

18:41 < ShikharJ> rcurtin: I see the issue here. Sure this needs to be fixed, let me open a PR.

18:42 < rcurtin> I think Marcus said he will already handle it, but I am not sure if he's already done it yet :)

18:45 cjlcarvalho has quit [Ping timeout: 252 seconds]

18:49 < ShikharJ> rcurtin: This is a nice catch, I had only referred to the Tensorflow documentation, and maybe I misunderstood (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/layer_norm).

18:53 < zoq> Haven't started, so if Shikhar likes to handle it, please feel free.

18:59 < ShikharJ> zoq: Cool.

19:18 < ShikharJ> zoq: Opened one, please let me know if that's okay.

20:00 ImQ009 has quit [Quit: Leaving]

20:10 aconite3318 has joined #mlpack

20:11 aconite3318 has quit [Remote host closed the connection]

20:36 duckgoose28 has joined #mlpack

20:40 duckgoose28 has quit [Remote host closed the connection]

20:45 cd20 has joined #mlpack

20:50 cd20 has quit [Remote host closed the connection]

21:24 robertohueso has left #mlpack []

21:24 AlexanderS has joined #mlpack

21:25 AlexanderS has quit [Remote host closed the connection]