ChanServ changed the topic of #mlpack to: "Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there."
< rcurtin> also, I believe (but am not 100% sure) that I will at least be in California during the mentor summit
< rcurtin> so I will be sure to at least drop by the area :)
wiking has quit [Quit: ZNC 1.7.1 - https://znc.in]
wiking has joined #mlpack
noresult0 has joined #mlpack
noresult0 has quit [Remote host closed the connection]
Davey20 has joined #mlpack
Davey20 has quit [Remote host closed the connection]
ricklly has quit [Ping timeout: 252 seconds]
Inspire-26 has joined #mlpack
Inspire-26 has quit [Remote host closed the connection]
ddenison20 has joined #mlpack
ddenison20 has quit [Remote host closed the connection]
tgerczei8 has joined #mlpack
tgerczei8 has quit [Remote host closed the connection]
eelvex has joined #mlpack
eelvex has quit [K-Lined]
< ShikharJ> rcurtin: As far as my understanding goes regarding this, a neuron is nothing but a simple activation (computed by matrix operations) in this context. Also, if you read the abstract, it specifically mentions that the gain and bias parameters specifically follow from BatchNorm technique. And in equation 4 it specifically mentions that these vectors have the same dimensions as the mean. So we need to have just a single vector
< ShikharJ> for a single Forward routine.
hemi77026 has joined #mlpack
hemi77026 has quit [Read error: Connection reset by peer]
foobar29 has joined #mlpack
foobar29 has quit [Remote host closed the connection]
Torrinco6 has joined #mlpack
Torrinco6 has quit [Remote host closed the connection]
rdb25 has joined #mlpack
rdb25 has quit [Remote host closed the connection]
realies14 has joined #mlpack
realies14 has quit [Remote host closed the connection]
jjohn14 has joined #mlpack
jjohn14 has quit [Remote host closed the connection]
badpixel13 has joined #mlpack
badpixel13 has quit [Killed (Unit193 (Spam is not permitted on freenode.))]
stryngs8 has joined #mlpack
stryngs8 has quit [Remote host closed the connection]
lounge-userant5 has joined #mlpack
lounge-userant5 has quit [Remote host closed the connection]
robertohueso has joined #mlpack
cbreak28 has joined #mlpack
cbreak28 has quit [Killed (Unit193 (Spam is not permitted on freenode.))]
< rcurtin> ShikharJ: right, so to me this would imply that g and b should be vectors of length input.n_rows, and that we should be using each_col() when g and b are multiplied and added instead of each_row()
Guest24388 has joined #mlpack
Guest24388 has quit [Remote host closed the connection]
ask-ygU5AP564 has joined #mlpack
ask-ygU5AP564 has quit [Remote host closed the connection]
< zoq> rcurtin: That sounds correct to me; will incoperate that into the open PR; Shikhar what do you think?
< zoq> rcurtin: If you are in California, let me know :)
< rcurtin> definitely. I will probably order the plane tickets today or tomorrow; just need to double-check
< zoq> another race?
robertohueso has quit [Quit: Leaving.]
< rcurtin> no, actually this would be to travel out to the Berkeley office of my new company
< rcurtin> but I am pretty sure I would find some racing to do while I was out there :)
ImQ009 has joined #mlpack
< zoq> I see, would be great if it works out, but sounds like it does
robertohueso has joined #mlpack
robertohueso has quit [Quit: Leaving.]
< ShikharJ> rcurtin: No, quite the opposite. In LayerNorm, we have the mean vector as 1 x n_cols, so the g and b vectors should also be of the same shape. If you look carefully at the equation 4, you'll see that we're doing an element wise multiplication with g and (x - mu), which wouldn't be valid if you put the shape of the g vector to be n_rows instead of n_cols.
< ShikharJ> rcurtin: I mean you'll have to do element wise mutiplication with each_row().
< rcurtin> (hang on, I'm in a meeting. I'll respond when I have a chance, possibly a few hours)
robertohueso has joined #mlpack
Wintereise8 has joined #mlpack
Wintereise8 has quit [Remote host closed the connection]
< rcurtin> ShikharJ: it seems to me like equation 4 in the paper is assuming that the input a^t is one single point with dimension (n_rows x 1)---so, that is, the paper assumes a batch size of 1
< rcurtin> thus the output h^t should have size (n_rows x 1) also
< rcurtin> \mu^t is a scalar, since it's the mean of all elements in a^t; so is \sigma^t
< rcurtin> in this case, for elementwise multiplication to make sense, then g would have to have the shape (n_rows x 1) also
< rcurtin> if we generalize to larger batch sizes... then A^t (let's call it capitalized since it's a matrix now not a vector) has size (n_rows x n_cols) where n_cols is the batch size
< rcurtin> and in this case I agree, the mean vector has size (1 x n_cols), and the operation (A^t - \mu^t) would actually be implemented as A^t.each_row() -= \mu^t
< rcurtin> but it only makes sense to learn a bias and gain (b and g) for all points, instead of one for each point
< rcurtin> so g and b must have size (1 x n_rows)
< rcurtin> I hope this makes sense, I am not sure if I wrote it well. But to me it made sense when I considered that the equations are written for a batch size of 1, then I manually generalized them from there
wannabeOG has joined #mlpack
wannabeOG has quit [Client Quit]
< ShikharJ> rcurtin: I see the issue here. Sure this needs to be fixed, let me open a PR.
< rcurtin> I think Marcus said he will already handle it, but I am not sure if he's already done it yet :)
cjlcarvalho has quit [Ping timeout: 252 seconds]
< ShikharJ> rcurtin: This is a nice catch, I had only referred to the Tensorflow documentation, and maybe I misunderstood (https://www.tensorflow.org/api_docs/python/tf/contrib/layers/layer_norm).
< zoq> Haven't started, so if Shikhar likes to handle it, please feel free.
< ShikharJ> zoq: Cool.
< ShikharJ> zoq: Opened one, please let me know if that's okay.
ImQ009 has quit [Quit: Leaving]
aconite3318 has joined #mlpack
aconite3318 has quit [Remote host closed the connection]
duckgoose28 has joined #mlpack
duckgoose28 has quit [Remote host closed the connection]
cd20 has joined #mlpack
cd20 has quit [Remote host closed the connection]
robertohueso has left #mlpack []
AlexanderS has joined #mlpack
AlexanderS has quit [Remote host closed the connection]