#mlpack on 2017-06-14 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:06 sumedhghaisas_ has joined #mlpack

00:08 sumedhghaisas_ has quit [Read error: Connection reset by peer]

00:09 sumedhghaisas_ has joined #mlpack

00:12 sumedhghaisas__ has joined #mlpack

00:12 sumedhghaisas_ has quit [Read error: Connection reset by peer]

00:17 < sumedhghaisas__> zoq: I just printed how many times the evaluate is called. And I know that Evaluate does not use the parameters, right?

00:18 < sumedhghaisas__> now I did some more analysis...

00:18 < sumedhghaisas__> I realised that Evaluate is also called in Gradient..

00:18 < sumedhghaisas__> is there a reason for doing that?

00:20 < sumedhghaisas__> ahh... and the another call was hiding in plain sight

00:21 < sumedhghaisas__> so at the end of SGD... the overall function is returned after calling Evaluate with every function...

00:23 < sumedhghaisas__> but technically for RNN we only have 1 function to optimize... so we are calling extra Evaluate for every Train call.

00:27 < rcurtin> sumedhghaisas__: I thought about this, at least for typical SGD we can't really avoid the Evaluate() call at the start or end of optimization since we need to calculate the full objective function

00:28 < sumedhghaisas__> yes... I agree. But maybe we can do something for the special case of only 1 objective function.

00:28 < sumedhghaisas__> ahh wait. Do we have GD?

00:28 < rcurtin> yes, core/optimizers/gradient_descent/

00:29 < rcurtin> but, I am a litlte confused about why you say there is only one objective function for RNNs (though I am not too familiar with that code)

00:29 < rcurtin> we should have one objective function per data point

00:29 < rcurtin> so i.e. if we are training on 1M data points then there are 1M calls to Evaluate(iterate, i)

00:29 < sumedhghaisas__> yes even I thought so. So I was not paying attention to it.

00:30 < sumedhghaisas__> But when I printed the Evaluate calls, there were many so that made me look into it

00:30 < sumedhghaisas__> wait...

00:30 < sumedhghaisas__> let me paste the section

00:30 < rcurtin> sure

00:30 < sumedhghaisas__> arma::mat inputTemp, labelsTemp;

00:30 < sumedhghaisas__> for (size_t i = 0; i < (10 + offset); i++)

00:30 < sumedhghaisas__> {

00:30 < sumedhghaisas__> for (size_t j = 0; j < trainReberGrammarCount; j++)

00:30 < sumedhghaisas__> {

00:30 < sumedhghaisas__> inputTemp = trainInput.at(0, j);

00:30 < sumedhghaisas__> labelsTemp = trainLabels.at(0, j);

00:30 < sumedhghaisas__> model.Train(inputTemp, labelsTemp, opt);

00:30 < sumedhghaisas__> }

00:31 < sumedhghaisas__> so as I understand from this, we are calling Train for every training data

00:31 < rcurtin> for each training sequence I think

00:31 < sumedhghaisas__> indeed

00:31 < sumedhghaisas__> but I am sure why exactly?

00:31 < rcurtin> but I guess here the sequences are one element long? I am not sure (I haven't spent much time looking at this test)

00:32 < sumedhghaisas__> umm... dont think so. Let me see

00:32 < sumedhghaisas__> const size_t inputSize = 7;

00:33 < sumedhghaisas__> 7 elements long

00:33 < rcurtin> ah sorry, hang on

00:33 < rcurtin> inputTemp is a matrix, and trainInput is an arma::field<arma::mat>

00:33 < sumedhghaisas__> yup...

00:33 < rcurtin> so trainInput.at(0, j) refers to a matrix, not a single point

00:33 < sumedhghaisas__> yes

00:34 < sumedhghaisas__> maybe we can change the Train function to accept the field? I am thinking about some reason why it might not be possible

00:34 < rcurtin> hmm, I am not sure about that one, that is a tricky question

00:35 < rcurtin> ideally you would want to train on all sequences, but not all sequences are the same length, and I'm not sure having a different Train() signature for FFN and RNN is a good thing

00:35 < rcurtin> zoq might have some good input here, but I suspect he is probably asleep now (I guess it's 6am there) so maybe it will be tomorrow before we hear from him :)

00:35 < sumedhghaisas__> ahhh... variable length sequences

00:35 < rcurtin> I think that the inputSize set to 7 means that the dimension of the input data is 7, not that it is seven elements long

00:36 < sumedhghaisas__> but wait... we are not changing rho... so sequence length has to be constant

00:37 < sumedhghaisas__> yeah... zoq my shine some light on this puzzle

00:38 < sumedhghaisas__> *may

00:38 < rcurtin> yeah, I am sorry I am not more helpful here

00:38 < rcurtin> I am playing with the mlpack neural networks in my spare time, but I have not gotten very far yet, I have mostly focused on basic networks and CNNs

00:38 < rcurtin> no RNNs for me quite yet... :)

00:39 < sumedhghaisas__> Ohh yeah ... this RNN framework is tough to understand. I am going over this for a long time and every day I figure out something new.

00:40 < sumedhghaisas__> So the question I have for him is... Does 'rho' break the sequence? or only breaks the backprop through time.

00:41 < sumedhghaisas__> Cause if it only break the backprop through time... then I have reason to believe that our Backward implementation is wrong..

00:41 < sumedhghaisas__> okay. I may have another question for you.

00:42 < sumedhghaisas__> So I have completed the batch norm for FNN. But how should I implement it for CNN?

00:43 < rcurtin> hmm, maybe it would be good to submit a PR for FNN batch norm, then I can think about how it might be adapted for CNN?

00:43 sumedhghaisas__ has quit [Read error: Connection reset by peer]

00:44 sumedhghaisas__ has joined #mlpack

00:44 < rcurtin> it seems to me that the CNN batch norm layer is a "spatial" norm, so the calculation might need to change, but maybe you can use Armadillo functions on subviews for this task?

00:44 < rcurtin> avoiding copies might be a little bit tricky, but I think definitely not impossible

00:45 < sumedhghaisas__> ahh yes... thats correct.

00:45 < sumedhghaisas__> sorry the proble was different

00:46 < sumedhghaisas__> So the Forward function signature for convolution also accepts arma::mat

00:46 < sumedhghaisas__> I thought it would be cube

00:46 < sumedhghaisas__> but it isn't

00:46 < rcurtin> if I remember right there are two overloads

00:47 < rcurtin> one for cube, one for mat

00:48 sumedhghaisas_ has joined #mlpack

00:48 sumedhghaisas__ has quit [Read error: Connection reset by peer]

00:48 < sumedhghaisas_> template<typename eT>

00:48 < sumedhghaisas_> void Forward(const arma::Mat<eT>&& input, arma::Mat<eT>&& output);

00:48 < sumedhghaisas_> this is the only one I could find

00:49 < rcurtin> ah, take a look in convolution_rules/

00:49 < rcurtin> I think that the Convolution2D layer stores the shape of the images

00:49 < rcurtin> then reshapes the input matrix into a cube, then calls the ConvolutionRule to do the convolution processing

00:49 < rcurtin> double-check on that, but I am pretty sure that is right

00:50 < sumedhghaisas_> yes precisely... so now we need 2 separate batch norm layers... 1 for FNN and 1 for CNN

00:50 < sumedhghaisas_> cause there is no way to tell if the input to batch norm is 2D or 3D?

00:51 < rcurtin> right, I think that is true unless we can use some template metaprogramming to find some information about the previous layer

00:52 < sumedhghaisas_> or the uner has to input that while creating the layer...

00:53 < sumedhghaisas_> I like that solution... I feel user should not be forced to do that

00:53 < rcurtin> I agree, but I don't know if it's easily possible :)

00:54 < rcurtin> since the previous layer is encoded by boost::variant it could be hard to determine what the layer actually is

00:55 < sumedhghaisas_> nope... can't think of any solution right now

00:55 < sumedhghaisas_> so if and else it is for now :)

00:56 < sumedhghaisas_> so user has to input the input dimensions... should I go with that?

00:58 < rcurtin> I think for now that is ok, maybe marcus will have some other comments in the PR

00:58 < rcurtin> but it is probably worth spending a little while thinking about ways that the previous layer type could be detected

00:59 < sumedhghaisas_> So I was thinking that boost::variant is not static... cause I can just write an if statement and change the type

00:59 < sumedhghaisas_> so is it even possible to get the type at static time?

01:00 < rcurtin> I am not sure, I have not looked into it deeply

01:00 < rcurtin> I know that boost::visitor is able to provide very fast speeds, but I am not 100% certain of all the internal details

01:01 < sumedhghaisas_> yes it does.. I did some time measures. and for reasonable class loads... its as fast as normal function calls

01:03 < sumedhghaisas_> okay I will keep researching on the side.

01:03 < rcurtin> so either they have done something impressive, or in our case the cost of virtual functions isn't that high to begin with :)

01:03 < rcurtin> (but I think that the cost is high, I simply haven't verified that so I can't say for sure)

01:05 < sumedhghaisas_> haha... true. All I got from reading some blog posts that they have a integer as a part of a variable storing which type it is...

01:05 < sumedhghaisas_> but how do they do this is above my paygrade

01:05 < rcurtin> :)

01:05 < rcurtin> I do wish the boost internals were better commented, sometimes it can be really difficult to read

01:06 < rcurtin> although given that it is template metaprogramming, it is already difficult to read no matter what

01:06 < sumedhghaisas_> ahh and about your suggestion of using cube... If what marcus says is true and at variable length sequence we have a reshape... then the cost is huge

01:07 < sumedhghaisas_> for both cube and vector

01:07 < sumedhghaisas_> so I was trying std::list... what do you think?

01:08 < sumedhghaisas_> I agree. Did you know that templates are turing complete? its like c++ has 2 languages built inside it.

01:08 < rcurtin> ah searching google for 'std list' is not a great idea

01:08 < sumedhghaisas_> :P I checked

01:09 < sumedhghaisas_> I meant linked list... so overhead of reshape

01:09 < rcurtin> if you never need to access a particular index of the list, and always only iterate over it from the end or beginning, then I agree, std::list is a better call

01:09 < rcurtin> and std::list is definitely better if a reshape or resize operation will be taking place

01:09 < rcurtin> but I thought that the size of the cube would be fixed for a given network

01:10 < sumedhghaisas_> yes... thats what I noticed... so if you think about it... both in backward and gradient

01:10 < rcurtin> it is possible that there is a detail here I am overlooking, I still have to bring myself fully up to speed on GRUs

01:10 < sumedhghaisas_> as we go through BPTT

01:10 < sumedhghaisas_> we only need access to consecutive outputs in BPTT

01:10 < sumedhghaisas_> never random

01:11 < rcurtin> right, in that case list is a better choice than vector

01:11 < sumedhghaisas_> and std::list is a doubly linked list.. so that solves the problem

01:11 < rcurtin> I agree

01:12 < sumedhghaisas_> okay. working on that now...

01:12 < rcurtin> like I said I still need to understand why a resize may be necessary, but I will try and put time into that tomorrow

01:12 < sumedhghaisas_> yes... me too.

01:13 < sumedhghaisas_> I am confused about that. but you what? that problem eventually boils down to the use 'rho'.

01:14 < sumedhghaisas_> if 'rho' is just breaking BPTT then resize is definately needed. Cause we may have longer chains than 'rho' but just backpropagating till 'rho' length then resetting the backpropagation

01:14 < sumedhghaisas_> but like I said if thats the case I think the implementation of Backward and Gradient may be wrong

01:15 < sumedhghaisas_> we have to discuss this tomorrow with zoq

01:16 < rcurtin> sure, I will be around to chat, or if I am stuck in a meeting, I'll read through the logs later

01:34 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

01:46 sumedhghaisas_ has joined #mlpack

02:18 mikeling has joined #mlpack

04:51 govg has joined #mlpack

04:51 govg has quit [Client Quit]

05:12 govg has joined #mlpack

05:13 < ironstark> rcurtin: I am still facing issues. That's why I benchmarked a new library this week rather than upgrading shogun. Also I read the shogun implementations and they already specify the options available so I dont think upgrading implementations of that library is necessary.

06:46 sumedhghaisas_ has quit [Read error: Connection reset by peer]

08:30 shikhar has joined #mlpack

08:31 kris1_ has joined #mlpack

08:41 kris1_ has quit [Quit: kris1_]

08:43 kris1_ has joined #mlpack

08:44 kris1_ has quit [Client Quit]

08:45 kris1_ has joined #mlpack

08:52 kris1_ has quit [Quit: kris1_]

09:05 kris1_ has joined #mlpack

09:08 kris1_ has quit [Client Quit]

09:15 kris1_ has joined #mlpack

09:53 kris1_ has quit [Quit: kris1_]

10:18 kris1_ has joined #mlpack

10:25 < zoq> sumedhghais: About the reber grammar test, you are right about the rho parameter, we should update the parameter for each step, right now the number of previous steps kept in memory is constant. There are cases where you are only interested in the last steps but in this case, we are interested in all previous steps. Meaning it's something we have to fix.

10:25 < zoq> About the rho parameter, it breaks the sequence but also the amount of backpropagation steps to take back in time. As pointed out before there are cases where you could limit the number of steps to take back in time so the rho value in the model should be different from the value of the recurrent or LSTM layer, but that's not implemented right now. So, the rho values that is passed in the RNN class is just a

10:26 < zoq> cheap way to figure out the initial sequence length, without looking at the actual input size of the first layer, so the value might be misleading if you don't want to take every previous step into account.

10:26 < zoq> I guess we could somehow figure out if we should take the complete steps into account or not, like in Torch you can just set rho to 99999 to achieve the same, but I'm not sure that's necessary, and I think if we know the rho parameter we can speedup the process e.g. by preallocating memory.

10:26 < zoq> About adding another constructor for arma::field and arma::mat, I guess it makes sense, we could speedup the optimization process. I thought about the problem some time ago and figured I like to provide the same interface for all models and went with the arma::mat option, but I think if someone likes to train the model on sequences with a variable size it makes sense to add another constructor.

10:26 < zoq> About arma::cube and the conv layer, some time ago the conv layer supported arma::cube and arma::mat as input, but that comes with the problem that you have to take care that every layer is capable of handling both types, which is a burden that I wanted to avoid. I think we can completely avoid the use of arma::cube inside the conv layer, we have to refactor the convolution operation but I think that is

10:26 < zoq> something we should do anyway.

10:27 < zoq> About the figuring out if the input to the batch norm is 2D or 3D, would be relatively easy to figure that out at runtime (by taking a look at the inputWidth/outputWidth), but at compile time is another story.

10:34 vivekp has quit [Ping timeout: 255 seconds]

10:35 vivekp has joined #mlpack

11:19 vivekp has quit [Ping timeout: 246 seconds]

11:21 vivekp has joined #mlpack

11:22 shikhar has quit [Quit: WeeChat 1.7]

12:00 kris1_ has quit [Quit: kris1_]

12:20 kris1_ has joined #mlpack

12:48 < rcurtin> ironstark: ok, I can try and help you today if you would like to take some time to do that

12:55 sgupta has quit [Ping timeout: 260 seconds]

13:08 shikhar has joined #mlpack

13:32 < rcurtin> zoq: sumedhghaisas: I think looking at inputWidth/outputWidth at runtime is fine, it might give a slowdown but I suspect that a single "if" statement is going to have pretty negligible effect there

13:38 sgupta has joined #mlpack

15:37 < kris1_> if there is function like function fnc1(arma::mat && input){ fnc2(input)) where fnc2(arma::mat&& a){ a = 0 } then in fnc1 should i call fnc2(std::move(input)) since input is already a rvalue when fnc1 gets it why do we need std::move again

15:42 travis-ci has joined #mlpack

15:42 < travis-ci> mlpack/mlpack#2563 (master - f7df912 : Ryan Curtin): The build was broken.

15:42 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/3e84774a37fb...f7df91201754

15:42 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/242868220

15:42 travis-ci has left #mlpack []

15:43 kris1_ has quit [Quit: kris1_]

15:48 < rcurtin> kris1_: this has to do with reference collapsing---the compiler will collapse arma::mat&& & into arma::mat&, so you need the second std::move() to cause that to not happen

15:48 < rcurtin> http://thbecker.net/articles/rvalue_references/section_01.html is a good rvalue reference tutorial although it is quite long

15:53 kris1 has joined #mlpack

16:00 travis-ci has joined #mlpack

16:00 < travis-ci> mlpack/mlpack#2564 (master - b3e963f : Marcus Edel): The build was broken.

16:00 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/f7df91201754...b3e963f5bc21

16:00 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/242876097

16:00 travis-ci has left #mlpack []

16:18 sumedhghaisas_ has joined #mlpack

16:36 < ironstark> rcurtin: can you please help me with the shogun install

16:37 < ironstark> I also wanted to know the initial centroid values you were talking about in the PR comment. I wanted to know the initial values that are used by mlpack for centroids

16:42 < ironstark> I got the second part. Just need help with the shogun install now.

16:52 kris1 has quit [Quit: kris1]

16:58 kris1 has joined #mlpack

17:04 < sumedhghaisas_> zoq: Hey Marcus...

17:05 < sumedhghaisas_> have couple of questions for you if you are free?

17:10 < zoq> sumedhghais: Kinda, please go ahead.

17:11 shikhar has quit [Quit: WeeChat 1.7]

17:29 < kris1> thanks rcurtin i will have a look

17:42 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

17:53 < rcurtin> ironstark: sounds good, I am happy to help

17:53 < rcurtin> but I think I have meetings for the next two hours

17:53 < rcurtin> so the response might be a little slow :(

17:53 < rcurtin> can you tell me what the issue is and how to reproduce it?

17:58 kris1 has quit [Quit: kris1]

17:59 < ironstark> I'll try to build shogun once again

17:59 < ironstark> and as issues pop up will ask you

17:59 < rcurtin> sure, I will try to answer quickly

18:04 kris1 has joined #mlpack

18:51 shikhar has joined #mlpack

18:54 < ironstark> rcurtin: https://paste.ubuntu.com/24858497/

18:54 < ironstark> this is the error message

18:57 mikeling has quit [Quit: Connection closed for inactivity]

19:00 < rcurtin> ok, thanks, let me take a look momentarily

19:05 sumedhghaisas_ has joined #mlpack

19:12 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

19:12 shikhar has quit [Quit: WeeChat 1.7]

19:23 kris1 has quit [Quit: kris1]

19:24 kris1 has joined #mlpack

19:47 < rcurtin> ironstark: I can't reproduce the issue; I tried to build on slake (one of the benchmarking systems)

19:47 < rcurtin> I configured with 'cmake -DBUILD_META_EXAMPLES=OFF ../'

19:47 < rcurtin> I'll try reconfiguring with the exact options from shogun_install.sh

19:55 kris1 has quit [Quit: kris1]

19:59 kris1 has joined #mlpack

20:20 < rcurtin> I found that whoever reinstalled gekko had only touched sda, not sdb

20:20 < rcurtin> and since the system was running RAID1...

20:20 < rcurtin> I just booted a rescue CD into sdb and am rebuilding the RAID1 :)

20:22 < zoq> nice, mirroring for the win :)

20:46 sumedhghaisas has joined #mlpack

20:50 < rcurtin> ironstark: I can't reproduce the problem on slake, even with the same CMake parameters as shogun_install.sh

21:10 < sumedhghaisas> zoq: Hey Marcus... There?

21:11 < zoq> sumedhghais: yes

21:13 < sumedhghaisas> Hey... bunch of questions for you.

21:13 < sumedhghaisas> haha

21:14 < zoq> sure, go ahead

21:15 < sumedhghaisas> okay I will start with the most basic and the hardest. So exactly what is the purpose of 'rho'? does it define the sequence length? or is it used to break the BPTT chain?

21:17 < zoq> Have you seen my messages: http://www.mlpack.org/irc/? Both, it breaks the sequence but also defines the amount of steps.

21:18 < sumedhghaisas> ahh no I didn't get them. Sorry

21:18 < sumedhghaisas> Give me a moment to go through them

21:19 < sumedhghaisas> ahhh makes sense now...

21:19 < sumedhghaisas> so that was my confusion... the rho of the model can be different than the rho of layers right?

21:20 < zoq> yes

21:20 < zoq> as I said the name is kinda misleading

21:21 < sumedhghaisas> okay then I have a reason to believe that the implementation of Backward might be worng

21:21 < zoq> open for ideas to rename the parameter

21:21 < sumedhghaisas> so in Backward of LSTM we reset backwardStep to 0

21:22 < sumedhghaisas> and then access outParameter.size() - backwardStep - 1...

21:22 < sumedhghaisas> so we are technically accessing only the last 'rho' section...

21:22 < sumedhghaisas> always...

21:23 < sumedhghaisas> But Forward is implemented correctly to support multiple 'rho' section in a sequence

21:23 < sumedhghaisas> that was another reason whi I had this confusion

21:25 < zoq> hm, I can't find outParameter.size() - backwardStep - 1

21:25 < zoq> can you point me to the line?

21:26 < zoq> backwardStep should increased in each step

21:27 < sumedhghaisas> okay wait...

21:27 < zoq> and backwardStep should be reset to zero if rho is reached

21:29 < sumedhghaisas> https://thepasteb.in/p/pghQ6Xy33YvsR

21:29 < sumedhghaisas> ahh sorry... its cellParameter in LSTM

21:30 < sumedhghaisas> so here in this code we access the output to use in backward using backwardStep... but imagine 'rho' for the model is 100 and for the LSTM is 20

21:30 < sumedhghaisas> so there are 5 parts in BPTT

21:31 < sumedhghaisas> but cellParameter.size() - backwardStep - 1 will always give the last 'rho' part

21:31 < sumedhghaisas> cause we are resetting the backwardStep

21:33 < zoq> yes, you are right, model rho != layer rho is something that isn't handled at the moment.

21:33 < sumedhghaisas> ahh okay. Okay so that doubt is clear for me now

21:35 < sumedhghaisas> and I shifted the implementation to list with my last commit ... it is slower for fixed rho...

21:35 < sumedhghaisas> but according to my calculation it should be faster on variable rho...

21:35 < zoq> I can't see an easy fix for the problem and I'm not sure it's that important.

21:35 < zoq> hm, okay, is the difference huge?

21:36 < sumedhghaisas> 2 secs

21:36 < zoq> oh, okay

21:36 < sumedhghaisas> but wait... I compared it the older build

21:36 < sumedhghaisas> 1.5 sec

21:36 < sumedhghaisas> cause changing the parameters have a huge effect on the runtime#

21:37 < sumedhghaisas> wait... lets see if boost::list is faster than std::list

21:37 < zoq> yes, I guess we could use cube if rho is fixed

21:38 < zoq> and provide some alias e.g. GRUConst or something like that

21:38 < zoq> do you test in debug or release mode?

21:41 < sumedhghaisas> oops didn't check that... I guess the default is Debug right?

21:41 < sumedhghaisas> maybe the difference is not that much in Release

21:42 < sumedhghaisas> okay. WIll check that and get back to you...

21:42 < zoq> release

21:42 < zoq> is the default

21:42 < sumedhghaisas> ahh then I checked in release...

21:42 < sumedhghaisas> okay... 2 different implementations then

21:42 < sumedhghaisas> same for LSTM?

21:43 < sumedhghaisas> another thing to discuss. So Evalaute is called in Gradient call... why is that?

21:43 < zoq> if we go that way

21:44 < sumedhghaisas> which? GRUConst?

21:44 < zoq> Gradient (Backward pass Gradient calculation) is called inside the optimizer class and Evalute performs the Forward pass.

21:44 < zoq> yes, GRUConst

21:45 < sumedhghaisas> yes but Evaluate is also called in the optimizer right before the gradient.

21:45 < sumedhghaisas> So why do we need to call it again in Gradient? I am confused

21:46 < sumedhghaisas> It might be relevant if we are using the parameters that are being passed

21:46 < zoq> It's only called to get the initial performance

21:47 < sumedhghaisas> umm sorry? didn't get that

21:48 < sumedhghaisas> so gradient descent based optimizers will call Evaluate - Gradient - Evalaute - Gradient - .... In this order right?

21:49 < zoq> ah, gradient descent, I was talking about sgd

21:49 < zoq> let's see

21:49 < sumedhghaisas> is it different for sgd? LET ME SEE

21:51 < zoq> looks the same

21:51 < sumedhghaisas> ahh... but the first call to Gradient will fail

21:52 < sumedhghaisas> cause the calls are Gradient - Evaluate - Gradient - Evalaute -...

21:52 < zoq> is the first Gradient call the one of the optimizer or the inside the RNN class?

21:53 < sumedhghaisas> ummm... sorry didn't get that

21:54 < zoq> one step should be Evaluate (Forward) - Gradient (Backward/Gradient) right?

21:55 < sumedhghaisas> yup...

21:55 < sumedhghaisas> thats what I was thinking

21:56 < sumedhghaisas> but the Gradient function inside RNN... also internally calls Evaluate

21:56 < sumedhghaisas> I think that may be unnecessary

21:57 < zoq> but when we remove that step who is calling the Forward pass?

21:57 travis-ci has joined #mlpack

21:57 < travis-ci> mlpack/mlpack#2567 (master - 9dab9c6 : Ryan Curtin): The build is still failing.

21:57 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/b3e963f5bc21...9dab9c6113b7

21:57 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/243013224

21:57 travis-ci has left #mlpack []

21:58 < sumedhghaisas> the optimizer... it is calling the Evaluate...

21:59 < zoq> yes, to keep track of the progress, after we updated the parameters

21:59 < zoq> maybe I missed something :)

21:59 < zoq> we are talking about this line:

21:59 < zoq> https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/optimizers/sgd/sgd_impl.hpp#L107

21:59 < zoq> and this:

21:59 < zoq> https://github.com/mlpack/mlpack/blob/master/src/mlpack/core/optimizers/sgd/sgd_impl.hpp#L117

21:59 < zoq> right?

22:01 < sumedhghaisas> https://thepasteb.in/p/vghOVMr2JY6i3

22:01 < sumedhghaisas> yup... so imagine the second iteration of sgd... first is little bit tricky to handle

22:01 < sumedhghaisas> so for the second iteration... the Gradient will be called first... right?

22:02 < sumedhghaisas> although if you think about it... we don't have to use Forward pass there as it is already done in the first iteration of sgd... for the special case where the function value is 1

22:02 < sumedhghaisas> sorry number of functions to be 1

22:03 < zoq> yes, in this case the extra Evaluation function is unnecessary

22:04 < sumedhghaisas> there is another extra call... so at the end of sgd ... the overall objective is computed again...

22:04 < sumedhghaisas> it would be for each taining sequence....

22:05 < zoq> To get the overall performance over the complete dataset.

22:05 < sumedhghaisas> but for RNN we are calling Train on each training sequence individually right?

22:06 < zoq> So do you say since we call Evaluate after the Gradient step in iteration > 1 we don't have to call Evaluate in the Gradient step inside the model?

22:06 < sumedhghaisas> in this scenario... yes

22:07 < zoq> we don't necessary call Train for each sequence, I did that becasue the RNN class can't handle arma::field

22:07 < sumedhghaisas> yeah... I was thinking the same

22:08 < zoq> if it could handle arma::field we could avoid to Evaluate twice in each step

22:08 < sumedhghaisas> yes... thats what I was going to suggest... took too long sorry :)

22:08 < zoq> I think I get your point, you are talking about training a single sequence

22:08 < sumedhghaisas> but still the problem remains...

22:09 < sumedhghaisas> still we have 2 Evaluate calls in 1 iteration...

22:09 < sumedhghaisas> So imagine... SGD starts with 1 iteration

22:09 < sumedhghaisas> 1 training example

22:09 < zoq> yes

22:10 < sumedhghaisas> and in that iteration both the calls are happening with the same example...

22:10 < sumedhghaisas> so they are the same

22:10 < zoq> yes, so we could handle that inside the RNN class, right?

22:10 < sumedhghaisas> yeah... But I could not figure out how. No elegant solution I can see

22:10 < zoq> and avoid the extra Evaluate call if numfunctions = 1?

22:11 < sumedhghaisas> ohh.. but we avoid that with support of field right?

22:11 < sumedhghaisas> I was assuming the numFunctions will always be more than 1

22:11 < zoq> I guess, it would still be usefull

22:12 < sumedhghaisas> okay. That case is easy to handle. But how to avoid the extra call when numFunctions are greater than 1?

22:13 < zoq> for sgd we pass the index of the current sample, so we could use that parameter to keep track of already evaluated samples?

22:13 < sumedhghaisas> ahh yes...

22:13 < zoq> but for gd, hm ...

22:14 < sumedhghaisas> true... gd will pose a problem

22:14 < sumedhghaisas> what about this?

22:14 < sumedhghaisas> so we create 2 different Evaluate functions

22:14 < sumedhghaisas> 1 always returns the last Forward pass result

22:14 < sumedhghaisas> and 1 actually does the forward pass

22:15 < sumedhghaisas> according to all the current optimizers that can be associated with ANN

22:15 < sumedhghaisas> this method will work#

22:15 < zoq> good idea

22:15 < sumedhghaisas> and this works also for FNN and CNN

22:16 < zoq> yes, right, nice solution

22:16 < sumedhghaisas> okay so I will try to fix this first. Task number 1

22:17 < sumedhghaisas> So now to batch norm... sorry I am taking too much of your time :)

22:17 < zoq> nah, no worries

22:18 < sumedhghaisas> okay so I didn't quite get how do you want me to determine if the matrix is 2D or 3D? I mean at dynamic time

22:20 < zoq> I think you can figure it out by looking at inputHeight/outputWidth, which is set automatically

22:21 < sumedhghaisas> ahh... that is done at the initialization you mean?

22:21 < zoq> yes

22:21 < sumedhghaisas> so if inputHeight > 1 then 3D or else 2D?

22:22 < zoq> I'm not quite sure that's enough information, yes if inputHeight/inputWidth > 3D

22:22 < zoq> > 1 -> 3D

22:25 < sumedhghaisas> ahh okay. Got it.

22:26 < sumedhghaisas> Okay that task I can do now.

22:26 < zoq> Maybe there is a better solution, we should think about it.

22:26 < sumedhghaisas> you mean static time check?

22:27 < zoq> yes

22:27 < sumedhghaisas> I will think about it. Okay so on a final note

22:28 < sumedhghaisas> I will right now implement GRUConst

22:29 < sumedhghaisas> maybe I can try creating a variable length 1... do you have a test in mind for that?

22:29 < sumedhghaisas> I can try tonight#

22:30 < zoq> Konstantin is working on a neat test over here: https://github.com/mlpack/mlpack/pull/1005

22:31 < sumedhghaisas> ahh he is also implementing the Neural Turing Machine?

22:31 < zoq> he is working on HAM

22:32 < sumedhghaisas> ahh okay. Okay I saw those tests but can they be achieved without memory?

22:32 < sumedhghaisas> I just wanted to test the variable length for LSTM and GRU

22:33 < sumedhghaisas> can I create a variable length reber grammar test?

22:33 < zoq> Hierarchical Attentive Memory - LSTM/GRU should handle sequnces up to >= 50

22:34 < sumedhghaisas> ahh ... so they should pass the add test... okay will try that.

22:34 < sumedhghaisas> and his tests are going to help me a lot in NTM :)

22:35 < zoq> yes, maybe we can add even more tasks in the future

22:36 < zoq> and yes I think you could also use the Reber grammer test

22:36 < zoq> I have to check the data, the size should be variable

23:19 kris1 has quit [Quit: kris1]