verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
sumedhghaisas_ has joined #mlpack
sumedhghaisas_ has quit [Read error: Connection reset by peer]
sumedhghaisas_ has joined #mlpack
sumedhghaisas__ has joined #mlpack
sumedhghaisas_ has quit [Read error: Connection reset by peer]
< sumedhghaisas__> zoq: I just printed how many times the evaluate is called. And I know that Evaluate does not use the parameters, right?
< sumedhghaisas__> now I did some more analysis...
< sumedhghaisas__> I realised that Evaluate is also called in Gradient..
< sumedhghaisas__> is there a reason for doing that?
< sumedhghaisas__> ahh... and the another call was hiding in plain sight
< sumedhghaisas__> so at the end of SGD... the overall function is returned after calling Evaluate with every function...
< sumedhghaisas__> but technically for RNN we only have 1 function to optimize... so we are calling extra Evaluate for every Train call.
< rcurtin> sumedhghaisas__: I thought about this, at least for typical SGD we can't really avoid the Evaluate() call at the start or end of optimization since we need to calculate the full objective function
< sumedhghaisas__> yes... I agree. But maybe we can do something for the special case of only 1 objective function.
< sumedhghaisas__> ahh wait. Do we have GD?
< rcurtin> yes, core/optimizers/gradient_descent/
< rcurtin> but, I am a litlte confused about why you say there is only one objective function for RNNs (though I am not too familiar with that code)
< rcurtin> we should have one objective function per data point
< rcurtin> so i.e. if we are training on 1M data points then there are 1M calls to Evaluate(iterate, i)
< sumedhghaisas__> yes even I thought so. So I was not paying attention to it.
< sumedhghaisas__> But when I printed the Evaluate calls, there were many so that made me look into it
< sumedhghaisas__> wait...
< sumedhghaisas__> let me paste the section
< rcurtin> sure
< sumedhghaisas__> arma::mat inputTemp, labelsTemp;
< sumedhghaisas__> for (size_t i = 0; i < (10 + offset); i++)
< sumedhghaisas__> {
< sumedhghaisas__> for (size_t j = 0; j < trainReberGrammarCount; j++)
< sumedhghaisas__> {
< sumedhghaisas__> inputTemp = trainInput.at(0, j);
< sumedhghaisas__> labelsTemp = trainLabels.at(0, j);
< sumedhghaisas__> model.Train(inputTemp, labelsTemp, opt);
< sumedhghaisas__> }
< sumedhghaisas__> }
< sumedhghaisas__> so as I understand from this, we are calling Train for every training data
< rcurtin> for each training sequence I think
< sumedhghaisas__> indeed
< sumedhghaisas__> but I am sure why exactly?
< rcurtin> but I guess here the sequences are one element long? I am not sure (I haven't spent much time looking at this test)
< sumedhghaisas__> umm... dont think so. Let me see
< sumedhghaisas__> const size_t inputSize = 7;
< sumedhghaisas__> 7 elements long
< rcurtin> ah sorry, hang on
< rcurtin> inputTemp is a matrix, and trainInput is an arma::field<arma::mat>
< sumedhghaisas__> yup...
< rcurtin> so trainInput.at(0, j) refers to a matrix, not a single point
< sumedhghaisas__> yes
< sumedhghaisas__> maybe we can change the Train function to accept the field? I am thinking about some reason why it might not be possible
< rcurtin> hmm, I am not sure about that one, that is a tricky question
< rcurtin> ideally you would want to train on all sequences, but not all sequences are the same length, and I'm not sure having a different Train() signature for FFN and RNN is a good thing
< rcurtin> zoq might have some good input here, but I suspect he is probably asleep now (I guess it's 6am there) so maybe it will be tomorrow before we hear from him :)
< sumedhghaisas__> ahhh... variable length sequences
< rcurtin> I think that the inputSize set to 7 means that the dimension of the input data is 7, not that it is seven elements long
< sumedhghaisas__> but wait... we are not changing rho... so sequence length has to be constant
< sumedhghaisas__> yeah... zoq my shine some light on this puzzle
< sumedhghaisas__> *may
< rcurtin> yeah, I am sorry I am not more helpful here
< rcurtin> I am playing with the mlpack neural networks in my spare time, but I have not gotten very far yet, I have mostly focused on basic networks and CNNs
< rcurtin> no RNNs for me quite yet... :)
< sumedhghaisas__> Ohh yeah ... this RNN framework is tough to understand. I am going over this for a long time and every day I figure out something new.
< sumedhghaisas__> So the question I have for him is... Does 'rho' break the sequence? or only breaks the backprop through time.
< sumedhghaisas__> Cause if it only break the backprop through time... then I have reason to believe that our Backward implementation is wrong..
< sumedhghaisas__> okay. I may have another question for you.
< sumedhghaisas__> So I have completed the batch norm for FNN. But how should I implement it for CNN?
< rcurtin> hmm, maybe it would be good to submit a PR for FNN batch norm, then I can think about how it might be adapted for CNN?
sumedhghaisas__ has quit [Read error: Connection reset by peer]
sumedhghaisas__ has joined #mlpack
< rcurtin> it seems to me that the CNN batch norm layer is a "spatial" norm, so the calculation might need to change, but maybe you can use Armadillo functions on subviews for this task?
< rcurtin> avoiding copies might be a little bit tricky, but I think definitely not impossible
< sumedhghaisas__> ahh yes... thats correct.
< sumedhghaisas__> sorry the proble was different
< sumedhghaisas__> So the Forward function signature for convolution also accepts arma::mat
< sumedhghaisas__> I thought it would be cube
< sumedhghaisas__> but it isn't
< rcurtin> if I remember right there are two overloads
< rcurtin> one for cube, one for mat
sumedhghaisas_ has joined #mlpack
sumedhghaisas__ has quit [Read error: Connection reset by peer]
< sumedhghaisas_> template<typename eT>
< sumedhghaisas_> void Forward(const arma::Mat<eT>&& input, arma::Mat<eT>&& output);
< sumedhghaisas_> this is the only one I could find
< rcurtin> ah, take a look in convolution_rules/
< rcurtin> I think that the Convolution2D layer stores the shape of the images
< rcurtin> then reshapes the input matrix into a cube, then calls the ConvolutionRule to do the convolution processing
< rcurtin> double-check on that, but I am pretty sure that is right
< sumedhghaisas_> yes precisely... so now we need 2 separate batch norm layers... 1 for FNN and 1 for CNN
< sumedhghaisas_> cause there is no way to tell if the input to batch norm is 2D or 3D?
< rcurtin> right, I think that is true unless we can use some template metaprogramming to find some information about the previous layer
< sumedhghaisas_> or the uner has to input that while creating the layer...
< sumedhghaisas_> I like that solution... I feel user should not be forced to do that
< rcurtin> I agree, but I don't know if it's easily possible :)
< rcurtin> since the previous layer is encoded by boost::variant it could be hard to determine what the layer actually is
< sumedhghaisas_> nope... can't think of any solution right now
< sumedhghaisas_> so if and else it is for now :)
< sumedhghaisas_> so user has to input the input dimensions... should I go with that?
< rcurtin> I think for now that is ok, maybe marcus will have some other comments in the PR
< rcurtin> but it is probably worth spending a little while thinking about ways that the previous layer type could be detected
< sumedhghaisas_> So I was thinking that boost::variant is not static... cause I can just write an if statement and change the type
< sumedhghaisas_> so is it even possible to get the type at static time?
< rcurtin> I am not sure, I have not looked into it deeply
< rcurtin> I know that boost::visitor is able to provide very fast speeds, but I am not 100% certain of all the internal details
< sumedhghaisas_> yes it does.. I did some time measures. and for reasonable class loads... its as fast as normal function calls
< sumedhghaisas_> okay I will keep researching on the side.
< rcurtin> so either they have done something impressive, or in our case the cost of virtual functions isn't that high to begin with :)
< rcurtin> (but I think that the cost is high, I simply haven't verified that so I can't say for sure)
< sumedhghaisas_> haha... true. All I got from reading some blog posts that they have a integer as a part of a variable storing which type it is...
< sumedhghaisas_> but how do they do this is above my paygrade
< rcurtin> :)
< rcurtin> I do wish the boost internals were better commented, sometimes it can be really difficult to read
< rcurtin> although given that it is template metaprogramming, it is already difficult to read no matter what
< sumedhghaisas_> ahh and about your suggestion of using cube... If what marcus says is true and at variable length sequence we have a reshape... then the cost is huge
< sumedhghaisas_> for both cube and vector
< sumedhghaisas_> so I was trying std::list... what do you think?
< sumedhghaisas_> I agree. Did you know that templates are turing complete? its like c++ has 2 languages built inside it.
< rcurtin> ah searching google for 'std list' is not a great idea
< sumedhghaisas_> :P I checked
< sumedhghaisas_> I meant linked list... so overhead of reshape
< rcurtin> if you never need to access a particular index of the list, and always only iterate over it from the end or beginning, then I agree, std::list is a better call
< rcurtin> and std::list is definitely better if a reshape or resize operation will be taking place
< rcurtin> but I thought that the size of the cube would be fixed for a given network
< sumedhghaisas_> yes... thats what I noticed... so if you think about it... both in backward and gradient
< rcurtin> it is possible that there is a detail here I am overlooking, I still have to bring myself fully up to speed on GRUs
< sumedhghaisas_> as we go through BPTT
< sumedhghaisas_> we only need access to consecutive outputs in BPTT
< sumedhghaisas_> never random
< rcurtin> right, in that case list is a better choice than vector
< sumedhghaisas_> and std::list is a doubly linked list.. so that solves the problem
< rcurtin> I agree
< sumedhghaisas_> okay. working on that now...
< rcurtin> like I said I still need to understand why a resize may be necessary, but I will try and put time into that tomorrow
< sumedhghaisas_> yes... me too.
< sumedhghaisas_> I am confused about that. but you what? that problem eventually boils down to the use 'rho'.
< sumedhghaisas_> if 'rho' is just breaking BPTT then resize is definately needed. Cause we may have longer chains than 'rho' but just backpropagating till 'rho' length then resetting the backpropagation
< sumedhghaisas_> but like I said if thats the case I think the implementation of Backward and Gradient may be wrong
< sumedhghaisas_> we have to discuss this tomorrow with zoq
< rcurtin> sure, I will be around to chat, or if I am stuck in a meeting, I'll read through the logs later
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
sumedhghaisas_ has joined #mlpack
mikeling has joined #mlpack
govg has joined #mlpack
govg has quit [Client Quit]
govg has joined #mlpack
< ironstark> rcurtin: I am still facing issues. That's why I benchmarked a new library this week rather than upgrading shogun. Also I read the shogun implementations and they already specify the options available so I dont think upgrading implementations of that library is necessary.
sumedhghaisas_ has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
kris1_ has joined #mlpack
kris1_ has quit [Quit: kris1_]
kris1_ has joined #mlpack
kris1_ has quit [Client Quit]
kris1_ has joined #mlpack
kris1_ has quit [Quit: kris1_]
kris1_ has joined #mlpack
kris1_ has quit [Client Quit]
kris1_ has joined #mlpack
kris1_ has quit [Quit: kris1_]
kris1_ has joined #mlpack
< zoq> sumedhghais: About the reber grammar test, you are right about the rho parameter, we should update the parameter for each step, right now the number of previous steps kept in memory is constant. There are cases where you are only interested in the last steps but in this case, we are interested in all previous steps. Meaning it's something we have to fix.
< zoq> About the rho parameter, it breaks the sequence but also the amount of backpropagation steps to take back in time. As pointed out before there are cases where you could limit the number of steps to take back in time so the rho value in the model should be different from the value of the recurrent or LSTM layer, but that's not implemented right now. So, the rho values that is passed in the RNN class is just a
< zoq> cheap way to figure out the initial sequence length, without looking at the actual input size of the first layer, so the value might be misleading if you don't want to take every previous step into account.
< zoq> I guess we could somehow figure out if we should take the complete steps into account or not, like in Torch you can just set rho to 99999 to achieve the same, but I'm not sure that's necessary, and I think if we know the rho parameter we can speedup the process e.g. by preallocating memory.
< zoq> About adding another constructor for arma::field and arma::mat, I guess it makes sense, we could speedup the optimization process. I thought about the problem some time ago and figured I like to provide the same interface for all models and went with the arma::mat option, but I think if someone likes to train the model on sequences with a variable size it makes sense to add another constructor.
< zoq> About arma::cube and the conv layer, some time ago the conv layer supported arma::cube and arma::mat as input, but that comes with the problem that you have to take care that every layer is capable of handling both types, which is a burden that I wanted to avoid. I think we can completely avoid the use of arma::cube inside the conv layer, we have to refactor the convolution operation but I think that is
< zoq> something we should do anyway.
< zoq> About the figuring out if the input to the batch norm is 2D or 3D, would be relatively easy to figure that out at runtime (by taking a look at the inputWidth/outputWidth), but at compile time is another story.
vivekp has quit [Ping timeout: 255 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 246 seconds]
vivekp has joined #mlpack
shikhar has quit [Quit: WeeChat 1.7]
kris1_ has quit [Quit: kris1_]
kris1_ has joined #mlpack
< rcurtin> ironstark: ok, I can try and help you today if you would like to take some time to do that
sgupta has quit [Ping timeout: 260 seconds]
shikhar has joined #mlpack
< rcurtin> zoq: sumedhghaisas: I think looking at inputWidth/outputWidth at runtime is fine, it might give a slowdown but I suspect that a single "if" statement is going to have pretty negligible effect there
sgupta has joined #mlpack
< kris1_> if there is function like function fnc1(arma::mat && input){ fnc2(input)) where fnc2(arma::mat&& a){ a = 0 } then in fnc1 should i call fnc2(std::move(input)) since input is already a rvalue when fnc1 gets it why do we need std::move again
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2563 (master - f7df912 : Ryan Curtin): The build was broken.
travis-ci has left #mlpack []
kris1_ has quit [Quit: kris1_]
< rcurtin> kris1_: this has to do with reference collapsing---the compiler will collapse arma::mat&& & into arma::mat&, so you need the second std::move() to cause that to not happen
< rcurtin> http://thbecker.net/articles/rvalue_references/section_01.html is a good rvalue reference tutorial although it is quite long
kris1 has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2564 (master - b3e963f : Marcus Edel): The build was broken.
travis-ci has left #mlpack []
sumedhghaisas_ has joined #mlpack
< ironstark> rcurtin: can you please help me with the shogun install
< ironstark> I also wanted to know the initial centroid values you were talking about in the PR comment. I wanted to know the initial values that are used by mlpack for centroids
< ironstark> I got the second part. Just need help with the shogun install now.
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< sumedhghaisas_> zoq: Hey Marcus...
< sumedhghaisas_> have couple of questions for you if you are free?
< zoq> sumedhghais: Kinda, please go ahead.
shikhar has quit [Quit: WeeChat 1.7]
< kris1> thanks rcurtin i will have a look
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
< rcurtin> ironstark: sounds good, I am happy to help
< rcurtin> but I think I have meetings for the next two hours
< rcurtin> so the response might be a little slow :(
< rcurtin> can you tell me what the issue is and how to reproduce it?
kris1 has quit [Quit: kris1]
< ironstark> I'll try to build shogun once again
< ironstark> and as issues pop up will ask you
< rcurtin> sure, I will try to answer quickly
kris1 has joined #mlpack
shikhar has joined #mlpack
< ironstark> this is the error message
mikeling has quit [Quit: Connection closed for inactivity]
< rcurtin> ok, thanks, let me take a look momentarily
sumedhghaisas_ has joined #mlpack
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
shikhar has quit [Quit: WeeChat 1.7]
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< rcurtin> ironstark: I can't reproduce the issue; I tried to build on slake (one of the benchmarking systems)
< rcurtin> I configured with 'cmake -DBUILD_META_EXAMPLES=OFF ../'
< rcurtin> I'll try reconfiguring with the exact options from shogun_install.sh
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< rcurtin> I found that whoever reinstalled gekko had only touched sda, not sdb
< rcurtin> and since the system was running RAID1...
< rcurtin> I just booted a rescue CD into sdb and am rebuilding the RAID1 :)
< zoq> nice, mirroring for the win :)
sumedhghaisas has joined #mlpack
< rcurtin> ironstark: I can't reproduce the problem on slake, even with the same CMake parameters as shogun_install.sh
< sumedhghaisas> zoq: Hey Marcus... There?
< zoq> sumedhghais: yes
< sumedhghaisas> Hey... bunch of questions for you.
< sumedhghaisas> haha
< zoq> sure, go ahead
< sumedhghaisas> okay I will start with the most basic and the hardest. So exactly what is the purpose of 'rho'? does it define the sequence length? or is it used to break the BPTT chain?
< zoq> Have you seen my messages: http://www.mlpack.org/irc/? Both, it breaks the sequence but also defines the amount of steps.
< sumedhghaisas> ahh no I didn't get them. Sorry
< sumedhghaisas> Give me a moment to go through them
< sumedhghaisas> ahhh makes sense now...
< sumedhghaisas> so that was my confusion... the rho of the model can be different than the rho of layers right?
< zoq> yes
< zoq> as I said the name is kinda misleading
< sumedhghaisas> okay then I have a reason to believe that the implementation of Backward might be worng
< zoq> open for ideas to rename the parameter
< sumedhghaisas> so in Backward of LSTM we reset backwardStep to 0
< sumedhghaisas> and then access outParameter.size() - backwardStep - 1...
< sumedhghaisas> so we are technically accessing only the last 'rho' section...
< sumedhghaisas> always...
< sumedhghaisas> But Forward is implemented correctly to support multiple 'rho' section in a sequence
< sumedhghaisas> that was another reason whi I had this confusion
< zoq> hm, I can't find outParameter.size() - backwardStep - 1
< zoq> can you point me to the line?
< zoq> backwardStep should increased in each step
< sumedhghaisas> okay wait...
< zoq> and backwardStep should be reset to zero if rho is reached
< sumedhghaisas> ahh sorry... its cellParameter in LSTM
< sumedhghaisas> so here in this code we access the output to use in backward using backwardStep... but imagine 'rho' for the model is 100 and for the LSTM is 20
< sumedhghaisas> so there are 5 parts in BPTT
< sumedhghaisas> but cellParameter.size() - backwardStep - 1 will always give the last 'rho' part
< sumedhghaisas> cause we are resetting the backwardStep
< zoq> yes, you are right, model rho != layer rho is something that isn't handled at the moment.
< sumedhghaisas> ahh okay. Okay so that doubt is clear for me now
< sumedhghaisas> and I shifted the implementation to list with my last commit ... it is slower for fixed rho...
< sumedhghaisas> but according to my calculation it should be faster on variable rho...
< zoq> I can't see an easy fix for the problem and I'm not sure it's that important.
< zoq> hm, okay, is the difference huge?
< sumedhghaisas> 2 secs
< zoq> oh, okay
< sumedhghaisas> but wait... I compared it the older build
< sumedhghaisas> 1.5 sec
< sumedhghaisas> cause changing the parameters have a huge effect on the runtime#
< sumedhghaisas> wait... lets see if boost::list is faster than std::list
< zoq> yes, I guess we could use cube if rho is fixed
< zoq> and provide some alias e.g. GRUConst or something like that
< zoq> do you test in debug or release mode?
< sumedhghaisas> oops didn't check that... I guess the default is Debug right?
< sumedhghaisas> maybe the difference is not that much in Release
< sumedhghaisas> okay. WIll check that and get back to you...
< zoq> release
< zoq> is the default
< sumedhghaisas> ahh then I checked in release...
< sumedhghaisas> okay... 2 different implementations then
< sumedhghaisas> same for LSTM?
< sumedhghaisas> another thing to discuss. So Evalaute is called in Gradient call... why is that?
< zoq> if we go that way
< sumedhghaisas> which? GRUConst?
< zoq> Gradient (Backward pass Gradient calculation) is called inside the optimizer class and Evalute performs the Forward pass.
< zoq> yes, GRUConst
< sumedhghaisas> yes but Evaluate is also called in the optimizer right before the gradient.
< sumedhghaisas> So why do we need to call it again in Gradient? I am confused
< sumedhghaisas> It might be relevant if we are using the parameters that are being passed
< zoq> It's only called to get the initial performance
< sumedhghaisas> umm sorry? didn't get that
< sumedhghaisas> so gradient descent based optimizers will call Evaluate - Gradient - Evalaute - Gradient - .... In this order right?
< zoq> ah, gradient descent, I was talking about sgd
< zoq> let's see
< sumedhghaisas> is it different for sgd? LET ME SEE
< zoq> looks the same
< sumedhghaisas> ahh... but the first call to Gradient will fail
< sumedhghaisas> cause the calls are Gradient - Evaluate - Gradient - Evalaute -...
< zoq> is the first Gradient call the one of the optimizer or the inside the RNN class?
< sumedhghaisas> ummm... sorry didn't get that
< zoq> one step should be Evaluate (Forward) - Gradient (Backward/Gradient) right?
< sumedhghaisas> yup...
< sumedhghaisas> thats what I was thinking
< sumedhghaisas> but the Gradient function inside RNN... also internally calls Evaluate
< sumedhghaisas> I think that may be unnecessary
< zoq> but when we remove that step who is calling the Forward pass?
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#2567 (master - 9dab9c6 : Ryan Curtin): The build is still failing.
travis-ci has left #mlpack []
< sumedhghaisas> the optimizer... it is calling the Evaluate...
< zoq> yes, to keep track of the progress, after we updated the parameters
< zoq> maybe I missed something :)
< zoq> we are talking about this line:
< zoq> and this:
< zoq> right?
< sumedhghaisas> yup... so imagine the second iteration of sgd... first is little bit tricky to handle
< sumedhghaisas> so for the second iteration... the Gradient will be called first... right?
< sumedhghaisas> although if you think about it... we don't have to use Forward pass there as it is already done in the first iteration of sgd... for the special case where the function value is 1
< sumedhghaisas> sorry number of functions to be 1
< zoq> yes, in this case the extra Evaluation function is unnecessary
< sumedhghaisas> there is another extra call... so at the end of sgd ... the overall objective is computed again...
< sumedhghaisas> it would be for each taining sequence....
< zoq> To get the overall performance over the complete dataset.
< sumedhghaisas> but for RNN we are calling Train on each training sequence individually right?
< zoq> So do you say since we call Evaluate after the Gradient step in iteration > 1 we don't have to call Evaluate in the Gradient step inside the model?
< sumedhghaisas> in this scenario... yes
< zoq> we don't necessary call Train for each sequence, I did that becasue the RNN class can't handle arma::field
< sumedhghaisas> yeah... I was thinking the same
< zoq> if it could handle arma::field we could avoid to Evaluate twice in each step
< sumedhghaisas> yes... thats what I was going to suggest... took too long sorry :)
< zoq> I think I get your point, you are talking about training a single sequence
< sumedhghaisas> but still the problem remains...
< sumedhghaisas> still we have 2 Evaluate calls in 1 iteration...
< sumedhghaisas> So imagine... SGD starts with 1 iteration
< sumedhghaisas> 1 training example
< zoq> yes
< sumedhghaisas> and in that iteration both the calls are happening with the same example...
< sumedhghaisas> so they are the same
< zoq> yes, so we could handle that inside the RNN class, right?
< sumedhghaisas> yeah... But I could not figure out how. No elegant solution I can see
< zoq> and avoid the extra Evaluate call if numfunctions = 1?
< sumedhghaisas> ohh.. but we avoid that with support of field right?
< sumedhghaisas> I was assuming the numFunctions will always be more than 1
< zoq> I guess, it would still be usefull
< sumedhghaisas> okay. That case is easy to handle. But how to avoid the extra call when numFunctions are greater than 1?
< zoq> for sgd we pass the index of the current sample, so we could use that parameter to keep track of already evaluated samples?
< sumedhghaisas> ahh yes...
< zoq> but for gd, hm ...
< sumedhghaisas> true... gd will pose a problem
< sumedhghaisas> what about this?
< sumedhghaisas> so we create 2 different Evaluate functions
< sumedhghaisas> 1 always returns the last Forward pass result
< sumedhghaisas> and 1 actually does the forward pass
< sumedhghaisas> according to all the current optimizers that can be associated with ANN
< sumedhghaisas> this method will work#
< zoq> good idea
< sumedhghaisas> and this works also for FNN and CNN
< zoq> yes, right, nice solution
< sumedhghaisas> okay so I will try to fix this first. Task number 1
< sumedhghaisas> So now to batch norm... sorry I am taking too much of your time :)
< zoq> nah, no worries
< sumedhghaisas> okay so I didn't quite get how do you want me to determine if the matrix is 2D or 3D? I mean at dynamic time
< zoq> I think you can figure it out by looking at inputHeight/outputWidth, which is set automatically
< sumedhghaisas> ahh... that is done at the initialization you mean?
< zoq> yes
< sumedhghaisas> so if inputHeight > 1 then 3D or else 2D?
< zoq> I'm not quite sure that's enough information, yes if inputHeight/inputWidth > 3D
< zoq> > 1 -> 3D
< sumedhghaisas> ahh okay. Got it.
< sumedhghaisas> Okay that task I can do now.
< zoq> Maybe there is a better solution, we should think about it.
< sumedhghaisas> you mean static time check?
< zoq> yes
< sumedhghaisas> I will think about it. Okay so on a final note
< sumedhghaisas> I will right now implement GRUConst
< sumedhghaisas> maybe I can try creating a variable length 1... do you have a test in mind for that?
< sumedhghaisas> I can try tonight#
< zoq> Konstantin is working on a neat test over here: https://github.com/mlpack/mlpack/pull/1005
< sumedhghaisas> ahh he is also implementing the Neural Turing Machine?
< zoq> he is working on HAM
< sumedhghaisas> ahh okay. Okay I saw those tests but can they be achieved without memory?
< sumedhghaisas> I just wanted to test the variable length for LSTM and GRU
< sumedhghaisas> can I create a variable length reber grammar test?
< zoq> Hierarchical Attentive Memory - LSTM/GRU should handle sequnces up to >= 50
< sumedhghaisas> ahh ... so they should pass the add test... okay will try that.
< sumedhghaisas> and his tests are going to help me a lot in NTM :)
< zoq> yes, maybe we can add even more tasks in the future
< zoq> and yes I think you could also use the Reber grammer test
< zoq> I have to check the data, the size should be variable
kris1 has quit [Quit: kris1]