ChanServ changed the topic of #mlpack to: "Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there."
cjlcarvalho has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#5560 (master - d3e2227 : Ryan Curtin): The build has errored.
travis-ci has left #mlpack []
Shravan has joined #mlpack
< Shravan> Hello people
Shravan has quit [Client Quit]
davida has quit [Ping timeout: 256 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 246 seconds]
vivekp has joined #mlpack
mrohit[m] has quit [Ping timeout: 250 seconds]
mrohit[m] has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
< zoq> Shravan: Hello there!
vivekp has quit [Ping timeout: 252 seconds]
vivekp has joined #mlpack
robertohueso has joined #mlpack
vivekp has quit [Ping timeout: 252 seconds]
vivekp has joined #mlpack
davida has joined #mlpack
< davida> zoq: Regarding setting intial value of rho and changing it during training, what is rho'
< davida> s impact on the cube input? Does the number of slices in the cube have to match rho?
cjlcarvalho has quit [Ping timeout: 246 seconds]
< zoq> The number of steps to backpropagate through time. The number of slices should be >= rho.
< davida> zoq: Thx.
< ShikharJ> rcurtin: Are you there?
< rcurtin> yeah---sort of
< rcurtin> I am at a doctor's appointment this morning so lots of waiting...
< davida> zoq: I am setting model.Rho() = currentRho (size of my word - always shorter than number of slices in my cube) before I call model.Train(X,Y, optimizer). My optimizer is SGD with a clipped VanillaUpdate with batchsize and nbrIterations set to 1. I think that is all correct but I am getting a matrix multiplication error 50x1 and 0x0. All I have really done is reduced my list of 1500 datapoints down to 1. Any idea what might be wrong?
davida has quit [Ping timeout: 256 seconds]
davida has joined #mlpack
< davida> zoq: apologies if you replied to my last question. I got disconnected so lost all history. I am having difficulty with the changing of Rho on the RNN. Getting matrix multiplication error 50x1 and 0x0. Not sure how it ends up with a 0x0 matrix. The 50x1 makes sense as I hve 50 nodes and only one datapoint now.
< rcurtin> davida: just FYI the channel is logged at http://www.mlpack.org/irc/ so you can see any responses even if you are not in the channel
< rcurtin> (in this case zoq didn't reply while you were out of the channel)
< davida> rcurtin: Thx.
< davida> zoq: I have narrowed the problem down to the line: model.Rho() = currentRho. When I comment this line out it works fine (i.e. uses all slices in my cube). Is there something else that needs to be done to update Rho on the model?
< davida> zoq: some addiitonal info after debugging. I tried to fix my "currentRho" to different numbers, say 10, 15, 20. All failed with matrix multiplication problem. The only number that works is when "currentRho" matches the original "rho" set when creating the model with RNN<> model(rho). It looks like somewehre in the RNN code the original rho is being used and not being reset when the user changes it.
< ShikharJ> rcurtin: Does every parameter in a class need to be serialized that is being used in the Forward and Evaluate routines (and is not being updated in Reset())?
< ShikharJ> rcurtin: I see that in GAN class, we're making use of certain variables in Forward() and Evaluate() which are not serialized, which either need to be set using the Reset() or need to be explicitly serialized right?
< rcurtin> ShikharJ: right, exactly. serialize() should be able to take an entirely uninitialized object and restore it to the same state as what's been saved
< rcurtin> it's not always true that every parameter will need to be serialized, since some (like "reset" for instance) may always be one value after being loaded
< rcurtin> but probably most of those other GAN parameters need to be serialized. It's not too hard to write a serialization test, which is often helpful at debugging problems with serialization
< rcurtin> for the more complex classes good serialization tests can be hard though (which is actually the reason for that serialization PR fix in the first place---the serialization didn't handle a case we weren't testing for)
< ShikharJ> rcurtin: Alright, that made sense to me, I'll send a PR for the GAN related changes tomorrow?
< ShikharJ> rcurtin: When do you plan on doing the release?
< ShikharJ> I think we should merge your PR for now.
< rcurtin> ShikharJ: probably late this week or early next week; there are just a few outstanding bugs that still need to be handled
< rcurtin> I've been pretty swamped fixing lots of bugs and also working on ensmallen so I have been less productive towards this release than I hoped :)
< davida> How can I train my RNN, then get it to predict an outcome based on a shorter set of time-steps? I don't want to pad my input time slices. Is there a way to do this?
< davida> I basically need to train and utilise my RNN with variable lengths of inputs and get predictions based on variable length inputs as well. However, it seems when I pad with zeros then I cannot get the network to train at all.
< davida> I could get it to work by hand coding the Forward and Backward propagation steps but I want to use MLPACK for efficiency and also for future more complex networks.
< davida> What optins are there?
< davida> *options
< rcurtin> davida: sorry that I haven't been following the conversation much, and I don't know if I will be too helpful here because I am not too familiar with the RNN implementation
< rcurtin> but what happens if you pass in non-padded data that has less than rho slices?
< davida> The software gives me matrix mutliplcation errors (mentioned above to zoq but I guess he is busy).
< davida> And Rho is only for BPTT.
< davida> In the predict mode, we are doing forward propagation only.
< rcurtin> line 160 of rnn_impl.hpp seems to loop between 0 and rho for each point to do prediction
< rcurtin> I suspect that if this was changed to 'seqNum < predictors.n_slices' instead of 'seqNum < rho', then it would work in the way you want it to
< rcurtin> however, I am not sure---zoq will have to verify if this would be a reasonable change to make
< davida> I did see that, but somewhere there is an exception being thrown whenever rho != n_slices
< rcurtin> hm, in this case, I am not sure; let's wait for his response
< rcurtin> sorry I am not more helpful here...
< davida> ok, thx. Is zoq in European time?
< rcurtin> yeah, he is in Berlin
< robertohueso> Oh I was in Berlin this weekend :)
< rcurtin> I have never been, but I would love to go at some point in the future. I hear it is a beautiful city and a lot of fun to visit
< zoq> davida: Besides setting seqNum < predictors.n_slices you also have to modify the rho parameter inside the recurrent layer. You could manually reset the value if you train one epoch at a time.
< zoq> Ideally we reset the rho from within the rnn class. I'll see if I can do this in the next days. Supporting arbitrary sequences is definitely useful.
< davida> zoq: Thx. My recurrent layer is defined like this: Recurrent<>* recurrent = new Recurrent<>( add, lookup, linear, sigmoidLayer, rho); How do you change rho based on the pointer *recurrent ?
< davida> I actually thought the model(rho) and the recurrent( ..., ..., ..., ..., rho) were the same thing.
< zoq> davida: Add size_t& Rho() { return rho; } to the recurrent layer and do recurrent->Rho() after the size changed; same for the RNN class.
< davida> I see. I need to change the implementation.
< zoq> davida: It's the same idea, but in some cases you like to use a different value for each layer or model.
< zoq> davida: Unfortunately, yes.
< rcurtin> zoq: happy to wait on a patch to release 3.0.4, or alternately we can release 3.0.5 shortly after 3.0.4 with the fix
< davida> OK - so I need to change both the model.Rho() and then the recurrent->Rho() if I wish to limit the number of slices used in Backward. Does this also affect Predict()?
< zoq> davida: Predict as well, right.
< zoq> davida: Not sure what your plans are but I like the idea to do the coursera homework in mlpack and publish each one as a simple tutorial.
< zoq> rcurtin: Absolutely, I think I can figure this out in the next days.
< zoq> davida: Not sure it's okay to publish solutions for the' homework'?
< davida> That is what I am trying to do. Getting the code from the Python/TensorFlow to C++/MLPACK.
< davida> I could ask Andrew Ng if he was OK that we do that.
< zoq> ah nice, I'll let you know once I have a patch
< zoq> davida: Thanks, let's see what he thinks about the idea.