#mlpack on 2018-10-30 — irc logs at libera.irclog.whitequark.org

2018-08-06 04:28 ChanServ changed the topic of #mlpack to: Due to ongoing spam on freenode, we've muted unregistered users. See http://www.mlpack.org/ircspam.txt for more information, or also you could join #mlpack-temp and chat there.

01:56 cjlcarvalho has joined #mlpack

02:05 vivekp has quit [Ping timeout: 240 seconds]

02:25 travis-ci has joined #mlpack

02:25 < travis-ci> mlpack/mlpack#5560 (master - d3e2227 : Ryan Curtin): The build has errored.

02:25 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/f3b237936ef1...d3e2227abb3b

02:25 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/448115988

02:25 travis-ci has left #mlpack []

02:46 Shravan has joined #mlpack

02:46 < Shravan> Hello people

02:47 Shravan has quit [Client Quit]

03:35 davida has quit [Ping timeout: 256 seconds]

05:40 vivekp has joined #mlpack

05:46 vivekp has quit [Ping timeout: 246 seconds]

05:48 vivekp has joined #mlpack

06:18 mrohit[m] has quit [Ping timeout: 250 seconds]

06:21 mrohit[m] has joined #mlpack

06:45 vivekp has quit [Read error: Connection reset by peer]

07:09 vivekp has joined #mlpack

08:04 vivekp has quit [Ping timeout: 240 seconds]

10:00 vivekp has joined #mlpack

10:06 < zoq> Shravan: Hello there!

10:16 vivekp has quit [Ping timeout: 252 seconds]

10:19 vivekp has joined #mlpack

10:32 robertohueso has joined #mlpack

10:48 vivekp has quit [Ping timeout: 252 seconds]

10:54 vivekp has joined #mlpack

13:06 davida has joined #mlpack

13:12 < davida> zoq: Regarding setting intial value of rho and changing it during training, what is rho'

13:12 < davida> s impact on the cube input? Does the number of slices in the cube have to match rho?

13:19 cjlcarvalho has quit [Ping timeout: 246 seconds]

13:28 < zoq> The number of steps to backpropagate through time. The number of slices should be >= rho.

13:49 < davida> zoq: Thx.

13:53 < ShikharJ> rcurtin: Are you there?

14:39 < rcurtin> yeah---sort of

14:39 < rcurtin> I am at a doctor's appointment this morning so lots of waiting...

15:01 < davida> zoq: I am setting model.Rho() = currentRho (size of my word - always shorter than number of slices in my cube) before I call model.Train(X,Y, optimizer). My optimizer is SGD with a clipped VanillaUpdate with batchsize and nbrIterations set to 1. I think that is all correct but I am getting a matrix multiplication error 50x1 and 0x0. All I have really done is reduced my list of 1500 datapoints down to 1. Any idea what might be wrong?

15:15 davida has quit [Ping timeout: 256 seconds]

15:28 davida has joined #mlpack

15:34 < davida> zoq: apologies if you replied to my last question. I got disconnected so lost all history. I am having difficulty with the changing of Rho on the RNN. Getting matrix multiplication error 50x1 and 0x0. Not sure how it ends up with a 0x0 matrix. The 50x1 makes sense as I hve 50 nodes and only one datapoint now.

15:40 < rcurtin> davida: just FYI the channel is logged at http://www.mlpack.org/irc/ so you can see any responses even if you are not in the channel

15:40 < rcurtin> (in this case zoq didn't reply while you were out of the channel)

15:42 < davida> rcurtin: Thx.

15:49 < davida> zoq: I have narrowed the problem down to the line: model.Rho() = currentRho. When I comment this line out it works fine (i.e. uses all slices in my cube). Is there something else that needs to be done to update Rho on the model?

16:12 < davida> zoq: some addiitonal info after debugging. I tried to fix my "currentRho" to different numbers, say 10, 15, 20. All failed with matrix multiplication problem. The only number that works is when "currentRho" matches the original "rho" set when creating the model with RNN<> model(rho). It looks like somewehre in the RNN code the original rho is being used and not being reset when the user changes it.

16:30 < ShikharJ> rcurtin: Does every parameter in a class need to be serialized that is being used in the Forward and Evaluate routines (and is not being updated in Reset())?

16:43 < ShikharJ> rcurtin: I see that in GAN class, we're making use of certain variables in Forward() and Evaluate() which are not serialized, which either need to be set using the Reset() or need to be explicitly serialized right?

17:39 < rcurtin> ShikharJ: right, exactly. serialize() should be able to take an entirely uninitialized object and restore it to the same state as what's been saved

17:39 < rcurtin> it's not always true that every parameter will need to be serialized, since some (like "reset" for instance) may always be one value after being loaded

17:39 < rcurtin> but probably most of those other GAN parameters need to be serialized. It's not too hard to write a serialization test, which is often helpful at debugging problems with serialization

17:40 < rcurtin> for the more complex classes good serialization tests can be hard though (which is actually the reason for that serialization PR fix in the first place---the serialization didn't handle a case we weren't testing for)

18:06 < ShikharJ> rcurtin: Alright, that made sense to me, I'll send a PR for the GAN related changes tomorrow?

18:06 < ShikharJ> rcurtin: When do you plan on doing the release?

18:06 < ShikharJ> I think we should merge your PR for now.

18:10 < rcurtin> ShikharJ: probably late this week or early next week; there are just a few outstanding bugs that still need to be handled

18:10 < rcurtin> I've been pretty swamped fixing lots of bugs and also working on ensmallen so I have been less productive towards this release than I hoped :)

19:17 < davida> How can I train my RNN, then get it to predict an outcome based on a shorter set of time-steps? I don't want to pad my input time slices. Is there a way to do this?

19:19 < davida> I basically need to train and utilise my RNN with variable lengths of inputs and get predictions based on variable length inputs as well. However, it seems when I pad with zeros then I cannot get the network to train at all.

19:21 < davida> I could get it to work by hand coding the Forward and Backward propagation steps but I want to use MLPACK for efficiency and also for future more complex networks.

19:21 < davida> What optins are there?

19:21 < davida> *options

19:22 < rcurtin> davida: sorry that I haven't been following the conversation much, and I don't know if I will be too helpful here because I am not too familiar with the RNN implementation

19:22 < rcurtin> but what happens if you pass in non-padded data that has less than rho slices?

19:24 < davida> The software gives me matrix mutliplcation errors (mentioned above to zoq but I guess he is busy).

19:24 < davida> And Rho is only for BPTT.

19:25 < davida> In the predict mode, we are doing forward propagation only.

19:27 < rcurtin> line 160 of rnn_impl.hpp seems to loop between 0 and rho for each point to do prediction

19:27 < rcurtin> I suspect that if this was changed to 'seqNum < predictors.n_slices' instead of 'seqNum < rho', then it would work in the way you want it to

19:27 < rcurtin> however, I am not sure---zoq will have to verify if this would be a reasonable change to make

19:28 < davida> I did see that, but somewhere there is an exception being thrown whenever rho != n_slices

19:28 < rcurtin> hm, in this case, I am not sure; let's wait for his response

19:28 < rcurtin> sorry I am not more helpful here...

19:29 < davida> ok, thx. Is zoq in European time?

19:31 < rcurtin> yeah, he is in Berlin

20:01 < robertohueso> Oh I was in Berlin this weekend :)

20:03 < rcurtin> I have never been, but I would love to go at some point in the future. I hear it is a beautiful city and a lot of fun to visit

20:19 < zoq> davida: Besides setting seqNum < predictors.n_slices you also have to modify the rho parameter inside the recurrent layer. You could manually reset the value if you train one epoch at a time.

20:19 < zoq> Ideally we reset the rho from within the rnn class. I'll see if I can do this in the next days. Supporting arbitrary sequences is definitely useful.

20:24 < davida> zoq: Thx. My recurrent layer is defined like this: Recurrent<>* recurrent = new Recurrent<>( add, lookup, linear, sigmoidLayer, rho); How do you change rho based on the pointer *recurrent ?

20:25 < davida> I actually thought the model(rho) and the recurrent( ..., ..., ..., ..., rho) were the same thing.

20:26 < zoq> davida: Add size_t& Rho() { return rho; } to the recurrent layer and do recurrent->Rho() after the size changed; same for the RNN class.

20:27 < davida> I see. I need to change the implementation.

20:27 < zoq> davida: It's the same idea, but in some cases you like to use a different value for each layer or model.

20:28 < zoq> davida: Unfortunately, yes.

20:29 < rcurtin> zoq: happy to wait on a patch to release 3.0.4, or alternately we can release 3.0.5 shortly after 3.0.4 with the fix

20:29 < davida> OK - so I need to change both the model.Rho() and then the recurrent->Rho() if I wish to limit the number of slices used in Backward. Does this also affect Predict()?

20:30 < zoq> davida: Predict as well, right.

20:31 < zoq> davida: Not sure what your plans are but I like the idea to do the coursera homework in mlpack and publish each one as a simple tutorial.

20:32 < zoq> rcurtin: Absolutely, I think I can figure this out in the next days.

20:32 < zoq> davida: Not sure it's okay to publish solutions for the' homework'?

20:33 < davida> That is what I am trying to do. Getting the code from the Python/TensorFlow to C++/MLPACK.

20:33 < davida> I could ask Andrew Ng if he was OK that we do that.

20:34 < zoq> ah nice, I'll let you know once I have a patch

20:53 < zoq> davida: Thanks, let's see what he thinks about the idea.