#mlpack on 2018-11-26 — irc logs at libera.irclog.whitequark.org

2018-11-12 22:39 ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/

00:00 mrohit[m] has quit [Remote host closed the connection]

00:10 mrohit[m] has joined #mlpack

06:24 vivekp has quit [Ping timeout: 250 seconds]

06:24 vivekp has joined #mlpack

06:50 Megatron3 has joined #mlpack

06:51 Megatron3 has quit [Client Quit]

07:12 < davida> rcurtin: The tests I have run so far on the Windows 10 setup with the latest pull of MLPACK (3.0.4 + any recent changes on the master) are showing no problems.

07:13 < davida> My DeepLearning.ai exercises are all converging correctly on Windows.

07:13 < davida> Now really hoping that zoq can release the change to RNN to allow different sized samples.

08:51 pd09041999 has joined #mlpack

09:16 zoq_ has joined #mlpack

09:18 gtank____ has joined #mlpack

09:25 gtank____ is now known as gtank___

10:33 adm64 has joined #mlpack

10:34 < adm64> is there a way to perform evaluation (see r-squared) with mlpack? i'm using decision foreset

11:09 zoq_ is now known as zoq

11:40 < davida> adm64: there is an MSE class in MLPACK with Evaluate() function. Can this help you? http://www.mlpack.org/docs/mlpack-3.0.4/doxygen/classmlpack_1_1cv_1_1MSE.html

11:41 < adm64> thanks @davida - anything in the command line? that's what we currently use

11:47 < ShikharJ> davida: That's good to hear!

11:49 pd09041999 has quit [Ping timeout: 268 seconds]

11:50 akhandait has joined #mlpack

11:53 < akhandait> zoq: Do we have some functionality for loading a dataset partially and continuously as we train a model. Or is loading the entire dataset at once before we begin training the only option? That leads to memory shortage and limits the size of models and datasets.

11:53 < akhandait> I am talking about something like the Dataloader functionality in pytorch

11:54 < davida> akhandait: Could you batch train in a loop (around model.Train()) and modify the dataset in each loop with the new input data?

11:56 < davida> adm64: Sorry, I am not familiar with the command line tools. I am using the MLPACK C++ library in my own code.

11:57 adm64 has quit [Quit: Page closed]

11:57 < akhandait> Yeah, but that will lead to a lot of overhead in loading the data each time. Also, we would need to break the dataset beforehand.

11:57 < akhandait> I don’t think that’s a tidy approach

11:57 < akhandait> We also can’t shuffle the data after each epoch then

11:58 < akhandait> davida:

11:59 < davida> akhandait: ... but isn't that what PyTorch Dataloader is basically doing under the hood anyway.

11:59 < davida> You can shuffle the data by setting the Shuffle = True in the optimizer flag.

11:59 < davida> optimizer paramter.

12:00 < davida> If I recall the code well, each loop thru' the total dataset will be shuffled.

12:01 < akhandait> davida: I really doubt that’s what the dataloader does, but I am not sure, I will check their source.

12:01 < davida> So if your dataset is 1000 and your batchsize is 100 and your maxIterations is 1000, each batch would get shuffled 100 times

12:02 < akhandait> But we will still need to break the datatset according to our batch size every time we want to train

12:02 pd09041999 has joined #mlpack

12:03 < davida> ... as it takes 10 batches of 100 to loop thru' your dataset. Excuse me the maxIterations would need to be larger since if I recall well maxIterations will be referenced against the batch size as well. So for 100 shuffles maxIterations would need to be 100000.

12:06 < akhandait> davida: I think that will work, but that will only shuffle the data in a batch and not the entire dataset before we make batches. So in every epoch, the batches will have the same data.

12:12 < davida> akhandait: I am not clear on what you mean by batch here. The code in the optimizer shuffles the entire dataset once per loop thru' the dataset. If you mean when you add more data to the dataset, then you could pre-shuffle the new dataset int he Armadillo matrix with some pretty simple code.

12:13 < davida> Use the arma::shuffle function

12:13 < davida> http://arma.sourceforge.net/docs.html#shuffle

12:18 < akhandait> That’s not quite what I was saying, but it’s okay. Thanks. I will try these things. :)

13:10 pd09041999 has quit [Ping timeout: 246 seconds]

13:23 pd09041999 has joined #mlpack

13:44 pd09041999 has quit [Ping timeout: 244 seconds]

13:45 < jenkins-mlpack2> Project docker mlpack nightly build build #135: STILL UNSTABLE in 8 hr 31 min: http://ci.mlpack.org/job/docker%20mlpack%20nightly%20build/135/

14:51 saurabh has joined #mlpack

14:52 saurabh has quit [Client Quit]

14:52 saurabh has joined #mlpack

15:00 saurabh97 has joined #mlpack

15:24 pd09041999 has joined #mlpack

15:25 pd09041999 has quit [Max SendQ exceeded]

16:05 saurabh has quit [Quit: Leaving]

16:05 saurabh97 has quit [Quit: Leaving]

16:30 akhandait has quit [Quit: Connection closed for inactivity]

18:28 < davida> zoq: Did you manage to make any progress on the RNN sequence inputs we talked about last week?

23:42 vivekp has quit [Ping timeout: 250 seconds]

23:45 vivekp has joined #mlpack