#mlpack on 2018-07-01 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:37 witness_ has quit [Quit: Connection closed for inactivity]

07:10 vivekp has joined #mlpack

08:05 witness_ has joined #mlpack

09:51 < ShikharJ> zoq: Are you there?

10:11 < jenkins-mlpack> Yippee, build fixed!

10:11 < jenkins-mlpack> Project docker mlpack nightly build build #366: FIXED in 2 hr 57 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/366/

10:27 < zoq> ShikharJ: Great results, will take a look at the PR later today.

10:44 witness_ has quit [Quit: Connection closed for inactivity]

10:48 < ShikharJ> zoq: I was hoping if we could discuss the design for the DualOptimizer PR? If you'll be free later during the day, please ping me then.

10:49 < zoq> ShikharJ: If you like we can talk about the design now.

10:52 < Atharva> zoq: For VAE models, MNIST data is needed, should we store it as csv in the models repo or should we store it as the original idx-ubyte format

10:52 < Atharva> Another option can be to download it the first time someone builds the models repo

10:53 < ShikharJ> Atharva: I have the tranposed MNIST data available in csv format, let me know if you need it.

10:53 < Atharva> ShikharJ: What do you mean by transposes here?

10:54 < Atharva> transposed*

10:54 < ShikharJ> Every image is a column of size 784. The original dataset had images laid across rows.

10:54 < zoq> Atharva: csv is fine hdf5 might be another option, we could compress the dataset and uncompress it as a build step, something like: https://github.com/mlpack/mlpack/blob/6fd5e527b54bf83993f98f8ea894734aa620bb62/src/mlpack/tests/CMakeLists.txt#L185-L189

10:55 < ShikharJ> So the dataset has dimensions 70,000 x 784 instead of 784 x 70,000.

10:55 < Atharva> ShikharJ: got it :), I thought by transposed you meant something else in this case

10:56 < Atharva> zoq: Thanks, can armadillo load hdf5 files?

10:58 < zoq> Atharva: Yes, build you have to build armadillo with hdf5 support

11:00 < Atharva> zoq: Okay, If hdf5 doesn't have a size much lesser tan csv, I guess it's better to go with csv as otherwise people will have to build armadillo differerntly for this one task

11:02 < zoq> Atharva: Agreed.

11:02 < Atharva> ShikharJ: Where do you have it? on some repo?

11:02 < ShikharJ> Atharva: On my laptop, and on savannah server.

11:03 < Atharva> Can you give me the link?

11:03 < ShikharJ> Atharva: If there's a server you need it on, I can scp the zip file?

11:03 < ShikharJ> Atharva: The zip is about 17 MBs, so maybe I can send it over mail as well

11:04 < zoq> ShikharJ: We can put it in the jenkins folder.

11:05 < ShikharJ> zoq: I'm not sure if I understand what you mean by jenkins folder?

11:06 < ShikharJ> Did you mean jenkins-conf repository?

11:06 < zoq> If we move the file in the jenkins workspace, somene can download it over http.

11:07 < zoq> This is just another possibility.

11:08 < ShikharJ> zoq: Ideally we should try and upload the dataset directly to mlpack/models?

11:08 < zoq> ShikharJ: Agreed, what's the size of the compressed dataset (tar.gz)?

11:09 < ShikharJ> zoq: I only have the zip file with me, and that's about 17 MBs.

11:10 < ShikharJ> I'm not sure if tar ball would have a significantly different size or not.

11:11 < zoq> ShikharJ: Right, it's just that on some systems you have to install unzip or something like that to extract the archive.

11:12 < ShikharJ> zoq: I'll upload the tarball to mlpack/models. Atharva, you can download from there then.

11:12 < zoq> ShikharJ: Great, thanks!

11:17 < Atharva> ShikharJ: Thanks!

11:18 < ShikharJ> Interesting, the tarball is about 13.9 MBs in size.

11:19 < Atharva> How much had you thought it should be?

11:20 < ShikharJ> Atharva: I wasn't expecting it to be about 5/6th the size of a zip file.

11:20 < Atharva> Ohh, okay

11:37 < ShikharJ> zoq: I have opened a pull request for the dataset. Atharva, you can download from the same once it is merged.

11:38 < ShikharJ> zoq: Are you still around, maybe we can discuss the design of the dual optimizer further?

11:39 < zoq> ShikharJ: Sure, I'm here.

11:41 < ShikharJ> zoq: As far as I can see, when we create an optimizer object, it internally calls the optimize function. But what that optimize function does is not clear to me.

11:44 < ShikharJ> zoq: More specifically, what functionality does optimize() offer?

11:45 < zoq> ShikharJ: The optimizer will call the Evaluate function to get the current loss, in our case this will run the Forward pass. Afterwards the optimizer will call the Gradient function to get the gradients for the update step, so this will call the Forward/Backward and Gradient function in case of a network.

11:45 < zoq> Perhaps: http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html is helpful as well.

11:45 < zoq> See the example optimizer.

11:46 < zoq> In our case ObjectiveFunction is the GAN class.

11:49 < ShikharJ> zoq: Alright, so if we create two optimizers, we'll need to provide two evaluate, two gradient, and two forward functions (because we're calling discriminator and generator forward at the same time inside GAN::Forward) right/

11:49 < ShikharJ> ?

11:51 < zoq> That's an option, but we could use the same Evalaute function if we could distinguish between the functions somehow.

11:55 < zoq> ShikharJ: If you like I can write a dummy class with the idea I have in mind.

11:55 < ShikharJ> zoq: Hmm, I can't think of an idea that would be fast regarding this, do you have an idea how can this be achieved. I can only think of a template based solution, but that woud lead to runtime loss.

11:55 < ShikharJ> zoq: Please go ahead, I can't think of a good solution for this problem.

11:56 < zoq> Sure, maybe I missed something :) I'll see if I can put some time into the idea later today.

11:57 < ShikharJ> zoq: Also regarding the models directory, what name would you prefer? Maybe a generic name like datasets/ ?

11:57 < zoq> yeah, or data

11:57 < ShikharJ> I'll make the changes and push again.

11:58 < zoq> ShikharJ: Okay, thanks again.

14:17 petris has quit [Remote host closed the connection]

14:28 petris has joined #mlpack

14:45 petris has quit [Remote host closed the connection]

14:47 petris has joined #mlpack

17:59 witness_ has joined #mlpack

18:12 < ShikharJ> zoq: I was thinking, in the meantime, we can try completing the RBM PR?

18:17 < zoq> ShikharJ: I think that is a great idea :)

18:20 < ShikharJ> zoq: Great, I have updated the WGAN PR, and I'll update the blog with a post as well.

18:23 < zoq> ShikharJ: Awesome, I'll take a second look at the changes later today; and we should be able to merge the code in the next days.

18:31 < Atharva> ShikharJ: The dataset you uploaded doesn't have the labels, we don't need it for generative models but I guess it will be better to have them.

18:32 < Atharva> Or do you need it for GANs, sorry I am not sure

18:32 < Atharva> ?

18:33 < ShikharJ> Atharva: Yeah, I did that to prevent having problems for the GAN code. We can provide them separately, I have the original full dataset as well.

18:33 < Atharva> Oh cool, maybe you could commit the csv with labels to the PR :)

18:34 < Atharva> No rush, do it when you get time

18:35 < ShikharJ> Atharva: For most other cases, the kaggle_train_test_dataset.zip should suffice, I don't really see a need for adding labels, as it has them I think.

18:35 < Atharva> Ahh yes, it has. No problem then.

18:51 vivekp has quit [Ping timeout: 245 seconds]

19:39 ImQ009 has joined #mlpack

19:57 petris has quit [Quit: Bye bye.]

19:59 petris has joined #mlpack

20:54 ImQ009 has quit [Quit: Leaving]

22:10 < Atharva> sumedhghaisas: You there?