#mlpack on 2017-08-21 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

00:14 kris__ has quit [Quit: Connection closed for inactivity]

03:54 govg has quit [Ping timeout: 240 seconds]

04:10 < lozhnikov> kris1: I ran your code last night, it doesn't work. Here is the result https://usercontent.irccloud-cdn.com/file/NED2Qwcy/keras-results.png

04:12 < lozhnikov> kris1: I don't understand your idea. Right now we sample noise variables each Gradient() call. And batchSize and noiseSize are completely different variables. Could you elaborate a bit?

04:20 govg has joined #mlpack

05:54 Ramnath has joined #mlpack

05:55 < Ramnath> i have just started working on mlpack

05:55 < Ramnath> mlpack_logistic_regression -t dataset.csv -v. this command is not working for me

05:55 < Ramnath> need help

06:28 kris1 has joined #mlpack

06:40 < kris1> Lozhnikov: What i trying to say that.Discriminator is trained on m:1 ratio of real data to fake data. The problem with that i feel is that discriminator is only learning real values. We should have m:m ration for the discriminator.

06:43 < kris1> Also I tried the example by eric jang yesterday. It is learning too slowly imho.

06:44 < kris1> I think i should the implmentation to using 2 Training functions and check if thats working. What do you think.

06:47 < lozhnikov> kris1: Why did you decide that we sample real data more often than noise data? We sample noise data each Gradient() call

06:49 < kris1> Yup but in each call to Gradient. We actually end up computing Real data Gradient m times and Fake Data Gradient 1 time. So the ration is still m:1. For every Gradient call.

06:49 < lozhnikov> I think that the current optimizers architecture doesn't allow to implement 2 training functions without crutches

06:49 < lozhnikov> we get only one sample of real data at each call

06:51 < kris1> Hmmm ohh yes sorry ….. you are right….

06:53 < kris1> Just one thing will updating the generator. The discriminator gradient are they zero’s

06:53 < kris1> ie we should not train the discriminator wihen trianing the generator.

06:54 < lozhnikov> Ramnath: I guess you should specify the train labels, some test data and the output. For example:

06:54 < lozhnikov> ./bin/mlpack_logistic_regression -t train_nonlinsep.txt -l train_labels_nonlinsep.txt -T test_nonlinsep.txt -o output_test_labels.txt

06:54 < lozhnikov> Could you describe your issue more detailed?

06:55 < lozhnikov> kris1: in that case we omit a batch

06:55 < lozhnikov> kris1: as for me it is better to train the discriminator network too

06:56 < kris1> “In that case we omit a batch” Can you explain this a bit more.

06:57 < kris1> The paper states that when training the generator we should not train the discriminator.

06:58 < lozhnikov> the optimizer call the Gradient() function on a real sample. If we don't train the discriminator network we omit that sample

07:00 < kris1> But i think that is okay…. don’t you think we would have offset value for going through that batch again.

07:02 < lozhnikov> I think we shouldn't omit samples. Could you explain your idea with the offset?

07:07 kris__ has joined #mlpack

07:07 < kris__> Here is the result from this example http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/

07:08 < kris__> https://usercontent.irccloud-cdn.com/file/GwCDuj1o/1ug3jz.gif

07:09 < lozhnikov> What does the figure mean?

07:10 < kris__> It the generated distribution vs Real distribution.

07:10 < kris__> Sorry for the scales. I automated the thing and forgot take care of the scales.

07:10 keonkim has quit [Ping timeout: 246 seconds]

07:11 < kris__> What i meant is whenever we train the generator. We could have a offset value indicating we missed the batch till x. Then when we are updating the discriminator we could do i - x. this would involve some bound checking but should be doable.

07:14 keonkim has joined #mlpack

07:15 < lozhnikov> I think that's not a good idea

07:16 < lozhnikov> that's hard to debug and hard to understand

07:17 < kris__> hmmm.........well i think we should give a try to training the generator alone. Other than that i don't see any reasons that the algorithm is not working. Do you have any ideas.

07:21 < lozhnikov> yeah, I do.

07:21 < lozhnikov> 1. It is reasonable to try to minimize -log(D(G(z))) (if I didn't get results on the digit dataset it doesn't mean that this technique doesn't work on the mnist dataset). This technique increases the gradient of the generator significantly.

07:21 < lozhnikov> 2. Try to vary some parameters.

07:21 < lozhnikov> 3. Try the Adam optimizer.

07:25 < kris__> Hmmm, i think 1 looks reasonable. Vary some parameters the parameters i am using are from the examples so i think they are correct more over I am playing with epoch parameter only since we are using the mini batch optimizer and most implementation use Adam optimizer.

07:25 < kris__> I don't think adam would work for our case since it dosen't support batch training yet

07:26 < lozhnikov> of course I mean try to use mini-batch Adam

07:26 < kris__> Should i implement the mini-batch Adam in that case.

07:27 < lozhnikov> If the first and the second point don't provide good results, that looks reasonable

07:30 < kris__> OKay i will try that on the 1d gaussian example. It is the easiest and fastest.

07:31 < kris__> Also the implementation of the resize layer now works. Maybe you could check it out.

07:32 < lozhnikov> I'll look through the code today

07:41 mikeling has joined #mlpack

07:59 Ramnath has quit [Ping timeout: 260 seconds]

08:05 shikhar has joined #mlpack

08:13 shikhar has quit [Quit: WeeChat 1.4]

08:18 kris1 has quit [Quit: kris1]

08:50 kris1 has joined #mlpack

10:15 < ironstark> zoq: rcurtin: I had a question regarding the data dump.. I wanted to create a small data dump and work on that.. can I create one filled with random values?

10:15 < ironstark> I was thinking that since NBC is implemented in almost every library I'll use that algorithm

10:16 < ironstark> and fill a table with dummy values of runtime, accuracy, Precision and Recall

10:17 < ironstark> and then create some JS charts first and present them.. once we finalize the charts I will try to integrate them

10:18 < ironstark> Also, I needed to discuss how should I submit the work for final evaluation

10:29 < zoq> ironstark: If you like start with random numbers, sure go ahead, creating some reasonable values is simple through e.g. make METHODBLOCK=NBC LOG=True CONFIG=... , it probably makes sense to use sqlite instead of mysql as driver, since sqlite is a simple file, there is no need to setup the databae first. In this case you have to use driver : 'sqlite' instead of driver : 'mysql' in the config file.

10:30 < zoq> ironstark: About the final evaluation, I think a blog post is a good example to show what you did over the last weeks; including a description of what work was done, what code got merged, what code didn't get merged, and what's left to do. It's a good idea to start with the post earlier so that we have a chance to look over.

11:02 < ironstark> okay I will write a blog for the weeks 8 to 11

11:07 < kris__> zoq: Why can't i use Adam with minibatch sgd right now.

11:07 < kris__> The implementation of Adam right now seems consistent with update policies.

11:08 < kris__> So why can't i just use MiniBatchSGD< Adam, NoDecay>(....)

11:15 < zoq> kris__: There is no particular reason, in fact, there is an open issue: https://github.com/mlpack/mlpack/issues/1047 that not only makes sure we can train in batch mode but also consolidates SGD and MinibatchSGD.

11:26 < kris__> Hmmm ok... then let me try that out see the bugs that i get from using that..........

11:28 < zoq> okay, let me know if you need any help with the integration.

11:41 < kris__> zoq: The whole thing compiled. I used it in my test also and it is working now. Should i be any worried about it's correctness...

11:42 < zoq> kris__: If it works for you, it should be fine, I can take a look if you like.

11:45 < kris__> Hmm i updated the changes in the Gan PR. For the test this is the file

11:46 < kris__> https://usercontent.irccloud-cdn.com/file/PMuZEctd/gan_keras.cpp

11:49 < zoq> kris__: Looks good, also you don't have to define an alias 'using AdamBatchSGD = MiniBatchSGDType<AdamUpdate, NoDecay>;' you could just use 'MiniBatchSGDType<AdamUpdate, NoDecay>;'

11:57 < kris__> Hmmmm i know..... I would remove that...........

14:55 < kris__> Lozhnikov: I have a question for GAN the discriminator loss should ideally go down in each iteration. And the generator loss should go up intially and then go down. Is that right.

14:56 < kris__> I talking about the ideal situation here.

15:02 < lozhnikov> kris__: we don't compute the generator loss

15:02 < lozhnikov> the Evaluate() function returns the discriminator loss

15:03 < lozhnikov> when you train the generator, the discriminator loss increases

15:05 < kris__> When we train the generator we are getting this loss right -log(D(G(z)). I have changed the implementation for now optimizing the this loss now earlier we were getting 1 - log (D(G(z))

15:05 < kris__> Is that right.

15:06 < kris__> I am looking at this example btw http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/

15:06 < kris__> I have tested it out and it works.

15:07 < kris__> I was comparing our loss at each iteration with their loss at each iteration.

15:09 < lozhnikov> actually, I am looking forward to the results on the mnist dataset. Did you try that?

15:10 < kris__> Hmmm i sent you the code yesterday for that right. gan_keras.cpp. You found that we were not learning anything.

15:11 < kris__> I though you ran it on train4.txt input.

15:12 < lozhnikov> yes, I have sent the result

15:12 < kris__> Yes i saw that so train4.txt is actually mnist example consiting of only 7's if we are not able to learn that i think the full mnist would be much harder.

15:12 < lozhnikov> Did you try the updated version on the same test?

15:14 < lozhnikov> I ran that code on the full mnist dataset. I didn't extract sevens since you didn't tell me to do that

15:14 < kris__> That's okay. The network should have learn't something.

15:15 < kris__> Ahhh i did not try the updated version...... because it takes time on the machine to train that. I wanted to check the implementation works first on a easier gaussian example first and then run it.

15:15 < lozhnikov> I think it is reasonable to try the Digit dataset first

15:16 < lozhnikov> I didn't get good results on that but I didn't try the Adam optimizer

15:17 < kris__> But we don't have the parameters for the digits dataset. So i think it's better if we stick with the example that we have the parameters for.

15:18 < kris__> So yes we can use the adam optimizer in the batch setting. So i think we the points that you mentioned in the morning about 1. Change -log(D(G(z)) 2. Parameters Tuning. 3. Adam Optimizer.

15:18 < lozhnikov> the Digit dataset doesn't take too much time for training

15:18 < kris__> So 1 and 3 are done.

15:18 < kris__> And now we don't need to tune the parameters since we have them.

15:19 < kris__> Okay i will give it try.

15:19 < kris__> Could you just have a look at the update code though just to sure that it is correct.

15:20 < lozhnikov> ok, I'll look in the evening

15:21 < kris__> I do have question though. Here is our losses for 4 epoch on gaussian example i just mentioned https://usercontent.irccloud-cdn.com/file/6eODESZM/output.txt

15:24 < lozhnikov> why the discriminator loss is equal to zero?

15:24 < kris__> Just a moment...

15:27 vivekp has joined #mlpack

15:28 < kris__> These are the losses using the tensorflow for implementation ..........

15:28 < kris__> https://www.irccloud.com/pastebin/B483Kqbw/tf.txt

15:30 < kris__> Regarding discriminator loss being 0. While logging i put i make it 0 when the mini batch changes. Only the loggin variable though not the actual loss.

15:32 < kris__> My question was that you can see that their discriminator loss goes down and then generator loss goes up.

15:32 < kris__> I think that should be the case in our implementation also.

15:35 < lozhnikov> I think that's not obligatory. The discriminator loss grows as we train the generator and decreases as we train the discriminator

15:39 < kris__> Hmmm yes i think that make some sense.....

15:40 < kris__> but i think if the parameters are same our system loss should be close to their implementation.

15:42 < kris__> I will plot the the generated curve and real data curve and see.

15:45 < kris__> Out current implmentation loss function for discriminator increases and decreases rapidly after say 2000 iterations.

15:45 < kris__> That is wrong i think

15:48 < lozhnikov> that's a minimax game. in that case the loss function doesn't mean anything useful. if the loss function grows that mean only that the generator is training

15:49 < kris__> Well thats thing it dosen't keep growing it goes up and down like 1 ---> 9 in just one batch iteration.

15:49 < kris__> I can make a plot if you like.

15:53 kris1 has quit [Quit: kris1]

15:58 kris1 has joined #mlpack

16:09 < kris__> On the digits dataset should i run the program on the same set of parameters........or should i just use the parameters for the mnist dataset that we have from the keras example.

16:39 kris1 has quit [Quit: kris1]

16:55 < lozhnikov> kris__: that's a completely different dataset, therefore it requires completely different parameters

17:02 kris1 has joined #mlpack

18:11 kris1 has quit [Ping timeout: 248 seconds]

20:21 < kris__> https://usercontent.irccloud-cdn.com/file/GCYV2aST/Screen%20Shot%202017-08-22%20at%201.50.40%20AM.png

20:21 < kris__> lozhnnikov i got this result with these parameters for my changes...

20:21 < kris__> ./gan.o -i ./digits_train.arm -o ./output -v -s -e 500 -r 0.1 -g 2 -N 32 -b 10 -G 128 -D 128 -t 0

20:22 < kris__> I did not use the adam optimizer btw... i will try that now....

20:26 < lozhnikov> hmm... it seems the previous results were better than these. the first row doesn't look like digits

20:27 < kris__> Yes i agree....

20:28 < kris__> I have changed the parameters a little in the example i am runnnig now lets.....see.....

20:31 < kris__> There is another problem with present implementation.... the evaluate function actually evaluates the generator and discriminator on new point rather than one it was trained on. Also, evaluate the function for the generator is wrong we evaluate on both the real + generated data. It should be only on the fake data.

20:32 < kris__> I have a new solution in mind that is more much closer to the paper and also uses your ideas.

20:32 < kris__> But i will try the present solution first and then see if that does not work i will implement it out.

20:32 < lozhnikov> currently the Evaluate() function doesn't calculate the generator loss

20:34 < lozhnikov> actually, the Evaluate() function doesn't influence on the training algorithm

20:35 < kris__> I know that...but the when i say closer to the algorithm i mean alternating the training of the generator and discriminator. Right now this is being done simultaneously.

20:36 < kris__> That is the part i am concerned with.

20:45 < kris__> My solution is that we could instead of using a column of size one for the noise use matrix of size trainData.n_cols. Now the predictors matrix = 2 * trainData.n_cols. We could follow the same procedure as right now. Just that gradient function we would do trainData[i + offset] and fakedata[i + trainData.n_cols] when training gradient here the offset = #GenTrained * batchSize(num of skipped batches). and when training

20:45 < kris__> the generator we just have to use the fakeData[i + trainData.n_cols].

20:45 < lozhnikov> kris__: I tried the latest changes with the Adam optimizer. I also didn't get good results on the Digit dataset. However, these changes looks good, I didn't find issues

20:45 < kris__> Hmmm okay. Were you able to look at the resize layer.

20:45 < lozhnikov> not yet, I am going to do that

20:48 < lozhnikov> kris__: it is hard to understand that idea and hard to debug

20:48 < lozhnikov> so, I don't recommend to do that

20:50 < kris__> Well other than that i have no ideas....:( for fixing the present implementation. I will try to again see out implementation with 1d gaussian dataset (just because it is faster and easier to debug).

20:58 < kris__> Should i go for the 2 train function approach ?? That should would work straigth away i guess.

21:14 < lozhnikov> again, I think currently the optimizer API doesn't allow to do that without crutches

21:16 < kris__> I mean using 2 optimizer like used in the examples one for training the generator network and then for training the discriminator network. It was one of my initial commits on the GAN PR.

21:17 < lozhnikov> I mean the same. The optimizer API doesn't allow training on a single batch

21:27 < kris__> https://usercontent.irccloud-cdn.com/file/jzecOwtx/Screen%20Shot%202017-08-22%20at%202.56.17%20AM.png

21:27 < kris__> ./gan.o -i ./digits_train.arm -o ./output -v -s -e 1000 -r 0.001 -g 4 -N 100 -b 10 -G 64 -D 128 -t 0

21:27 < kris__> using the adam optimizer.

21:30 < kris__> if you look at the code here https://github.com/AYLIEN/gan-intro/blob/master/gan.py. Train function the code generates the dataset and trains the network on it. So i did not understand you above comment

21:35 < lozhnikov> actually, that's a style issue. as for me generating the dataset inside the Train() function looks ugly

21:51 < kris__> Any ideas then how can we improve the results. Ideas for Alternating trainging for generator and discriminator ?

22:56 kris1 has joined #mlpack

23:53 kris1 has quit [Quit: kris1]