verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
kris__ has quit [Quit: Connection closed for inactivity]
< lozhnikov>
kris1: I don't understand your idea. Right now we sample noise variables each Gradient() call. And batchSize and noiseSize are completely different variables. Could you elaborate a bit?
govg has joined #mlpack
Ramnath has joined #mlpack
< Ramnath>
i have just started working on mlpack
< Ramnath>
mlpack_logistic_regression -t dataset.csv -v. this command is not working for me
< Ramnath>
need help
kris1 has joined #mlpack
< kris1>
Lozhnikov: What i trying to say that.Discriminator is trained on m:1 ratio of real data to fake data. The problem with that i feel is that discriminator is only learning real values. We should have m:m ration for the discriminator.
< kris1>
Also I tried the example by eric jang yesterday. It is learning too slowly imho.
< kris1>
I think i should the implmentation to using 2 Training functions and check if thats working. What do you think.
< lozhnikov>
kris1: Why did you decide that we sample real data more often than noise data? We sample noise data each Gradient() call
< kris1>
Yup but in each call to Gradient. We actually end up computing Real data Gradient m times and Fake Data Gradient 1 time. So the ration is still m:1. For every Gradient call.
< lozhnikov>
I think that the current optimizers architecture doesn't allow to implement 2 training functions without crutches
< lozhnikov>
we get only one sample of real data at each call
< kris1>
Hmmm ohh yes sorry ….. you are right….
< kris1>
Just one thing will updating the generator. The discriminator gradient are they zero’s
< kris1>
ie we should not train the discriminator wihen trianing the generator.
< lozhnikov>
Ramnath: I guess you should specify the train labels, some test data and the output. For example:
< kris__>
It the generated distribution vs Real distribution.
< kris__>
Sorry for the scales. I automated the thing and forgot take care of the scales.
keonkim has quit [Ping timeout: 246 seconds]
< kris__>
What i meant is whenever we train the generator. We could have a offset value indicating we missed the batch till x. Then when we are updating the discriminator we could do i - x. this would involve some bound checking but should be doable.
keonkim has joined #mlpack
< lozhnikov>
I think that's not a good idea
< lozhnikov>
that's hard to debug and hard to understand
< kris__>
hmmm.........well i think we should give a try to training the generator alone. Other than that i don't see any reasons that the algorithm is not working. Do you have any ideas.
< lozhnikov>
yeah, I do.
< lozhnikov>
1. It is reasonable to try to minimize -log(D(G(z))) (if I didn't get results on the digit dataset it doesn't mean that this technique doesn't work on the mnist dataset). This technique increases the gradient of the generator significantly.
< lozhnikov>
2. Try to vary some parameters.
< lozhnikov>
3. Try the Adam optimizer.
< kris__>
Hmmm, i think 1 looks reasonable. Vary some parameters the parameters i am using are from the examples so i think they are correct more over I am playing with epoch parameter only since we are using the mini batch optimizer and most implementation use Adam optimizer.
< kris__>
I don't think adam would work for our case since it dosen't support batch training yet
< lozhnikov>
of course I mean try to use mini-batch Adam
< kris__>
Should i implement the mini-batch Adam in that case.
< lozhnikov>
If the first and the second point don't provide good results, that looks reasonable
< kris__>
OKay i will try that on the 1d gaussian example. It is the easiest and fastest.
< kris__>
Also the implementation of the resize layer now works. Maybe you could check it out.
< lozhnikov>
I'll look through the code today
mikeling has joined #mlpack
Ramnath has quit [Ping timeout: 260 seconds]
shikhar has joined #mlpack
shikhar has quit [Quit: WeeChat 1.4]
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< ironstark>
zoq: rcurtin: I had a question regarding the data dump.. I wanted to create a small data dump and work on that.. can I create one filled with random values?
< ironstark>
I was thinking that since NBC is implemented in almost every library I'll use that algorithm
< ironstark>
and fill a table with dummy values of runtime, accuracy, Precision and Recall
< ironstark>
and then create some JS charts first and present them.. once we finalize the charts I will try to integrate them
< ironstark>
Also, I needed to discuss how should I submit the work for final evaluation
< zoq>
ironstark: If you like start with random numbers, sure go ahead, creating some reasonable values is simple through e.g. make METHODBLOCK=NBC LOG=True CONFIG=... , it probably makes sense to use sqlite instead of mysql as driver, since sqlite is a simple file, there is no need to setup the databae first. In this case you have to use driver : 'sqlite' instead of driver : 'mysql' in the config file.
< zoq>
ironstark: About the final evaluation, I think a blog post is a good example to show what you did over the last weeks; including a description of what work was done, what code got merged, what code didn't get merged, and what's left to do. It's a good idea to start with the post earlier so that we have a chance to look over.
< ironstark>
okay I will write a blog for the weeks 8 to 11
< kris__>
zoq: Why can't i use Adam with minibatch sgd right now.
< kris__>
The implementation of Adam right now seems consistent with update policies.
< kris__>
So why can't i just use MiniBatchSGD< Adam, NoDecay>(....)
< zoq>
kris__: There is no particular reason, in fact, there is an open issue: https://github.com/mlpack/mlpack/issues/1047 that not only makes sure we can train in batch mode but also consolidates SGD and MinibatchSGD.
< kris__>
Hmmm ok... then let me try that out see the bugs that i get from using that..........
< zoq>
okay, let me know if you need any help with the integration.
< kris__>
zoq: The whole thing compiled. I used it in my test also and it is working now. Should i be any worried about it's correctness...
< zoq>
kris__: If it works for you, it should be fine, I can take a look if you like.
< kris__>
Hmm i updated the changes in the Gan PR. For the test this is the file
< zoq>
kris__: Looks good, also you don't have to define an alias 'using AdamBatchSGD = MiniBatchSGDType<AdamUpdate, NoDecay>;' you could just use 'MiniBatchSGDType<AdamUpdate, NoDecay>;'
< kris__>
Hmmmm i know..... I would remove that...........
< kris__>
Lozhnikov: I have a question for GAN the discriminator loss should ideally go down in each iteration. And the generator loss should go up intially and then go down. Is that right.
< kris__>
I talking about the ideal situation here.
< lozhnikov>
kris__: we don't compute the generator loss
< lozhnikov>
the Evaluate() function returns the discriminator loss
< lozhnikov>
when you train the generator, the discriminator loss increases
< kris__>
When we train the generator we are getting this loss right -log(D(G(z)). I have changed the implementation for now optimizing the this loss now earlier we were getting 1 - log (D(G(z))
< kris__>
I was comparing our loss at each iteration with their loss at each iteration.
< lozhnikov>
actually, I am looking forward to the results on the mnist dataset. Did you try that?
< kris__>
Hmmm i sent you the code yesterday for that right. gan_keras.cpp. You found that we were not learning anything.
< kris__>
I though you ran it on train4.txt input.
< lozhnikov>
yes, I have sent the result
< kris__>
Yes i saw that so train4.txt is actually mnist example consiting of only 7's if we are not able to learn that i think the full mnist would be much harder.
< lozhnikov>
Did you try the updated version on the same test?
< lozhnikov>
I ran that code on the full mnist dataset. I didn't extract sevens since you didn't tell me to do that
< kris__>
That's okay. The network should have learn't something.
< kris__>
Ahhh i did not try the updated version...... because it takes time on the machine to train that. I wanted to check the implementation works first on a easier gaussian example first and then run it.
< lozhnikov>
I think it is reasonable to try the Digit dataset first
< lozhnikov>
I didn't get good results on that but I didn't try the Adam optimizer
< kris__>
But we don't have the parameters for the digits dataset. So i think it's better if we stick with the example that we have the parameters for.
< kris__>
So yes we can use the adam optimizer in the batch setting. So i think we the points that you mentioned in the morning about 1. Change -log(D(G(z)) 2. Parameters Tuning. 3. Adam Optimizer.
< lozhnikov>
the Digit dataset doesn't take too much time for training
< kris__>
So 1 and 3 are done.
< kris__>
And now we don't need to tune the parameters since we have them.
< kris__>
Okay i will give it try.
< kris__>
Could you just have a look at the update code though just to sure that it is correct.
< kris__>
Regarding discriminator loss being 0. While logging i put i make it 0 when the mini batch changes. Only the loggin variable though not the actual loss.
< kris__>
My question was that you can see that their discriminator loss goes down and then generator loss goes up.
< kris__>
I think that should be the case in our implementation also.
< lozhnikov>
I think that's not obligatory. The discriminator loss grows as we train the generator and decreases as we train the discriminator
< kris__>
Hmmm yes i think that make some sense.....
< kris__>
but i think if the parameters are same our system loss should be close to their implementation.
< kris__>
I will plot the the generated curve and real data curve and see.
< kris__>
Out current implmentation loss function for discriminator increases and decreases rapidly after say 2000 iterations.
< kris__>
That is wrong i think
< lozhnikov>
that's a minimax game. in that case the loss function doesn't mean anything useful. if the loss function grows that mean only that the generator is training
< kris__>
Well thats thing it dosen't keep growing it goes up and down like 1 ---> 9 in just one batch iteration.
< kris__>
I can make a plot if you like.
kris1 has quit [Quit: kris1]
kris1 has joined #mlpack
< kris__>
On the digits dataset should i run the program on the same set of parameters........or should i just use the parameters for the mnist dataset that we have from the keras example.
kris1 has quit [Quit: kris1]
< lozhnikov>
kris__: that's a completely different dataset, therefore it requires completely different parameters
< kris__>
I did not use the adam optimizer btw... i will try that now....
< lozhnikov>
hmm... it seems the previous results were better than these. the first row doesn't look like digits
< kris__>
Yes i agree....
< kris__>
I have changed the parameters a little in the example i am runnnig now lets.....see.....
< kris__>
There is another problem with present implementation.... the evaluate function actually evaluates the generator and discriminator on new point rather than one it was trained on. Also, evaluate the function for the generator is wrong we evaluate on both the real + generated data. It should be only on the fake data.
< kris__>
I have a new solution in mind that is more much closer to the paper and also uses your ideas.
< kris__>
But i will try the present solution first and then see if that does not work i will implement it out.
< lozhnikov>
currently the Evaluate() function doesn't calculate the generator loss
< lozhnikov>
actually, the Evaluate() function doesn't influence on the training algorithm
< kris__>
I know that...but the when i say closer to the algorithm i mean alternating the training of the generator and discriminator. Right now this is being done simultaneously.
< kris__>
That is the part i am concerned with.
< kris__>
My solution is that we could instead of using a column of size one for the noise use matrix of size trainData.n_cols. Now the predictors matrix = 2 * trainData.n_cols. We could follow the same procedure as right now. Just that gradient function we would do trainData[i + offset] and fakedata[i + trainData.n_cols] when training gradient here the offset = #GenTrained * batchSize(num of skipped batches). and when training
< kris__>
the generator we just have to use the fakeData[i + trainData.n_cols].
< lozhnikov>
kris__: I tried the latest changes with the Adam optimizer. I also didn't get good results on the Digit dataset. However, these changes looks good, I didn't find issues
< kris__>
Hmmm okay. Were you able to look at the resize layer.
< lozhnikov>
not yet, I am going to do that
< lozhnikov>
kris__: it is hard to understand that idea and hard to debug
< lozhnikov>
so, I don't recommend to do that
< kris__>
Well other than that i have no ideas....:( for fixing the present implementation. I will try to again see out implementation with 1d gaussian dataset (just because it is faster and easier to debug).
< kris__>
Should i go for the 2 train function approach ?? That should would work straigth away i guess.
< lozhnikov>
again, I think currently the optimizer API doesn't allow to do that without crutches
< kris__>
I mean using 2 optimizer like used in the examples one for training the generator network and then for training the discriminator network. It was one of my initial commits on the GAN PR.
< lozhnikov>
I mean the same. The optimizer API doesn't allow training on a single batch