verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has joined #mlpack
sulan_ has joined #mlpack
sulan_ has quit [Read error: Connection reset by peer]
< manish7294>
zoq: Thanks for solving the issue and these good suggestions. The changing batch size has made the batch precalculation of lmnn redundant :)
< manish7294>
Either way It was not making much of a difference.
< manish7294>
AMSGrad also works great :)
< manish7294>
rcurtin: As per the findings BigBatchSGD(both adaptive search and line search) or AMSGrad are good options to replace SGD.
__sulan__ has quit [Quit: Leaving]
< rcurtin>
manish7294: great to hear the different optimizers worked better; do you have benchmarking results for them?
< rcurtin>
I saw your comments on the LMNN PR also; I haven't had a chance to dig in deeply, but did calling Impostors() only once every 100 iterations help?
< manish7294>
rcurtin: I have mostly tested them on iris, vc2 and covertype 5k points dataset and by looking at the results I would say results are quite similar but they help in avoiding divergence
< manish7294>
calling impostors after 100 iteration is leading to errors.
< manish7294>
Let me verify the 100 iteration idea once again
witness_ has quit [Quit: Connection closed for inactivity]
< manish7294>
With sgd on 5k covertype data I am getting " [WARN ] SGD: converged to -nan; terminating with failure. Try a smaller step size? " within a second of starting and with BigBatchSgd it does not seem to converge.
< manish7294>
with BigBatchSGD coordinates values seems to remain oscillating between few values.
sumedhghaisas has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
< sumedhghaisas>
Atharva: Hi Atharva
< sumedhghaisas>
Hows it going?
< Atharva>
I am just about to post to the blog.
< sumedhghaisas>
Maybe we can speed up the mail thread with IRC :)
< Atharva>
It's done.
< sumedhghaisas>
Nice! I will take a look at it later
< Atharva>
The tasks for this week
< sumedhghaisas>
umm... Have you updated the PR?
vivekp has joined #mlpack
< Atharva>
No, I am just trying to the debug the failing Jacobian test, but I am not quite sure what that test does.
< Atharva>
The gradient check is passing with the KL loss added to the total loss
< sumedhghaisas>
Jacobian tests is failing?
< sumedhghaisas>
huh... Well, push away and lets see why is that test not happy
< Atharva>
OKay
< sumedhghaisas>
Also about the VAE class, which aspect of VAE do you think cannot be emulated by FFN?
vivekp has quit [Read error: Connection reset by peer]
< Atharva>
For example, the Encode function, GenerateRandom, GenerateSpecific, SampleOutput. Also, I had to make the Evaluate and Backward function loop over all the layers collecting extra loss, which is 0 almost all the time.
< Atharva>
Even in a VAE network, for one layer, it's too much work.
< sumedhghaisas>
The loop is mostly static... which shouldn't cause any delay.
< sumedhghaisas>
The extra loss functionality is not just for VAE
< sumedhghaisas>
it extends the FFN functionality to produce L1 and L2 regularized layers, which is a huge improvement over the current framework
< Atharva>
Yes, I understand, but the generate and encode functionalities will have to forward pass through some layers of the network, with custom inputs
< Atharva>
With, multiple repar layers, it will prove tougher
< sumedhghaisas>
The Encode function is nothing but a forward of parametric model, feed forward, CNN or RNN thus we do not need to make any extra efforts for it.
< Atharva>
Yes but partial
< Atharva>
Yeah
< sumedhghaisas>
If we look at VAE as a special model, we restrict the user to improve upon it
vivekp has joined #mlpack
< sumedhghaisas>
we will restrict them to use the functionality given by us
< Atharva>
That makes sense
< sumedhghaisas>
if we indeed look at it as a specific case of FFN and make sure the current architecture supports it
< sumedhghaisas>
we not only make sure VAE could be implemented but the user can use the extra FFN features to improve upon it
< sumedhghaisas>
For example, if you implement VAE class
< Atharva>
Yeah, I never thought of it that way
< sumedhghaisas>
you have to make sure you support hierarchical VAE, beta VAE, regularized VAE
< sumedhghaisas>
although with FFN, multiplke repar layers would achieve hierarchical aspect
< sumedhghaisas>
specialized repar layer with Beta will achieve Beta VAE and so on
< sumedhghaisas>
minimal changes
< sumedhghaisas>
Although I am still not 100 percent sure we can emulate it :)
< sumedhghaisas>
So some thinking is required there
< Atharva>
So, let's go ahead and start making some models with the FFN class, and if some functions prove too complex, then we can give a thought to VAE class.
< Atharva>
If not, then we are good.
< sumedhghaisas>
that would be risky, as a new class shift is not a simple one
< sumedhghaisas>
Lets look at the aspects of VAE that we cannot satisfy right now
< sumedhghaisas>
1) Generation
< sumedhghaisas>
what else?
< sumedhghaisas>
hmmm
< sumedhghaisas>
Okay how do we implement generation in FFN
< Atharva>
The generation can be random or controlled
< Atharva>
We need to think about both cases
< sumedhghaisas>
indeed
< sumedhghaisas>
okay give it some thought, lets try involving Ryan and Marcus as well and see if they have some thought on it
< Atharva>
Yeah, can you explain how yoou said we would implement Encode?
< Atharva>
you*
< Atharva>
Can we do partial forward pass with FFN class?
< sumedhghaisas>
Encode is not a direct feature of VAE, but generation is
< sumedhghaisas>
Encode happens as a part of Forward
< sumedhghaisas>
ahh partial pass
< sumedhghaisas>
thats what I was thinking
< Atharva>
Yes, but we should be able to have just the encodings i we want to.
< Atharva>
From thoese encoding, we should be able to operate the Generate functions independently
< sumedhghaisas>
I agree. We should, that could be achieved with partial pass
< Atharva>
Yeah
< sumedhghaisas>
If we do the partial pass and access the layers output parameter
< sumedhghaisas>
we will get encoding
< Atharva>
and then Generate either randomly, or with a sample of our choice
< sumedhghaisas>
If we define the final layer as distribution layer the current architecture should produce conditional samples
< sumedhghaisas>
For example, the current architectute Predict outputs the last layer output
< Atharva>
Yes, but a VAE outputs a distribution
< Atharva>
parameters to a distribution
< sumedhghaisas>
if the last layer outputs a distribution, we sample from it to generate conditional samples
< Atharva>
We should be able to then sample from that
< Atharva>
Exactly
< sumedhghaisas>
yes but that only conditional
< sumedhghaisas>
How do we produce unconditional samples?
< Atharva>
Sorry, what exaclty do you mean by unconditional samples?
< sumedhghaisas>
for that we need to start the forward propagation from Repar layer
< sumedhghaisas>
ohh conditional samples are samples from P(Z | X) where uncoditional are from P(Z)
< sumedhghaisas>
basically conditional are samples from posterior over the latents and unconditional are samples from latent prior
< Atharva>
Yeah
< Atharva>
We need to start from repar layer for that
< sumedhghaisas>
yes. Now thats the puzzler.
< Atharva>
I just pushed the latest changes
< sumedhghaisas>
Okay. Lets keep thinking about this and complete this week's work first. Lets hope we find some solution till then.
< sumedhghaisas>
I will take a look at it tonight :)
< Atharva>
Sure!
< Atharva>
sumedhghaisas: You there?
manish7294 has quit [Ping timeout: 260 seconds]
< rcurtin>
manish7294: I think we need to debug the idea a little bit more. recalculating impostors only once every 100 iterations should work just fine
< rcurtin>
if you like, you could try recalculating only every other iteration
< rcurtin>
just for debugging
< rcurtin>
but it should be no problem, since all we are calculating in Impostors() is the indices of the impostors
ImQ009 has quit [Quit: Leaving]
witness_ has joined #mlpack
witness_ has quit [Quit: Connection closed for inactivity]