verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has joined #mlpack
sulan_ has joined #mlpack
sulan_ has quit [Read error: Connection reset by peer]
sulan_ has joined #mlpack
__sulan__ has joined #mlpack
sulan_ has quit [Ping timeout: 264 seconds]
< jenkins-mlpack> Project docker mlpack nightly build build #346: STILL UNSTABLE in 2 hr 21 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/346/
< zoq> manish7294: https://github.com/mlpack/mlpack/pull/1429 should fix the issue, also did you test AMSGrad?
ImQ009 has joined #mlpack
wenhao has joined #mlpack
manish7294 has joined #mlpack
< manish7294> zoq: Thanks for solving the issue and these good suggestions. The changing batch size has made the batch precalculation of lmnn redundant :)
< manish7294> Either way It was not making much of a difference.
< manish7294> AMSGrad also works great :)
< manish7294> rcurtin: As per the findings BigBatchSGD(both adaptive search and line search) or AMSGrad are good options to replace SGD.
__sulan__ has quit [Quit: Leaving]
< rcurtin> manish7294: great to hear the different optimizers worked better; do you have benchmarking results for them?
< rcurtin> I saw your comments on the LMNN PR also; I haven't had a chance to dig in deeply, but did calling Impostors() only once every 100 iterations help?
< manish7294> rcurtin: I have mostly tested them on iris, vc2 and covertype 5k points dataset and by looking at the results I would say results are quite similar but they help in avoiding divergence
< manish7294> calling impostors after 100 iteration is leading to errors.
< manish7294> Let me verify the 100 iteration idea once again
witness_ has quit [Quit: Connection closed for inactivity]
< manish7294> With sgd on 5k covertype data I am getting " [WARN ] SGD: converged to -nan; terminating with failure. Try a smaller step size? " within a second of starting and with BigBatchSgd it does not seem to converge.
< manish7294> with BigBatchSGD coordinates values seems to remain oscillating between few values.
sumedhghaisas has joined #mlpack
vivekp has quit [Read error: Connection reset by peer]
< sumedhghaisas> Atharva: Hi Atharva
< sumedhghaisas> Hows it going?
< Atharva> I am just about to post to the blog.
< sumedhghaisas> Maybe we can speed up the mail thread with IRC :)
< Atharva> It's done.
< sumedhghaisas> Nice! I will take a look at it later
< Atharva> The tasks for this week
< sumedhghaisas> umm... Have you updated the PR?
vivekp has joined #mlpack
< Atharva> No, I am just trying to the debug the failing Jacobian test, but I am not quite sure what that test does.
< Atharva> The gradient check is passing with the KL loss added to the total loss
< sumedhghaisas> Jacobian tests is failing?
< sumedhghaisas> huh... Well, push away and lets see why is that test not happy
< Atharva> OKay
< sumedhghaisas> Also about the VAE class, which aspect of VAE do you think cannot be emulated by FFN?
vivekp has quit [Read error: Connection reset by peer]
< Atharva> For example, the Encode function, GenerateRandom, GenerateSpecific, SampleOutput. Also, I had to make the Evaluate and Backward function loop over all the layers collecting extra loss, which is 0 almost all the time.
< Atharva> Even in a VAE network, for one layer, it's too much work.
< sumedhghaisas> The loop is mostly static... which shouldn't cause any delay.
< sumedhghaisas> The extra loss functionality is not just for VAE
< sumedhghaisas> it extends the FFN functionality to produce L1 and L2 regularized layers, which is a huge improvement over the current framework
< Atharva> Yes, I understand, but the generate and encode functionalities will have to forward pass through some layers of the network, with custom inputs
< Atharva> With, multiple repar layers, it will prove tougher
< sumedhghaisas> The Encode function is nothing but a forward of parametric model, feed forward, CNN or RNN thus we do not need to make any extra efforts for it.
< Atharva> Yes but partial
< Atharva> Yeah
< sumedhghaisas> If we look at VAE as a special model, we restrict the user to improve upon it
vivekp has joined #mlpack
< sumedhghaisas> we will restrict them to use the functionality given by us
< Atharva> That makes sense
< sumedhghaisas> if we indeed look at it as a specific case of FFN and make sure the current architecture supports it
< sumedhghaisas> we not only make sure VAE could be implemented but the user can use the extra FFN features to improve upon it
< sumedhghaisas> For example, if you implement VAE class
< Atharva> Yeah, I never thought of it that way
< sumedhghaisas> you have to make sure you support hierarchical VAE, beta VAE, regularized VAE
< sumedhghaisas> although with FFN, multiplke repar layers would achieve hierarchical aspect
< sumedhghaisas> specialized repar layer with Beta will achieve Beta VAE and so on
< sumedhghaisas> minimal changes
< sumedhghaisas> Although I am still not 100 percent sure we can emulate it :)
< sumedhghaisas> So some thinking is required there
< Atharva> So, let's go ahead and start making some models with the FFN class, and if some functions prove too complex, then we can give a thought to VAE class.
< Atharva> If not, then we are good.
< sumedhghaisas> that would be risky, as a new class shift is not a simple one
< sumedhghaisas> Lets look at the aspects of VAE that we cannot satisfy right now
< sumedhghaisas> 1) Generation
< sumedhghaisas> what else?
< sumedhghaisas> hmmm
< sumedhghaisas> Okay how do we implement generation in FFN
< Atharva> The generation can be random or controlled
< Atharva> We need to think about both cases
< sumedhghaisas> indeed
< sumedhghaisas> okay give it some thought, lets try involving Ryan and Marcus as well and see if they have some thought on it
< Atharva> Yeah, can you explain how yoou said we would implement Encode?
< Atharva> you*
< Atharva> Can we do partial forward pass with FFN class?
< sumedhghaisas> Encode is not a direct feature of VAE, but generation is
< sumedhghaisas> Encode happens as a part of Forward
< sumedhghaisas> ahh partial pass
< sumedhghaisas> thats what I was thinking
< Atharva> Yes, but we should be able to have just the encodings i we want to.
< Atharva> From thoese encoding, we should be able to operate the Generate functions independently
< sumedhghaisas> I agree. We should, that could be achieved with partial pass
< Atharva> Yeah
< sumedhghaisas> If we do the partial pass and access the layers output parameter
< sumedhghaisas> we will get encoding
< Atharva> and then Generate either randomly, or with a sample of our choice
< sumedhghaisas> If we define the final layer as distribution layer the current architecture should produce conditional samples
< sumedhghaisas> For example, the current architectute Predict outputs the last layer output
< Atharva> Yes, but a VAE outputs a distribution
< Atharva> parameters to a distribution
< sumedhghaisas> if the last layer outputs a distribution, we sample from it to generate conditional samples
< Atharva> We should be able to then sample from that
< Atharva> Exactly
< sumedhghaisas> yes but that only conditional
< sumedhghaisas> How do we produce unconditional samples?
< Atharva> Sorry, what exaclty do you mean by unconditional samples?
< sumedhghaisas> for that we need to start the forward propagation from Repar layer
< sumedhghaisas> ohh conditional samples are samples from P(Z | X) where uncoditional are from P(Z)
< sumedhghaisas> basically conditional are samples from posterior over the latents and unconditional are samples from latent prior
< Atharva> Yeah
< Atharva> We need to start from repar layer for that
< sumedhghaisas> yes. Now thats the puzzler.
< Atharva> I just pushed the latest changes
< sumedhghaisas> Okay. Lets keep thinking about this and complete this week's work first. Lets hope we find some solution till then.
< sumedhghaisas> I will take a look at it tonight :)
< Atharva> Sure!
< Atharva> sumedhghaisas: You there?
manish7294 has quit [Ping timeout: 260 seconds]
< rcurtin> manish7294: I think we need to debug the idea a little bit more. recalculating impostors only once every 100 iterations should work just fine
< rcurtin> if you like, you could try recalculating only every other iteration
< rcurtin> just for debugging
< rcurtin> but it should be no problem, since all we are calculating in Impostors() is the indices of the impostors
ImQ009 has quit [Quit: Leaving]
witness_ has joined #mlpack
witness_ has quit [Quit: Connection closed for inactivity]