verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has joined #mlpack
manish7294 has joined #mlpack
< manish7294> rcurtin: Finally results on covertype are out. K = 10, Initial Accuracy - 96.9761, final Accuracy - 97.3618, Total Time - 13hrs, 23mins, 1.7secs, Optimizer - L-BFGS
manish7294 has quit [Ping timeout: 260 seconds]
< rcurtin> manish7294: results seem decent, but that took a really long time. did you compile with -DDEBUG=OFF?
< rcurtin> also do you know how many iterations of L-BFGS were used?
< rcurtin> I was writing up my theory today for pruning but something was wrong, the results did not make sense
< rcurtin> so I have some error I need to fix, I will try tomorrow
manish7294 has joined #mlpack
manish7294_ has joined #mlpack
< manish7294_> rcurtin: I can tell you the number of iterations only if I could scroll back the log. Done many things to make this happen but I am not getting why I can't scroll the log. :(
< manish7294_> I have used -DDEBUG = OFF Here
< rcurtin> are you using screen?
< rcurtin> should be ctrl+a + esc then scroll
< manish7294_> with screen too, I am not getting
< manish7294_> let me try once more
< manish7294_> No, It didn't work
< rcurtin> ok
< rcurtin> that's strange
< rcurtin> the scrollback buffer should still be there
< rcurtin> sorry for the slightly slow responses... I am playing mariokart online
< manish7294_> and total computing neighbors time was 6 hrs 15 mins 58.1 secs
< rcurtin> so I respond while waiting for the next race :)
< manish7294_> great, should be having fun
< rcurtin> that sounds about right. if that was on a benchmarking system, each search should take roughly 30 sexs
< rcurtin> secs
< rcurtin> excuse me
< manish7294_> :)
< rcurtin> I had to correct the typo so I missed the start :)
< rcurtin> another thing you could try is, e.g., only doing the NN search for impostors every 100 iterations ot somehing like this
< manish7294_> Now, I am able to scroll back 1950 lines but still can't touch last iteration log as I used --verbose
< manish7294_> How can we keep count of iterations inside the LMNNfunction to make above happen?
< rcurtin> ah, sorry, I forgot L-BFGS only prints the iteration number in debug mode
< rcurtin> so there will be no output
< rcurtin> maybe we should change the optimizer to print in verbose mode too
< manish7294_> Temporarily we can just cout
< manish7294_> Or should we be making a permanent change?
< rcurtin> I don't have a particular preference, I think it is fine as is
< rcurtin> but if you want to change it I am fine with that also
< manish7294_> okay so I will make a temporary cout for myself
< manish7294_> Please check this 'How can we keep count of iterations inside the LMNNfunction to make above happen?'
< rcurtin> you could just have a size_t that you increment each time Evaluate() is called
< rcurtin> that wouldn't be perfect since Evaluate() may be called more than once per itwration
< rcurtin> but it can still be helpful for reducing the cost of the LMNNFunction evaluatuons
< manish7294_> Sure, Let's try it out.
< rcurtin> ok, time for bed now---talk to you tomorrow!
< manish7294_> have a good night :)
< ShikharJ> zoq: It seems that our implementation converges within 10 epochs on the 10,000 image dataset. I found no practical difference (atleast not visually) between the 10 and 20 epoch results.
< ShikharJ> zoq: Might be because we're taking a big generator multiplier (about 10). I'll investigate for different multiplier steps as well.
manish7294_ has quit [Quit: Page closed]
< zoq> ShikharJ: worth to test out
manish7294 has quit [Ping timeout: 265 seconds]
< jenkins-mlpack> Project docker mlpack nightly build build #342: UNSTABLE in 2 hr 37 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/342/
< zoq> ShikharJ: Do you think we could save the weights after a couple of iterations? It might be a neat way to visualize the optimization process afterwards?
< ShikharJ> zoq: Sure, I'll see what I can do.
sumedhghaisas has joined #mlpack
dasayan05 has joined #mlpack
sumedhghaisas2 has joined #mlpack
dasayan05 has left #mlpack []
sumedhghaisas_ has joined #mlpack
< sumedhghaisas_> Atharva: Hi Atharva
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has quit [Ping timeout: 240 seconds]
< Atharva> sumedhghaias_: Hi Sumedh
< Atharva> The numerical deriative test is failing, I think we have the derivatives wrong.
< sumedhghaisas_> Hi Atharva
< sumedhghaisas_> @atharva
< Atharva> Hey
< sumedhghaisas_> *@Atharva
< sumedhghaisas_> Have you fixed the input thing to SoftplusFunction::Deriv
< Atharva> Yes
< sumedhghaisas_> ohh okay. So the only problem remains is the adding KL loss to the total loss?
< Atharva> Yeah, but the gradient check is failing
< sumedhghaisas_> As a preliminary test, try the gradient test without adding klBackward to Backward
< sumedhghaisas_> The gradient test depends on the total loss
< Atharva> I am trying that
< sumedhghaisas_> Sure thing... let me know that works or not
< Atharva> Oh, but that will only be true for a VAE network, right/
< Atharva> ?
< Atharva> I am just trying the gradient test with a simple network with a linear and repar layer
< sumedhghaisas_> No... Remember we were discussing this before the project started?
< sumedhghaisas_> we somehow need to be able to add the KL loss to the the overall loss
< sumedhghaisas_> And I think this is super useful also for the future, as with this functionality we will be able to add regularization kind of tricks to the layer
< sumedhghaisas_> think about it... KL is just another regularization
< Atharva> Yeah, but only if the network is VAE, right? In this case, aren't we just use a sampling layer to train a simple neural network.
< sumedhghaisas_> Use linear + repar + linear to test the gradients
< sumedhghaisas_> Not just VAE, any kind of NN
< sumedhghaisas_> we haven't made any VAE specific changes, have we?
< Atharva> Okay, will do that
< sumedhghaisas_> The Repar layer is just a layer with extra regularization
< sumedhghaisas_> that is KL
< Atharva> I will also add the KL loss to the backward function
< sumedhghaisas_> Add wait...
< sumedhghaisas_> test first then add...
< sumedhghaisas_> the gradient test has to pass before adding KL error
< sumedhghaisas_> Is the procedure clear to you? If you have any doubts let me know
< Atharva> Yeah. and after adding KL as well, right?
< sumedhghaisas_> After adding KL error the gradient test should indeed fail
< sumedhghaisas_> Have you looked at how gradient test works?
< Atharva> Yes
< Atharva> I am a little confused as to why it will fail
< sumedhghaisas_> its delta(Loss) / delta(parameters)
< sumedhghaisas_> we estimate this numerically
< sumedhghaisas_> there is one little flaw in our architecture... currently the loss is only 'reconstruction'
< sumedhghaisas_> so delta(Loss) would be wrong numerically when we also consider the error signal from KL
< sumedhghaisas_> if we add the error signal from KL, we need to add KL loss to the loss function
< sumedhghaisas_> which we aren't able to do right now
< Atharva> Understood
< sumedhghaisas_> So if the gradient test passes without KL error, our gradients are correct expect for KL
< sumedhghaisas_> Lets see if thats the case
< sumedhghaisas_> Then lets decide how to add KL loss overall loss
< Atharva> Another thing, do you think it will be better to give a boolean parameter while constructing the repar layer whether the user wants to use KL or not, just for that extra functionality
< Atharva> We could make two cases in the backward function then
< sumedhghaisas_> umm I am not sure if that will be helpful
< sumedhghaisas_> repar without KL is just like auto encoders
< sumedhghaisas_> user can add a bottleneck layer to achieve the same performance
< Atharva> But there is no random sampling in autoencoders, repar will still have the random sampling
< sumedhghaisas_> yes... but there is no loss term to tell the layer to control the distribution
< sumedhghaisas_> it can overfit to every point sees
< sumedhghaisas_> the problem observed in auto encoers
< Atharva> Ohhh yes, it will overfit like crazy
< sumedhghaisas_> indeed
< sumedhghaisas_> is the test passing?
< Atharva> Wait, i will give it a go
< Atharva> To take a break from this, I started with the VAE class yesterday
< Atharva> It's failing, very badly : critical check CheckGradient(function) <= 1e-4 failed [0.99999998701936665 > 0.0001]
< Atharva> linear
< Atharva> repar
< Atharva> linear
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas> Atharva: hmmm
< sumedhghaisas> Lets check the gradients again then
< sumedhghaisas> The code online is the one you are trying?
< Atharva> Yeah, with the latest changes you suggested
< sumedhghaisas> Atharva: the PR code is the latest?
< sumedhghaisas> but I still see the SoftplusFunction::Deriv issue in that code
< Atharva> Yeah I made those changes
< Atharva> haven't commited them yet
< sumedhghaisas> Okay. Make sure the code you are running is the latest
< sumedhghaisas> Have you tried debugging where the error is?
< Atharva> I am trying to calculate the gradients again, do you think the error could be elsewhere?
< sumedhghaisas> I am not sure. I need to look at the new code you are running
< Atharva> Okay, I will commit it
< Atharva> pushed
< sumedhghaisas> Atharva: Yes I saw. I gave it a very quick look but I have to complete some other work.
< sumedhghaisas> Try to isolate the error by only using mean or only using stddev
< sumedhghaisas> this way you would know where the error lies
< Atharva> Sure, I will try and debug it
< sumedhghaisas> also check for size consistency in the network, make sure every layer is getting the correct size input
< Atharva> Okay
< Atharva> So, I think today's sync isn't necessary now
< Atharva> I will try and debug the code
< sumedhghaisas> Atharva: Ahh wait... I see the problem
< sumedhghaisas> We need to approximate the gradient with constant gaussian sample
< sumedhghaisas> or the stochasticity in the sample will disturb the computation
< Atharva> Ohkay
< sumedhghaisas> Atharva: Okay I suggest adding a boolean to the layer, stochastic=True
< sumedhghaisas> if user passes false, always assign a constant value to the sample
< sumedhghaisas> This will help only in testing I guess
< sumedhghaisas> But its important
< Atharva> Yeah, but when we don't set a seed, it's always a constant sample
< sumedhghaisas> umm... I don't think so.
< sumedhghaisas> When the seed is same, the random number chain is same, each random number is not same
< sumedhghaisas> so if I run the program again and again, the same random numbers will be generated
< Atharva> Okay, I will do this
ImQ009 has joined #mlpack
< sumedhghaisas> rcurtin: Hey Ryan
< sumedhghaisas> Do you think we should have a NormalDistribution to support matrix of univariate gaussain distributions?
< sumedhghaisas> *gaussian
< rcurtin> sumedhghaisas: that would just be a GaussianDistribution with a diagonal covariance matrix, right?
< sumedhghaisas> rcurtin: it can be done that way. But the problem occurs when there is a batch of distributions
< sumedhghaisas> like in VAE
< rcurtin> hmm, so I read some VAE papers but it was like 3 years ago and I think I forgot everything
< sumedhghaisas> for example
< rcurtin> so assume that I don't know much :)
< sumedhghaisas> Okay so let me describe the problme
< sumedhghaisas> *problem
< sumedhghaisas> So we pass a batch of points through encoder which converts each points to a gaussian distribution
< sumedhghaisas> so we have a batch of gaussian distributions, where each distribution has a fixed size
< sumedhghaisas> and each variable is independent of each other in the distribution
< sumedhghaisas> This will be too hard to be represented by our current setup. :(
< rcurtin> hmm, it seems to me like you could use the current GaussianDistribution class, and you would initialize the mean with the vector of means, and the covariance with diag(variances)
< rcurtin> however one problem with that is that the covariance will take d x d memory, but really since the covariance is diagonal, we should only need 'd' elements
< sumedhghaisas> yes... and the batch will make it worse
< sumedhghaisas> rather than b * d memory
< rcurtin> right, I guess d will be equal to the batch size?
< sumedhghaisas> it will take b * d * d
< rcurtin> or wait would it be b*d*d
< rcurtin> right
< rcurtin> I see
< rcurtin> hmm, so a couple of ideas spring to mind. you could write a new class that is made for multivariate gaussian distributions but is specific to diagonal covariances
< rcurtin> you could also templatize the existing GaussianDistribution class so that it takes whether or not the covariance is diagonal as a parameter, but I think maybe that is a little bit confusing
< rcurtin> or you could just work with the matrix of means and variances directly in the VAE classes
< rcurtin> I think any of those could be fine, but I agree, the existing GaussianDistribution would not work for this
< sumedhghaisas> huh... templatizing it would also work I guess
< rcurtin> right, I guess it would be template<bool DiagonalCovariance> class GaussianDistribution
< rcurtin> but I don't know if that makes it too complex
< rcurtin> I guess you could use using declarations to make it simpler again...
< rcurtin> template<bool DiagonalCovariance> class BaseGaussianDistribution;
< rcurtin> using BaseGaussianDistribution<false> = GaussianDistribution;
< rcurtin> using BaseGaussianDistribution<true> = DiagonalGaussianDistribution; // or some other name, I don't know if that is a good one
< rcurtin> anyway, that is just one possible idea
< sumedhghaisas> I agree... soundsconfusing
< sumedhghaisas> *sounds confusing
< sumedhghaisas> Naming it NormalDistribution will is confusing?
< sumedhghaisas> *be
< rcurtin> I think it may be confusing, but comments in the class description should be sufficient to clarify for users
< rcurtin> I can't think of too many other names that are not way too long
< rcurtin> GaussianDistributionExceptTheCovarianceIsDiagonal :)
< sumedhghaisas> haha :P
< sumedhghaisas> okay another issue
< sumedhghaisas> For this, let us continue with NormalDistribution with extensive documentation?
< sumedhghaisas> The name is also consistent with other libraries
< sumedhghaisas> I prefer treating it a matrix of nomral distributions rather than batch of gaussian distributions where each distribution has diagonal covariance. What you think?
< sumedhghaisas> The other issue is regarding FFN anf RNN arch
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
sumedhghaisas2 has quit [Ping timeout: 260 seconds]
travis-ci has joined #mlpack
< travis-ci> mlpack/mlpack#5024 (master - 1917b1a : Ryan Curtin): The build has errored.
travis-ci has left #mlpack []
sumedhghaisas2 has joined #mlpack
< rcurtin> sorry, I had a meeting, but it is done now
< rcurtin> I think NormalDistribution is fine if that's what you'd like to do
< rcurtin> what's the FNN/RNN architecture issue?
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 276 seconds]
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
haritha1313 has joined #mlpack
vivekp has quit [Ping timeout: 245 seconds]
< haritha1313> rcurtin: Hi, i had a doubt in armadillo. Thought you might be able to help me out. Is there any function I can use to compare multiple values simultaneously. For e.g if i want to get all rows which have values 3 in column 1 and 4 in column 2, without using find() in a loop.
< haritha1313> Something like checking for a pair
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 255 seconds]
< rcurtin> haritha1313: I see what you mean, I can't think of an immediate function for that
< rcurtin> but, I wonder if you could use a clever lambda in .transform() or something like this
< rcurtin> I am not sure if you could change the size of the matrix during such a call, though
< rcurtin> I don't think it would be efficient, but you could use something like sum(a == b) where a is the matrix you're interested in, and b is a matrix with 3s in column 1, 4s in column 2, and nans everywhere else
< haritha1313> Right now I am using find() for first value, and then using the returned indices to use any() for second value. It seemed to be a bit slow.
< rcurtin> given the complexity it may be better to just write a for loop over each column
< rcurtin> er, rather, loop over each row (although since Armadillo is column major it is faster to iterate over columns)
< rcurtin> (so maybe it is worth transposing the matrix)
< haritha1313> Will nested loop have lesser complexity than find(), any()?
< rcurtin> possibly; find() will turn into a loop, and any() may turn into another loop
< rcurtin> so if you can do it all as one loop, I don't know that there would be any faster way
sumedhghaisas has quit [Ping timeout: 260 seconds]
< haritha1313> Actually I'm trying out stuff on the movielens-1m dataset, so needed it to be fast enough for 1 million entries.
< haritha1313> Thanks for helping :) . I'll try it as nested loop itself, the column major point you mentioned will be helpful.
< rcurtin> the matrix format is going to be Nx3, right? where each column is user id, item id, rating
< haritha1313> yes
< rcurtin> or are you representing it as a huge sparse matrix?
< rcurtin> ah, ok
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Read error: Connection reset by peer]
haritha1313 has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 260 seconds]
< ShikharJ> zoq: It seems as though with a lower gradient multiplier, only the time for convergence is increased and no major visual change to the output.
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has joined #mlpack
< zoq> ShikharJ: hmm, okay, I guess we could rerun the experiments with some other parameters, but since the results are just fine for the smaller dataset I would say let's goahead and merge the code, so that we can continue with the next part. What do you think?
ImQ009 has quit [Quit: Leaving]