#mlpack on 2018-06-07 — irc logs at libera.irclog.whitequark.org

2015-01-15 23:05 verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/

02:11 vivekp has joined #mlpack

02:19 manish7294 has joined #mlpack

02:23 < manish7294> rcurtin: Finally results on covertype are out. K = 10, Initial Accuracy - 96.9761, final Accuracy - 97.3618, Total Time - 13hrs, 23mins, 1.7secs, Optimizer - L-BFGS

02:35 manish7294 has quit [Ping timeout: 260 seconds]

02:41 < rcurtin> manish7294: results seem decent, but that took a really long time. did you compile with -DDEBUG=OFF?

02:41 < rcurtin> also do you know how many iterations of L-BFGS were used?

02:45 < rcurtin> I was writing up my theory today for pruning but something was wrong, the results did not make sense

02:46 < rcurtin> so I have some error I need to fix, I will try tomorrow

02:47 manish7294 has joined #mlpack

02:47 manish7294_ has joined #mlpack

02:50 < manish7294_> rcurtin: I can tell you the number of iterations only if I could scroll back the log. Done many things to make this happen but I am not getting why I can't scroll the log. :(

02:50 < manish7294_> I have used -DDEBUG = OFF Here

02:50 < rcurtin> are you using screen?

02:51 < rcurtin> should be ctrl+a + esc then scroll

02:51 < manish7294_> with screen too, I am not getting

02:51 < manish7294_> let me try once more

02:53 < manish7294_> No, It didn't work

02:54 < rcurtin> ok

02:54 < rcurtin> that's strange

02:54 < rcurtin> the scrollback buffer should still be there

02:55 < rcurtin> sorry for the slightly slow responses... I am playing mariokart online

02:55 < manish7294_> and total computing neighbors time was 6 hrs 15 mins 58.1 secs

02:55 < rcurtin> so I respond while waiting for the next race :)

02:55 < manish7294_> great, should be having fun

02:55 < rcurtin> that sounds about right. if that was on a benchmarking system, each search should take roughly 30 sexs

02:55 < rcurtin> secs

02:55 < rcurtin> excuse me

02:55 < manish7294_> :)

02:59 < rcurtin> I had to correct the typo so I missed the start :)

02:59 < rcurtin> another thing you could try is, e.g., only doing the NN search for impostors every 100 iterations ot somehing like this

03:01 < manish7294_> Now, I am able to scroll back 1950 lines but still can't touch last iteration log as I used --verbose

03:02 < manish7294_> How can we keep count of iterations inside the LMNNfunction to make above happen?

03:03 < rcurtin> ah, sorry, I forgot L-BFGS only prints the iteration number in debug mode

03:03 < rcurtin> so there will be no output

03:03 < rcurtin> maybe we should change the optimizer to print in verbose mode too

03:05 < manish7294_> Temporarily we can just cout

03:06 < manish7294_> Or should we be making a permanent change?

03:06 < rcurtin> I don't have a particular preference, I think it is fine as is

03:06 < rcurtin> but if you want to change it I am fine with that also

03:07 < manish7294_> okay so I will make a temporary cout for myself

03:08 < manish7294_> Please check this 'How can we keep count of iterations inside the LMNNfunction to make above happen?'

03:11 < rcurtin> you could just have a size_t that you increment each time Evaluate() is called

03:11 < rcurtin> that wouldn't be perfect since Evaluate() may be called more than once per itwration

03:11 < rcurtin> but it can still be helpful for reducing the cost of the LMNNFunction evaluatuons

03:12 < manish7294_> Sure, Let's try it out.

03:30 < rcurtin> ok, time for bed now---talk to you tomorrow!

03:30 < manish7294_> have a good night :)

06:43 < ShikharJ> zoq: It seems that our implementation converges within 10 epochs on the 10,000 image dataset. I found no practical difference (atleast not visually) between the 10 and 20 epoch results.

06:45 < ShikharJ> zoq: Might be because we're taking a big generator multiplier (about 10). I'll investigate for different multiplier steps as well.

08:11 manish7294_ has quit [Quit: Page closed]

08:53 < zoq> ShikharJ: worth to test out

09:08 manish7294 has quit [Ping timeout: 265 seconds]

09:51 < jenkins-mlpack> Project docker mlpack nightly build build #342: UNSTABLE in 2 hr 37 min: http://masterblaster.mlpack.org/job/docker%20mlpack%20nightly%20build/342/

10:10 < zoq> ShikharJ: Do you think we could save the weights after a couple of iterations? It might be a neat way to visualize the optimization process afterwards?

10:14 < ShikharJ> zoq: Sure, I'll see what I can do.

10:21 sumedhghaisas has joined #mlpack

10:25 dasayan05 has joined #mlpack

10:26 sumedhghaisas2 has joined #mlpack

10:26 dasayan05 has left #mlpack []

10:28 sumedhghaisas_ has joined #mlpack

10:28 < sumedhghaisas_> Atharva: Hi Atharva

10:28 sumedhghaisas has quit [Ping timeout: 276 seconds]

10:30 sumedhghaisas2 has quit [Ping timeout: 240 seconds]

11:26 < Atharva> sumedhghaias_: Hi Sumedh

11:30 < Atharva> The numerical deriative test is failing, I think we have the derivatives wrong.

11:43 < sumedhghaisas_> Hi Atharva

11:43 < sumedhghaisas_> @atharva

11:44 < Atharva> Hey

11:44 < sumedhghaisas_> *@Atharva

11:45 < sumedhghaisas_> Have you fixed the input thing to SoftplusFunction::Deriv

11:45 < Atharva> Yes

11:46 < sumedhghaisas_> ohh okay. So the only problem remains is the adding KL loss to the total loss?

11:46 < Atharva> Yeah, but the gradient check is failing

11:46 < sumedhghaisas_> As a preliminary test, try the gradient test without adding klBackward to Backward

11:47 < sumedhghaisas_> The gradient test depends on the total loss

11:47 < Atharva> I am trying that

11:47 < sumedhghaisas_> Sure thing... let me know that works or not

11:47 < Atharva> Oh, but that will only be true for a VAE network, right/

11:47 < Atharva> ?

11:48 < Atharva> I am just trying the gradient test with a simple network with a linear and repar layer

11:48 < sumedhghaisas_> No... Remember we were discussing this before the project started?

11:48 < sumedhghaisas_> we somehow need to be able to add the KL loss to the the overall loss

11:49 < sumedhghaisas_> And I think this is super useful also for the future, as with this functionality we will be able to add regularization kind of tricks to the layer

11:49 < sumedhghaisas_> think about it... KL is just another regularization

11:49 < Atharva> Yeah, but only if the network is VAE, right? In this case, aren't we just use a sampling layer to train a simple neural network.

11:50 < sumedhghaisas_> Use linear + repar + linear to test the gradients

11:50 < sumedhghaisas_> Not just VAE, any kind of NN

11:50 < sumedhghaisas_> we haven't made any VAE specific changes, have we?

11:51 < Atharva> Okay, will do that

11:51 < sumedhghaisas_> The Repar layer is just a layer with extra regularization

11:51 < sumedhghaisas_> that is KL

11:51 < Atharva> I will also add the KL loss to the backward function

11:51 < sumedhghaisas_> Add wait...

11:51 < sumedhghaisas_> test first then add...

11:52 < sumedhghaisas_> the gradient test has to pass before adding KL error

11:52 < sumedhghaisas_> Is the procedure clear to you? If you have any doubts let me know

11:53 < Atharva> Yeah. and after adding KL as well, right?

11:53 < sumedhghaisas_> After adding KL error the gradient test should indeed fail

11:53 < sumedhghaisas_> Have you looked at how gradient test works?

11:53 < Atharva> Yes

11:54 < Atharva> I am a little confused as to why it will fail

11:54 < sumedhghaisas_> its delta(Loss) / delta(parameters)

11:54 < sumedhghaisas_> we estimate this numerically

11:55 < sumedhghaisas_> there is one little flaw in our architecture... currently the loss is only 'reconstruction'

11:56 < sumedhghaisas_> so delta(Loss) would be wrong numerically when we also consider the error signal from KL

11:56 < sumedhghaisas_> if we add the error signal from KL, we need to add KL loss to the loss function

11:56 < sumedhghaisas_> which we aren't able to do right now

11:56 < Atharva> Understood

11:57 < sumedhghaisas_> So if the gradient test passes without KL error, our gradients are correct expect for KL

11:57 < sumedhghaisas_> Lets see if thats the case

11:58 < sumedhghaisas_> Then lets decide how to add KL loss overall loss

11:58 < Atharva> Another thing, do you think it will be better to give a boolean parameter while constructing the repar layer whether the user wants to use KL or not, just for that extra functionality

11:58 < Atharva> We could make two cases in the backward function then

11:58 < sumedhghaisas_> umm I am not sure if that will be helpful

11:58 < sumedhghaisas_> repar without KL is just like auto encoders

11:59 < sumedhghaisas_> user can add a bottleneck layer to achieve the same performance

11:59 < Atharva> But there is no random sampling in autoencoders, repar will still have the random sampling

12:00 < sumedhghaisas_> yes... but there is no loss term to tell the layer to control the distribution

12:00 < sumedhghaisas_> it can overfit to every point sees

12:00 < sumedhghaisas_> the problem observed in auto encoers

12:00 < Atharva> Ohhh yes, it will overfit like crazy

12:00 < sumedhghaisas_> indeed

12:01 < sumedhghaisas_> is the test passing?

12:02 < Atharva> Wait, i will give it a go

12:05 < Atharva> To take a break from this, I started with the VAE class yesterday

12:05 < Atharva> It's failing, very badly : critical check CheckGradient(function) <= 1e-4 failed [0.99999998701936665 > 0.0001]

12:06 < Atharva> linear

12:06 < Atharva> repar

12:06 < Atharva> linear

12:31 sumedhghaisas_ has quit [Ping timeout: 260 seconds]

12:58 sumedhghaisas has joined #mlpack

12:58 < sumedhghaisas> Atharva: hmmm

12:58 < sumedhghaisas> Lets check the gradients again then

12:58 < sumedhghaisas> The code online is the one you are trying?

12:59 < Atharva> Yeah, with the latest changes you suggested

13:00 < sumedhghaisas> Atharva: the PR code is the latest?

13:01 < sumedhghaisas> but I still see the SoftplusFunction::Deriv issue in that code

13:01 < Atharva> Yeah I made those changes

13:01 < Atharva> haven't commited them yet

13:01 < sumedhghaisas> Okay. Make sure the code you are running is the latest

13:02 < sumedhghaisas> Have you tried debugging where the error is?

13:04 < Atharva> I am trying to calculate the gradients again, do you think the error could be elsewhere?

13:05 < sumedhghaisas> I am not sure. I need to look at the new code you are running

13:06 < Atharva> Okay, I will commit it

13:12 < Atharva> pushed

13:29 < sumedhghaisas> Atharva: Yes I saw. I gave it a very quick look but I have to complete some other work.

13:31 < sumedhghaisas> Try to isolate the error by only using mean or only using stddev

13:31 < sumedhghaisas> this way you would know where the error lies

13:31 < Atharva> Sure, I will try and debug it

13:31 < sumedhghaisas> also check for size consistency in the network, make sure every layer is getting the correct size input

13:32 < Atharva> Okay

13:33 < Atharva> So, I think today's sync isn't necessary now

13:33 < Atharva> I will try and debug the code

13:35 < sumedhghaisas> Atharva: Ahh wait... I see the problem

13:36 < sumedhghaisas> We need to approximate the gradient with constant gaussian sample

13:36 < sumedhghaisas> or the stochasticity in the sample will disturb the computation

13:37 < Atharva> Ohkay

13:38 < sumedhghaisas> Atharva: Okay I suggest adding a boolean to the layer, stochastic=True

13:38 < sumedhghaisas> if user passes false, always assign a constant value to the sample

13:39 < sumedhghaisas> This will help only in testing I guess

13:39 < sumedhghaisas> But its important

13:40 < Atharva> Yeah, but when we don't set a seed, it's always a constant sample

13:40 < sumedhghaisas> umm... I don't think so.

13:41 < sumedhghaisas> When the seed is same, the random number chain is same, each random number is not same

13:41 < sumedhghaisas> so if I run the program again and again, the same random numbers will be generated

13:42 < Atharva> Okay, I will do this

14:30 ImQ009 has joined #mlpack

14:37 < sumedhghaisas> rcurtin: Hey Ryan

14:38 < sumedhghaisas> Do you think we should have a NormalDistribution to support matrix of univariate gaussain distributions?

14:38 < sumedhghaisas> *gaussian

14:39 < rcurtin> sumedhghaisas: that would just be a GaussianDistribution with a diagonal covariance matrix, right?

14:40 < sumedhghaisas> rcurtin: it can be done that way. But the problem occurs when there is a batch of distributions

14:40 < sumedhghaisas> like in VAE

14:41 < rcurtin> hmm, so I read some VAE papers but it was like 3 years ago and I think I forgot everything

14:41 < sumedhghaisas> for example

14:41 < rcurtin> so assume that I don't know much :)

14:42 < sumedhghaisas> Okay so let me describe the problme

14:42 < sumedhghaisas> *problem

14:43 < sumedhghaisas> So we pass a batch of points through encoder which converts each points to a gaussian distribution

14:44 < sumedhghaisas> so we have a batch of gaussian distributions, where each distribution has a fixed size

14:44 < sumedhghaisas> and each variable is independent of each other in the distribution

14:46 < sumedhghaisas> This will be too hard to be represented by our current setup. :(

14:46 < rcurtin> hmm, it seems to me like you could use the current GaussianDistribution class, and you would initialize the mean with the vector of means, and the covariance with diag(variances)

14:47 < rcurtin> however one problem with that is that the covariance will take d x d memory, but really since the covariance is diagonal, we should only need 'd' elements

14:47 < sumedhghaisas> yes... and the batch will make it worse

14:47 < sumedhghaisas> rather than b * d memory

14:47 < rcurtin> right, I guess d will be equal to the batch size?

14:48 < sumedhghaisas> it will take b * d * d

14:48 < rcurtin> or wait would it be b*d*d

14:48 < rcurtin> right

14:48 < rcurtin> I see

14:48 < rcurtin> hmm, so a couple of ideas spring to mind. you could write a new class that is made for multivariate gaussian distributions but is specific to diagonal covariances

14:49 < rcurtin> you could also templatize the existing GaussianDistribution class so that it takes whether or not the covariance is diagonal as a parameter, but I think maybe that is a little bit confusing

14:49 < rcurtin> or you could just work with the matrix of means and variances directly in the VAE classes

14:49 < rcurtin> I think any of those could be fine, but I agree, the existing GaussianDistribution would not work for this

14:49 < sumedhghaisas> huh... templatizing it would also work I guess

14:51 < rcurtin> right, I guess it would be template<bool DiagonalCovariance> class GaussianDistribution

14:51 < rcurtin> but I don't know if that makes it too complex

14:51 < rcurtin> I guess you could use using declarations to make it simpler again...

14:51 < rcurtin> template<bool DiagonalCovariance> class BaseGaussianDistribution;

14:52 < rcurtin> using BaseGaussianDistribution<false> = GaussianDistribution;

14:52 < rcurtin> using BaseGaussianDistribution<true> = DiagonalGaussianDistribution; // or some other name, I don't know if that is a good one

14:52 < rcurtin> anyway, that is just one possible idea

14:54 < sumedhghaisas> I agree... soundsconfusing

14:54 < sumedhghaisas> *sounds confusing

14:55 < sumedhghaisas> Naming it NormalDistribution will is confusing?

14:56 < sumedhghaisas> *be

14:57 < rcurtin> I think it may be confusing, but comments in the class description should be sufficient to clarify for users

14:57 < rcurtin> I can't think of too many other names that are not way too long

14:57 < rcurtin> GaussianDistributionExceptTheCovarianceIsDiagonal :)

14:57 < sumedhghaisas> haha :P

14:58 < sumedhghaisas> okay another issue

14:59 < sumedhghaisas> For this, let us continue with NormalDistribution with extensive documentation?

14:59 < sumedhghaisas> The name is also consistent with other libraries

15:02 < sumedhghaisas> I prefer treating it a matrix of nomral distributions rather than batch of gaussian distributions where each distribution has diagonal covariance. What you think?

15:03 < sumedhghaisas> The other issue is regarding FFN anf RNN arch

15:11 sumedhghaisas2 has joined #mlpack

15:15 sumedhghaisas has quit [Ping timeout: 260 seconds]

15:20 sumedhghaisas2 has quit [Ping timeout: 260 seconds]

15:22 travis-ci has joined #mlpack

15:22 < travis-ci> mlpack/mlpack#5024 (master - 1917b1a : Ryan Curtin): The build has errored.

15:22 < travis-ci> Change view : https://github.com/mlpack/mlpack/compare/e6c64a97dba5...1917b1a841d6

15:22 < travis-ci> Build details : https://travis-ci.org/mlpack/mlpack/builds/389272126

15:22 travis-ci has left #mlpack []

15:24 sumedhghaisas2 has joined #mlpack

15:27 < rcurtin> sorry, I had a meeting, but it is done now

15:27 < rcurtin> I think NormalDistribution is fine if that's what you'd like to do

15:27 < rcurtin> what's the FNN/RNN architecture issue?

15:28 sumedhghaisas has joined #mlpack

15:29 sumedhghaisas2 has quit [Ping timeout: 276 seconds]

15:33 sumedhghaisas has quit [Ping timeout: 276 seconds]

15:35 sumedhghaisas has joined #mlpack

16:51 sumedhghaisas2 has joined #mlpack

16:51 sumedhghaisas has quit [Ping timeout: 276 seconds]

16:58 sumedhghaisas2 has quit [Ping timeout: 240 seconds]

16:59 sumedhghaisas has joined #mlpack

17:14 sumedhghaisas2 has joined #mlpack

17:15 sumedhghaisas has quit [Ping timeout: 260 seconds]

17:50 haritha1313 has joined #mlpack

17:51 vivekp has quit [Ping timeout: 245 seconds]

17:53 < haritha1313> rcurtin: Hi, i had a doubt in armadillo. Thought you might be able to help me out. Is there any function I can use to compare multiple values simultaneously. For e.g if i want to get all rows which have values 3 in column 1 and 4 in column 2, without using find() in a loop.

17:53 < haritha1313> Something like checking for a pair

18:17 sumedhghaisas has joined #mlpack

18:19 sumedhghaisas2 has quit [Ping timeout: 255 seconds]

18:20 < rcurtin> haritha1313: I see what you mean, I can't think of an immediate function for that

18:20 < rcurtin> but, I wonder if you could use a clever lambda in .transform() or something like this

18:20 < rcurtin> I am not sure if you could change the size of the matrix during such a call, though

18:21 < rcurtin> I don't think it would be efficient, but you could use something like sum(a == b) where a is the matrix you're interested in, and b is a matrix with 3s in column 1, 4s in column 2, and nans everywhere else

18:22 < haritha1313> Right now I am using find() for first value, and then using the returned indices to use any() for second value. It seemed to be a bit slow.

18:22 < rcurtin> given the complexity it may be better to just write a for loop over each column

18:22 < rcurtin> er, rather, loop over each row (although since Armadillo is column major it is faster to iterate over columns)

18:23 < rcurtin> (so maybe it is worth transposing the matrix)

18:24 < haritha1313> Will nested loop have lesser complexity than find(), any()?

18:25 < rcurtin> possibly; find() will turn into a loop, and any() may turn into another loop

18:25 < rcurtin> so if you can do it all as one loop, I don't know that there would be any faster way

18:26 sumedhghaisas has quit [Ping timeout: 260 seconds]

18:26 < haritha1313> Actually I'm trying out stuff on the movielens-1m dataset, so needed it to be fast enough for 1 million entries.

18:27 < haritha1313> Thanks for helping :) . I'll try it as nested loop itself, the column major point you mentioned will be helpful.

18:27 < rcurtin> the matrix format is going to be Nx3, right? where each column is user id, item id, rating

18:28 < haritha1313> yes

18:28 < rcurtin> or are you representing it as a huge sparse matrix?

18:28 < rcurtin> ah, ok

18:45 sumedhghaisas has joined #mlpack

18:54 sumedhghaisas has quit [Read error: Connection reset by peer]

18:55 haritha1313 has quit [Ping timeout: 260 seconds]

19:00 sumedhghaisas has joined #mlpack

19:08 sumedhghaisas2 has joined #mlpack

19:09 sumedhghaisas has quit [Ping timeout: 260 seconds]

19:11 sumedhghaisas has joined #mlpack

19:12 sumedhghaisas2 has quit [Ping timeout: 260 seconds]

19:34 < ShikharJ> zoq: It seems as though with a lower gradient multiplier, only the time for convergence is increased and no major visual change to the output.

19:42 sumedhghaisas has quit [Ping timeout: 276 seconds]

19:43 sumedhghaisas has joined #mlpack

19:47 sumedhghaisas has quit [Ping timeout: 276 seconds]

19:53 sumedhghaisas2 has joined #mlpack

19:58 < zoq> ShikharJ: hmm, okay, I guess we could rerun the experiments with some other parameters, but since the results are just fine for the smaller dataset I would say let's goahead and merge the code, so that we can continue with the next part. What do you think?

20:31 ImQ009 has quit [Quit: Leaving]