#mlpack on 2019-07-05 — irc logs at libera.irclog.whitequark.org

2018-11-12 22:39 ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/

01:06 xiaohong has quit [Remote host closed the connection]

01:06 xiaohong has joined #mlpack

01:58 xiaohong has quit [Remote host closed the connection]

02:04 xiaohong has joined #mlpack

03:11 < Toshal> zoq: I am thinking to add padding = 'SAME' and 'VALID' options to our convolution, and pooling layers. Sometimes we know that in order to keep dimensions same we need to pad odd number of rows or column. That's why I was thinking to add the above options.

03:13 < Toshal> If we enable above options then odd padding case would be handled by us by adding an extra row or column at the end.

03:14 < Toshal> https://www.tensorflow.org/api_docs/python/tf/nn/conv2d

03:14 < Toshal> https://www.tensorflow.org/api_docs/python/tf/nn/max_pool

03:14 < Toshal> https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t

03:14 < Toshal> I have referred above links. Let me know your thoughts regarding same.

03:17 < Toshal> In inception block same padding is used. So it would be helpful indirectly over there.

03:29 KimSangYeon-DGU has joined #mlpack

03:53 xiaohong has quit [Remote host closed the connection]

04:03 xiaohong has joined #mlpack

04:04 xiaohong has quit [Remote host closed the connection]

04:41 < sreenik[m]> Toshal: It would be great to have same and valid padding, also let me know if you actually implement it because I will need to update that in my converter. Currently I am doing a workaround by manually calculating the same and valid padding dimensions

04:53 jeffin14333 has joined #mlpack

04:53 < jeffin14333> zoq , lozhnikov : Good Morning :)

04:54 < jeffin14333> I tried something - https://pastebin.com/kM9mSt6m

04:54 < jeffin14333> Can you take a look at Line 115-130

04:55 < jeffin14333> Let me Know if this approach works, and then we can declare two different function, depending upon the policy

04:56 < jeffin14333> Also the definition is at line 27-45

04:57 < jeffin14333> If this approach is feasible, I will go ahead with it :)

04:57 jeffin14333 has left #mlpack []

04:59 jeffin1433 has joined #mlpack

05:00 < jenkins-mlpack2> Project docker mlpack weekly build build #55: STILL UNSTABLE in 7 hr 13 min: http://ci.mlpack.org/job/docker%20mlpack%20weekly%20build/55/

05:00 < jeffin1433> also lozhnikov, you suggested to move line 124, inside the upper for loop and not make separate for loop, but I cannot do that, since i need Mappings to be created fully for BOW class, but that's not the case for DictionaryClass, and hence to support both I have to move it the other one

05:06 jeffin1433 has quit [Ping timeout: 260 seconds]

05:07 xiaohong has joined #mlpack

05:11 xiaohong has quit [Remote host closed the connection]

05:12 xiaohong has joined #mlpack

05:45 xiaohong has quit [Remote host closed the connection]

05:45 xiaohong has joined #mlpack

07:00 xiaohong has quit [Remote host closed the connection]

07:01 KimSangYeon-DGU6 has joined #mlpack

07:02 xiaohong has joined #mlpack

07:43 < jenkins-mlpack2> Project docker mlpack nightly build build #377: STILL UNSTABLE in 3 hr 29 min: http://ci.mlpack.org/job/docker%20mlpack%20nightly%20build/377/

07:51 < zoq> Toshal: Sounds like a good idea to me, adding an extra row as well.

07:54 < zoq> jeffin14333: On a first glance it looks good; looks like the creatmat check isn't necessary since both policies implement the same interface.

07:58 < zoq> jeffin14333: Also would be a good idea to move the second check out of both loops.

07:58 < zoq> jeffin14333: The compiler should be able to optimize the check out.

08:09 xiaohong has quit [Remote host closed the connection]

08:14 xiaohong has joined #mlpack

08:32 xiaohong has quit [Remote host closed the connection]

08:36 xiaohong has joined #mlpack

08:36 xiaohong has quit [Remote host closed the connection]

08:39 xiaohong has joined #mlpack

08:40 xiaohong has quit [Remote host closed the connection]

08:44 xiaohong has joined #mlpack

09:03 < Toshal> sreenik: Sure I will let you know.

09:03 < Toshal> zoq: Thanks for the input

09:04 < Toshal> info*

09:19 xiaohong has quit [Remote host closed the connection]

09:20 xiaohong has joined #mlpack

09:28 xiaohong has quit [Remote host closed the connection]

09:28 xiaohong has joined #mlpack

10:04 xiaohong has quit [Remote host closed the connection]

10:06 xiaohong has joined #mlpack

10:37 xiaohong has quit [Remote host closed the connection]

10:37 xiaohong has joined #mlpack

11:09 < lozhnikov> jeffin1433: Yes, I see. Agreed with Marcus.

11:22 xiaohong has quit [Remote host closed the connection]

11:27 xiaohong has joined #mlpack

11:37 xiaohong has quit [Remote host closed the connection]

11:38 xiaohong has joined #mlpack

12:27 xiaohong has quit [Remote host closed the connection]

12:28 xiaohong has joined #mlpack

13:01 KimSangYeon-DGU has quit [Remote host closed the connection]

13:02 KimSangYeon-DGU6 has quit [Remote host closed the connection]

13:02 KimSangYeon-DGU has joined #mlpack

13:08 xiaohong has quit [Remote host closed the connection]

13:10 xiaohong has joined #mlpack

13:11 xiaohong has quit [Remote host closed the connection]

13:12 xiaohong has joined #mlpack

13:17 sumedhghaisas has joined #mlpack

13:17 < sumedhghaisas> KimSangYeon-DGU: Hey Kim

13:17 < KimSangYeon-DGU> Hey Sumedh

13:17 < KimSangYeon-DGU> Hi

13:18 < sumedhghaisas> I was just trying to open your mail as well

13:19 < sumedhghaisas> I didn't quite understand your definition of 'S'

13:19 < KimSangYeon-DGU> Yeah, I changed the objective equation a bit, is it valid?

13:19 < KimSangYeon-DGU> Ah,

13:19 < KimSangYeon-DGU> Because QGMM has unnormalized gaussians

13:19 < KimSangYeon-DGU> So I changed the approximation constant accordingly

13:21 < KimSangYeon-DGU> Previously, I normalized the probabilities with the sum of probabilities by hand.

13:22 < KimSangYeon-DGU> However, the training doesn't work well..

13:22 < sumedhghaisas> ummm... Little confused still

13:22 < sumedhghaisas> so we have unnormalised probabilities

13:23 < KimSangYeon-DGU> Right

13:23 < sumedhghaisas> we evaluate then over the batch

13:23 < sumedhghaisas> and add that constraint

13:23 < sumedhghaisas> correct?

13:23 < KimSangYeon-DGU> Yeah

13:23 < sumedhghaisas> cause we want the sum to be constraint

13:23 < sumedhghaisas> now you added something to this constraint?

13:24 < KimSangYeon-DGU> Yeah, previously, the objective equation to optimize is NLL + lambda * approx const.

13:24 < KimSangYeon-DGU> I set the approx const to the sum of probabilities - 1.

13:25 < KimSangYeon-DGU> Assuming the sum of probabilities is 1.

13:25 < sumedhghaisas> Ahh so you added -1?

13:25 < KimSangYeon-DGU> Previsouly

13:25 < KimSangYeon-DGU> But now

13:25 < sumedhghaisas> ahh okay

13:27 < KimSangYeon-DGU> Changed 1 into the actual sum of probabilities.

13:27 < KimSangYeon-DGU> https://github.com/KimSangYeon-DGU/GSoC-2019/tree/master/Research/Optimization/Experiments

13:27 < sumedhghaisas> wait so its dum of probabilities - sum of probabilities?

13:28 < KimSangYeon-DGU> No

13:29 < KimSangYeon-DGU> Can you see the main.py

13:29 < KimSangYeon-DGU> from 55 to 59?

13:31 < KimSangYeon-DGU> Because we assume the derivative of objective function is zero, when we calculate it

13:32 < KimSangYeon-DGU> I just added the approximation constant the sum of probabilities - 1 = 0, previsouly

13:32 < KimSangYeon-DGU> However, when we have unnormalised gaussians, actually the sum of probabilities is not 1

13:32 < KimSangYeon-DGU> So I edited the objective function accordingly.

13:34 < sumedhghaisas> Hey Kim. I need to do a quick 20 minute meeting. Can I come back to you in 20 minutes? Sorry but someone just needs urgent attention.

13:34 < sumedhghaisas> I will take a look at your new loss function as well.

13:35 < KimSangYeon-DGU> Okay

13:35 < sumedhghaisas> Sorry for this :(

13:35 < KimSangYeon-DGU> No worries :)

13:37 < sakshamB> ShikharJ: are you there?

13:37 < ShikharJ> sakshamB: Hey.

13:37 < sumedhghaisas> KimSangYeon-DGU: Just a quick question

13:37 < sumedhghaisas> so you added - tf.reduce_sum(G) this right?

13:37 < KimSangYeon-DGU> Yeah

13:38 < sumedhghaisas> But G is not the probabilites, its the square root of the probablities, right?

13:38 < ShikharJ> sakshamB: Sorry, yesterday was 4th of July, so I was out late, and woke up late.

13:39 < sakshamB> ShikharJ: no problem. I wanted to discuss on Spectral normalization.

13:39 < KimSangYeon-DGU> G is probabilities

13:39 < ShikharJ> sakshamB: Sure, I have a lot of time today :)

13:39 < ShikharJ> Toshal: Are you here?

13:40 < sakshamB> ShikharJ I would require the dimension of the weights for the normalization. It is used with linear layer and convolution layer.

13:40 < sakshamB> ShikharJ and the bias is not normalized. This is similar to weight normalization that Toshal was working on.

13:41 < KimSangYeon-DGU> sumedhghaisas: G is the unnormalised probabilities

13:42 < sumedhghaisas> umm... Could you look at the Paper Equation 9

13:42 < sumedhghaisas> thats the definition of G you are using right?

13:42 < KimSangYeon-DGU> Yeah

13:43 < sumedhghaisas> then its the square root of the probability

13:43 < KimSangYeon-DGU> Ahh, In the code, G is a probability

13:43 < KimSangYeon-DGU> Sorry for the confusion

13:43 < KimSangYeon-DGU> I'll change the name of variables

13:43 < ShikharJ> sakshamB: Okay, so are you concerned the two techniques would be pretty similar to each other?

13:44 < KimSangYeon-DGU> In the code, G is just probabilities

13:44 < sumedhghaisas> okay but then you are used tf.reduce_sum(G[0] * G[1])

13:45 < KimSangYeon-DGU> sumedhghaisas: Oh, sorry...

13:45 < sumedhghaisas> but there square root of the probabilities are required

13:45 < KimSangYeon-DGU> I have confustion

13:45 < sakshamB> ShikharJ no I need to get the dimension of the weight matrix in order to do the normalization.

13:46 < sakshamB> ShikharJ: and the weight matrix needs to be reshaped differently for convolutional layer and linear layer

13:48 < KimSangYeon-DGU> sumedhghaisas: I think I should edit the code a bit.

13:48 < KimSangYeon-DGU> I didn't use the quantum_gmm() function...

13:48 < ShikharJ> sakshamB: I see, having a size visitor wouldn't help.

13:52 < sakshamB> ShikharJ maybe we could directly pass the dimensions through the constructor? or create a weight height, width and depth visitor?

13:52 < ShikharJ> sakshamB: Can't you take into account the layer type of the layer?

13:54 < sakshamB> I could but that would still not give me the dimensions of the weight. we could add more getters ?

13:55 < ShikharJ> sakshamB: What do you mean by getters?

13:56 < ShikharJ> GetWeightHeight() and GetWeightWidth()?

13:56 < sakshamB> ShikharJ yes

13:56 < ShikharJ> sakshamB: That was gonna be my next suggestion :) Provided you can take into account the layer type. That would bypass the need to create visitors.

13:57 < ShikharJ> Though I guess visitors would be more cleaner, but I'm not sure how much work would be required to implement them.

13:59 < sakshamB> ShikharJ: yes that is why I was thinking about using visitors because otherwise I would have to do a long list of if else with Linear LinearNoBias and maybe others in the future etc.

14:00 < ShikharJ> sakshamB: I'm currently assessing that.

14:01 < sakshamB> ShikharJ and this could also solve the problem for Toshal since he also did not want to normalize the bias

14:01 < ShikharJ> sakshamB: Yes, I can see the upside for Toshal as well.

14:02 < ShikharJ> sakshamB: BTW, did you get a chance to implement the Inception Score script that we talked about on Monday?

14:02 < sakshamB> yes I have pushed a commented test

14:03 < ShikharJ> sakshamB: Great :)

14:03 < ShikharJ> Give me a second :)

14:03 < sakshamB> ShikharJ: although I am not sure about the layers of the gan model. The example was using weight norm throughout and minibatchDiscrimination.

14:03 robertohueso has quit [Ping timeout: 246 seconds]

14:05 < ShikharJ> sakshamB: No worries there, we'll assess the script in the 3rd phase, along with the rest of the commented out code, before we push them to models repository. I have some GAN model scripts on multi-channel images to push there as well.

14:05 < sumedhghaisas> KimSangYeon-DGU: Yes something seems off

14:06 < ShikharJ> sakshamB: I think you should go ahead with the visitors, they don't seem that hard to implement, but feel free to ask questions.

14:07 < KimSangYeon-DGU> sumedhghaisas: Right, sorry for the confusion.

14:07 < sumedhghaisas> No worries :)

14:07 < KimSangYeon-DGU> I'll get back to you soon

14:08 < sakshamB> ShikharJ: alright cool

14:08 < KimSangYeon-DGU> For the quick test, the performance increases

14:08 < KimSangYeon-DGU> Edited it according to your correction

14:09 < ShikharJ> sakshamB: Okay, anything else you wished to discuss?

14:10 < sakshamB> ShikharJ zoq could you take a look here https://github.com/mlpack/mlpack/pull/1493#discussion_r299114822

14:10 < sumedhghaisas> Ahh nice

14:10 < KimSangYeon-DGU> Hmm.... but it's not all the case, I'll test more

14:11 < sumedhghaisas> but wait... what is the definition of G in your code now?

14:11 < sumedhghaisas> is it square root of gaussian?

14:11 < sumedhghaisas> like in the paper?

14:12 < ShikharJ> sakshamB: I like the idea, makes the convolution layers more concise, and I can see the reduction in redundancy.

14:12 < ShikharJ> zoq: What are your thoughts on that?

14:12 < sumedhghaisas> I reccommend going by the way they did in the paper so we have some base ground to talk on

14:14 < sakshamB> ShikharJ yes we could abstract all the code associated with padding inside that layer and the VALID, SAME options Toshal was just talking about.

14:15 < KimSangYeon-DGU> sumedhghaisas: Right it is square root

14:15 < ShikharJ> sakshamB: Yeah totally, it would be easier to maintain the code as well from then on :)

14:17 < KimSangYeon-DGU> sumedhghaisas: I wrote the code according to the paper.

14:17 < sakshamB> ShikharJ: yup I think I have discussed everything for now. Thanks for your time. Have a good weekend 8)

14:18 < KimSangYeon-DGU> G is the square root of gaussians sorry, It's my mistake..

14:18 < ShikharJ> sakshamB: I think you should take some time to implement the visitors and the padding layers.

14:19 < ShikharJ> sakshamB: Okay, since I have a long weekend (though I need to run to get my social security), I think I'll have time for reviews today :) Have a nice weekend :)

14:19 < sakshamB> ShikharJ: yes I can work on that.

14:28 < KimSangYeon-DGU> sumedhghaisas: I updated the code

14:28 < KimSangYeon-DGU> sumedhghaisas: Can you check it? sorry for the mistake...

14:29 < KimSangYeon-DGU> sumedhghaisas: You are right, the G is the square root of Gaussians, It's my confusion.

14:29 < sumedhghaisas> https://github.com/KimSangYeon-DGU/GSoC-2019/blob/master/Research/Optimization/Experiments/main.py this is the code?

14:30 vivekp has joined #mlpack

14:31 < KimSangYeon-DGU> Can you pushed the F5 button?

14:32 < KimSangYeon-DGU> sumedhghaisas: G is the square root of gaussians, P is the mixture probabilities

14:32 < sumedhghaisas> ahh okay so you have shifted to -1 right?

14:32 < KimSangYeon-DGU> Yeah

14:32 < sumedhghaisas> okay cause constraint is unnormalized sum probabilities it doesn make sense to subtract the same quantity from it

14:33 < sumedhghaisas> so your function 'unnormalized_gaussain' now computes the square root of gaussian right?

14:33 < KimSangYeon-DGU> Yeah

14:33 < KimSangYeon-DGU> Right

14:33 < sumedhghaisas> Great :)

14:34 < sumedhghaisas> so the results are same?

14:34 < KimSangYeon-DGU> I should the change -1 to sum of P

14:34 < sumedhghaisas> ahh no -1 is correct I think

14:34 < KimSangYeon-DGU> AH

14:34 < KimSangYeon-DGU> yeah

14:34 < KimSangYeon-DGU> the results are a bit different

14:34 < sumedhghaisas> constraint is sum of P

14:35 < sumedhghaisas> constraint is basically sum of P(x, k | theta)

14:35 < sumedhghaisas> correct?

14:35 < KimSangYeon-DGU> Yeah

14:35 < KimSangYeon-DGU> But the sum of P isn't 1 actually...

14:36 < KimSangYeon-DGU> So we normalised the P with the sum of P by hand.

14:36 < sumedhghaisas> yes thats why we adding it in the constraint , basically we are saying (sum of P - 1) should be zero

14:36 < KimSangYeon-DGU> Yeah

14:36 < sumedhghaisas> wait I am confused again. When you say sum of P? what is P? P(x) or P(x, K | theta)?

14:37 < KimSangYeon-DGU> the sum of P(x, K | theta)

14:38 < sumedhghaisas> correct, then that sum is constraint

14:38 < sumedhghaisas> so how lagrangian works is

14:38 < sumedhghaisas> we optimize g(x) + lambda * f(x)

14:38 < KimSangYeon-DGU> It doesn't train well...

14:38 < KimSangYeon-DGU> Yeah

14:39 < sumedhghaisas> where f(x) is onstraint to zero

14:39 < sumedhghaisas> ahh yes that can be rationalized that it doesn't train well

14:39 < sumedhghaisas> cause we are using approximate constraints

14:39 < KimSangYeon-DGU> I'll check it

14:40 vivekp has quit [Ping timeout: 258 seconds]

14:40 < sumedhghaisas> Did you find some time for the paper? :)

14:41 < KimSangYeon-DGU> sumedhghaisas: I think the normalization doesn't work....

14:41 < sumedhghaisas> ummm... you mean sum of P?

14:41 < KimSangYeon-DGU> When we normalized it, it doesn't train

14:41 < KimSangYeon-DGU> Yeah

14:42 < KimSangYeon-DGU> Do you mean the Sliced Wasserstein Distance paper?

14:42 < sumedhghaisas> so you are training NLL + lambda * (sum of P - 1) correct?

14:42 < KimSangYeon-DGU> Yeah

14:42 < KimSangYeon-DGU> After normalized the sum of P

14:42 < sumedhghaisas> okay what the behavior in the training

14:42 < sumedhghaisas> diverging?

14:42 < KimSangYeon-DGU> it towards center

14:43 < sumedhghaisas> what do you after normalized sum of P?

14:43 < sumedhghaisas> sum of P / (sum of P) ?

14:43 < KimSangYeon-DGU> P / sum of P

14:43 < akhandait> sreenik[m]: Hey

14:44 < sumedhghaisas> wait... it was (sum of P - 1) correct? So for each P you do p / (sum of P)?

14:44 < KimSangYeon-DGU> Yeah, P / sum of P and I checked the sum of P is 1

14:45 < KimSangYeon-DGU> P = tf.div(P, tf.reduce_sum(P))

14:45 < KimSangYeon-DGU> print(sess.run(tf.reduce_sum(P)))

14:45 < sumedhghaisas> wait... I think you are understanding this all wrong.

14:45 < sumedhghaisas> We want the optimizer to normalize it

14:45 < KimSangYeon-DGU> Ah...

14:45 < sumedhghaisas> not us

14:45 < sumedhghaisas> so okay lets go over lagrangian a bit

14:46 < sumedhghaisas> the task is to optimize g(x)

14:46 < KimSangYeon-DGU> Yeah

14:46 < sumedhghaisas> so optimize g(x)

14:46 < sumedhghaisas> which is our NLL

14:46 < KimSangYeon-DGU> Yeah

14:46 < sumedhghaisas> now we propose a new constraint saying that while optimizer optimizes g(x) we want f(x) to remain zero

14:47 < sumedhghaisas> so the optimizer is changing x to optimize g(x) , we are saying that optimizer cannot change to any x but x that satisfy f(x) = 0

14:48 < sumedhghaisas> for us f(x) is (sum of P - 1)

14:48 < sumedhghaisas> we are restricting the optimizer to parameters that satisfy this equation

14:48 < sumedhghaisas> while optimizing NLL

14:49 < sumedhghaisas> This sum of P is unnormalized

14:49 < sumedhghaisas> we want that sum to be 1

14:49 < sumedhghaisas> but thats the constraint on the optimizer

14:49 < KimSangYeon-DGU> Got it.

14:50 < sumedhghaisas> So just NLL + (sum of P - 1)

14:50 < KimSangYeon-DGU> without lambda?

14:51 < sumedhghaisas> wait its NLL + lambda * (sum of unnormalized P - 1)

14:51 < KimSangYeon-DGU> Ah thanks

14:51 < sumedhghaisas> ahh yes lambda ... good catch

14:51 < sumedhghaisas> although lambda could be 1 :P

14:51 < KimSangYeon-DGU> :)

14:52 < KimSangYeon-DGU> Sumedh, but the optimizer cannot find the sum of P - 1 is 0..

14:52 < KimSangYeon-DGU> The sum of P keep going to be large

14:52 < sumedhghaisas> yeah... that has lot of problems as well.

14:52 < sumedhghaisas> Lagrangian is hard to optimize

14:52 < sumedhghaisas> our optimizers are not good enough

14:53 < sumedhghaisas> Its like L2 regularization

14:53 < sumedhghaisas> remember L2?

14:53 < sumedhghaisas> L2 regularization is basically a lagrangian saying that sum of weight squares should be C

14:53 < KimSangYeon-DGU> Ah

14:54 < KimSangYeon-DGU> Can we try the L2 regularization?

14:54 < sumedhghaisas> not really... We don't have any priors on our parameters

14:54 < sumedhghaisas> L2 works because it comes from baysian perspective of gaussian prior over parmeters

14:55 < sumedhghaisas> L1 is laplace prior

14:55 < KimSangYeon-DGU> Thanks for the information.

14:55 < sumedhghaisas> But okay so this optimization is basically going to th centre right?

14:55 < KimSangYeon-DGU> I sent an result image

14:56 < KimSangYeon-DGU> Wait a moment

14:57 < KimSangYeon-DGU> I sent it

14:57 < KimSangYeon-DGU> This time it is not going to the centre but not trained well.

14:57 < sumedhghaisas> so its diverging?

14:58 < sumedhghaisas> okay then the approximation of the constraint is not good enough...

14:58 < KimSangYeon-DGU> I sent an email with result video

14:58 < KimSangYeon-DGU> Can you view the file?

14:59 < KimSangYeon-DGU> Right, the constraint is not good enough...

14:59 < sumedhghaisas> ahh wait...

14:59 < sumedhghaisas> the optimize looks almost good enough

15:00 < sumedhghaisas> I can see that the centres are going in the right direction at least right?

15:00 KimSangYeon-DGU has quit [Remote host closed the connection]

15:01 KimSangYeon-DGU has joined #mlpack

15:01 < KimSangYeon-DGU> sumedhghaisas: Sorry my internet connection was broken

15:02 < KimSangYeon-DGU> Yeah the mean is right

15:02 < sumedhghaisas> This is with the NLL + lambda * (unnormalized sum of p - 1)?

15:02 < KimSangYeon-DGU> Yeah

15:02 < sumedhghaisas> I think these are good results

15:03 < sumedhghaisas> so at the end o the video the centre waas still going in the correct direction

15:03 < sumedhghaisas> could you optimize it little bit more?

15:03 < KimSangYeon-DGU> I test it longer

15:03 < KimSangYeon-DGU> Yeah

15:03 < sumedhghaisas> how stable is the training

15:03 < sumedhghaisas> in terms of random initialization?

15:04 < sumedhghaisas> try 10 different initializations and see how they behave

15:04 < sumedhghaisas> if for each one the centre is going in the correct direction

15:04 < sumedhghaisas> I definitely call it a good result

15:04 < sumedhghaisas> at least the training is stable

15:04 < KimSangYeon-DGU> Yeah

15:05 < KimSangYeon-DGU> I'm currently testing it

15:05 < sumedhghaisas> Great. Also I am free tomorrow. If you wanna sleep right now and ping me in the morning?

15:05 < KimSangYeon-DGU> sumedhghaisas: I have a question. The GMM uses Cholesky decomposition

15:05 < KimSangYeon-DGU> Ah yes

15:06 < sumedhghaisas> your call :) just dont want to keep your awake unnecessarily

15:06 < KimSangYeon-DGU> But the Cholesky decomposition isn't stable...

15:06 < sumedhghaisas> its not stable yes

15:07 < KimSangYeon-DGU> So, it is tricky when we set the parameters initially

15:07 < sumedhghaisas> but when you cholesky its cholesky of the sigma right?

15:07 < sumedhghaisas> ahh I would just find 10 random initialization in 0 to 1 mean

15:07 < sumedhghaisas> and some rational variance

15:07 < KimSangYeon-DGU> But the covariance

15:08 < sumedhghaisas> I see.

15:08 < KimSangYeon-DGU> I used the lower covariance

15:08 < sumedhghaisas> Thats a valid point

15:08 < KimSangYeon-DGU> Ah

15:08 < KimSangYeon-DGU> But it is too sensitive to the intialization of covariance

15:08 < sumedhghaisas> did you observe that covarinace affect the training a lot?

15:08 < sumedhghaisas> I see

15:09 < KimSangYeon-DGU> Not deeply

15:09 < KimSangYeon-DGU> I'll also check it

15:09 < sumedhghaisas> Yeah I think little experimentation on covariance is required as well

15:10 < sumedhghaisas> I will do some checking how covariance affects it in theory

15:10 < KimSangYeon-DGU> Yeah thanks

15:10 < sumedhghaisas> But I suspect this is coming from the optimizer

15:10 < sumedhghaisas> and not from the model

15:11 < sumedhghaisas> okay basically next goal is to see how stable the training is w.r.t. initial parameters

15:11 < KimSangYeon-DGU> Agreed, So I set constraint positive definite constraint to covariance

15:11 < KimSangYeon-DGU> Yeah

15:11 < sumedhghaisas> thats right

15:11 < sumedhghaisas> ehhh we have some good news :)

15:11 < KimSangYeon-DGU> Really great

15:11 < sumedhghaisas> lets hope the training is stable

15:11 < KimSangYeon-DGU> Yeah

15:12 < sumedhghaisas> and then we will focus on getting better approximation

15:12 < sumedhghaisas> I have some ideas in that direction which I can tell you all about tomorrow

15:12 < KimSangYeon-DGU> Yeah but I'm worried about the timeline of implemenation

15:13 < sumedhghaisas> Me too a little bit :( I will talk to Ryan and figure it out don't worry :)

15:13 < KimSangYeon-DGU> Ah yes!

15:13 < sumedhghaisas> Let me take care of that :)

15:13 < KimSangYeon-DGU> Thanks, I'll continue to implement it

15:13 < KimSangYeon-DGU> :)

15:14 < sumedhghaisas> Great. Give that paper a go if you get some time :)

15:16 < KimSangYeon-DGU> Yeah

15:17 KimSangYeon-DGU has quit [Remote host closed the connection]

15:17 KimSangYeon-DGU has joined #mlpack

15:18 < KimSangYeon-DGU> sumedhghaisas: I;ll get back to you tomorrow! Thanks for the meeting :)

15:18 < sumedhghaisas> See you tomorrow :)

15:18 KimSangYeon-DGU has quit [Remote host closed the connection]

16:13 sumedhghaisas has quit [Ping timeout: 260 seconds]

18:10 vivekp has joined #mlpack

18:54 vivekp has quit [Ping timeout: 258 seconds]

19:02 < sreenik[m]> zoq: I am thinking of introducing the momentum parameter in batchnorm, it will have a default value of 1 unless specified. Is it all right to proceed?

19:05 < zoq> ShikharJ: Agreed.

19:06 < zoq> sreenik[m]: To provide backward compatibility?

19:06 < zoq> sreenik[m]: Sounds like a good idea to me.

19:07 < sreenik[m]> Yes it won't hinder backward compatibility

19:08 < zoq> Great!

19:08 < sreenik[m]> I was having a difficulty in understanding a part of it though. Let me explain...

19:10 < sreenik[m]> In line 93 in the file https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/ann/layer/batch_norm_impl.hpp "runningMean" is defined but I don't see it getting used anywhere else except in line 73 (which is executed only if "deterministic" is true)

19:11 < sreenik[m]> I am not sure if deterministic can be true after it is already false

19:15 < zoq> deterministic is set by the FFN or RNN class and defines if we are in training mode (deterministic = false) or prediction mode (deterministic = true). But you are right the default value is training:

19:15 < zoq> https://github.com/mlpack/mlpack/blob/c62088ad40c4f9a3839e23a5dbbd5ab55054c7ee/src/mlpack/methods/ann/ffn_impl.hpp#L378-L386

19:15 < zoq> is the function that updates the deterministic parameter.

19:18 < sreenik[m]> Oh, I get it now. Actually, adding momentum functionality is just modifying the runningMean and runningVariance, so I was thinking of the consequences as I didn't see them getting used anywhere afterwards. But this solves it for me. Thanks :)

19:28 < zoq> Nice :)

21:31 xiaohong has quit [Read error: Connection timed out]

21:35 xiaohong has joined #mlpack