ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
xiaohong has quit [Remote host closed the connection]
xiaohong has joined #mlpack
xiaohong has quit [Remote host closed the connection]
xiaohong has joined #mlpack
< Toshal>
zoq: I am thinking to add padding = 'SAME' and 'VALID' options to our convolution, and pooling layers. Sometimes we know that in order to keep dimensions same we need to pad odd number of rows or column. That's why I was thinking to add the above options.
< Toshal>
If we enable above options then odd padding case would be handled by us by adding an extra row or column at the end.
< Toshal>
I have referred above links. Let me know your thoughts regarding same.
< Toshal>
In inception block same padding is used. So it would be helpful indirectly over there.
KimSangYeon-DGU has joined #mlpack
xiaohong has quit [Remote host closed the connection]
xiaohong has joined #mlpack
xiaohong has quit [Remote host closed the connection]
< sreenik[m]>
Toshal: It would be great to have same and valid padding, also let me know if you actually implement it because I will need to update that in my converter. Currently I am doing a workaround by manually calculating the same and valid padding dimensions
< jeffin1433>
also lozhnikov, you suggested to move line 124, inside the upper for loop and not make separate for loop, but I cannot do that, since i need Mappings to be created fully for BOW class, but that's not the case for DictionaryClass, and hence to support both I have to move it the other one
jeffin1433 has quit [Ping timeout: 260 seconds]
xiaohong has joined #mlpack
xiaohong has quit [Remote host closed the connection]
xiaohong has joined #mlpack
xiaohong has quit [Remote host closed the connection]
xiaohong has joined #mlpack
xiaohong has quit [Remote host closed the connection]
< sumedhghaisas>
wait so its dum of probabilities - sum of probabilities?
< KimSangYeon-DGU>
No
< KimSangYeon-DGU>
Can you see the main.py
< KimSangYeon-DGU>
from 55 to 59?
< KimSangYeon-DGU>
Because we assume the derivative of objective function is zero, when we calculate it
< KimSangYeon-DGU>
I just added the approximation constant the sum of probabilities - 1 = 0, previsouly
< KimSangYeon-DGU>
However, when we have unnormalised gaussians, actually the sum of probabilities is not 1
< KimSangYeon-DGU>
So I edited the objective function accordingly.
< sumedhghaisas>
Hey Kim. I need to do a quick 20 minute meeting. Can I come back to you in 20 minutes? Sorry but someone just needs urgent attention.
< sumedhghaisas>
I will take a look at your new loss function as well.
< KimSangYeon-DGU>
Okay
< sumedhghaisas>
Sorry for this :(
< KimSangYeon-DGU>
No worries :)
< sakshamB>
ShikharJ: are you there?
< ShikharJ>
sakshamB: Hey.
< sumedhghaisas>
KimSangYeon-DGU: Just a quick question
< sumedhghaisas>
so you added - tf.reduce_sum(G) this right?
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
But G is not the probabilites, its the square root of the probablities, right?
< ShikharJ>
sakshamB: Sorry, yesterday was 4th of July, so I was out late, and woke up late.
< sakshamB>
ShikharJ: no problem. I wanted to discuss on Spectral normalization.
< KimSangYeon-DGU>
G is probabilities
< ShikharJ>
sakshamB: Sure, I have a lot of time today :)
< ShikharJ>
Toshal: Are you here?
< sakshamB>
ShikharJ I would require the dimension of the weights for the normalization. It is used with linear layer and convolution layer.
< sakshamB>
ShikharJ and the bias is not normalized. This is similar to weight normalization that Toshal was working on.
< KimSangYeon-DGU>
sumedhghaisas: G is the unnormalised probabilities
< sumedhghaisas>
umm... Could you look at the Paper Equation 9
< sumedhghaisas>
thats the definition of G you are using right?
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
then its the square root of the probability
< KimSangYeon-DGU>
Ahh, In the code, G is a probability
< KimSangYeon-DGU>
Sorry for the confusion
< KimSangYeon-DGU>
I'll change the name of variables
< ShikharJ>
sakshamB: Okay, so are you concerned the two techniques would be pretty similar to each other?
< KimSangYeon-DGU>
In the code, G is just probabilities
< sumedhghaisas>
okay but then you are used tf.reduce_sum(G[0] * G[1])
< KimSangYeon-DGU>
sumedhghaisas: Oh, sorry...
< sumedhghaisas>
but there square root of the probabilities are required
< KimSangYeon-DGU>
I have confustion
< sakshamB>
ShikharJ no I need to get the dimension of the weight matrix in order to do the normalization.
< sakshamB>
ShikharJ: and the weight matrix needs to be reshaped differently for convolutional layer and linear layer
< KimSangYeon-DGU>
sumedhghaisas: I think I should edit the code a bit.
< KimSangYeon-DGU>
I didn't use the quantum_gmm() function...
< ShikharJ>
sakshamB: I see, having a size visitor wouldn't help.
< sakshamB>
ShikharJ maybe we could directly pass the dimensions through the constructor? or create a weight height, width and depth visitor?
< ShikharJ>
sakshamB: Can't you take into account the layer type of the layer?
< sakshamB>
I could but that would still not give me the dimensions of the weight. we could add more getters ?
< ShikharJ>
sakshamB: What do you mean by getters?
< ShikharJ>
GetWeightHeight() and GetWeightWidth()?
< sakshamB>
ShikharJ yes
< ShikharJ>
sakshamB: That was gonna be my next suggestion :) Provided you can take into account the layer type. That would bypass the need to create visitors.
< ShikharJ>
Though I guess visitors would be more cleaner, but I'm not sure how much work would be required to implement them.
< sakshamB>
ShikharJ: yes that is why I was thinking about using visitors because otherwise I would have to do a long list of if else with Linear LinearNoBias and maybe others in the future etc.
< ShikharJ>
sakshamB: I'm currently assessing that.
< sakshamB>
ShikharJ and this could also solve the problem for Toshal since he also did not want to normalize the bias
< ShikharJ>
sakshamB: Yes, I can see the upside for Toshal as well.
< ShikharJ>
sakshamB: BTW, did you get a chance to implement the Inception Score script that we talked about on Monday?
< sakshamB>
yes I have pushed a commented test
< ShikharJ>
sakshamB: Great :)
< ShikharJ>
Give me a second :)
< sakshamB>
ShikharJ: although I am not sure about the layers of the gan model. The example was using weight norm throughout and minibatchDiscrimination.
robertohueso has quit [Ping timeout: 246 seconds]
< ShikharJ>
sakshamB: No worries there, we'll assess the script in the 3rd phase, along with the rest of the commented out code, before we push them to models repository. I have some GAN model scripts on multi-channel images to push there as well.
< sumedhghaisas>
KimSangYeon-DGU: Yes something seems off
< ShikharJ>
sakshamB: I think you should go ahead with the visitors, they don't seem that hard to implement, but feel free to ask questions.
< KimSangYeon-DGU>
sumedhghaisas: Right, sorry for the confusion.
< sumedhghaisas>
No worries :)
< KimSangYeon-DGU>
I'll get back to you soon
< sakshamB>
ShikharJ: alright cool
< KimSangYeon-DGU>
For the quick test, the performance increases
< KimSangYeon-DGU>
Edited it according to your correction
< ShikharJ>
sakshamB: Okay, anything else you wished to discuss?
< KimSangYeon-DGU>
Hmm.... but it's not all the case, I'll test more
< sumedhghaisas>
but wait... what is the definition of G in your code now?
< sumedhghaisas>
is it square root of gaussian?
< sumedhghaisas>
like in the paper?
< ShikharJ>
sakshamB: I like the idea, makes the convolution layers more concise, and I can see the reduction in redundancy.
< ShikharJ>
zoq: What are your thoughts on that?
< sumedhghaisas>
I reccommend going by the way they did in the paper so we have some base ground to talk on
< sakshamB>
ShikharJ yes we could abstract all the code associated with padding inside that layer and the VALID, SAME options Toshal was just talking about.
< KimSangYeon-DGU>
sumedhghaisas: Right it is square root
< ShikharJ>
sakshamB: Yeah totally, it would be easier to maintain the code as well from then on :)
< KimSangYeon-DGU>
sumedhghaisas: I wrote the code according to the paper.
< sakshamB>
ShikharJ: yup I think I have discussed everything for now. Thanks for your time. Have a good weekend 8)
< KimSangYeon-DGU>
G is the square root of gaussians sorry, It's my mistake..
< ShikharJ>
sakshamB: I think you should take some time to implement the visitors and the padding layers.
< ShikharJ>
sakshamB: Okay, since I have a long weekend (though I need to run to get my social security), I think I'll have time for reviews today :) Have a nice weekend :)
< sakshamB>
ShikharJ: yes I can work on that.
< KimSangYeon-DGU>
sumedhghaisas: I updated the code
< KimSangYeon-DGU>
sumedhghaisas: Can you check it? sorry for the mistake...
< KimSangYeon-DGU>
sumedhghaisas: You are right, the G is the square root of Gaussians, It's my confusion.
< sumedhghaisas>
wait... I think you are understanding this all wrong.
< sumedhghaisas>
We want the optimizer to normalize it
< KimSangYeon-DGU>
Ah...
< sumedhghaisas>
not us
< sumedhghaisas>
so okay lets go over lagrangian a bit
< sumedhghaisas>
the task is to optimize g(x)
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
so optimize g(x)
< sumedhghaisas>
which is our NLL
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
now we propose a new constraint saying that while optimizer optimizes g(x) we want f(x) to remain zero
< sumedhghaisas>
so the optimizer is changing x to optimize g(x) , we are saying that optimizer cannot change to any x but x that satisfy f(x) = 0
< sumedhghaisas>
for us f(x) is (sum of P - 1)
< sumedhghaisas>
we are restricting the optimizer to parameters that satisfy this equation
< sumedhghaisas>
while optimizing NLL
< sumedhghaisas>
This sum of P is unnormalized
< sumedhghaisas>
we want that sum to be 1
< sumedhghaisas>
but thats the constraint on the optimizer
< KimSangYeon-DGU>
Got it.
< sumedhghaisas>
So just NLL + (sum of P - 1)
< KimSangYeon-DGU>
without lambda?
< sumedhghaisas>
wait its NLL + lambda * (sum of unnormalized P - 1)
< KimSangYeon-DGU>
Ah thanks
< sumedhghaisas>
ahh yes lambda ... good catch
< sumedhghaisas>
although lambda could be 1 :P
< KimSangYeon-DGU>
:)
< KimSangYeon-DGU>
Sumedh, but the optimizer cannot find the sum of P - 1 is 0..
< KimSangYeon-DGU>
The sum of P keep going to be large
< sumedhghaisas>
yeah... that has lot of problems as well.
< sumedhghaisas>
Lagrangian is hard to optimize
< sumedhghaisas>
our optimizers are not good enough
< sumedhghaisas>
Its like L2 regularization
< sumedhghaisas>
remember L2?
< sumedhghaisas>
L2 regularization is basically a lagrangian saying that sum of weight squares should be C
< KimSangYeon-DGU>
Ah
< KimSangYeon-DGU>
Can we try the L2 regularization?
< sumedhghaisas>
not really... We don't have any priors on our parameters
< sumedhghaisas>
L2 works because it comes from baysian perspective of gaussian prior over parmeters
< sumedhghaisas>
L1 is laplace prior
< KimSangYeon-DGU>
Thanks for the information.
< sumedhghaisas>
But okay so this optimization is basically going to th centre right?
< KimSangYeon-DGU>
I sent an result image
< KimSangYeon-DGU>
Wait a moment
< KimSangYeon-DGU>
I sent it
< KimSangYeon-DGU>
This time it is not going to the centre but not trained well.
< sumedhghaisas>
so its diverging?
< sumedhghaisas>
okay then the approximation of the constraint is not good enough...
< KimSangYeon-DGU>
I sent an email with result video
< KimSangYeon-DGU>
Can you view the file?
< KimSangYeon-DGU>
Right, the constraint is not good enough...
< sumedhghaisas>
ahh wait...
< sumedhghaisas>
the optimize looks almost good enough
< sumedhghaisas>
I can see that the centres are going in the right direction at least right?
KimSangYeon-DGU has quit [Remote host closed the connection]
KimSangYeon-DGU has joined #mlpack
< KimSangYeon-DGU>
sumedhghaisas: Sorry my internet connection was broken
< KimSangYeon-DGU>
Yeah the mean is right
< sumedhghaisas>
This is with the NLL + lambda * (unnormalized sum of p - 1)?
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
I think these are good results
< sumedhghaisas>
so at the end o the video the centre waas still going in the correct direction
< sumedhghaisas>
could you optimize it little bit more?
< KimSangYeon-DGU>
I test it longer
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
how stable is the training
< sumedhghaisas>
in terms of random initialization?
< sumedhghaisas>
try 10 different initializations and see how they behave
< sumedhghaisas>
if for each one the centre is going in the correct direction
< sumedhghaisas>
I definitely call it a good result
< sumedhghaisas>
at least the training is stable
< KimSangYeon-DGU>
Yeah
< KimSangYeon-DGU>
I'm currently testing it
< sumedhghaisas>
Great. Also I am free tomorrow. If you wanna sleep right now and ping me in the morning?
< KimSangYeon-DGU>
sumedhghaisas: I have a question. The GMM uses Cholesky decomposition
< KimSangYeon-DGU>
Ah yes
< sumedhghaisas>
your call :) just dont want to keep your awake unnecessarily
< KimSangYeon-DGU>
But the Cholesky decomposition isn't stable...
< sumedhghaisas>
its not stable yes
< KimSangYeon-DGU>
So, it is tricky when we set the parameters initially
< sumedhghaisas>
but when you cholesky its cholesky of the sigma right?
< sumedhghaisas>
ahh I would just find 10 random initialization in 0 to 1 mean
< sumedhghaisas>
and some rational variance
< KimSangYeon-DGU>
But the covariance
< sumedhghaisas>
I see.
< KimSangYeon-DGU>
I used the lower covariance
< sumedhghaisas>
Thats a valid point
< KimSangYeon-DGU>
Ah
< KimSangYeon-DGU>
But it is too sensitive to the intialization of covariance
< sumedhghaisas>
did you observe that covarinace affect the training a lot?
< sumedhghaisas>
I see
< KimSangYeon-DGU>
Not deeply
< KimSangYeon-DGU>
I'll also check it
< sumedhghaisas>
Yeah I think little experimentation on covariance is required as well
< sumedhghaisas>
I will do some checking how covariance affects it in theory
< KimSangYeon-DGU>
Yeah thanks
< sumedhghaisas>
But I suspect this is coming from the optimizer
< sumedhghaisas>
and not from the model
< sumedhghaisas>
okay basically next goal is to see how stable the training is w.r.t. initial parameters
< KimSangYeon-DGU>
Agreed, So I set constraint positive definite constraint to covariance
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
thats right
< sumedhghaisas>
ehhh we have some good news :)
< KimSangYeon-DGU>
Really great
< sumedhghaisas>
lets hope the training is stable
< KimSangYeon-DGU>
Yeah
< sumedhghaisas>
and then we will focus on getting better approximation
< sumedhghaisas>
I have some ideas in that direction which I can tell you all about tomorrow
< KimSangYeon-DGU>
Yeah but I'm worried about the timeline of implemenation
< sumedhghaisas>
Me too a little bit :( I will talk to Ryan and figure it out don't worry :)
< KimSangYeon-DGU>
Ah yes!
< sumedhghaisas>
Let me take care of that :)
< KimSangYeon-DGU>
Thanks, I'll continue to implement it
< KimSangYeon-DGU>
:)
< sumedhghaisas>
Great. Give that paper a go if you get some time :)
< KimSangYeon-DGU>
Yeah
KimSangYeon-DGU has quit [Remote host closed the connection]
KimSangYeon-DGU has joined #mlpack
< KimSangYeon-DGU>
sumedhghaisas: I;ll get back to you tomorrow! Thanks for the meeting :)
< sumedhghaisas>
See you tomorrow :)
KimSangYeon-DGU has quit [Remote host closed the connection]
sumedhghaisas has quit [Ping timeout: 260 seconds]
vivekp has joined #mlpack
vivekp has quit [Ping timeout: 258 seconds]
< sreenik[m]>
zoq: I am thinking of introducing the momentum parameter in batchnorm, it will have a default value of 1 unless specified. Is it all right to proceed?
< zoq>
ShikharJ: Agreed.
< zoq>
sreenik[m]: To provide backward compatibility?
< zoq>
sreenik[m]: Sounds like a good idea to me.
< sreenik[m]>
Yes it won't hinder backward compatibility
< zoq>
Great!
< sreenik[m]>
I was having a difficulty in understanding a part of it though. Let me explain...
< sreenik[m]>
I am not sure if deterministic can be true after it is already false
< zoq>
deterministic is set by the FFN or RNN class and defines if we are in training mode (deterministic = false) or prediction mode (deterministic = true). But you are right the default value is training:
< zoq>
is the function that updates the deterministic parameter.
< sreenik[m]>
Oh, I get it now. Actually, adding momentum functionality is just modifying the runningMean and runningVariance, so I was thinking of the consequences as I didn't see them getting used anywhere afterwards. But this solves it for me. Thanks :)
< zoq>
Nice :)
xiaohong has quit [Read error: Connection timed out]