verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
< kris1>
zoq: Parameters are the weights of a layer right so its a matrix
< kris1>
so if passing model.Model()[1] should return a matrix not a row vector.
< zoq>
kris1: right, the problem is that for some layers Parameters does not return the internal weight matrix instead it returns all trainable parameters. I guess an easy solution would be to introduce a function that returns the input and output size. Another solution would be to run the network for a single iteration and use the output to figure out the input and output size, but that would be slower. I have to
< zoq>
think about it.
< zoq>
Until now there was no need to provide a function to get the input and output size.
< zoq>
What I don't like about the idea is that you have to implement the functions for each layer that implements the Parameters function.
< zoq>
Right now, a minimal trainable layer has to provide Forward, Backward, Gradient and Parameters and non-trainable layer just Forward and Backward.
< kris1>
zoq: did you get a chance to look to at nestrov accelrated gradient pr
< kris1>
i think we should implement it as separate method.
< zoq>
Not today, I'll take a look at it tomorrow. Instead of an extra policy?
< kris1>
my future work would be to add apply_momentum to it that would provide the vanilla momentum update to every optimizer method
< kris1>
Yes i am not in favor of the extra policy because i looked at the implementation of caffe. they have a seprate nag method
< kris1>
vanilla nag would take the gradients that update parameters for any optimizer given to it
< zoq>
I think providing another policy would minimize code duplication, without a performance cost. But I'll take a look at the PR tomorrow.
< kris1>
that what i want do in the future. Though every optimizer can have a specialized template for there version of nag as well
< kris1>
Oh yes no problem i just wanted to discuss the design with you.
< kris1>
zoq: I will look into what you said about the parameters.
< zoq>
But if we provide nag as policy, wouldn't that allow any other optimizer to use it?
< zoq>
Isn't that what you meant, implement nag as seperate method?
< kris1>
Oh yes you mean to say that every policy could implement the way the want to implement nag and we could have template base class.
< zoq>
I think we mean the same thing :)
< kris1>
Yes ....:)
brn_ has joined #mlpack
< zoq>
Let me take a look at the PR tomorrow and we can discuss over there, what do you think?
< kris1>
Great.
< kris1>
zoq: okay now i get your point regarding the parameters. okay so i would have to implement a method input to every layer which would be similar to the inputwidth() fucntion that cnn have right.
< zoq>
kris1: Yeah, it's like the InputWidth() function for the conv layer.
mikeling has joined #mlpack
< kris1>
zoq: Every layer has InputParameter(). but we really don't set these inputParameter. I think we could use the args of forward() function to initialize these values.
brn_ has quit [Ping timeout: 240 seconds]
brn_ has joined #mlpack
< zoq>
kris1: I thought about the solution, but to use the InputParameter to specify the input and output size, is not really intuitive for a user. But I agree, that would reduce the minimal function set.
< kris1>
maybe i could add new variables then like fanin and fanout. and FanIn(){return fanin}. But these would have to initialize inside the forward function in every layer.
< kris1>
but i would have to this for every forward function in every layer. Is there work around that
< kris1>
zoq:
< zoq>
I haven't really thought about a better solution but adding InputSize() and OutputSize to each layer that implements the Parameters() function, is defently an idea that works.
< zoq>
The input and output size is know at construction time, so we can use the constructor initialization list to set the parameters.
< zoq>
e.g. for the linear layer InputSize returns inSize and OutputSize returns outSize
< kris1>
zoq: but that is not true for all layers.
< kris1>
it would be better if we use the ones in forward function.
< kris1>
eg sequential dosen't have a constructor having inputsize and outputsize
< zoq>
Yeah, we implement the InputSize and OutputSize functions only for a layer where we know the size, the sequential layer is special because it's an container like the FFN class, that can hold different layer. It's meant to be used in a case where you have to split into two branches.
< kris1>
oh okay. I see....makes sense....but i could give you another example like add layer.....how would we find out fanin for that
aditya_ has joined #mlpack
drewtran has joined #mlpack
< zoq>
kris1: That is a good point, and somewhat tirck, I agree. So the inputSize of Add is the outputSize of the previous layer right? If Add provides the Parameters() but does not implement the InputSize function, we could asume that the InputSize is the outputSize of the previous layer. I haven't thought this through, but that could be a solution, what do you think?
pretorium has quit [Read error: Connection reset by peer]
daivik has joined #mlpack
daivik has left #mlpack []
anubhavb1_ has joined #mlpack
govg has quit [Ping timeout: 240 seconds]
govg has joined #mlpack
kris3 has joined #mlpack
kris2 has quit [Ping timeout: 260 seconds]
kris3 has quit [Ping timeout: 240 seconds]
kris2 has joined #mlpack
govg has quit [Ping timeout: 260 seconds]
vinayakvivek has quit [Quit: Connection closed for inactivity]
kartik_ has joined #mlpack
thyrix has joined #mlpack
< kartik_>
hi <zoq> I am stuck with the cma-es implementation .. we are optimising a black box function for lets say super mario game..
< kartik_>
now we have in input JSON string input of {0,1,2,3} 169 of them and we have to find its output that are 5 button values according to bangs code..
< kartik_>
but in CMA-ES we have a black box function and given its dimensions , we are able to find the covariance matrix ..
< kartik_>
then 1st how to use the covariance matrix ? and 2nd , am i working on single variate or multi variate CMA-ES for super mario ..
< kartik_>
thanks
anubhavb1_ has quit [Quit: Page closed]
aditya_ has joined #mlpack
< zoq>
kartik_: Hello, first of all the CMA-ES works on topologically fixed neural networks, so for the sake of simplicity let's say we have a two-layer neural network with 169 inputs and 5 outputs.
< zoq>
kartik_: The parameters of the networks are sampled from a multivariate Gaussian distribution, the next step is to evaluate the network and to calculate the fitness, by using the input from the game-screen; at each step, the network outputs one of the 5 actions.
< zoq>
kartik_: When all networks have been evaluated, the mean of the multivariate Gaussian distribution is recalculated as a weighted average of the networks with the highest fitness.
< zoq>
kartik_: At the same time, you update your Covariance matrix (at time = 0 it's the Identity matrix with size N x N where N is the name of parameters/network weights).
< zoq>
kartik_: The covariance matrix is a bias to move in the direction of the most valuable network. Take a look at: https://arxiv.org/pdf/1604.00772.pdf for the exact equation to update the covariance matrix.
< kartik_>
so <zoq> in Bang's NEAT and CNE this was acheived by pertubation , cross over, mutation and speciating which here is done by moving the mean to new location close to optimum.. and recalculating the covariance?
< kartik_>
also at new location using gaussian distribution to sample out the new offsprings .. denoted by lambda?
< kartik_>
ive already read that wonderful tutorial .. you suggested me this in January i guess.. now i have started working on this project again this month.. apologies for the delay ..
Vishal_ has joined #mlpack
< kartik_>
that cleared everything to me except one thing.. covariance is a square matrix having weights and for the given neural network what dimension should it be of ?
< kartik_>
also i would make this compatible with Bang Lui's neural net implementation .. was just thinking when will his implementation be merged in the main repository .. :D he has documented it superbly
Vishal__ has joined #mlpack
< zoq>
kartik_: Yes; NEAT doesn't work on fixed topology networks, instead it will find an optimal topology and parameter set using offspring, mutation and crossover. So you might start with a single layer neural network and could end with a 5 layer neural newwork, where some of the units from layer x are connected with units from another layer.
< zoq>
kartik_: If I remember right lambda is the number of populations/networks sampled from the multivariate Gaussian distribution.
< zoq>
kartik_: The covariance matrix is of size N x N where N = parameter size = network weights.
Vishal__ has left #mlpack []
< zoq>
kartik_: Making it compatible with Bang's code, is a great idea, that way you could reuse part of his work. But we somehow introduced a bug, so that the method wasn't able to solve the Mario task, it's kinda strange because in an earlier stage the network was able to solve all implemented tasks e.g. CartPole, MountainCar, XOR, etc. I already started to look into the issue and I guess solved some issues, but
< zoq>
got distracted by some other things. It's definitely on my list to finish this.
< kartik_>
yes its for the number of population ..
< kartik_>
what kind of bug?
< kartik_>
i used the code myself to check mario implementation and it worked fine
< kartik_>
and <zoq> im still not completely clear with the dimension of covariance matrix.. but i guess ill figure that out ..
< zoq>
kartik_: At some point in the optimization process, the method gets stuck until the point where only one population is left. The method should be able to resolve the issue, but somehow the function that calculates the similarity between genomes isn't able to produce another population.
< zoq>
kartik_: oh, okay, do you remember which version you used for the test? I remember that we tested multiple instances at the same time, for like 7 days ...
< kartik_>
oh no.. it was in december.. i reinstalled latest ubuntu after it and it got removed..
< zoq>
kartik_: oh okay, maybe I should give it another try, maybe it got solved by it's own :)
< zoq>
kartik_: About the covariance matrix: let's say your network has one layer with two inputs and two outputs and two hidden units, you have 8 weights so N = 8.
< kartik_>
yup.. oh ohkae .. that was silly of me
< kartik_>
thanks ..
< kartik_>
ill ping u after some code on cmaes and also then will try for the bug fix..
< kartik_>
finally got downloaded :D .. that was cool ..
kartik_ has quit [Quit: Page closed]
Vishal_ has quit [Quit: Page closed]
chvsp has quit [Quit: Page closed]
chvsp has joined #mlpack
thyrix has joined #mlpack
< chvsp>
zoq: I was writing the code for BatchNorm layer. I couldn't understand what this line means in the Serialize function : ar & data::CreateNVP() . Could you please help me out with this. Thanks
< rcurtin>
chvsp: that's boost::serialization (or a wrapper around it)
< rcurtin>
if you go read the boost serialization documentation, it should make sense
< rcurtin>
the only thing to keep in mind after that,
< rcurtin>
is that data::CreateNVP() is a special mlpack replacement for BOOST_SERIALIZATION_NVP(),
< rcurtin>
and mlpack uses a Serialize() function instead of serialize()
< rcurtin>
if you want more details on what is going on there, after you read the serialization docs, see src/mlpack/core/data/serialization_shim.hpp
< rcurtin>
(but be warned, that file is kind of crazy)
< chvsp>
rcurtin: Boost serialisation- Will look into it thanks.
< chvsp>
serialization_shim.hpp - Will try, thanks for the warning though... :)
< rcurtin>
yeah, no huge need to understand these things in detail, just the basics should suffice to understand what it does
< chvsp>
rcurtin: Another thing I wanted to know. For BatchNorm there is a different forward pass for both train and test runs. I couldn't get any ideas, how to carry it out.
< rcurtin>
I'm not particularly familiar with that code, so unfortunately I can't say for that one
< rcurtin>
I know that sometimes your training passes will be different than your test passes
< rcurtin>
like i.e. with dropout, where you perform dropout at training time but not test time
thyrix has quit [Ping timeout: 260 seconds]
< chvsp>
Cool! I went through the dropout code. There is this parameter called deterministic and we can have different passes conditioned on this variable. I will try to include this in my code.
govg has joined #mlpack
chvsp has quit [Ping timeout: 260 seconds]
shihao has joined #mlpack
bobby_ has joined #mlpack
vinayakvivek has joined #mlpack
bobby_ has quit [Quit: Page closed]
biswesh has joined #mlpack
biswesh has quit [Client Quit]
shihao has quit [Ping timeout: 260 seconds]
mikeling has quit [Quit: Connection closed for inactivity]
< kris2>
zoq: got it thanks. I was misinterpreting template <class LayerType, class... Args> void Add(Args... args) function.
tempname has quit [Quit: Page closed]
chvsp has joined #mlpack
light_ has quit [Ping timeout: 260 seconds]
daivik has joined #mlpack
daivik has left #mlpack []
s1998 has joined #mlpack
< chvsp>
Hi zoq:, rcurtin: I read about the serialisation and have understood what it does. I couldn't understand which variables to serialise? Is there any criteria for selection of variables?
aditya_ has quit [Ping timeout: 240 seconds]
< zoq>
chvsp: Every parameter needed for the reverse deconstruction. You can ask yourself, which parameter do I have to save to for the reconstruction e.g. for the Dropout layer we need ratio and rescale everything else is calculated at runtime for the linear layer we have to save the weights the input and output size.
< zoq>
chvsp: One scenario where serialization is used is to save the model to a file e.g. XML.
< chvsp>
zoq: So in the case of batchnorm, the scale and shift vectors need to be stored. Will the mean and variance of the training set be stored also? Because at test time, we use the mean and variance of the training set.
< zoq>
Yes, we also have to save the mean and variance parameter, since we don't know what someone does once the model is loaded, he could continue to train the model or use it for prediction.
< zoq>
You could also say, that you save the current state of the method/model.
< kris2>
any good ways to the value in the hidden layer at each iteration.
< kris2>
when i call model.predict(x). can i get the values in hidden layers also.
AL3x3d has joined #mlpack
AL3x3d has quit [Quit: Page closed]
< kris2>
could we do a boost::apply_visitor(getparameters, model.Model()[i])
< zoq>
kris2: hm, that would mean a layer can access the Model, right?
< kris2>
zoq: i think we could a forwardvisitor(input, output).
< kris2>
i actually trying to implement the vanilla policy gradients, so thats why need the hidden values for the whole episode
< zoq>
And the base container e.g. FFN calls the forwardvisitor function?
< kris2>
sorry i don't understad....just give me a minute
dineshraj01 has quit [Read error: Connection reset by peer]
< zoq>
What I did for the recurrent visual attention model was to split the network into two branches. One branch for the input and another branch for the actual computation and merge it back where I needed the input and the output of the previous layer.
< zoq>
What you also could do is to implement some function e.g. Input and use a visitor that is called in each iterations, which updates the input.
< chvsp>
Ok got it thanks
delfo_ has joined #mlpack
< s1998>
Since there is no pruning method in current implementation, I would like to implement a Reduced Error Pruning on decsion tree.
vinayakvivek has quit [Quit: Connection closed for inactivity]
< zoq>
s1998: It might take some time before rcurtin answers the question, you can always check the irc logs: www.mlpack.org/irc/
< s1998>
Sure :)
s1998 has quit [Quit: Page closed]
< rcurtin>
I'm in transit right now, it will be a few hours perhaps before I can respond
< rcurtin>
too much travel...
< zoq>
Hopefully some nice place, like Hawaii :)
< kris2>
sorry for the late reply. 1) And the base container e.g. FFN calls the forwardvisitor function. No any function lets say bookeeping() could call the boost::apply_visitor(forwardvistitor(input, output)). Yes i agree we are doing double computation here. Thats my only concern.
< kris2>
zoq: also one more thing when we say parameters do we mean W or mean like in the outputParameterVisitor gives the output.
< kris2>
When you implement a some function......... that updates the input. This would also suffer from the same problem of double computation. as i previously describe.
< zoq>
kris2: Parameters = all trainable parameter in most cases the weights.
< kris2>
ohhh okay then even the outputParameters make sense.
< zoq>
We could save a reference instead of a copy, but yes there is a small overhead.
< kris2>
i think you misunderstood. i need the hidden state values at every forward pass iteration.
< kris2>
should i elaborate more.
< zoq>
ah, I did
< kris2>
for input i am just pushing it to std::vector<arma::vec>
< kris2>
ok, that why i was using the forward vistor for getting the hidden state values.
< zoq>
Couldn't you use the OutputParameter for that?
< kris2>
hmm okay so the outputParameter gives wTx for every layer if i am not wrong.
< zoq>
for the linear layer yes
< zoq>
and you like x right?
< zoq>
Which is either InputParameter or OutputParameter, depending from where you start: for (size_t i = 0; i < network.size(); ++i) { outputParaemter = boost::apply_visitor(OutputParameterVisitor(), network[i]; }
delfo_ has quit [Quit: WeeChat 1.7]
< kris2>
yes exactly what i was thinking thanks
< kris2>
a side question i was reading the reinforcement learning project idea again. The deliverable for the project is just to implement those algorithm for provide cli interface to the user that they can tune the parameters. I think its the latter.
< kris2>
zoq: also i wanted to know by when should we submit the proposals.
< chvsp>
Hi kris2: could you please give me brief introduction of what a visitor is? I think I would need it too, to extract the gradients from a layer.
< kris2>
i think you were implementing the batchnorm right. where did you require visitor just curious.
< chvsp>
I want to get the gradients flowing through the layer, to test the layer. I was wondering how to go about it. You case seemed similar, hence I asked
< zoq>
kris2: The mentioned algorithms are examples, if you have some other interesting methods in mind, I'm open for a discussion.
< zoq>
kris2: The plan is to implement some but not all of them and to provide an interface, so that someone could write a new task e.g. You have some sensor that measures the room temperature and you'd like to know at which time you could open the window without wasting too much energy, or you have this game where you can't find a solution, so you like to use some machine learning to figure it out :)
< zoq>
kris2: The application phase opens March 20? and ends April 3? ... not sure what the exact dates are, you should check the timeline. Anyway, you can submit and update your proposal in that timeframe, also you can give us the chance to take a look at the proposal and we can give you some feedback. But, If you submit your proposal like 3 days before the deadline, we can't guarantee that we have the chance to
< zoq>
give you some feedback.
< zoq>
chvsp: If you have e.g. Linear<> layer(10, 10); you can access the gradients with layer.Gradients().
< kris2>
Aaah thanks....Then maybe i should start right from now itself then. Shouldn't leave it till the last moment.
< zoq>
chvsp: If you have e.g. Linear<> layer(10, 10); you can access the gradients with layer.Gradients(). Once you don't know the layer type you have to use some of the visitors.
< zoq>
kris2: Good idea, as I said you can update your proposal as often as you like in that timeframe.
< chvsp>
zoq: Then why are visitors used in the first place? You declare the network architecture before hand, hence you must know the layers and their types.
< chvsp>
Oh I didn't know that the proposals could be edited. I too will start preparing one.
< chvsp>
I mean edited after submission.
< zoq>
chvsp: You some kind lose that knowledge if you put multiple types (Linear, Sigmoid, Dropout, etc.) into one container. E.g. Linear<> layer(10 ,10); template<typename T>DoSomething(T& layer) T could be anything, so you need some abstraction that allows you to call a specific function for specifc types, since not every layer implements the same function set.
< kris2>
chvsp: The explanation is trying to separate the data structure from the algorithm. For different layer types we could have implement lets say weight parameters. These would give the weight parameters independent of the layer type.
< zoq>
chvsp: Yeah, I think the rule that you can update your proposal was introduced two years ago.
< rcurtin>
zoq: nope, just a trip to the Symantec office in LA... not very exciting to sit in the cubicles here
< rcurtin>
but I will go racing this weekend so that will make the trip worth it :)
< rcurtin>
s1998: (hopefully you check the logs) I think that sounds like a good thing to implement, but consider implementing the 'PruningStrategy' as a template policy class, just like CategoricalSplitType and NumericSplitType
< chvsp>
zoq: When we do FFN<> model; model.Add <Layer1>; we don't have the Layer1 object with us. Hence to access such objects you use visitor. Is that what you wanted to convey?
< rcurtin>
strange journal name for that paper... "international journal of man-machine studies"
kris2 has left #mlpack []
< chvsp>
rcurtin: "man studies machine" would have made sense... :D
< rcurtin>
I wondered if it was a Kraftwerk reference... 'die Mensch-Maschine"
< zoq>
chvsp: I think we mean the same thing, yes :)
< zoq>
rcurtin: Now listen to some Kraftwerk songs :)
< chvsp>
Kraftwerk die Mensch Maschine - they just seem to repeat the same thing over and over. Its good nonetheless. :)
< zoq>
I agree, it's kinda catchy
< rcurtin>
I like Kraftwerk, the repetitiveness does not bother me at all :)
< zoq>
Does anyone have other music recommendations?
chvsp has quit [Ping timeout: 260 seconds]
< rcurtin>
hehe, if we are thinking in the genre of Kraftwerk, I have also been listening to some Moebius & Plank
diehumblex has quit [Quit: Connection closed for inactivity]
chvsp has joined #mlpack
< zoq>
Another german electronic music band, but this time I can't say I know a single song.
< rcurtin>
yet another german electronic music band I found some time back that I liked was Deutsche-Amerikanische Freundschaft (D.A.F.), I think they mostly recorded in the late 80s/early 90s
< rcurtin>
very minimal electronica, kind of like some of the more quiet tracks from Kraftwerk