verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
chenzhe1 has joined #mlpack
chenzhe has quit [Ping timeout: 246 seconds]
chenzhe1 is now known as chenzhe
mikeling has joined #mlpack
chenzhe has quit [Ping timeout: 240 seconds]
aashay has joined #mlpack
aashay has quit [Quit: Connection closed for inactivity]
vpal has joined #mlpack
vivekp has quit [Ping timeout: 260 seconds]
vpal is now known as vivekp
shikhar has joined #mlpack
shikhar has quit [Read error: Connection reset by peer]
shikhar has joined #mlpack
shikhar has quit [Quit: WeeChat 1.4]
sumedhghaisas has joined #mlpack
mentekid has quit [Quit: Leaving.]
mentekid has joined #mlpack
sumedhghaisas has quit [Ping timeout: 240 seconds]
mikeling has quit [Quit: Connection closed for inactivity]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
zoq: Hey Marcus, had a couple of questions about the architecture.
< sumedhghaisas>
I observed that the 'parameter' input to the evaluate function is not used in FFN
< sumedhghaisas>
So we are assuming that only gradient descend based optimizers will be used?
< zoq>
sumedhghais: That's true, ignoring the input, was the easiest was an easy way to reuse the existing optimizer classes without writing a wrapper. But if you have something in mind, that you think it worth a change, feel free.
< sumedhghaisas>
zoq: yeah I agree. But I am bit confused about the working. So Evaluate returns the loss based on current network parameters
< sumedhghaisas>
but the Gradient function creates gradient in a matrix style
< sumedhghaisas>
for the update
< sumedhghaisas>
so where are the parameters updated? usually they are updated in the optimizer, right?
mentekid has quit [Quit: Leaving.]
< sumedhghaisas>
ahh okay... the 'iterate' matrix is passed as the reference to the 'parameters' object of FFN
< sumedhghaisas>
so it gets updated in vanilla update...
< zoq>
yeah, absolutely right
< sumedhghaisas>
but then we can maybe we can somehow parameterize the update policy to accept actual update operation and bypass the entire gradient matrix creation?
< sumedhghaisas>
what do you think?
< sumedhghaisas>
that update operation will implement a forward pass through all the layers and update their individual parameters?
mentekid has joined #mlpack
< zoq>
I mean you could do that, I guess the benefit is you would save memory, since you only have to hold the current gradient of layer x.
< sumedhghaisas>
yeah... thats what I was thinking. And we can compute and update at the same time... without actually saving the gradient
< sumedhghaisas>
So the update function will do the work of gradient and update
< sumedhghaisas>
but then we will need to change the gradient function of all the layers... uffff
< zoq>
I like the idea, not sure, there is an easy way to achieve this; the idea was to avoid the implement of a special optimizer for the ann code.
< zoq>
modifiying the Gradient function should be straightforward
< zoq>
but it takes some time, yes
< sumedhghaisas>
yeah... We will save lot of memory access... and also the creation of gradient matrix... which involves lot of matrix reshaping
< sumedhghaisas>
okay I will create a github issue for this and try to work it out
< sumedhghaisas>
also ... Should I use the BatchNorm pull request and modify it... cause except for some small changes and adding support for convolutional layers, the code looks good to me
< zoq>
opening a new issue is a good idea
< zoq>
yeah, the BatchNorm PR looks good for me too
< sumedhghaisas>
okay... I will try to replace some architectural changes in place of the batch norm implementation... cause most of my work is already done by him there :P
< zoq>
if you like, sure :)
< sumedhghaisas>
zoq: On a separate note, do you think building the network static rather than dynamic will have speed up?
< sumedhghaisas>
just a curiosity, I dont yet have an architecture to do that :)