verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
alsc has joined #mlpack
alsc has quit [Quit: alsc]
alsc has joined #mlpack
vivekp has quit [Ping timeout: 248 seconds]
vivekp has joined #mlpack
alsc has quit [Quit: alsc]
alsc has joined #mlpack
alsc has quit [Quit: alsc]
akshit2296 has joined #mlpack
akshit2296 has quit [Ping timeout: 260 seconds]
akshit2296 has joined #mlpack
akshit2296 has quit [Client Quit]
alsc has joined #mlpack
< alsc>
so I have started thinking about the implementation of the TerminationPolicy for SGD: since classes are kind of nested but for example I am using it as a member of RMSProp…
< alsc>
two questions
< alsc>
1. I was thinking of giving the full state of the Optimizer back to the user code that instantiates the termination policy, and allowing for a bool return value that decides wether computation should continue or stop… sort of bool shouldTerminate(arguments)
< alsc>
this makes me think if the TerminationPolicy itself should be template over the DecomposableFunction? or?
< alsc>
2. I was thinking of using termination policy as an update policy, checking the validation error each epoch for example, and possibly changing some parameters…. so maybe instead of creating a new template we could just rethink the UpdatePolicyType concept by introducing the bool return value and passing the whole model as an argument so it can be used for crossvalidation?
< alsc>
validation I meant, not crossvalidation
alsc has quit [Ping timeout: 248 seconds]
alsc has joined #mlpack
< alsc>
zoc: I’ll also give you an overview of the changes done in the CMakeLists then..
< rcurtin>
this gives us the time step's predictors, which we have to run on each layer
< rcurtin>
but this particular code will actually do a non-contiguous copy for each time step
< rcurtin>
this is because the data is organized like this:
< rcurtin>
each row contains all time steps for all points sequentially; so if it's 10-dimensional data, the first 10 rows have the first time step, the second 10 rows have the second time step, and so forth ...
< rcurtin>
but, if we organized the data differently, we could avoid this call to .rows() and could avoid the copy
< rcurtin>
the data would need to be organized like this (if we kept it as an arma::mat):
< rcurtin>
each row contains one time step for a single point; the columns are organized such that if there are N points, the first N columns are time step 0 for each point; the second N columns are time step 1 for each point; and so forth ...
< rcurtin>
alternately you can represent this as an arma::cube where each slice is a time slice
< rcurtin>
what do you think? I am happy to make the change (I think it will be straightforward)
< rcurtin>
the only other question is how we might handle variable-length sequences cleanly (not many other toolkits have good support for that); in those cases a copy may be unavoidable, but maybe we can think about the variable-length problem some other day
< zoq>
I would go with arma::mat since all of the other methods deal with arma::mat as input instead of using arma::cube, so unless you see another benefit of using arma::cube I would go with arma::mat. If you can make the adjustments happy to take alook at it afterwards.
< zoq>
Also I agree, let#s deak with variable-length sequences, later.
< rcurtin>
ok, I will rearrange the data accordingly and add some documentation about the data format
< rcurtin>
it might take a little while to adjust all of the tests/etc. that use the RNN code