verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
vivekp has joined #mlpack
manish7294 has joined #mlpack
< manish7294>
rcurtin: Finally results on covertype are out. K = 10, Initial Accuracy - 96.9761, final Accuracy - 97.3618, Total Time - 13hrs, 23mins, 1.7secs, Optimizer - L-BFGS
manish7294 has quit [Ping timeout: 260 seconds]
< rcurtin>
manish7294: results seem decent, but that took a really long time. did you compile with -DDEBUG=OFF?
< rcurtin>
also do you know how many iterations of L-BFGS were used?
< rcurtin>
I was writing up my theory today for pruning but something was wrong, the results did not make sense
< rcurtin>
so I have some error I need to fix, I will try tomorrow
manish7294 has joined #mlpack
manish7294_ has joined #mlpack
< manish7294_>
rcurtin: I can tell you the number of iterations only if I could scroll back the log. Done many things to make this happen but I am not getting why I can't scroll the log. :(
< manish7294_>
I have used -DDEBUG = OFF Here
< rcurtin>
are you using screen?
< rcurtin>
should be ctrl+a + esc then scroll
< manish7294_>
with screen too, I am not getting
< manish7294_>
let me try once more
< manish7294_>
No, It didn't work
< rcurtin>
ok
< rcurtin>
that's strange
< rcurtin>
the scrollback buffer should still be there
< rcurtin>
sorry for the slightly slow responses... I am playing mariokart online
< manish7294_>
and total computing neighbors time was 6 hrs 15 mins 58.1 secs
< rcurtin>
so I respond while waiting for the next race :)
< manish7294_>
great, should be having fun
< rcurtin>
that sounds about right. if that was on a benchmarking system, each search should take roughly 30 sexs
< rcurtin>
secs
< rcurtin>
excuse me
< manish7294_>
:)
< rcurtin>
I had to correct the typo so I missed the start :)
< rcurtin>
another thing you could try is, e.g., only doing the NN search for impostors every 100 iterations ot somehing like this
< manish7294_>
Now, I am able to scroll back 1950 lines but still can't touch last iteration log as I used --verbose
< manish7294_>
How can we keep count of iterations inside the LMNNfunction to make above happen?
< rcurtin>
ah, sorry, I forgot L-BFGS only prints the iteration number in debug mode
< rcurtin>
so there will be no output
< rcurtin>
maybe we should change the optimizer to print in verbose mode too
< manish7294_>
Temporarily we can just cout
< manish7294_>
Or should we be making a permanent change?
< rcurtin>
I don't have a particular preference, I think it is fine as is
< rcurtin>
but if you want to change it I am fine with that also
< manish7294_>
okay so I will make a temporary cout for myself
< manish7294_>
Please check this 'How can we keep count of iterations inside the LMNNfunction to make above happen?'
< rcurtin>
you could just have a size_t that you increment each time Evaluate() is called
< rcurtin>
that wouldn't be perfect since Evaluate() may be called more than once per itwration
< rcurtin>
but it can still be helpful for reducing the cost of the LMNNFunction evaluatuons
< manish7294_>
Sure, Let's try it out.
< rcurtin>
ok, time for bed now---talk to you tomorrow!
< manish7294_>
have a good night :)
< ShikharJ>
zoq: It seems that our implementation converges within 10 epochs on the 10,000 image dataset. I found no practical difference (atleast not visually) between the 10 and 20 epoch results.
< ShikharJ>
zoq: Might be because we're taking a big generator multiplier (about 10). I'll investigate for different multiplier steps as well.
< zoq>
ShikharJ: Do you think we could save the weights after a couple of iterations? It might be a neat way to visualize the optimization process afterwards?
< ShikharJ>
zoq: Sure, I'll see what I can do.
sumedhghaisas has joined #mlpack
dasayan05 has joined #mlpack
sumedhghaisas2 has joined #mlpack
dasayan05 has left #mlpack []
sumedhghaisas_ has joined #mlpack
< sumedhghaisas_>
Atharva: Hi Atharva
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has quit [Ping timeout: 240 seconds]
< Atharva>
sumedhghaias_: Hi Sumedh
< Atharva>
The numerical deriative test is failing, I think we have the derivatives wrong.
< sumedhghaisas_>
Hi Atharva
< sumedhghaisas_>
@atharva
< Atharva>
Hey
< sumedhghaisas_>
*@Atharva
< sumedhghaisas_>
Have you fixed the input thing to SoftplusFunction::Deriv
< Atharva>
Yes
< sumedhghaisas_>
ohh okay. So the only problem remains is the adding KL loss to the total loss?
< Atharva>
Yeah, but the gradient check is failing
< sumedhghaisas_>
As a preliminary test, try the gradient test without adding klBackward to Backward
< sumedhghaisas_>
The gradient test depends on the total loss
< Atharva>
I am trying that
< sumedhghaisas_>
Sure thing... let me know that works or not
< Atharva>
Oh, but that will only be true for a VAE network, right/
< Atharva>
?
< Atharva>
I am just trying the gradient test with a simple network with a linear and repar layer
< sumedhghaisas_>
No... Remember we were discussing this before the project started?
< sumedhghaisas_>
we somehow need to be able to add the KL loss to the the overall loss
< sumedhghaisas_>
And I think this is super useful also for the future, as with this functionality we will be able to add regularization kind of tricks to the layer
< sumedhghaisas_>
think about it... KL is just another regularization
< Atharva>
Yeah, but only if the network is VAE, right? In this case, aren't we just use a sampling layer to train a simple neural network.
< sumedhghaisas_>
Use linear + repar + linear to test the gradients
< sumedhghaisas_>
Not just VAE, any kind of NN
< sumedhghaisas_>
we haven't made any VAE specific changes, have we?
< Atharva>
Okay, will do that
< sumedhghaisas_>
The Repar layer is just a layer with extra regularization
< sumedhghaisas_>
that is KL
< Atharva>
I will also add the KL loss to the backward function
< sumedhghaisas_>
Add wait...
< sumedhghaisas_>
test first then add...
< sumedhghaisas_>
the gradient test has to pass before adding KL error
< sumedhghaisas_>
Is the procedure clear to you? If you have any doubts let me know
< Atharva>
Yeah. and after adding KL as well, right?
< sumedhghaisas_>
After adding KL error the gradient test should indeed fail
< sumedhghaisas_>
Have you looked at how gradient test works?
< Atharva>
Yes
< Atharva>
I am a little confused as to why it will fail
< sumedhghaisas_>
its delta(Loss) / delta(parameters)
< sumedhghaisas_>
we estimate this numerically
< sumedhghaisas_>
there is one little flaw in our architecture... currently the loss is only 'reconstruction'
< sumedhghaisas_>
so delta(Loss) would be wrong numerically when we also consider the error signal from KL
< sumedhghaisas_>
if we add the error signal from KL, we need to add KL loss to the loss function
< sumedhghaisas_>
which we aren't able to do right now
< Atharva>
Understood
< sumedhghaisas_>
So if the gradient test passes without KL error, our gradients are correct expect for KL
< sumedhghaisas_>
Lets see if thats the case
< sumedhghaisas_>
Then lets decide how to add KL loss overall loss
< Atharva>
Another thing, do you think it will be better to give a boolean parameter while constructing the repar layer whether the user wants to use KL or not, just for that extra functionality
< Atharva>
We could make two cases in the backward function then
< sumedhghaisas_>
umm I am not sure if that will be helpful
< sumedhghaisas_>
repar without KL is just like auto encoders
< sumedhghaisas_>
user can add a bottleneck layer to achieve the same performance
< Atharva>
But there is no random sampling in autoencoders, repar will still have the random sampling
< sumedhghaisas_>
yes... but there is no loss term to tell the layer to control the distribution
< sumedhghaisas_>
it can overfit to every point sees
< sumedhghaisas_>
the problem observed in auto encoers
< Atharva>
Ohhh yes, it will overfit like crazy
< sumedhghaisas_>
indeed
< sumedhghaisas_>
is the test passing?
< Atharva>
Wait, i will give it a go
< Atharva>
To take a break from this, I started with the VAE class yesterday
sumedhghaisas_ has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
< sumedhghaisas>
Atharva: hmmm
< sumedhghaisas>
Lets check the gradients again then
< sumedhghaisas>
The code online is the one you are trying?
< Atharva>
Yeah, with the latest changes you suggested
< sumedhghaisas>
Atharva: the PR code is the latest?
< sumedhghaisas>
but I still see the SoftplusFunction::Deriv issue in that code
< Atharva>
Yeah I made those changes
< Atharva>
haven't commited them yet
< sumedhghaisas>
Okay. Make sure the code you are running is the latest
< sumedhghaisas>
Have you tried debugging where the error is?
< Atharva>
I am trying to calculate the gradients again, do you think the error could be elsewhere?
< sumedhghaisas>
I am not sure. I need to look at the new code you are running
< Atharva>
Okay, I will commit it
< Atharva>
pushed
< sumedhghaisas>
Atharva: Yes I saw. I gave it a very quick look but I have to complete some other work.
< sumedhghaisas>
Try to isolate the error by only using mean or only using stddev
< sumedhghaisas>
this way you would know where the error lies
< Atharva>
Sure, I will try and debug it
< sumedhghaisas>
also check for size consistency in the network, make sure every layer is getting the correct size input
< Atharva>
Okay
< Atharva>
So, I think today's sync isn't necessary now
< Atharva>
I will try and debug the code
< sumedhghaisas>
Atharva: Ahh wait... I see the problem
< sumedhghaisas>
We need to approximate the gradient with constant gaussian sample
< sumedhghaisas>
or the stochasticity in the sample will disturb the computation
< Atharva>
Ohkay
< sumedhghaisas>
Atharva: Okay I suggest adding a boolean to the layer, stochastic=True
< sumedhghaisas>
if user passes false, always assign a constant value to the sample
< sumedhghaisas>
This will help only in testing I guess
< sumedhghaisas>
But its important
< Atharva>
Yeah, but when we don't set a seed, it's always a constant sample
< sumedhghaisas>
umm... I don't think so.
< sumedhghaisas>
When the seed is same, the random number chain is same, each random number is not same
< sumedhghaisas>
so if I run the program again and again, the same random numbers will be generated
< Atharva>
Okay, I will do this
ImQ009 has joined #mlpack
< sumedhghaisas>
rcurtin: Hey Ryan
< sumedhghaisas>
Do you think we should have a NormalDistribution to support matrix of univariate gaussain distributions?
< sumedhghaisas>
*gaussian
< rcurtin>
sumedhghaisas: that would just be a GaussianDistribution with a diagonal covariance matrix, right?
< sumedhghaisas>
rcurtin: it can be done that way. But the problem occurs when there is a batch of distributions
< sumedhghaisas>
like in VAE
< rcurtin>
hmm, so I read some VAE papers but it was like 3 years ago and I think I forgot everything
< sumedhghaisas>
for example
< rcurtin>
so assume that I don't know much :)
< sumedhghaisas>
Okay so let me describe the problme
< sumedhghaisas>
*problem
< sumedhghaisas>
So we pass a batch of points through encoder which converts each points to a gaussian distribution
< sumedhghaisas>
so we have a batch of gaussian distributions, where each distribution has a fixed size
< sumedhghaisas>
and each variable is independent of each other in the distribution
< sumedhghaisas>
This will be too hard to be represented by our current setup. :(
< rcurtin>
hmm, it seems to me like you could use the current GaussianDistribution class, and you would initialize the mean with the vector of means, and the covariance with diag(variances)
< rcurtin>
however one problem with that is that the covariance will take d x d memory, but really since the covariance is diagonal, we should only need 'd' elements
< sumedhghaisas>
yes... and the batch will make it worse
< sumedhghaisas>
rather than b * d memory
< rcurtin>
right, I guess d will be equal to the batch size?
< sumedhghaisas>
it will take b * d * d
< rcurtin>
or wait would it be b*d*d
< rcurtin>
right
< rcurtin>
I see
< rcurtin>
hmm, so a couple of ideas spring to mind. you could write a new class that is made for multivariate gaussian distributions but is specific to diagonal covariances
< rcurtin>
you could also templatize the existing GaussianDistribution class so that it takes whether or not the covariance is diagonal as a parameter, but I think maybe that is a little bit confusing
< rcurtin>
or you could just work with the matrix of means and variances directly in the VAE classes
< rcurtin>
I think any of those could be fine, but I agree, the existing GaussianDistribution would not work for this
< sumedhghaisas>
huh... templatizing it would also work I guess
< rcurtin>
right, I guess it would be template<bool DiagonalCovariance> class GaussianDistribution
< rcurtin>
but I don't know if that makes it too complex
< rcurtin>
I guess you could use using declarations to make it simpler again...
< rcurtin>
template<bool DiagonalCovariance> class BaseGaussianDistribution;
< rcurtin>
using BaseGaussianDistribution<false> = GaussianDistribution;
< rcurtin>
using BaseGaussianDistribution<true> = DiagonalGaussianDistribution; // or some other name, I don't know if that is a good one
< rcurtin>
anyway, that is just one possible idea
< sumedhghaisas>
I agree... soundsconfusing
< sumedhghaisas>
*sounds confusing
< sumedhghaisas>
Naming it NormalDistribution will is confusing?
< sumedhghaisas>
*be
< rcurtin>
I think it may be confusing, but comments in the class description should be sufficient to clarify for users
< rcurtin>
I can't think of too many other names that are not way too long
< sumedhghaisas>
For this, let us continue with NormalDistribution with extensive documentation?
< sumedhghaisas>
The name is also consistent with other libraries
< sumedhghaisas>
I prefer treating it a matrix of nomral distributions rather than batch of gaussian distributions where each distribution has diagonal covariance. What you think?
< sumedhghaisas>
The other issue is regarding FFN anf RNN arch
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
sumedhghaisas2 has quit [Ping timeout: 260 seconds]
travis-ci has joined #mlpack
< travis-ci>
mlpack/mlpack#5024 (master - 1917b1a : Ryan Curtin): The build has errored.
< rcurtin>
sorry, I had a meeting, but it is done now
< rcurtin>
I think NormalDistribution is fine if that's what you'd like to do
< rcurtin>
what's the FNN/RNN architecture issue?
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 276 seconds]
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has quit [Ping timeout: 240 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
haritha1313 has joined #mlpack
vivekp has quit [Ping timeout: 245 seconds]
< haritha1313>
rcurtin: Hi, i had a doubt in armadillo. Thought you might be able to help me out. Is there any function I can use to compare multiple values simultaneously. For e.g if i want to get all rows which have values 3 in column 1 and 4 in column 2, without using find() in a loop.
< haritha1313>
Something like checking for a pair
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 255 seconds]
< rcurtin>
haritha1313: I see what you mean, I can't think of an immediate function for that
< rcurtin>
but, I wonder if you could use a clever lambda in .transform() or something like this
< rcurtin>
I am not sure if you could change the size of the matrix during such a call, though
< rcurtin>
I don't think it would be efficient, but you could use something like sum(a == b) where a is the matrix you're interested in, and b is a matrix with 3s in column 1, 4s in column 2, and nans everywhere else
< haritha1313>
Right now I am using find() for first value, and then using the returned indices to use any() for second value. It seemed to be a bit slow.
< rcurtin>
given the complexity it may be better to just write a for loop over each column
< rcurtin>
er, rather, loop over each row (although since Armadillo is column major it is faster to iterate over columns)
< rcurtin>
(so maybe it is worth transposing the matrix)
< haritha1313>
Will nested loop have lesser complexity than find(), any()?
< rcurtin>
possibly; find() will turn into a loop, and any() may turn into another loop
< rcurtin>
so if you can do it all as one loop, I don't know that there would be any faster way
sumedhghaisas has quit [Ping timeout: 260 seconds]
< haritha1313>
Actually I'm trying out stuff on the movielens-1m dataset, so needed it to be fast enough for 1 million entries.
< haritha1313>
Thanks for helping :) . I'll try it as nested loop itself, the column major point you mentioned will be helpful.
< rcurtin>
the matrix format is going to be Nx3, right? where each column is user id, item id, rating
< haritha1313>
yes
< rcurtin>
or are you representing it as a huge sparse matrix?
< rcurtin>
ah, ok
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Read error: Connection reset by peer]
haritha1313 has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has joined #mlpack
sumedhghaisas has quit [Ping timeout: 260 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas2 has quit [Ping timeout: 260 seconds]
< ShikharJ>
zoq: It seems as though with a lower gradient multiplier, only the time for convergence is increased and no major visual change to the output.
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas has joined #mlpack
sumedhghaisas has quit [Ping timeout: 276 seconds]
sumedhghaisas2 has joined #mlpack
< zoq>
ShikharJ: hmm, okay, I guess we could rerun the experiments with some other parameters, but since the results are just fine for the smaller dataset I would say let's goahead and merge the code, so that we can continue with the next part. What do you think?