verne.freenode.net changed the topic of #mlpack to: http://www.mlpack.org/ -- We don't respond instantly... but we will respond. Give it a few minutes. Or hours. -- Channel logs: http://www.mlpack.org/irc/
robertohueso has left #mlpack []
vivekp has quit [Ping timeout: 248 seconds]
vivekp has joined #mlpack
caiojcarvalho has joined #mlpack
travis-ci has joined #mlpack
< travis-ci> manish7294/mlpack#2 (evalBounds - 457980e : Manish): The build passed.
travis-ci has left #mlpack []
cjlcarvalho has joined #mlpack
caiojcarvalho has quit [Ping timeout: 276 seconds]
caiojcarvalho has joined #mlpack
cjlcarvalho has quit [Ping timeout: 260 seconds]
vivekp has quit [Ping timeout: 240 seconds]
vivekp has joined #mlpack
lozhnikov has quit [Ping timeout: 240 seconds]
lozhnikov has joined #mlpack
lozhnikov has quit [Ping timeout: 276 seconds]
cjlcarvalho has joined #mlpack
caiojcarvalho has quit [Ping timeout: 256 seconds]
lozhnikov has joined #mlpack
cjlcarvalho has quit [Ping timeout: 240 seconds]
cjlcarvalho has joined #mlpack
lozhnikov has quit [Ping timeout: 260 seconds]
vivekp has quit [Ping timeout: 244 seconds]
manish7294 has joined #mlpack
vivekp has joined #mlpack
manish7294 has quit [Client Quit]
manish7294 has joined #mlpack
< manish7294> rcurtin: Got a copy of your mail from mailing list, look like you're already up. I want to discuss the structure of boostmetric. Do you think it is a good time for that?
< rcurtin> manish7294: I am in talks all day, so I think maybe it would be best to use email for that
< manish7294> sure :)
< rcurtin> I won't really be able to devote much time at the moment, only quick responses, etc.
< manish7294> no problem
< rcurtin> I'm waking up roughly 7 UTC this week so our awake times overlap much more than usual :)
< manish7294> JUst if we you have few seconds to spare, Do you think we would be able to use existing optimizers for boostmetric. https://arxiv.org/pdf/0910.2279.pdf
< manish7294> As looking from the algorithm I think we have to build it from scratch.
< rcurtin> I'll take a look when I have a second and respond then, thanks for the direct PDF link :)
< manish7294> sure
< rcurtin> do you mean for Algorithm 2?
manish72942 has joined #mlpack
< manish72942> Right, sorry for delay --- Lost the connection
manish7294 has quit [Ping timeout: 252 seconds]
< rcurtin> this will take me a little time to think about
< rcurtin> I will try and have an answer later today
< manish72942> no need to hurry :)
< rcurtin> manish72942: I haven't had time to really give it a good look, but my instinct is, the BoostMetric paper claims both speedup and accuracy boost over LMNN
< rcurtin> so, if you want to devote time now (if it will not take too long), you could implement it by itself and we could see if both of those claims are true
< rcurtin> I am not sure if speedup will still be obtained given our optimized implementation (which still has some further optimization to go)
< manish72942> like rough implementation as guven in algo without any optimizer or anything,right?
< manish72942> *given
< manish72942> If that so, I am on my way.
< rcurtin> right, I think that is reasonable, but the more important thing here will be trying to reproduce the results of their paper
< rcurtin> I want to make sure that in the time we have left, we get something interesting
< rcurtin> so really the situation I want to try to avoid is an incompletely optimized LMNN and then we find out that BoostMetric does not consistently give speedup or improved accuracy over LMNN
< rcurtin> fully optimized LMNN is interesting by itself, and also interesting is fast BoostMetric with some of the LMNN optimizations
< rcurtin> I need to read the BoostMetric paper in full (I have not had a chance to do that, sorry; I have been focused on LMNN)
< rcurtin> manish72942: more about the runtime results in BoostMetric:
< rcurtin> (1) the paper claims that for each iteration of LMNN, a projection of M back onto the PSD cone is needed, which costs an O(d^3) eigendecomposition
< rcurtin> however, in our implementation, since we are optimizing L (where M = L^T L) directly, M is always guaranteed to be PSD so we do not ever need to take that step
< rcurtin> (2) the paper points out that their implementation was in MATLAB, and that further speedup could be seen in C/C++
< rcurtin> to me, this almost guarantees they used the MATLAB implementation of LMNN, which we already know to be inefficient since it computes the full distance matrix
< rcurtin> so, an "efficient" implementation of BoostMetric may behave entirely differently than their results (with respect to speed at least)
< manish72942> Ya, they have even referenced it in their implementation
< rcurtin> so I don't mean to say BoostMetric is bad or anything, of course---I just mean that we can't be sure of exactly what we will encounter with respect to speed
< rcurtin> do you think you would rather implement BoostMetric or keep working on the LMNN optimizations? (or perhaps you feel that you can do both in parallel?)
< manish72942> from your above comments it looks like we already have done all these optimization, so we shouldn't be expecting much from this. But still let's give it a shot, maybe just today itself I will try to work it out.
< rcurtin> I think it's still completely possible that all of our LMNN optimizations will apply to BoostMetric
< rcurtin> and if it doesn't fit exactly into the optimizer API, that's ok---after all, our existing AdaBoost implementation does not either
< rcurtin> but as far as any paper goes, we can say, e.g., "we have provided order-of-magnitude+ speedups to LMNN and expect that these would be applicable to LMNN derivatives such as BoostMetric, PSMetric, etc."
< manish72942> I will try to make a rough implementation today and will see, whether it's any good continuing working on boost metric.
< manish72942> agreed, we are at least in position to claim that
< rcurtin> but I don't think we can persuasively say, e.g., "we got a little bit of speedup for LMNN and also implemented BoostMetric but roughly only see the same results as the BoostMetric paper" :)
< rcurtin> anyway, yeah, that sounds good, let's see what the rough implementation does
< manish72942> :)
lozhnikov has joined #mlpack
lozhnikov has quit [Ping timeout: 264 seconds]
lozhnikov has joined #mlpack
lozhnikov_ has joined #mlpack
lozhnikov has quit [Ping timeout: 240 seconds]
lozhnikov_ has quit [Client Quit]
< ShikharJ> lozhnikov: zoq : I hace tried debugging the RBM PR to an extent, but I'm unable to get the test accuracy up. Could you guys take a review of the code please?
lozhnikov has joined #mlpack
lozhnikov has quit [Ping timeout: 256 seconds]
< zoq> ShikharJ: Can you narrow down the issue to some part of the code? I'll take a look at the code later today, but I think it would be helpful to get some additional information, maybe you can tell us what you already tried?
cjlcarvalho has quit [Ping timeout: 248 seconds]
ImQ009 has joined #mlpack
< sumedhghaisas> Atharva: Hi Atharva
< sumedhghaisas> Hows it going?
< sumedhghaisas> Did the model work with MeanSquaredError?
< ShikharJ> zoq: I have added the support for mini-batches, but I'm doubtful of the usefulness of the design (I'd refactor the entire PR to use SFINAE + enable_if<>). Plus I'm not sure where the FreeEnergy function of SSRBM originates from. That, and a number of issues while working with mini-batch inputs since most of the code was designed keeping single input in mind, but the tests make use of mini-batches.
< ShikharJ> zoq: Most of the other part of the code is correct, but these issues are likely to be the cause of trouble. More specifically, the ones relating to updation of gradients.
cjlcarvalho has joined #mlpack
< zoq> ShikharJ: I guess, it would make sense to switch back to the single input case, if that might cause some issues; don't think training over minbatches is that important at least at this point.
< zoq> Also, this sounds like that we should start with the free energy function.
< ShikharJ> zoq: I tried augmenting the test-cases for single inputs, but even there the accuracy is not good, so there's probably a problem with our Evaluate-Gradient routines.
< zoq> ShikharJ: Okay, we should check the gradients for some steps, perhaps we see some strange values (inf, zeros).
caiojcarvalho has joined #mlpack
cjlcarvalho has quit [Ping timeout: 268 seconds]
jenkins-mlpack has quit [Ping timeout: 256 seconds]
manish7294 has joined #mlpack
< manish7294> rcurtin: Here's a rough implementation but it seems the binary search part takes just indefinite time (probably something is wrong), if you get some time please have a look at it - https://gist.github.com/manish7294/3d97be37919658b96bba0125f2f3de84
< manish7294> hmm, it looks like the the terminating condition for bisection given in the paper and in the implementation differ by an extra condition (abs(lhs) < EPS), Now after adding that condition, it seems boostmetric is superfast :)
< manish7294> The main reason I could think of is --- it doesn't recalculate impostors at every iteration.
vivekp has quit [Ping timeout: 264 seconds]
vivekp has joined #mlpack
xa0 has quit [Ping timeout: 256 seconds]
manish7294 has quit [Ping timeout: 252 seconds]
manish72942 has quit [Ping timeout: 265 seconds]
xa0 has joined #mlpack
xa0 has quit [Ping timeout: 244 seconds]
robertohueso has joined #mlpack
xa0 has joined #mlpack
< rcurtin> manish7294: great to hear it's fast, can you get some timings/accuracy reports on different datasets?
< rcurtin> if the issue is that it's not calculating impostors, we could also have a variant of LMNN where we don't recalculate impostors, and see what the performance there is
< rcurtin> it may also be implicit in their paper that impostors need to be recalculated, so maybe their implementation recalculated impostors but the paper didn't make it clear that needed to be done
manish7294 has joined #mlpack
< manish7294> rcurtin: Here's the original implementation https://gist.github.com/manish7294/123598515035fe5a37f0a049143e06ac , and I don't think they have ever recalculated impostors (they just have done it once for calculating knn_triplets)
< rcurtin> I don't really have time to look into the implementation, I am just offering possibilities for the speedup
< rcurtin> it will be interesting ti see the accuracy results, and then we should also compare with LMNN where we never recalculate impostors
< manish7294> no worries, I will post them soon :)
< rcurtin> sure, sounds good
< manish7294> rcurtin: Here are some simulations : https://gist.github.com/manish7294/2388267666b1159ce261ce7b95dc923c
manish7294 has quit [Quit: Page closed]
< Atharva> zoq: You there?
< zoq> Atharva: I'm here now.
< Atharva> zoq: I have realised that serialising parameters of the Sequential layer does not work. As the Sequential layer is just a container its parameter object is empty.
< Atharva> Instead, I propose a different solution to access the encoder and decoder of a network seperately, which I also think might be useful in other cases.
< Atharva> What do you think about a ForwardPartial() function in the FFN class which takes in input, output matrices and the starting and ending number of the layers within the network to forward pass through.
< Atharva> I implemented it locally and it saved a lot of effort when working with the decoder and encoder seperately
< zoq> Atharva: You are right, but using 'ar & BOOST_SERIALIZATION_NVP(network);' should call the serialize function of each layer, at the end we still have to implement the reset function through.
< zoq> Atharva: hm, that is an interesting idea, do you think we could provide another Forward function that does the same?
< Atharva> zoq: Okay, so do you want this function to be called just Forward instead of ForwardPartial ?
< zoq> Atharva: If you think that is reasonable, I think it looks cleaner.
< Atharva> zoq: Yes, it will call the serialize function of each layer, but then no layer serializes its parameters in the serialize function. So the trained parameters never get saved individually.
< Atharva> zoq: Okay! I will create a new PR then.
< zoq> Atharva: Right, you still ahve to collect the parameter in the reset function, your solution sounds much simpler.
ImQ009 has quit [Quit: Leaving]
< zoq> ShikharJ: Do you use 'RBMNetworkTest/ssRBMClassificationTest' for testing?
lozhnikov has joined #mlpack
< Atharva> zoq: I just posted a blog post, but the website isn't getting updated.
< zoq> Atharva: hm, I wonder if changing the date from 2018-07-10 to 2018-07-17 will fix the issue.
< Atharva> zoq: Oh sorry, I didn't change the date when I copied it.
< zoq> okay, that doesn't fix the issue, I'll look into it tomorrow.
< ShikharJ> zoq: Yes.
< zoq> ShikharJ: I get the following error:
< zoq> error: as_scalar(): expression doesn't evaluate to exactly one element
< zoq> unknown location:0: fatal error: in "RBMNetworkTest/ssRBMClassificationTest": std::logic_error: as_scalar(): expression doesn't evaluate to exactly one element
< ShikharJ> zoq: Ah, the ssRBM needs to be changed a little for batch support I guess.
< ShikharJ> zoq: I'll push in a few changes in an hour or so, probably that would fix this as well.
< zoq> ShikharJ: Okay, thanks!
< ShikharJ> zoq: The bigger issue lies with BinaryRBM code.
< ShikharJ> zoq: In my system, SSRBM is still giving about 74% accuracy, while BinaryRBM is just a notch above 65%.
< zoq> For the binary test I get: error: addition: incompatible matrix dimensions: 100x10 and 100x1, sounds like some batch size issue.
< ShikharJ> zoq: Have you pulled in the latest code from the branch?
< ShikharJ> zoq: Because these issues were there in previous versions on the code, which atleast builds and runs fine for now? What configuration of CMake are you using?
< zoq> cmake -DBUILD_CLI_EXECUTABLES=OFF -DDEBUG=ON -DBUILD_PYTHON_BINDINGS=OFF ..
< zoq> last commit is 98b5fc04d, which I think is the latest version
< zoq> travis ends up with the same error
< ShikharJ> zoq: Ok, the commit seems to be fine, I'll look into this as well. Thanks for letting me know.
lozhnikov has quit [Ping timeout: 240 seconds]