ChanServ changed the topic of #mlpack to: "mlpack: a fast, flexible machine learning library :: We don't always respond instantly, but we will respond; please be patient :: Logs at http://www.mlpack.org/irc/
< rcurtin>
I don't yet know whether they are good or bad... I need to compare with other GPU toolkits :)
unworthyprogramm has joined #mlpack
ib07 has quit [Ping timeout: 240 seconds]
ib07 has joined #mlpack
ib07 has quit [Max SendQ exceeded]
ib07 has joined #mlpack
unworthyprogramm has quit [Ping timeout: 245 seconds]
ib07 has quit [Ping timeout: 240 seconds]
ib07 has joined #mlpack
ib07 has quit [Max SendQ exceeded]
ib07 has joined #mlpack
ib07 has quit [Max SendQ exceeded]
ib07 has joined #mlpack
kristjansson has quit [Ping timeout: 264 seconds]
TurkeyGibby has quit [Ping timeout: 244 seconds]
gtank___ has quit [Ping timeout: 260 seconds]
vansika__ has quit [Ping timeout: 272 seconds]
ImQ009 has joined #mlpack
vansika__ has joined #mlpack
gtank___ has joined #mlpack
kristjansson has joined #mlpack
TurkeyGibby has joined #mlpack
kartikdutt18[m] has quit [Quit: Idle for 30+ days]
TurkeyGibby has quit [Read error: Connection reset by peer]
TurkeyGibby has joined #mlpack
ashiee has joined #mlpack
ashiee has quit [Remote host closed the connection]
< zoq>
rcurtin: Wow, that is really cool, I guess lr is not the best example for GPU acceleration?
thecodeeagleGitt has joined #mlpack
< thecodeeagleGitt>
Hey everyone!
< thecodeeagleGitt>
I'm Ashlesha Kumar, an undergraduate student at BITS Pilani, India currently pursuing Dual Majors in Computer Science and Economics. I have decent experience in C, C++ and Python and would love to contribute to mlpack. I'm new to the community, any leads or guidance for a starting point for the same would be really helpful!
< zoq>
thecodeeagleGitt: Hello there, there is a get involved section on the community page - https://www.mlpack.org/community.html that should help you get started.
< zoq>
thecodeeagleGitt: Also feel free to ask any questions here.
< thecodeeagleGitt>
Sure, will check it out. Thank you @matrixbot !
< zoq>
thecodeeagleGitt: I'm not a bot :) we use a bridge between (IRC, slack, gitter) that sometimes creates a strange name.
< thecodeeagleGitt>
Oh I see xD
< RishabhGarg108Gi>
It's an achievement when a bot is compared to a human, but it sucks when someone calls a human a bot :p
< rcurtin>
zoq: yeah, I think it is not the best problem; at small batch sizes and dimensionalities, I think it might be too small to really show speedup
< rcurtin>
but, I'm not sure---I need to test with TensorFlow and/or PyTorch on the GPU and see if they somehow do way better :)
< zoq>
rcurtin: Btw. in the MR you mentioned you used SGD, in the branch I can see you worked on lbfgs as well, but I guess lbfgs doesn't work yet?
< rcurtin>
yeah, I haven't tried LBFGS but it *might* work... not sure :)
< rcurtin>
I just updated the numbers with a TensorFlow comparison... TF (even on the GPU) is an order of magnitude slower than bandicoot
< rcurtin>
I don't think TF is a particularly "hard" competitor, but, nice to see that number nonetheless... now let's try PyTorch...
< zoq>
Wow, nice, do you have the CPU numbers for TF as well?
< rcurtin>
oh, I didn't try it, let me do that now
< zoq>
Wonder if at least the CPU implementation is comparable.
< rcurtin>
let
< rcurtin>
*let's find out...
< rcurtin>
if the state of efficiency for machine learning libraries is really truly this bad, the sky is the limit for mlpack and bandicoot and ensmallen...
< rcurtin>
I know TF has a reputation for being slow, but I'm interested to see if PyTorch is similar
< zoq>
Yeah, unfortunately efficiency isn't top priority for the majority.
< rcurtin>
ha, wow, so I ran just with a batch size of 128 and 100 dimensions... TF takes 19.697s on the CPU, 32.283s on the GPU, and Armadillo takes 0.299s on the CPU...
< zoq>
Interesting
< rcurtin>
no idea what the bottleneck is
< zoq>
Didn't expect such a huge gap between tf and coot, arma
< rcurtin>
same, I knew it would be a gap, but I didn't know it would be so big
< zoq>
fingers crossed pytorch is the same :)
< rcurtin>
with 1000 dimensions and a batch size of 1024, TF takes 9.827s on the CPU, 12.706s on the GPU, while Armadillo takes 2.131s on the CPU (and bandicoot+CUDA takes 1.080s)
< zoq>
insanely good numbers in comparison
< abernauer[m]>
Training on CPU with Tensorflow backend takes ages from my limited experience.
< rcurtin>
yeah, I remember when I was at Symantec we found that inference with mlpack's NN toolkit was a good bit more efficient than TensorFlow on the CPU (...and I think in that case on the GPU too, but that probably had to do with the fact that we had a very small batch size for inference)
< zoq>
the arma results are with OpenBLAS right?
< rcurtin>
yeah
< rcurtin>
I'd *assume* but I'm not sure that tensorflow-cpu is using OpenBLAS too
< rcurtin>
I have to say, both TF and PyTorch documentation make it really, really hard to understand what's going on under the hood---what I want to know is that the data is being stored on the GPU, instead of each minibatch being transferred back and forth between CPU and GPU
< rcurtin>
but this turns out to be relatively hard to figure out how to check... the documentation is pretty byzantine in places
< abernauer[m]>
Does TF default to dense or sparse tensors? That would be one bottleneck that comes to mind.
< rcurtin>
abernauer[m]: I'd assume dense; that should be a safe bet
< rcurtin>
I think that, in the end, I'll have to redo the logistic regression example against the MNIST dataset, because both libraries provide examples for that situation (which are presumably tuned to work as well as possible)
< rcurtin>
but, even if that makes the TF implementation faster, it's hard for me to see that significantly changing the results
< rcurtin>
we'll find out, I guess :)
jjerphan has joined #mlpack
< rcurtin>
ok, numbers updated for PyTorch... bandicoot+ensmallen is 3x-10x faster with the code that I wrote
< rcurtin>
I'm not convinced that I've written "optimal" PyTorch code, but I think I did a good job with it
< rcurtin>
still a few other things to try though---I don't just want to know how bandicoot performs relative to TensorFlow and PyTorch; I want to know how it compares to an "optimal" implementation of logistic regression on the GPU (of course one of those does not exist, but maybe some toolkit out there gets slow)
< rcurtin>
*"gets close", not "gets slow"
ib07 has quit [Ping timeout: 240 seconds]
< zoq>
Interesting, they clearly do something else.
< rcurtin>
it's possible that every mini-batch is being copied back and forth from the GPU, but I tried to keep that from happening
< zoq>
I'll spin up NVIDIA Nsight and take a look at the results later.
< rcurtin>
awesome, thanks, I guess I should add some more directions...
jjerphan has quit [Remote host closed the connection]
ib07 has joined #mlpack
ImQ009 has quit [Quit: Leaving]
< AlexNguyenGitter>
Hi, do you happen to have some resource to walk newcomers through the codebase in overall? Thanks
< jjb[m]>
Ryan Curtin is bandicoot out of alpha?
< abernauer[m]>
jjb: How's life? Working on issues to get the mlpack R bindings working on mac and ready for CRAN submission?
< jjb[m]>
Hectic as sheltering-in-place implies having more time to work on projects; but, that rarely has been the case. CRAN-wise, it’s a slow and steady process.
< jjb[m]>
In the interim, there is a new `conda` package for `r-rcppensmallen` , which is useful once CRAN gets `mlpack` as `r-rcppensmallen` is a dependency.
< abernauer[m]>
Yeah saw that contributions. Completed a more involved example for collaboritive filtering and the movielens data set for the mlpack/examples repo a week or so ago.
< rcurtin>
jjb[m]: nope, not out of alpha yet... mostly working on a 'proof of concept' for now to indicate (1) if the prototype is efficient (seems like yes?) (2) how much work we have yet to do before we can actually release it on an unsuspecting public :-D
< rcurtin>
I know the feeling about sheltering-in-place... it feels like there should be so much more time, but actually there isn't :)
< rcurtin>
awesome to hear about r-rcppensmallen; any update on the mlpack CRAN package or way I can help out? I know I backed out some time back, but my time has cleared up a bit since then so I can help see it through
< rcurtin>
AlexNguyenGitter: the codebase itself should be pretty reasonably documented, so hopefully it shouldn't be *too* hard to jump in :) have you seen https://www.mlpack.org/community.html ?