rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
birm[m] has quit [Quit: You have been kicked for being idle]
<rcurtin[m]> here's a nice bandicoot benchmark; randu() for 100M floats takes ~1.35s for Armadillo... using cuRand with bandicoot, it takes ~0.001s. speedup of 1000x+ :)
<rcurtin[m]> that's a rate of roughly 360 GB/s for randu generation on my RTX2080ti, which has a max. memory bandwidth of 616 GB/s, so I guess the cuRand developers did a pretty good job :)
<rcurtin[m]> however, I have to write the randu kernels by hand for OpenCL... so I don't know if I'll succeed at getting the same performance levels...