rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack
jjb[m] has joined #mlpack
<jjb[m]> ryan nice! I saw a few items on the “Should require only C++” that I’ll aim to tackle.
<zoq[m]> Some really good numbers.
<rcurtin[m]> I still have some minor bugs in my OpenCL XORWOW implementation, but I have it within ~4x of CUDA. I'll probably spend a couple more hours with it, but randu() performance is not the most important thing in the world so probably not much more time than that... for now 😃
<rcurtin[m]> The Philox generator you wrote will be what I use for randn() 👍️
<rcurtin[m]> * I still have some minor bugs in my OpenCL XORWOW implementation, but I have it within ~4x of the runtime CUDA. I'll probably spend a couple more hours with it, but randu() performance is not the most important thing in the world so probably not much more time than that... for now 😃
<rcurtin[m]> * I still have some minor bugs in my OpenCL XORWOW implementation, but I have it within ~4x of the runtime of CUDA. I'll probably spend a couple more hours with it, but randu() performance is not the most important thing in the world so probably not much more time than that... for now 😃
<rcurtin[m]> s/CUDA/the runtime of cuRand/, s//`/, s//`/
<rcurtin[m]> It's... a lot 😃
<zoq[m]> So far the implementation is easy, so easy to review.
<rcurtin[m]> Those array operations are the easiest kernels to write and tune, they're very boilerplate. There might be some extra performance that one could squeeze out of each operation, but that's a task for another time...