<rcurtin[m]>
lots to do for a second release, and lots of optimizations and improvements for the future... but version 1 is out!
<shrit[m]>
congrats
<shrit[m]>
now we can test
<shrit[m]>
indeed I only have nvidia quadro k420,. I have no idea if this has any use or not
<shrit[m]>
and it is my only graphic cards
<rcurtin[m]>
it probably won't give much speedup, but... it should at least work! (assuming the device supports CUDA)
<shrit[m]>
rcurtin[m]: it should be unless if some nvidia graphics card does not support CUDA
<shrit[m]>
it supporsts cuda 3
texasmusicinstru has joined #mlpack
texasmusicinstru has quit [Ping timeout: 272 seconds]
<jonpsy[m]>
<rcurtin[m]> "ta-da! https://coot.sourceforge..." <- looks amazing. For benchmark, was pytorch c++ API used?
<rcurtin[m]>
I just adapted some of their example Python code
<jonpsy[m]>
I see, is there any reason why we have an edge over pytorch/tensorflow
<rcurtin[m]>
I believe but am not 100% sure it is because we are not transferring data back and forth from the GPU; I think TF and PyTorch both do this to preserve some GPU memory
<jonpsy[m]>
Hm, but as per your benchmark both Torch and TF reported OOM for large dimensions
<rcurtin[m]>
yeah, so maybe there is overhead too; I'm not sure the reasons, I didn't take the time to investigate. I did my best to confirm that the code I wrote was reasonable and not using those libraries totally incorrectly