#mlpack on 2022-03-15 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

00:36 GopiMTatiraju[m] has quit [Ping timeout: 240 seconds]

00:36 M082AABMEB has quit [Ping timeout: 240 seconds]

00:37 shrit[m] has quit [Ping timeout: 240 seconds]

00:37 jjb[m] has quit [Ping timeout: 240 seconds]

00:37 M082AABMEB has joined #mlpack

00:49 GopiMTatiraju[m] has joined #mlpack

00:53 shrit[m] has joined #mlpack

00:55 jjb[m] has joined #mlpack

01:21 <NabanitaDash[m]> In the section for RL idea, the link for deep learning reading list, I.e. deeplearning.net/reading-list is mislinked(maybe)?

01:58 <zoq[m]> <NabanitaDash[m]> "In the section for RL idea, the..." <- Nice catch, updated.

03:00 manav71ManavSang has joined #mlpack

03:00 <manav71ManavSang> Hello mlpack community, Hope you are doing well... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/88191bf0a01cca3a968b1e5ec0700fa994ffec71)

03:05 <zoq[m]> Hello manav71 (Manav Sanghi) , since you are interested in the examples idea, one next step is to make sure you have a working local jupyter notebook instance running. One way is to use https://github.com/mlpack/examples/blob/master/scripts/jupyter-conda-setup.sh. After that you could contribute a simple example notebook, don't feel obligated, it's just one thing I would recommend to get familiar with the codebase. Since you mentioned you have

03:05 <zoq[m]> experience with GitHub actions, I was actually wondering if we could set up some GitHub Actions job to automatically check the notebooks, if they run without any errors. We already have a job for the non-notebook code.

03:09 <manav71ManavSang> Hello mlpack community,... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/689d5520f1637f3c3533a861b276cf76cab1bfeb)

03:19 <ShubhamAgrawal[4> <zoq[m]> "Hello manav71 (Manav Sanghi..." <- One way to go about it is to convert `.ipynb` to `.py` using nbconvert, and then rename out file to `.cpp` extension, and use it in usual way.

03:19 <ShubhamAgrawal[4> But I think this is very short for a GSoC project. I may be wrong.

03:20 <zoq[m]> > <@shubhamag:matrix.org> One way to go about it is to convert `.ipynb` to `.py` using nbconvert, and then rename out file to `.cpp` extension, and use it in usual way.

03:20 <zoq[m]> > But I think this is very short for a GSoC project. I may be wrong.

03:20 <zoq[m]> Yes, was meant as a contribution to get familiar with the codebase.

03:21 <ShubhamAgrawal[4> zoq[m]: Oh

04:15 <ShubhamAgrawal[4> <Aakash-kaushikAa> "Hey @shubhamag:matrix.org it..." <- Sent now :)

04:15 <ShubhamAgrawal[4> * Sent just now :)

04:15 <ShubhamAgrawal[4> > <@aakash-kaushik-5f3a5ad9d73408ce4fec6e70:gitter.im> Hey @shubhamag:matrix.org it would be great if you mail your idea or discussion to the mailing list.

08:19 CaCode has joined #mlpack

08:45 CaCode has quit [Remote host closed the connection]

11:06 hitesh-anandhite has joined #mlpack

11:06 <hitesh-anandhite> Hello mlpack community,

11:11 <hitesh-anandhite> Hello mlpack community,

11:11 <hitesh-anandhite> Myself Hitesh Anand. I am a computer science undergrad in my 4th semester at IIT Kanpur. I have experience in python, C++, ML and Deep Learning. Also, I am currently working on an undergraduate project related to deep learning. I am interested in contributing to mlpack in the GSoC’22. I have used mlpack and it was a really good experience as a user. Can someone kindly guide me about how to proceed further and whom to discuss my

11:11 <hitesh-anandhite> project ideas with?

11:11 <hitesh-anandhite> Thanks in advance.

13:37 <NabanitaDash[m]> due to unexpected exception with message:

13:37 <NabanitaDash[m]> Mat::max(): object has no elements

13:38 <NabanitaDash[m]> * ``` REQUIRE_THROWS_AS( mlpack_test_preprocess_split(params, timers), std::unexpected_handler )

13:38 <NabanitaDash[m]> Mat::max(): object has no elements

13:38 <NabanitaDash[m]> due to unexpected exception with message:

13:38 <NabanitaDash[m]> ```

13:39 <NabanitaDash[m]> * REQUIRE_THROWS_AS( mlpack_test_preprocess_split(params, timers), std::unexpected_handler )

13:39 <NabanitaDash[m]> Mat::max(): object has no elements

13:39 <NabanitaDash[m]> due to unexpected exception with message:

13:39 <NabanitaDash[m]> I am not able to figure out why ``REQUIRE_THROWS_AS`` doesn't catch unexpected errors?

14:25 <rcurtin[m]> zoq: is https://github.com/mlpack/mlpack/pull/3153 ready to merge or did it need any more changes?

15:30 <zoq[m]> <rcurtin[m]> "zoq: is https://github.com/..."; <- One more change, I can do that later.

15:30 <rcurtin[m]> sounds good! I just wanted to know if I could merge it. in this case I will not 😄

15:31 <GopiMTatiraju[m]> Heyy, I had one doubt regarding bandicoot, I see that in kernels directory in cuda there are more folders named onewat twoway and threeway, what is the difference between those?

15:33 <rcurtin[m]> each kernel is written to take a different element type---so depending on settings of the preprocessor, we can compile a kernel for float, double, int, etc.

15:33 <rcurtin[m]> now some of these kernels, like `accu()`, only have one element type

15:33 <rcurtin[m]> (hence, "oneway", although I dunno if that is the best name)

15:33 <rcurtin[m]> some other kernels are "two-way", e.g., a dot-product `dot(A, B)` could be between a GPU matrix `A` that has element type `float` and `B` that has element type `double`

15:35 <GopiMTatiraju[m]> Ohh okay...

15:35 <GopiMTatiraju[m]> So do will join fall under twoway?

15:35 <GopiMTatiraju[m]> We can join matrices to different types, right?

15:36 <GopiMTatiraju[m]> s/do//, s/to/of/

15:37 <rcurtin[m]> it would be possible to do that, but I would actually suggest implementing `join_cols()` and `join_rows()` as requiring the same types. I also think you can do it with cudaMemcpy() and related functions, instead of needing to write a custom kernel (at least for `join_cols()`, maybe `join_rows()` is more complicated)

15:38 <rcurtin[m]> the reason I suggest requiring the same types (at least for the kernels and low-level implementation) is that there is some overhead associated with every kernel we add, so ideally we want to keep the number of kernels low, or as low as we can (and even then there will be many, there is not much way around that)

15:38 <rcurtin[m]> so, the top-level `join_cols()` function can take different matrix types, but if they are different, you can use `conv_to<>` to convert the matrices to the right output type, and then join them

15:40 <rcurtin[m]> (it is true that my suggestion does not give the fastest possible code---but I'm trying to strike a balance between number of kernels and performance. tough to say if that is actually the right tradeoff in user applications... I am just making a somewhat educated guess)

15:41 <zoq[m]> I guess it depends on the usecase, you could add another build step, to remove all the unused kernels.

15:42 <rcurtin[m]> yeah. in fact, if we had a way to cache compiled kernels somewhere on the user's system, then the number of kernels doesn't really matter

15:43 <rcurtin[m]> but currently, every single time we start a program that includes bandicoot, it compiles all kernels at initialization. so, I think most of the test suite runtime is compiling those kernels at the start of runtime, and then the tests all run pretty quickly

15:43 Niketkumardheery has joined #mlpack

15:43 <Niketkumardheery> Hi Everyone ,I am Niket a computer science engineer , I have keen interest in data science and i was looking for contributing in Data science community and finally i got MlPack

15:45 <zoq[m]> Yeah, but that is one reason why I'm more on the side of getting the kernels as fast a possible, instead of keeping the number of kernels low; because I'm thinking about a long running process.

15:45 <rcurtin[m]> hmm, in that case, maybe it is better to write `join_rows()` as a two-way kernel, and I suppose `join_cols()` too. In the case where the types are the same, `join_cols()` may be best done as a `cudaMemcpy()` though

15:46 <zoq[m]> But in case of `join_` I think it makes sense to be clever?

15:46 <Niketkumardheery> * Hi Everyone ,I am Niket a computer science engineer , I have keen interest in data science and i was looking for contributing in Data science community and finally i got MlPack ...

15:46 <Niketkumardheery> Now looking forward to work with such enthusiastic peeps

15:46 <zoq[m]> Don't think we can get much performance out of it with an independent kernel?

15:47 <rcurtin[m]> I guess you are right, my intuition was that maybe `join_cols()` and `join_rows()` doesn't happen too much in inner loops, but on second thought I could be totally wrong

15:48 <zoq[m]> > <@niketkumardheeryan-62304a3d6da03739849255da:gitter.im> Hi Everyone ,I am Niket a computer science engineer , I have keen interest in data science and i was looking for contributing in Data science community and finally i got MlPack ...

15:48 <zoq[m]> Hello, in case you haven't seen it already and are searching for a getting started direction - https://www.mlpack.org/community.html should be helpful.

15:48 <zoq[m]> > Now looking forward to work with such enthusiastic peeps

15:48 <Niketkumardheery> * Hi Everyone ,I am Niket a computer science engineer , I have keen interest in data science and i was looking for contributing in Data science community and finally i got MlPack ...

15:48 <Niketkumardheery> Now looking forward to work with such enthusiastic peeps

15:48 <zoq[m]> zoq[m]: Also, in case you are here for GSoC - https://www.mlpack.org/gsoc.html

15:48 <Niketkumardheery> Now looking forward to work with such enthusiastic peeps

15:49 <zoq[m]> rcurtin[m]: That is what I like about it, I just do both and do a quick benchmark.

15:50 <rcurtin[m]> I wonder if maybe we should open an issue for some kind of local compiled kernel caching support for Bandicoot

15:50 <rcurtin[m]> maybe we can just write out any compiled kernels to the user's home directory or something, and look for those

15:50 <rcurtin[m]> but there is some complexity: if we find and load kernels, we have to make sure they correspond to the device that is being used, and that they are compatible with, e.g., the version of OpenCL or CUDA being used

15:51 <rcurtin[m]> definitely not impossible but tedious

15:51 <rcurtin[m]> if it worked, it would mean that there's a big long bandicoot compilation process the first time the user ever runs a bandicoot program (and printing to stderr to tell them what is going on is probably a good idea), but then after that we never need to do it again unless they change their setup

15:51 <GopiMTatiraju[m]> I guess this might add a lot of overhead and future maintenance work as well....

15:51 <zoq[m]> rcurtin[m]: Right, pytorch is doing the same thing I think.

15:52 <rcurtin[m]> yeah, you are right Gopi M Tatiraju but I feel a bit forced into it... the alternative is that the user has to wait for like 10-30 seconds (or is it longer?) every single time they start a bandicoot program

15:52 <rcurtin[m]> we don't have the benefit of PyTorch's development team size, unfortunately 😄😄

15:53 <rcurtin[m]> which actually can be a good thing: the PyTorch core team is up in the hundreds now I believe (and Facebook is aggressively hiring more and more), which causes the complexity and maintainability of the system to explode since team members' priorities are generally to get promoted and produce code, not slowly write maintainable, small-footprint code

15:54 <rcurtin[m]> I see this at my company: there is a lot of pressure to produce code fast, which results in lower-quality code that can't be maintained, and then later refactoring costs spiral into huge, unwieldy efforts that are borderline impossible

15:55 <rcurtin[m]> (I guess the ANN refactoring is a little like this but it's nowhere near as bad as the refactorings I have to do or want to do at work)

15:56 <GopiMTatiraju[m]> Yea I agree...

15:56 <GopiMTatiraju[m]> Once the team size increases its difficult to manage and everyone wants to contribute more so that they can get promoted...

15:56 <GopiMTatiraju[m]> And the code suffers due to this

15:56 <Niketkumardheery> Agree

15:57 <GopiMTatiraju[m]> So what I think now is that we are planning to add `join_` as a custom kernel?

15:57 <GopiMTatiraju[m]> Or should we first try to write out some compiled kernels and see how it goes?

15:57 <rcurtin[m]> it might be easiest just to start with an implementation of `join_cols()` using `cudaMemcpy()` (and the equivalent for OpenCL) for the same types, and then write a custom two-way kernel for `join_cols()` for two objects of different types

15:58 <rcurtin[m]> then as zoq pointed out it would be easy to do some quick benchmark comparisons

15:59 <GopiMTatiraju[m]> Okay, I will get on it then...

15:59 <GopiMTatiraju[m]> Using `cudaMemcpy` is a bit straight forward I guess...

15:59 <GopiMTatiraju[m]> Just have to copy the right elements and some checks I guess?

16:00 <rcurtin[m]> yeah, plus setting up the scaffolding such that `coot::join_cols()` calls into the right backend, etc.; you can take a look at the other functions like `dot()` to see how that works

16:01 <GopiMTatiraju[m]> Yea, I will refer the existing code for that...

16:02 <GopiMTatiraju[m]> we will have a meeting this friday, right?

16:02 <GopiMTatiraju[m]> I hope I can finish this before that so that we can discuss more on this...

19:04 <TarekNasser[m]> Should I use Xeus Cling and Jupyterlab to run cpp notebooks?

19:04 <TarekNasser[m]> on Ubuntu

19:16 <GopiMTatiraju[m]> zoq: to compile tests I build I first build `clBlas` and then `clBLAST`

19:17 <GopiMTatiraju[m]> But I am getting this error... (full message at https://libera.ems.host/_matrix/media/r0/download/libera.chat/eae1c12515846acdc62cff4a13bf1e99d994f8ce)

19:17 <GopiMTatiraju[m]> s///

19:20 <GopiMTatiraju[m]> I think I this is just a linking issue?

19:20 <GopiMTatiraju[m]> Cause I have these files present in `/usr/local/cuda/lib64`

19:22 <GopiMTatiraju[m]> I tried this command `BACKEND=CUDA_BACKEND make -I/usr/local/cuda/lib64`

21:53 <zoq[m]> <TarekNasser[m]> "Should I use Xeus Cling and..." <- The easiest would be to use https://github.com/mlpack/examples/blob/master/scripts/jupyter-conda-setup.sh

21:54 <zoq[m]> zoq[m]: Which installs xeus cling, jupyter notebook and the c++ kernel in a conda env.

22:16 <TarekNasser[m]> Ok thank you I got it to work

22:16 <TarekNasser[m]> I plan to do a notebook to demonstrate how well would several machine learning algorithms do on a simple dataset. Am I going in the right direction or should I do something else?

22:19 <zoq[m]> > <@tareknaser:matrix.org> Ok thank you I got it to work

22:19 <zoq[m]> > I plan to do a notebook to demonstrate how well would several machine learning algorithms do on a simple dataset. Am I going in the right direction or should I do something else?

22:19 <zoq[m]> >

22:19 <zoq[m]> Sounds good, keep in mind, that not every mlpack method can do classification or regression.

22:27 <TarekNasser[m]> I will read the documentation for the functions I want to use and ask the community if I get stuck

22:29 <shrit[m]> zoqrcurtin : This project is no longer valid right? Improvisation and Implementation of ANN Modules

22:30 <shrit[m]> Frankly after the refactoring we are doing, I do think we can remove this one

22:31 <zoq[m]> shrit[m]: Depends, I think we still want to add new features and continue to refactor some layers, but more focused.

22:31 <zoq[m]> zoq[m]: But I guess, we should rephrase it.

22:32 <shrit[m]> yeah of course, we need to refactor it