#mlpack on 2021-11-29 — irc logs at libera.irclog.whitequark.org

2021-07-27 15:44 rcurtin_irc changed the topic of #mlpack to: mlpack: a scalable machine learning library (https://www.mlpack.org/) -- channel logs: https://libera.irclog.whitequark.org/mlpack -- NOTE: messages sent here might not be seen by bridged users on matrix, gitter, or slack

00:51 texasmusicinstru has quit [Remote host closed the connection]

00:53 texasmusicinstru has joined #mlpack

02:31 texasmusicinstru has quit [Remote host closed the connection]

02:33 texasmusicinstru has joined #mlpack

05:02 texasmusicinstru has quit [Remote host closed the connection]

05:04 texasmusicinstru has joined #mlpack

05:51 texasmusicinstru has quit [Remote host closed the connection]

05:53 texasmusicinstru has joined #mlpack

06:06 Shadow3049[m] has quit [*.net *.split]

06:06 _slack_mlpack_25 has quit [*.net *.split]

06:06 PranshuSrivastav has quit [*.net *.split]

06:06 meetrainierManoj has quit [*.net *.split]

06:06 ShivamNayak[m] has quit [*.net *.split]

06:06 SaiVamsi[m] has quit [*.net *.split]

06:06 GaborBakos[m] has quit [*.net *.split]

06:06 MohomedShalik[m] has quit [*.net *.split]

06:06 VedantaJha[m] has quit [*.net *.split]

06:06 SoumyadipSarkar[ has quit [*.net *.split]

06:06 AvikantSrivastav has quit [*.net *.split]

06:06 RishabhGoel[m] has quit [*.net *.split]

06:06 jonathanplatkiew has quit [*.net *.split]

06:06 ZanHuang[m] has quit [*.net *.split]

06:07 GaborBakos[m] has joined #mlpack

06:07 _slack_mlpack_25 has joined #mlpack

06:12 ShivamNayak[m] has joined #mlpack

06:12 PranshuSrivastav has joined #mlpack

06:12 MohomedShalik[m] has joined #mlpack

06:13 RishabhGoel[m] has joined #mlpack

06:13 ZanHuang[m] has joined #mlpack

06:13 SoumyadipSarkar[ has joined #mlpack

06:13 AvikantSrivastav has joined #mlpack

06:13 jonathanplatkiew has joined #mlpack

06:13 Shadow3049[m] has joined #mlpack

06:13 VedantaJha[m] has joined #mlpack

06:13 SaiVamsi[m] has joined #mlpack

06:13 meetrainierManoj has joined #mlpack

06:14 _slack_mlpack_24 has joined #mlpack

08:48 Guest62 has quit [Quit: Client closed]

09:11 texasmusicinstru has quit [Remote host closed the connection]

09:12 texasmusicinstru has joined #mlpack

10:02 texasmusicinstru has quit [Remote host closed the connection]

10:03 texasmusicinstru has joined #mlpack

11:42 texasmusicinstru has quit [Remote host closed the connection]

11:44 texasmusicinstru has joined #mlpack

12:31 texasmusicinstru has quit [Remote host closed the connection]

12:33 texasmusicinstru has joined #mlpack

15:51 texasmusicinstru has quit [Remote host closed the connection]

15:53 texasmusicinstru has joined #mlpack

16:42 texasmusicinstru has quit [Remote host closed the connection]

16:44 texasmusicinstru has joined #mlpack

16:53 <rcurtin[m]> hey everyone, if you are interested in attending the NumFOCUS summit, it's open to "all project maintainers" (e.g. all of us!)---more info here: https://groups.google.com/a/numfocus.org/g/projects/c/jahXmxMLKO8/m/p5R6_4chBgAJ

16:53 <rcurtin[m]> it'll be next Monday through Wednesday, a handful of short sessions... some of the talks look interesting; I hope to attend all four

17:00 <jonpsy[m]> zoq: hey joining?

17:00 <jonpsy[m]> say4n: hi, woul you b joining?

17:00 <jonpsy[m]> our gsoc meet that is.

17:08 <shrit[m]> I will try to join when I am free, Also I would like to know their program so we can pick the sessions

17:12 <jonpsy[m]> <rcurtin[m]> "hey everyone, if you are..." <- link seems to be broken.

17:16 <rcurtin[m]> what, I can't share it? hang on

18:00 <rcurtin[m]> hmm, well I need to wait for a response from NumFOCUS to see if they have a link that I can share

18:00 <rcurtin[m]> I guess if not I'll just screenshot the post

18:21 texasmusicinstru has quit [Remote host closed the connection]

18:24 texasmusicinstru has joined #mlpack

18:27 <zoq[m]1> <jonpsy[m]> "our gsoc meet that is." <- No, I thought you canceled the meeting, since you have course work to do?

18:31 <jonpsy[m]> Oh ok. I thought we coul gloss over real quick. No problem, by half december we should be done.

18:31 <jonpsy[m]> s/we/coursework/

18:32 <jonpsy[m]> if youre free now we can hop on a quick call btw up to you

18:33 <zoq[m]1> unfortunately, I can't right now

18:34 <jonpsy[m]> no worries, mid december then :)

18:35 <zoq[m]1> so next week is canceled as well?

18:41 <jonpsy[m]> so i have some end projects to submit which are quite hectic. But I guess I can make it, but Id be interested about the journals we talked about before.

18:41 <zoq[m]1> I haven't heard anything from James.

18:41 <jonpsy[m]> same

18:52 <heisenbuugGopiMT> Hey @jonpsy:matrix.org are you planning to apply as student for coming GSoC?

18:54 <jonpsy[m]> probably mentor, or on a different org. doesn't make sense to me to apply in same org

18:55 <jonpsy[m]> * same org as student

18:56 <heisenbuugGopiMT> Yea, agreed...

18:56 <heisenbuugGopiMT> But I feel like maybe mentoring is nice, then we can also keep contributing as well.

18:56 <heisenbuugGopiMT> This year it's open for all, not only students, so even I am thinking what to do.

18:59 <rcurtin[m]> jonpsy: it's ok to reapply as a student again; we've had a handful of students in the past who built on their first year project in a second (or subsequent) year :)

18:59 <rcurtin[m]> hm, I guess the word isn't "student" anymore, but I forget what they said the preferred nomenclature was

19:00 <rcurtin[m]> "contributor" maybe it was? I can't remember 😄

19:00 <heisenbuugGopiMT> Yea, also we will have more contributors this year since we have two types of projects?

19:00 <rcurtin[m]> maybe? we'll see! I could see it either way

19:01 <rcurtin[m]> I think COVID had a big effect on applications last year (there were a lot fewer), but maybe this year we'll see higher numbers; not sure

19:02 <jonpsy[m]> rcurtin[m]: although true, my perspective is that it equates to "stealing" someone's spot who's probably newer to me. They woul benefit more. Besies, two times gsoc stuent at an org oesnt look goo on resume, but mayb my min will change

19:03 <heisenbuugGopiMT> @ryan:ratml.org what should we work on if we are planning to apply as mentor?

19:11 texasmusicinstru has quit [Remote host closed the connection]

19:14 texasmusicinstru has joined #mlpack

19:20 <rcurtin[m]> jonpsy: I can understand your perspective on that :) I'm not sure 2x GSoC looks bad on a resume though---of course I am not spending all day long looking at resumes, but two summers at the same organization demonstrates commitment to me. after all, it shows the mentors didn't have a bad experience, since they were willing to work with you again :)

19:21 <rcurtin[m]> heisenbuug (Gopi M Tatiraju): mentoring is a lot of code review and understanding algorithms, etc., so maybe it's good to spend some time doing that? I dunno how helpful that advice is... at least when I am mentoring, I rarely get to write much code myself and instead spend many hours poring over PRs trying to figure out if they have any bugs or how to break them, as well as helping people past compiler issues, etc.

19:23 <heisenbuugGopiMT> I also want to apply as a student again and this time I want to work on algorithm implementation...

19:23 <heisenbuugGopiMT> I actually wanted to work on RL...

19:24 <rcurtin[m]> ha, that probably means we need to get our ann-vtable branch done soon 😄

19:25 <rcurtin[m]> I've been working on the convolutional layer recently and I am almost sure that it works equivalently to PyTorch's

19:25 <rcurtin[m]> I also noticed... if I train our `mnist_cnn` example for one epoch on the CPU... it takes something like 70-75s on my machine. if I do the same with PyTorch on the CPU... it takes 250+ seconds

19:26 <rcurtin[m]> really cool "accidental" benchmarking result :)

19:26 <heisenbuugGopiMT> Wow, thats like 1/3rd time...

19:27 <rcurtin[m]> yeah! I was surprised and really excited to see that. I don't know if mlpack is faster just for a forward pass (inference), but if it is, this is a really nice benchmarking result for deploying mlpack to low-resource devices

19:27 <rcurtin[m]> but first I have to verify that everything is correct in the implementation :)

19:28 <heisenbuugGopiMT> Yea, and if it works out, I think we need put out a really good blog post to show-off our speed...

19:28 <jonpsy[m]> rcurtin[m]: what about GPU btw

19:28 <jonpsy[m]> woul be nice if we coul beat them at that as well

19:28 <rcurtin[m]> I don't think we have a prototype working with bandicoot yet

19:29 <jonpsy[m]> Pytorch GPU**

19:29 <jonpsy[m]> our cpu vs their gpu impl

19:29 <rcurtin[m]> oh, ok, yeah, I should check that but I really don't think we're going to win that one :)

19:29 <jonpsy[m]> probably not, but the magin woul be interesting :)

19:30 <rcurtin[m]> agreed---I'll see if I can get it working

19:30 <jonpsy[m]> just to pat ourselves on the back

19:30 <jonpsy[m]> how's the boost with banicoot btw?

19:30 <jonpsy[m]> * banicoot btw? on normal functions

19:30 <rcurtin[m]> I was having some trouble when I was playing with PyTorch making the model run correctly on the GPU, but I didn't try to debug very hard, I just said "ok let's just run it on the CPU then..."

19:30 <rcurtin[m]> I did some simulations on the logistic regression objective function... let me find them

19:31 <rcurtin[m]> https://gitlab.com/conradsnicta/bandicoot-code/-/issues/9

19:32 <rcurtin[m]> you can see that to get speedup, the problem needs to be high-dimensional (for logistic regression that allows a lot more parallelization on the GPU)

19:32 <rcurtin[m]> roughly 10x speedup over CPU when the data has 10k dimensions, with a batch size of 1024

19:32 <rcurtin[m]> it also outperforms PyTorch and TF for the same problem :)

19:33 <heisenbuugGopiMT> I am still working on writing that NN with ensmallen and armadillo, I hope I can get that done by the end of this week or maybe mid next week.

19:35 <jonpsy[m]> rcurtin[m]: VERY interesting

19:35 <rcurtin[m]> yeah! it kills me that I don't have more time to look into it. zoq has been working on it too

19:36 <jonpsy[m]> so it looks like our impl shines on large batches but arma is nice for small

19:36 <rcurtin[m]> yeah, and this makes sense---for small batch sizes and low dimensionality, the GPU can't adequately parallelize the problem

19:37 <jonpsy[m]> the parallelizing benefit is overweighe by the cpu => gpu memory pass

19:37 <jonpsy[m]> that's key

19:38 <rcurtin[m]> those numbers actually didn't count any time to move data from CPU to GPU (for Bandicoot) because bandicoot "starts" with the data in the right place already

19:39 <rcurtin[m]> note that this is a design advantage (in some cases) over PyTorch and TensorFlow, which will typically send batches of data to the GPU during training

19:39 <rcurtin[m]> with Bandicoot, we hold it all there already. this is of course a problem if your dataset is large, and we'll have to solve those issues later on, but basically PyTorch and TF are doing a huge amount of memory copying from CPU to GPU and back, and you can see the cost of that reflected in their runtime

19:40 <jonpsy[m]> yes but for large scale information, is there any better alternative?

19:40 <jonpsy[m]> other than what pytorch & tf are implementing

19:41 <rcurtin[m]> I would say that there is. the solution I just showed with bandicoot is one extreme (always hold all data on the GPU) and PyTorch/TF is the other extreme (send a data batch every time when making a forward pass)

19:43 <rcurtin[m]> I think the best solution here is adaptive---you want to store as much as you can on the GPU, given its limited memory resources. So in the best case, you can have all your training data and the model there. In the next best case, you ship "most" of the training data there all at once, then grab minibatches out of that

19:43 <rcurtin[m]> and in the worst case, where you have basically no free memory on the GPU, you send one batch at a time, just like PyTorch and TensorFlow do now

19:44 <zoq[m]1> pytorch allows you to pin your data to memory as well.

19:44 <jonpsy[m]> rcurtin[m]: hm but for multicluster computer. this could get weird

19:44 <zoq[m]1> GPU memory.

19:44 <rcurtin[m]> jonpsy: you're right, the multi-GPU and multi-system cases get real weird real fast :)

19:45 <rcurtin[m]> zoq: yeah, I tried to do that, but I am not sure I was fully successful and I thought it was still passing data back and forth. maybe I did it wrong? for TF though I could not find an "easy" way to do it

19:45 <rcurtin[m]> also this stuff changes so fast... my knowledge of what I just said is about a year out of date, and that could be WAY wrong by now 😄

19:45 <zoq[m]1> I guess another solution is to not worry about it at all and rely on shared-memory.

19:46 <jonpsy[m]> zoq[m]1: curious about this

19:46 <jonpsy[m]> aren't there any cool research papers? pretty sure some smart people alreay thought of a way aroun this

19:46 <zoq[m]1> This is what AMD is pushing with the latest architecture.

19:47 <jonpsy[m]> so they share RAM?

19:48 <zoq[m]1> I think they call it RDNA.

19:56 <zoq[m]1> Also, we have a similar problem with armadillo as well, in almost all methods we just hope we can allocate enough memory, but I don't think we ever check this.

19:57 <zoq[m]1> I guess the only method that does some check is streaming decision tree, not sure that is true.

19:57 <zoq[m]1> But that also means once you are on a resource constrained device, you probably end up in a bunch of strange issues.

19:58 <heisenbuugGopiMT> So we need to implement a method which will keep track of the available memory all the time?

20:00 <zoq[m]1> I'm not sure what happens if you request a matrix that doesn't fit into memory.

20:00 <zoq[m]1> Does armadillo raise an exception?

20:00 <rcurtin[m]> you'll get a `std::bad_alloc`

20:01 <rcurtin[m]> but we don't have any handling for situations like this in mlpack---in part because many of the algorithms we implement simply don't have a clean way to handle if the data does not fit into memory

20:01 <rcurtin[m]> for instance, how do you take the eigendecomposition of a matrix that doesn't fit in memory? it can be done! but the techniques are very non-trivial and not implemented by Armadillo directly

20:01 texasmusicinstru has quit [Remote host closed the connection]

20:02 <zoq[m]1> I guess if you just cleanly fail it's good enough for the user.

20:02 <rcurtin[m]> yeah, maybe it might be nice to issue a "cleaner" error message, but getting a `std::bad_alloc` should give a user what they need to figure out that they ran out of RAM

20:02 _slack_mlpack_25 has quit [Ping timeout: 265 seconds]

20:02 jonpsy[m] has quit [Ping timeout: 265 seconds]

20:02 abernauer[m] has quit [Ping timeout: 265 seconds]

20:02 FranchisNSaikia[ has quit [Ping timeout: 265 seconds]

20:02 ArunavShandeelya has quit [Ping timeout: 265 seconds]

20:03 _slack_mlpack_25 has joined #mlpack

20:03 SoumyadipSarkar[ has quit [Ping timeout: 265 seconds]

20:03 _slack_mlpack_U0 has quit [Ping timeout: 265 seconds]

20:03 Kaushalc64[m] has quit [Ping timeout: 265 seconds]

20:03 ShahAnwaarKhalid has quit [Ping timeout: 265 seconds]

20:03 ChaithanyaNaik[m has quit [Ping timeout: 265 seconds]

20:04 _slack_mlpack_U0 has joined #mlpack

20:04 texasmusicinstru has joined #mlpack

20:04 <zoq[m]1> Agreed, my point is that it's something you usually don't worry about, but on a GPU or resources constrained device this becomes something that you might run into.

20:06 <shrit[m]> Just to clarify, so pytorch and tensorflow they never store dataset in the GPU memeory?

20:08 <zoq[m]1> Not by default, I guess the most common case is that the dataset and the model doesn't fit in the memory.

20:08 <rcurtin[m]> I am not sure about TF, but with PyTorch I think you can put the whole dataset on the GPU, but like zoq pointed out most use cases of the two frameworks seem to only send a batch at a time to the GPU

20:08 <shrit[m]> So they even transfer the network wight?

20:09 <zoq[m]1> shrit[m]: once

20:10 <shrit[m]> because usually network weight has much smaller size than the dataset

20:10 <zoq[m]1> We have to do the same thing, in the example rcurtin provide the data was artificial and generated in memory.

20:10 <zoq[m]1> rcurtin: correct me if I'm wrong.

20:11 <zoq[m]1> shrit[m]: Correct, but depending on the model you can fill up the memory pretty fast as well.

20:12 <shrit[m]> how much GPU has a memory these days? I think most of them are more than 2 GB? right?

20:12 <shrit[m]> especially those used for the training

20:12 <shrit[m]> I suppose part of it will be used for the graphical interface

20:13 <zoq[m]1> Ohh, yeah, but I use a NLP model that fills up my 24GB.

20:13 <shrit[m]> is there a limitation on GPU memory occupation for the training?

20:13 <shrit[m]> 24 GB a model?

20:13 <shrit[m]> Gosh,

20:14 ArunavShandeelya has joined #mlpack

20:14 FranchisNSaikia[ has joined #mlpack

20:14 abernauer[m] has joined #mlpack

20:15 <shrit[m]> I have no idea about NLP these days, but how it can be possible that the model is 24GB

20:15 <shrit[m]> do you mean the dataset? I think you mean the model

20:15 <shrit[m]> I wondering the dataset sized used to train a 24 GB model

20:15 <shrit[m]> s/sized/size\/

20:15 <shrit[m]> s/sized/size/

20:16 jonpsy[m] has joined #mlpack

20:18 _slack_mlpack_27 has joined #mlpack

20:18 SoumyadipSarkar[ has joined #mlpack

20:21 ChaithanyaNaik[m has joined #mlpack

20:21 Kaushalc64[m] has joined #mlpack

20:21 ShahAnwaarKhalid has joined #mlpack

20:23 <zoq[m]1> The largest GPT-3 model has 175.0B parameters.

20:23 <zoq[m]1> So you can do the math.

20:24 <zoq[m]1> I'm not using such a large model.

20:51 texasmusicinstru has quit [Remote host closed the connection]

20:53 texasmusicinstru has joined #mlpack

21:22 <heisenbuugGopiMT> While installing arch on pi, i tried this command `bsdtar -xpf ArchLinuxARM-rpi-latest.tar.gz -C root`

21:22 <heisenbuugGopiMT> but getting this error `bsdtar: Error exit delayed from previous errors.`

21:23 <heisenbuugGopiMT> following this `https://archlinuxarm.org/platforms/armv6/raspberry-pi`

21:27 <rcurtin[m]> what was the first error? that message means there is at least one error up higher

21:28 <heisenbuugGopiMT> `Can't unlink already-existing object`

21:28 <heisenbuugGopiMT> should i delete everything in root folder and try again?

21:28 <rcurtin[m]> probably? to me that error suggests that maybe a file already exists

21:29 <heisenbuugGopiMT> now it says `Failed to create dir 'var'`

21:30 <heisenbuugGopiMT> sudo bsdtar -xpf ArchLinuxARM-rpi-latest.tar.gz -C root

21:30 <heisenbuugGopiMT> sudo: bsdtar: command not found

21:30 <heisenbuugGopiMT> trying with sudo is not working

21:34 <heisenbuugGopiMT> any idea?

21:35 <rcurtin[m]> Probably `bsdtar` is not on the `$PATH` for root?

21:35 <heisenbuugGopiMT> how can i set that?

21:39 <heisenbuugGopiMT> i opened `/etc/environment`

21:39 <rcurtin[m]> I don't know enough about your setup and system to effectively advise you here... probably best might be to see if you can find any information specific to your distribution on stack overflow or similar?

21:40 <heisenbuugGopiMT> yea, will do that...

21:41 <heisenbuugGopiMT> can't i use tar instead of bsdtar?

21:41 <rcurtin[m]> I would imagine so? give it a shot and see what happens :)

21:41 <heisenbuugGopiMT> tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.security.capability'

21:41 texasmusicinstru has quit [Remote host closed the connection]

21:43 texasmusicinstru has joined #mlpack

21:48 <heisenbuugGopiMT> This is just a warning

21:48 <heisenbuugGopiMT> I will insert the sd card in pi and check if it worked.

21:48 <heisenbuugGopiMT> thank you for the help

21:48 <rcurtin[m]> my help is only guesses based on the output, so it may be more helpful to investigate on your own than depend on my wild guesswork :)

21:50 <heisenbuugGopiMT> Yup if this doesn't work then I guess I have to start from the start again

21:54 <rcurtin[m]> trial and error is the best way to learn 😄

21:56 <heisenbuugGopiMT> Agreed, trying to ssh let's see what happens