<rcurtin[m]>
it'll be next Monday through Wednesday, a handful of short sessions... some of the talks look interesting; I hope to attend all four
<jonpsy[m]>
zoq: hey joining?
<jonpsy[m]>
say4n: hi, woul you b joining?
<jonpsy[m]>
our gsoc meet that is.
<shrit[m]>
I will try to join when I am free, Also I would like to know their program so we can pick the sessions
<jonpsy[m]>
<rcurtin[m]> "hey everyone, if you are..." <- link seems to be broken.
<rcurtin[m]>
what, I can't share it? hang on
<rcurtin[m]>
hmm, well I need to wait for a response from NumFOCUS to see if they have a link that I can share
<rcurtin[m]>
I guess if not I'll just screenshot the post
texasmusicinstru has quit [Remote host closed the connection]
texasmusicinstru has joined #mlpack
<zoq[m]1>
<jonpsy[m]> "our gsoc meet that is." <- No, I thought you canceled the meeting, since you have course work to do?
<jonpsy[m]>
Oh ok. I thought we coul gloss over real quick. No problem, by half december we should be done.
<jonpsy[m]>
s/we/coursework/
<jonpsy[m]>
if youre free now we can hop on a quick call btw up to you
<zoq[m]1>
unfortunately, I can't right now
<jonpsy[m]>
no worries, mid december then :)
<zoq[m]1>
so next week is canceled as well?
<jonpsy[m]>
so i have some end projects to submit which are quite hectic. But I guess I can make it, but Id be interested about the journals we talked about before.
<zoq[m]1>
I haven't heard anything from James.
<jonpsy[m]>
same
<heisenbuugGopiMT>
Hey @jonpsy:matrix.org are you planning to apply as student for coming GSoC?
<jonpsy[m]>
probably mentor, or on a different org. doesn't make sense to me to apply in same org
<jonpsy[m]>
* same org as student
<heisenbuugGopiMT>
Yea, agreed...
<heisenbuugGopiMT>
But I feel like maybe mentoring is nice, then we can also keep contributing as well.
<heisenbuugGopiMT>
This year it's open for all, not only students, so even I am thinking what to do.
<rcurtin[m]>
jonpsy: it's ok to reapply as a student again; we've had a handful of students in the past who built on their first year project in a second (or subsequent) year :)
<rcurtin[m]>
hm, I guess the word isn't "student" anymore, but I forget what they said the preferred nomenclature was
<rcurtin[m]>
"contributor" maybe it was? I can't remember 😄
<heisenbuugGopiMT>
Yea, also we will have more contributors this year since we have two types of projects?
<rcurtin[m]>
maybe? we'll see! I could see it either way
<rcurtin[m]>
I think COVID had a big effect on applications last year (there were a lot fewer), but maybe this year we'll see higher numbers; not sure
<jonpsy[m]>
rcurtin[m]: although true, my perspective is that it equates to "stealing" someone's spot who's probably newer to me. They woul benefit more. Besies, two times gsoc stuent at an org oesnt look goo on resume, but mayb my min will change
<heisenbuugGopiMT>
@ryan:ratml.org what should we work on if we are planning to apply as mentor?
texasmusicinstru has quit [Remote host closed the connection]
texasmusicinstru has joined #mlpack
<rcurtin[m]>
jonpsy: I can understand your perspective on that :) I'm not sure 2x GSoC looks bad on a resume though---of course I am not spending all day long looking at resumes, but two summers at the same organization demonstrates commitment to me. after all, it shows the mentors didn't have a bad experience, since they were willing to work with you again :)
<rcurtin[m]>
heisenbuug (Gopi M Tatiraju): mentoring is a lot of code review and understanding algorithms, etc., so maybe it's good to spend some time doing that? I dunno how helpful that advice is... at least when I am mentoring, I rarely get to write much code myself and instead spend many hours poring over PRs trying to figure out if they have any bugs or how to break them, as well as helping people past compiler issues, etc.
<heisenbuugGopiMT>
I also want to apply as a student again and this time I want to work on algorithm implementation...
<heisenbuugGopiMT>
I actually wanted to work on RL...
<rcurtin[m]>
ha, that probably means we need to get our ann-vtable branch done soon 😄
<rcurtin[m]>
I've been working on the convolutional layer recently and I am almost sure that it works equivalently to PyTorch's
<rcurtin[m]>
I also noticed... if I train our `mnist_cnn` example for one epoch on the CPU... it takes something like 70-75s on my machine. if I do the same with PyTorch on the CPU... it takes 250+ seconds
<rcurtin[m]>
really cool "accidental" benchmarking result :)
<heisenbuugGopiMT>
Wow, thats like 1/3rd time...
<rcurtin[m]>
yeah! I was surprised and really excited to see that. I don't know if mlpack is faster just for a forward pass (inference), but if it is, this is a really nice benchmarking result for deploying mlpack to low-resource devices
<rcurtin[m]>
but first I have to verify that everything is correct in the implementation :)
<heisenbuugGopiMT>
Yea, and if it works out, I think we need put out a really good blog post to show-off our speed...
<jonpsy[m]>
rcurtin[m]: what about GPU btw
<jonpsy[m]>
woul be nice if we coul beat them at that as well
<rcurtin[m]>
I don't think we have a prototype working with bandicoot yet
<jonpsy[m]>
Pytorch GPU**
<jonpsy[m]>
our cpu vs their gpu impl
<rcurtin[m]>
oh, ok, yeah, I should check that but I really don't think we're going to win that one :)
<jonpsy[m]>
probably not, but the magin woul be interesting :)
<rcurtin[m]>
agreed---I'll see if I can get it working
<jonpsy[m]>
just to pat ourselves on the back
<jonpsy[m]>
how's the boost with banicoot btw?
<jonpsy[m]>
* banicoot btw? on normal functions
<rcurtin[m]>
I was having some trouble when I was playing with PyTorch making the model run correctly on the GPU, but I didn't try to debug very hard, I just said "ok let's just run it on the CPU then..."
<rcurtin[m]>
I did some simulations on the logistic regression objective function... let me find them
<rcurtin[m]>
you can see that to get speedup, the problem needs to be high-dimensional (for logistic regression that allows a lot more parallelization on the GPU)
<rcurtin[m]>
roughly 10x speedup over CPU when the data has 10k dimensions, with a batch size of 1024
<rcurtin[m]>
it also outperforms PyTorch and TF for the same problem :)
<heisenbuugGopiMT>
I am still working on writing that NN with ensmallen and armadillo, I hope I can get that done by the end of this week or maybe mid next week.
<jonpsy[m]>
rcurtin[m]: VERY interesting
<rcurtin[m]>
yeah! it kills me that I don't have more time to look into it. zoq has been working on it too
<jonpsy[m]>
so it looks like our impl shines on large batches but arma is nice for small
<rcurtin[m]>
yeah, and this makes sense---for small batch sizes and low dimensionality, the GPU can't adequately parallelize the problem
<jonpsy[m]>
the parallelizing benefit is overweighe by the cpu => gpu memory pass
<jonpsy[m]>
that's key
<rcurtin[m]>
those numbers actually didn't count any time to move data from CPU to GPU (for Bandicoot) because bandicoot "starts" with the data in the right place already
<rcurtin[m]>
note that this is a design advantage (in some cases) over PyTorch and TensorFlow, which will typically send batches of data to the GPU during training
<rcurtin[m]>
with Bandicoot, we hold it all there already. this is of course a problem if your dataset is large, and we'll have to solve those issues later on, but basically PyTorch and TF are doing a huge amount of memory copying from CPU to GPU and back, and you can see the cost of that reflected in their runtime
<jonpsy[m]>
yes but for large scale information, is there any better alternative?
<jonpsy[m]>
other than what pytorch & tf are implementing
<rcurtin[m]>
I would say that there is. the solution I just showed with bandicoot is one extreme (always hold all data on the GPU) and PyTorch/TF is the other extreme (send a data batch every time when making a forward pass)
<rcurtin[m]>
I think the best solution here is adaptive---you want to store as much as you can on the GPU, given its limited memory resources. So in the best case, you can have all your training data and the model there. In the next best case, you ship "most" of the training data there all at once, then grab minibatches out of that
<rcurtin[m]>
and in the worst case, where you have basically no free memory on the GPU, you send one batch at a time, just like PyTorch and TensorFlow do now
<zoq[m]1>
pytorch allows you to pin your data to memory as well.
<jonpsy[m]>
rcurtin[m]: hm but for multicluster computer. this could get weird
<zoq[m]1>
GPU memory.
<rcurtin[m]>
jonpsy: you're right, the multi-GPU and multi-system cases get real weird real fast :)
<rcurtin[m]>
zoq: yeah, I tried to do that, but I am not sure I was fully successful and I thought it was still passing data back and forth. maybe I did it wrong? for TF though I could not find an "easy" way to do it
<rcurtin[m]>
also this stuff changes so fast... my knowledge of what I just said is about a year out of date, and that could be WAY wrong by now 😄
<zoq[m]1>
I guess another solution is to not worry about it at all and rely on shared-memory.
<jonpsy[m]>
zoq[m]1: curious about this
<jonpsy[m]>
aren't there any cool research papers? pretty sure some smart people alreay thought of a way aroun this
<zoq[m]1>
This is what AMD is pushing with the latest architecture.
<jonpsy[m]>
so they share RAM?
<zoq[m]1>
I think they call it RDNA.
<zoq[m]1>
Also, we have a similar problem with armadillo as well, in almost all methods we just hope we can allocate enough memory, but I don't think we ever check this.
<zoq[m]1>
I guess the only method that does some check is streaming decision tree, not sure that is true.
<zoq[m]1>
But that also means once you are on a resource constrained device, you probably end up in a bunch of strange issues.
<heisenbuugGopiMT>
So we need to implement a method which will keep track of the available memory all the time?
<zoq[m]1>
I'm not sure what happens if you request a matrix that doesn't fit into memory.
<zoq[m]1>
Does armadillo raise an exception?
<rcurtin[m]>
you'll get a `std::bad_alloc`
<rcurtin[m]>
but we don't have any handling for situations like this in mlpack---in part because many of the algorithms we implement simply don't have a clean way to handle if the data does not fit into memory
<rcurtin[m]>
for instance, how do you take the eigendecomposition of a matrix that doesn't fit in memory? it can be done! but the techniques are very non-trivial and not implemented by Armadillo directly
texasmusicinstru has quit [Remote host closed the connection]
<zoq[m]1>
I guess if you just cleanly fail it's good enough for the user.
<rcurtin[m]>
yeah, maybe it might be nice to issue a "cleaner" error message, but getting a `std::bad_alloc` should give a user what they need to figure out that they ran out of RAM
_slack_mlpack_25 has quit [Ping timeout: 265 seconds]
jonpsy[m] has quit [Ping timeout: 265 seconds]
abernauer[m] has quit [Ping timeout: 265 seconds]
FranchisNSaikia[ has quit [Ping timeout: 265 seconds]
ArunavShandeelya has quit [Ping timeout: 265 seconds]
_slack_mlpack_25 has joined #mlpack
SoumyadipSarkar[ has quit [Ping timeout: 265 seconds]
_slack_mlpack_U0 has quit [Ping timeout: 265 seconds]
Kaushalc64[m] has quit [Ping timeout: 265 seconds]
ShahAnwaarKhalid has quit [Ping timeout: 265 seconds]
ChaithanyaNaik[m has quit [Ping timeout: 265 seconds]
_slack_mlpack_U0 has joined #mlpack
texasmusicinstru has joined #mlpack
<zoq[m]1>
Agreed, my point is that it's something you usually don't worry about, but on a GPU or resources constrained device this becomes something that you might run into.
<shrit[m]>
Just to clarify, so pytorch and tensorflow they never store dataset in the GPU memeory?
<zoq[m]1>
Not by default, I guess the most common case is that the dataset and the model doesn't fit in the memory.
<rcurtin[m]>
I am not sure about TF, but with PyTorch I think you can put the whole dataset on the GPU, but like zoq pointed out most use cases of the two frameworks seem to only send a batch at a time to the GPU
<shrit[m]>
So they even transfer the network wight?
<zoq[m]1>
shrit[m]: once
<shrit[m]>
because usually network weight has much smaller size than the dataset
<zoq[m]1>
We have to do the same thing, in the example rcurtin provide the data was artificial and generated in memory.
<zoq[m]1>
rcurtin: correct me if I'm wrong.
<zoq[m]1>
shrit[m]: Correct, but depending on the model you can fill up the memory pretty fast as well.
<shrit[m]>
how much GPU has a memory these days? I think most of them are more than 2 GB? right?
<shrit[m]>
especially those used for the training
<shrit[m]>
I suppose part of it will be used for the graphical interface
<zoq[m]1>
Ohh, yeah, but I use a NLP model that fills up my 24GB.
<shrit[m]>
is there a limitation on GPU memory occupation for the training?
<shrit[m]>
24 GB a model?
<shrit[m]>
Gosh,
ArunavShandeelya has joined #mlpack
FranchisNSaikia[ has joined #mlpack
abernauer[m] has joined #mlpack
<shrit[m]>
I have no idea about NLP these days, but how it can be possible that the model is 24GB
<shrit[m]>
do you mean the dataset? I think you mean the model
<shrit[m]>
I wondering the dataset sized used to train a 24 GB model
<shrit[m]>
s/sized/size\/
<shrit[m]>
s/sized/size/
jonpsy[m] has joined #mlpack
_slack_mlpack_27 has joined #mlpack
SoumyadipSarkar[ has joined #mlpack
ChaithanyaNaik[m has joined #mlpack
Kaushalc64[m] has joined #mlpack
ShahAnwaarKhalid has joined #mlpack
<zoq[m]1>
The largest GPT-3 model has 175.0B parameters.
<zoq[m]1>
So you can do the math.
<zoq[m]1>
I'm not using such a large model.
texasmusicinstru has quit [Remote host closed the connection]
texasmusicinstru has joined #mlpack
<heisenbuugGopiMT>
While installing arch on pi, i tried this command `bsdtar -xpf ArchLinuxARM-rpi-latest.tar.gz -C root`
<heisenbuugGopiMT>
but getting this error `bsdtar: Error exit delayed from previous errors.`
<heisenbuugGopiMT>
sudo: bsdtar: command not found
<heisenbuugGopiMT>
trying with sudo is not working
<heisenbuugGopiMT>
any idea?
<rcurtin[m]>
Probably `bsdtar` is not on the `$PATH` for root?
<heisenbuugGopiMT>
how can i set that?
<heisenbuugGopiMT>
i opened `/etc/environment`
<rcurtin[m]>
I don't know enough about your setup and system to effectively advise you here... probably best might be to see if you can find any information specific to your distribution on stack overflow or similar?
<heisenbuugGopiMT>
yea, will do that...
<heisenbuugGopiMT>
can't i use tar instead of bsdtar?
<rcurtin[m]>
I would imagine so? give it a shot and see what happens :)