mattip has joined #hpy
GianlucaRizzo has joined #hpy
GianlucaRizzo has quit [Ping timeout: 272 seconds]
<fijal> antocuni: pinning so many objects would be *very* problematic
<fijal> I think we are running into very similar issues with various buffers etc.
<fijal> what you want instead is to have a free list, maybe
<fijal> ... but maybe that's a sweet spot for malloc anyway?
<antocuni> yes, we need to tweak our GC to work better in presence of more pinning
<antocuni> IIRC armin at some point convinced me that it was possible, but I don't remember his ideas
<antocuni> another easy (maybe temporary) solution would be to allocate W_HPyObjects directly outside the nursery, so that we no longer need pinning
<fijal> I don't think you can
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #hpy
<fijal> why can't we use malloc for this? it seems you are really winning here by having allocate/release in a tight loop (hits the sweet spot of malloc)
<fijal> so you allocate the normal part in the nursery (with normal things) and the extra part outside, with malloc/free pair
<antocuni> that's how it works now
<fijal> ok, then why is it slow?
<fijal> do you do free() when calling Release or do you do free from __del__?
<antocuni> because of malloc(), IIRC
<fijal> if you do malloc/free/malloc/free/malloc/free it's *really* fast
<antocuni> we call free() inside the __del__, which is a light finalizer
<fijal> ok, but that's why it's slow
<fijal> you need to call free() from Release
<antocuni> what is Release?
<fijal> HPy_release?
<fijal> how do you get rid of a handle?
<antocuni> HPy_Close
<fijal> yes, that one
<antocuni> but this closes the handle, not the object
<antocuni> the lifetime of an object is determined by the GC
<fijal> ok, but thta's where the problem lays
<antocuni> well yes, it's because we don't have reference counting
<fijal> you have a long set of malloc() before collect and then a long set of free() when you collect
<fijal> that's pretty bad, don't do that
<antocuni> which is the whole point of the excercise :)
<fijal> can you not free the extra space when doing HPy_Close?
<antocuni> I think there is a misunderstanding here
<fijal> yeah?
<antocuni> the extra space is the space which contains the data of the object
<fijal> but you can't use it without a handle, right?
<antocuni> you cannot free() it until you are sure that the object is no longer reachable
<fijal> I've done this exercise before - there is no way (that I know of) to allocate small chunks of non-movable memory that's freed by nursery collect that does not completely trash your caches
<antocuni> you can reach it even without handles; e.g. if the object is stored inside a list
<antocuni> the object is still alive even if there are no longer HPy handles for it
<fijal> and it has to be nonmovable, even if there are no handles?
<antocuni> there might be one in the future
<antocuni> so basically, you need to pin the memory only if there are handles around
<antocuni> and this is something which you can know precisely
<fijal> pinning is really not supposed to work like that
<fijal> if you have more than a couple pinned objects that survive for a long time, your GC is kinda unusable
<antocuni> but they won't survive for long
<antocuni> the typical cycle is "you don't have any handle -> you create a handle to call a C function -> you call the C func --> you close the handle and unpin the memory"
<fijal> ok, that's better
<fijal> you are running into the issue of not having a refcount on pinning though here
<fijal> no?
<antocuni> yes
<antocuni> we need to add support for that
<antocuni> I am aware that there is a lot of work to do on this side, and probably it's not even the first priority
<fijal> I mean... I'm not so stoked to have pinning refcounting
<fijal> but I guess I'm not going to stop anyone :)
<fijal> there might be some hack, I think
<fijal> like have a refcount saying "0 1 many" or something along those lines
<antocuni> ok but how do you go from many to 1?
<fijal> you don't, you never unpin it until it dies
<fijal> (or more likely you don't pin it in the first place, you allocate it somewhere else)
<antocuni> when you arrive at "many" it's already too late, because there is already someone who is relying on this address
<fijal> is there a field you can change? or is the memory just directly behind
<antocuni> currently W_HPyObject has a field hpy_data which contains the malloc()ed memory. But the point is to "inline" hpy_data inside W_HPyObject
<antocuni> but also, the problem is on the C side
<fijal> yes?
<fijal> you can have some tricks for one-ref-only if you don't inline
<antocuni> as soon as you pass a handle to C, someone might call HPy_AsStruct and get a C pointer to this memory
<fijal> but that's only valid as long as the handle is, right?
<antocuni> so if the GC runs when this pointer is still around, we must ensure that the memory doesn't move
<antocuni> yes
<fijal> ok, I think you can do a trick like this (maybe)
<antocuni> the problem is when you call the GC indirectly from C
<fijal> you have the hpy_data that's inlined (with maybe some bit magic somewhere), where each dereference has to be a bit special, checking the mask or whatnot
<fijal> then when you get two handles at the same time, you make it a normal pointer and malloc memory in a normal way
<fijal> but as long as you don't, you pin the HPy_Object itself
<antocuni> what if I get the second handle when someone has already a C pointer to it?
<fijal> you can't have macros for accessing all of that in C?
<fijal> like, does HPy_AsStruct always return a C pointer and that's it? or can it be more special?
<antocuni> we discussed it. Using macros is really too awkward and ugly when writing C code, and moreover it's too easy to make mistakes
<antocuni> like, if you want to keep a nice C-like syntax, you need to write something like:
<antocuni> HPy_STRUCT(my_handle)->my_field
<fijal> yeah something like that
<fijal> isn't HPy not supposed to be used by hand anyway?
<antocuni> but then it's too tempting to store the result of HPy_STRUCT inside a local variable
<antocuni> HPy is supposed to be used also by hand, e.g. to write numpy
<fijal> either way, maybe another trick that you can use is something like shadows when taking an address
<fijal> you return the address into old generation, but that does not survive minor collection as a valid address
<fijal> you still are going to run into some issues, notably trashing your caches, possibly
<antocuni> but then it means that you need to use a macro every time you want to access a field
<fijal> yeah, I would write
<fijal> HPy_STRUCT_GET(my_handle, "field")
<fijal> HPy_STRUCT_SRT(my_handle, "field", value)
<fijal> s/SRT/SET/
<antocuni> yes, that's exactly what we wanted to avoid
<fijal> haha, ok
<fijal> well, I don't think there is a good strategy then
<antocuni> well, the good strategy is to improve PyPy's GC :)
<fijal> I don't think you can
<fijal> at least not in ways you described, you are running into the same issues
<fijal> what is a realistic chance of numpy using HPy?
<antocuni> very realistic, from what I understand
<antocuni> the numpy devs seem very positive towards hpy
<fijal> what about a C static checker that checks if you are not storing HPy_Struct somewhere?
<antocuni> uhm
<antocuni> maaaybe
<fijal> maybe there is a way to design a hackish macro that would crash if you try not to follow up it with ->
<fijal> ?
<antocuni> I don't think such a macro exists, because to be able to handle the -> the expression must be of type e.g. PointObject*, and if it's of type PointObject* you can always store it in a local
<antocuni> if we really want to go along that route, maybe it's better to use the debug mode; like, in debug mode every call to HPy_Struct moves the memory somewhere else, so if you store it in a local var you get crashes
<antocuni> but all of this look very obscure to me
<fijal> yes, I struggle to think about a non-obscure versions
<antocuni> you can see that all the methods start by doing ArrayObject *self = ArrayObject_AsStruct
<antocuni> and the code still look reasonable
<antocuni> if you start to add macros, the code quickly become unreadable, I fear
<fijal> right, C just sucks I'm afraid
<antocuni> yes
<antocuni> probably it would be possible to do some hack with C++
<antocuni> maybe another simpler alternative is to write our own malloc wrapper/replacement in such a way that it plays well with alloc/free patterns generated by HPy
<fijal> I'm not sure it would help much
<fijal> ok, how about the following
<fijal> you allocate one HPy_Handle, you malloc the data (and put it in hpy_data), you set the refcount to "1"
<fijal> if you HPy_Close that one, you free the data
<fijal> if you call another HPy_Handle, you set the refcount to "many" and then you free it in light finalizer (maybe you only register the finalizer now)
<fijal> if HPy_Close has the refcount to "many" you don't do anything
<antocuni> I don't think it's possible to register the light finalizer at runtime, since it's an RPython-level __del__
<antocuni> but it might be possible to do that with an applevel finalizer
<fijal> I think there is a major win not registering the finalizer when not needed
<fijal> I don't see a fundamental reason why not?
<fijal> there might be a technical one
<antocuni> this might actually work, apart the fact that you need to do a bit of additional work for every call of HPy_Dup and HPy_Close
<fijal> in C++ you can overload "operator->" so yes, if you can do C++ a lot of things are possible
<antocuni> I agree about the major win on not registering the finalizer
<fijal> antocuni: thank you for providing a cool puzzle :)
marvin_ has quit [Remote host closed the connection]
<antocuni> so basically, what you are proposing is: 1. kill W_HPyObject.__del__, and re-implement the functionality with self.register_finalizer
marvin has joined #hpy
<antocuni> 2. add a fast path which avoids the register_finalizer entirely if you happen to have only a single handle to the object
<antocuni> it might work :)
<fijal> yes, we need some hacks to have the 0-1-many refcount
<antocuni> we need to measure how often the fast path actually happens, though
<fijal> but most importantly 3. free() the data is you close the only finalizer
<fijal> that's the biggest win, I think
<fijal> er, one handle
<fijal> (you can extend the refcount to X, it does not have to be only one bit)
<antocuni> the 0-1-many refcount can be placed directly on W_HPyObject now, there is no longer a need to play with pinning
<antocuni> ah no wait
<antocuni> what happens in this case:
<antocuni> 1. I create an W_HPyObject (with refcnt==1) and malloc hpy_data
<antocuni> 2. I store it inside a python list
<antocuni> 2. I close the handle
<fijal> is hpy_data valid still?
<antocuni> of course
<antocuni> the object is still alive because it's in a list
<fijal> then you up the refcnt to 2 or many if you store it in the list, of course
<fijal> wait
<fijal> "object still alive" != hpy_data valid, to me
<antocuni> this start to look suspiciously similar to cpyext 😅
<antocuni> object still alive == hpy_data valie
<fijal> ok, so I don't need to create a new handle when I want to access it?
<antocuni> I think this is the point that you are missing
<antocuni> of course you do
<fijal> why the new handle has to come with the same C pointer then?
<antocuni> if you want to pass this object to C, you need to create a new handle
<antocuni> handles are references to arbitrary python objects (either normal objects or HPy objects)
<fijal> can I do
<antocuni> HPy objects own a piece of C memory
<antocuni> so the memory is tied to the object, not to the handle
<fijal> so once HPy object gets created and a single handle created, then the C pointer to that struct has to be valid for the whole duration of the existence of this object?
<antocuni> yes
<fijal> ok, what is the exact difference with cpyext then?
<antocuni> no sorry
<antocuni> hpy_data can in theory move around
<antocuni> the invariant is that if you have a valid handle and you call HPy_AsStruct, the resulting pointer is valid until you close the handle
<fijal> yes
<fijal> so my scheme still works I think
<antocuni> so if the object doesn't have any handles to it, it can be moved around
<fijal> when is hpy_data populated?
<antocuni> when the object is created
<fijal> when you create a handle or when you create the W_HPyHandle?
<antocuni> you are still confusing HPy objects and handles
<fijal> you are giving me a bunch of contradictory data, it's hard to have a coherent picture :)
<antocuni> you should read the code, that one is not contradictory :)
<fijal> hah, one would hope, but also thanks
<antocuni> when a W_HPyObject is instantiated, it doesn't have any handle attached to it, it's a normal W_Root object
<fijal> what happens to hpy_data?
<antocuni> which mallocs() an hpy_data
<fijal> so it happens when W_HPyObject is allocated?
<antocuni> yes
<antocuni> then, we need to call the tp_init, which is written in C
<antocuni> so we create a handle, call tp_init, destroy the handle
<fijal> right
<fijal> no, I don't have a good solution then
<fijal> it seems like it should maybe be designed slightly differently?
<antocuni> how?
<fijal> I don't know! but I can try to think about it
<antocuni> ok :)
<fijal> you might actually have some hack that might work, or not
<fijal> like you allocate the contents of hpy_data inside the GC, you pin it if you have one handle and unpin it if you are done
<fijal> but if you have more than one (so you lose track of the amount of raw pointers), you unpin it, do a raw malloc, copy contents and then update hpy_data (and register the finalizer)
<fijal> so that basic scheme does not work because maybe the pointer already escaped, right?
<antocuni> yes, I think it's what we said before. But the problem is that when I realize that I have more than one, there might be already a reference to the memory which you would like to move
<antocuni> yes
<fijal> but something like this might work
<fijal> you always malloc memory *and* space behind the object
<fijal> if you close the one handle, you copy the contents behind the object (to the space we have) that's GC-managed
<fijal> and if no one ever needs it, it just dies
<fijal> but if you create a second handle, you copy it out *again* and have a finalizer
<fijal> that seems very clunky though, it's hard to believe there isn't a better strategy
<fijal> macros or C++ sound like a better idea at this stage (or a debug mode)
<antocuni> I don't understand this last proposal. What is the "space" you are talking about? What does it mean to malloc "behind" the object?
<fijal> so you have HPy_Object in nursery and enough space behind it for the contents of hpy_data, but hpy_data points out to some malloc() memory somewhere else
<fijal> when you HPy_Close your only handle, the malloc() memory is freed and its contents copied to GC managed memory
<antocuni> and when I create a new handle to it?
<fijal> you malloc() stuff again, move the memory there and forget about it (and register a finalizer)
<fijal> I really don't like it, there must be a better design
<antocuni> but this would happen all the time
<antocuni> think of a numpy array
<antocuni> we have a W_HPyObject, and hpy_data contains the numpy array
<antocuni> we create it from Python, so the object exists in Python land but we don't have any handle to it
<antocuni> every time we call a method, we create a handle, call the C function, close the handle
<antocuni> there is not such a thing as "the one handle" for the object
<antocuni> handles are always temporary and short lived
<antocuni> that's also why I think that pinning should not be a problem; the pinning is very short lived
marvin has quit [Remote host closed the connection]
073AAEFOT has joined #hpy
GianlucaRizzo has joined #hpy
GianlucaRizzo has quit [Ping timeout: 250 seconds]
<cfbolz> fijal: what happens if we allocate all the W_HPyObject's as "young but non-movable" objects?
<cfbolz> they tend to not die extremely quickly, I assume
073AAEFOT has quit [Remote host closed the connection]
marvin_ has joined #hpy
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #hpy
<antocuni> cfbolz: there are cases in which they actually die quickly. E.g. if you read items out of a numpy array in a for loop
<antocuni> because each item is a numpy scalar
<antocuni> and I think that this particular case probably hits a kind of sweet spot on CPython, thanks to the free lists
<cfbolz> antocuni: then I don't know either, except writing a new GC
<antocuni> well, the good part is that even if the current strategy is not optimal, it's still orders of magnitude better than cpyext
<cfbolz> antocuni: that's aiming relatively low though ;-)
<antocuni> 😂
<fijal> I think thta's an incredibly tight set of constraints
<fijal> not truly much better than cpyext
squeaky_pl has joined #hpy
<squeaky_pl> HPy was just mentioned during PyCon US Steering Council Panel
<squeaky_pl> Somebody submitted a question that said something along the lines of "What happened to HPy"
<squeaky_pl> The answer from the council was that nothing really happened to it and it's still going on, but it will take a lot of time to port extensions to HPy.
squeaky_pl has quit [Ping timeout: 240 seconds]