GianlucaRizzo has quit [Ping timeout: 272 seconds]
<fijal>
antocuni: pinning so many objects would be *very* problematic
<fijal>
I think we are running into very similar issues with various buffers etc.
<fijal>
what you want instead is to have a free list, maybe
<fijal>
... but maybe that's a sweet spot for malloc anyway?
<antocuni>
yes, we need to tweak our GC to work better in presence of more pinning
<antocuni>
IIRC armin at some point convinced me that it was possible, but I don't remember his ideas
<antocuni>
another easy (maybe temporary) solution would be to allocate W_HPyObjects directly outside the nursery, so that we no longer need pinning
<fijal>
I don't think you can
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #hpy
<fijal>
why can't we use malloc for this? it seems you are really winning here by having allocate/release in a tight loop (hits the sweet spot of malloc)
<fijal>
so you allocate the normal part in the nursery (with normal things) and the extra part outside, with malloc/free pair
<antocuni>
that's how it works now
<fijal>
ok, then why is it slow?
<fijal>
do you do free() when calling Release or do you do free from __del__?
<antocuni>
because of malloc(), IIRC
<fijal>
if you do malloc/free/malloc/free/malloc/free it's *really* fast
<antocuni>
we call free() inside the __del__, which is a light finalizer
<fijal>
ok, but that's why it's slow
<fijal>
you need to call free() from Release
<antocuni>
what is Release?
<fijal>
HPy_release?
<fijal>
how do you get rid of a handle?
<antocuni>
HPy_Close
<fijal>
yes, that one
<antocuni>
but this closes the handle, not the object
<antocuni>
the lifetime of an object is determined by the GC
<fijal>
ok, but thta's where the problem lays
<antocuni>
well yes, it's because we don't have reference counting
<fijal>
you have a long set of malloc() before collect and then a long set of free() when you collect
<fijal>
that's pretty bad, don't do that
<antocuni>
which is the whole point of the excercise :)
<fijal>
can you not free the extra space when doing HPy_Close?
<antocuni>
I think there is a misunderstanding here
<fijal>
yeah?
<antocuni>
the extra space is the space which contains the data of the object
<fijal>
but you can't use it without a handle, right?
<antocuni>
you cannot free() it until you are sure that the object is no longer reachable
<fijal>
I've done this exercise before - there is no way (that I know of) to allocate small chunks of non-movable memory that's freed by nursery collect that does not completely trash your caches
<antocuni>
you can reach it even without handles; e.g. if the object is stored inside a list
<antocuni>
the object is still alive even if there are no longer HPy handles for it
<fijal>
and it has to be nonmovable, even if there are no handles?
<antocuni>
there might be one in the future
<antocuni>
so basically, you need to pin the memory only if there are handles around
<antocuni>
and this is something which you can know precisely
<fijal>
pinning is really not supposed to work like that
<fijal>
if you have more than a couple pinned objects that survive for a long time, your GC is kinda unusable
<antocuni>
but they won't survive for long
<antocuni>
the typical cycle is "you don't have any handle -> you create a handle to call a C function -> you call the C func --> you close the handle and unpin the memory"
<fijal>
ok, that's better
<fijal>
you are running into the issue of not having a refcount on pinning though here
<fijal>
no?
<antocuni>
yes
<antocuni>
we need to add support for that
<antocuni>
I am aware that there is a lot of work to do on this side, and probably it's not even the first priority
<fijal>
I mean... I'm not so stoked to have pinning refcounting
<fijal>
but I guess I'm not going to stop anyone :)
<fijal>
there might be some hack, I think
<fijal>
like have a refcount saying "0 1 many" or something along those lines
<antocuni>
ok but how do you go from many to 1?
<fijal>
you don't, you never unpin it until it dies
<fijal>
(or more likely you don't pin it in the first place, you allocate it somewhere else)
<antocuni>
when you arrive at "many" it's already too late, because there is already someone who is relying on this address
<fijal>
is there a field you can change? or is the memory just directly behind
<antocuni>
currently W_HPyObject has a field hpy_data which contains the malloc()ed memory. But the point is to "inline" hpy_data inside W_HPyObject
<antocuni>
but also, the problem is on the C side
<fijal>
yes?
<fijal>
you can have some tricks for one-ref-only if you don't inline
<antocuni>
as soon as you pass a handle to C, someone might call HPy_AsStruct and get a C pointer to this memory
<fijal>
but that's only valid as long as the handle is, right?
<antocuni>
so if the GC runs when this pointer is still around, we must ensure that the memory doesn't move
<antocuni>
yes
<fijal>
ok, I think you can do a trick like this (maybe)
<antocuni>
the problem is when you call the GC indirectly from C
<fijal>
you have the hpy_data that's inlined (with maybe some bit magic somewhere), where each dereference has to be a bit special, checking the mask or whatnot
<fijal>
then when you get two handles at the same time, you make it a normal pointer and malloc memory in a normal way
<fijal>
but as long as you don't, you pin the HPy_Object itself
<antocuni>
what if I get the second handle when someone has already a C pointer to it?
<fijal>
you can't have macros for accessing all of that in C?
<fijal>
like, does HPy_AsStruct always return a C pointer and that's it? or can it be more special?
<antocuni>
we discussed it. Using macros is really too awkward and ugly when writing C code, and moreover it's too easy to make mistakes
<antocuni>
like, if you want to keep a nice C-like syntax, you need to write something like:
<antocuni>
HPy_STRUCT(my_handle)->my_field
<fijal>
yeah something like that
<fijal>
isn't HPy not supposed to be used by hand anyway?
<antocuni>
but then it's too tempting to store the result of HPy_STRUCT inside a local variable
<antocuni>
HPy is supposed to be used also by hand, e.g. to write numpy
<fijal>
either way, maybe another trick that you can use is something like shadows when taking an address
<fijal>
you return the address into old generation, but that does not survive minor collection as a valid address
<fijal>
you still are going to run into some issues, notably trashing your caches, possibly
<antocuni>
but then it means that you need to use a macro every time you want to access a field
<fijal>
yeah, I would write
<fijal>
HPy_STRUCT_GET(my_handle, "field")
<fijal>
HPy_STRUCT_SRT(my_handle, "field", value)
<fijal>
s/SRT/SET/
<antocuni>
yes, that's exactly what we wanted to avoid
<fijal>
haha, ok
<fijal>
well, I don't think there is a good strategy then
<antocuni>
well, the good strategy is to improve PyPy's GC :)
<fijal>
I don't think you can
<fijal>
at least not in ways you described, you are running into the same issues
<fijal>
what is a realistic chance of numpy using HPy?
<antocuni>
very realistic, from what I understand
<antocuni>
the numpy devs seem very positive towards hpy
<fijal>
what about a C static checker that checks if you are not storing HPy_Struct somewhere?
<antocuni>
uhm
<antocuni>
maaaybe
<fijal>
maybe there is a way to design a hackish macro that would crash if you try not to follow up it with ->
<fijal>
?
<antocuni>
I don't think such a macro exists, because to be able to handle the -> the expression must be of type e.g. PointObject*, and if it's of type PointObject* you can always store it in a local
<antocuni>
if we really want to go along that route, maybe it's better to use the debug mode; like, in debug mode every call to HPy_Struct moves the memory somewhere else, so if you store it in a local var you get crashes
<antocuni>
but all of this look very obscure to me
<fijal>
yes, I struggle to think about a non-obscure versions
<antocuni>
you can see that all the methods start by doing ArrayObject *self = ArrayObject_AsStruct
<antocuni>
and the code still look reasonable
<antocuni>
if you start to add macros, the code quickly become unreadable, I fear
<fijal>
right, C just sucks I'm afraid
<antocuni>
yes
<antocuni>
probably it would be possible to do some hack with C++
<antocuni>
maybe another simpler alternative is to write our own malloc wrapper/replacement in such a way that it plays well with alloc/free patterns generated by HPy
<fijal>
I'm not sure it would help much
<fijal>
ok, how about the following
<fijal>
you allocate one HPy_Handle, you malloc the data (and put it in hpy_data), you set the refcount to "1"
<fijal>
if you HPy_Close that one, you free the data
<fijal>
if you call another HPy_Handle, you set the refcount to "many" and then you free it in light finalizer (maybe you only register the finalizer now)
<fijal>
if HPy_Close has the refcount to "many" you don't do anything
<antocuni>
I don't think it's possible to register the light finalizer at runtime, since it's an RPython-level __del__
<antocuni>
but it might be possible to do that with an applevel finalizer
<fijal>
I think there is a major win not registering the finalizer when not needed
<fijal>
I don't see a fundamental reason why not?
<fijal>
there might be a technical one
<antocuni>
this might actually work, apart the fact that you need to do a bit of additional work for every call of HPy_Dup and HPy_Close
<fijal>
in C++ you can overload "operator->" so yes, if you can do C++ a lot of things are possible
<antocuni>
I agree about the major win on not registering the finalizer
<fijal>
antocuni: thank you for providing a cool puzzle :)
marvin_ has quit [Remote host closed the connection]
<antocuni>
so basically, what you are proposing is: 1. kill W_HPyObject.__del__, and re-implement the functionality with self.register_finalizer
marvin has joined #hpy
<antocuni>
2. add a fast path which avoids the register_finalizer entirely if you happen to have only a single handle to the object
<antocuni>
it might work :)
<fijal>
yes, we need some hacks to have the 0-1-many refcount
<antocuni>
we need to measure how often the fast path actually happens, though
<fijal>
but most importantly 3. free() the data is you close the only finalizer
<fijal>
that's the biggest win, I think
<fijal>
er, one handle
<fijal>
(you can extend the refcount to X, it does not have to be only one bit)
<antocuni>
the 0-1-many refcount can be placed directly on W_HPyObject now, there is no longer a need to play with pinning
<antocuni>
ah no wait
<antocuni>
what happens in this case:
<antocuni>
1. I create an W_HPyObject (with refcnt==1) and malloc hpy_data
<antocuni>
2. I store it inside a python list
<antocuni>
2. I close the handle
<fijal>
is hpy_data valid still?
<antocuni>
of course
<antocuni>
the object is still alive because it's in a list
<fijal>
then you up the refcnt to 2 or many if you store it in the list, of course
<fijal>
wait
<fijal>
"object still alive" != hpy_data valid, to me
<antocuni>
this start to look suspiciously similar to cpyext 😅
<antocuni>
object still alive == hpy_data valie
<fijal>
ok, so I don't need to create a new handle when I want to access it?
<antocuni>
I think this is the point that you are missing
<antocuni>
of course you do
<fijal>
why the new handle has to come with the same C pointer then?
<antocuni>
if you want to pass this object to C, you need to create a new handle
<antocuni>
handles are references to arbitrary python objects (either normal objects or HPy objects)
<fijal>
can I do
<antocuni>
HPy objects own a piece of C memory
<antocuni>
so the memory is tied to the object, not to the handle
<fijal>
so once HPy object gets created and a single handle created, then the C pointer to that struct has to be valid for the whole duration of the existence of this object?
<antocuni>
yes
<fijal>
ok, what is the exact difference with cpyext then?
<antocuni>
no sorry
<antocuni>
hpy_data can in theory move around
<antocuni>
the invariant is that if you have a valid handle and you call HPy_AsStruct, the resulting pointer is valid until you close the handle
<fijal>
yes
<fijal>
so my scheme still works I think
<antocuni>
so if the object doesn't have any handles to it, it can be moved around
<fijal>
when is hpy_data populated?
<antocuni>
when the object is created
<fijal>
when you create a handle or when you create the W_HPyHandle?
<antocuni>
you are still confusing HPy objects and handles
<fijal>
you are giving me a bunch of contradictory data, it's hard to have a coherent picture :)
<antocuni>
you should read the code, that one is not contradictory :)
<fijal>
hah, one would hope, but also thanks
<antocuni>
when a W_HPyObject is instantiated, it doesn't have any handle attached to it, it's a normal W_Root object
<fijal>
what happens to hpy_data?
<antocuni>
which mallocs() an hpy_data
<fijal>
so it happens when W_HPyObject is allocated?
<antocuni>
yes
<antocuni>
then, we need to call the tp_init, which is written in C
<antocuni>
so we create a handle, call tp_init, destroy the handle
<fijal>
right
<fijal>
no, I don't have a good solution then
<fijal>
it seems like it should maybe be designed slightly differently?
<antocuni>
how?
<fijal>
I don't know! but I can try to think about it
<antocuni>
ok :)
<fijal>
you might actually have some hack that might work, or not
<fijal>
like you allocate the contents of hpy_data inside the GC, you pin it if you have one handle and unpin it if you are done
<fijal>
but if you have more than one (so you lose track of the amount of raw pointers), you unpin it, do a raw malloc, copy contents and then update hpy_data (and register the finalizer)
<fijal>
so that basic scheme does not work because maybe the pointer already escaped, right?
<antocuni>
yes, I think it's what we said before. But the problem is that when I realize that I have more than one, there might be already a reference to the memory which you would like to move
<antocuni>
yes
<fijal>
but something like this might work
<fijal>
you always malloc memory *and* space behind the object
<fijal>
if you close the one handle, you copy the contents behind the object (to the space we have) that's GC-managed
<fijal>
and if no one ever needs it, it just dies
<fijal>
but if you create a second handle, you copy it out *again* and have a finalizer
<fijal>
that seems very clunky though, it's hard to believe there isn't a better strategy
<fijal>
macros or C++ sound like a better idea at this stage (or a debug mode)
<antocuni>
I don't understand this last proposal. What is the "space" you are talking about? What does it mean to malloc "behind" the object?
<fijal>
so you have HPy_Object in nursery and enough space behind it for the contents of hpy_data, but hpy_data points out to some malloc() memory somewhere else
<fijal>
when you HPy_Close your only handle, the malloc() memory is freed and its contents copied to GC managed memory
<antocuni>
and when I create a new handle to it?
<fijal>
you malloc() stuff again, move the memory there and forget about it (and register a finalizer)
<fijal>
I really don't like it, there must be a better design
<antocuni>
but this would happen all the time
<antocuni>
think of a numpy array
<antocuni>
we have a W_HPyObject, and hpy_data contains the numpy array
<antocuni>
we create it from Python, so the object exists in Python land but we don't have any handle to it
<antocuni>
every time we call a method, we create a handle, call the C function, close the handle
<antocuni>
there is not such a thing as "the one handle" for the object
<antocuni>
handles are always temporary and short lived
<antocuni>
that's also why I think that pinning should not be a problem; the pinning is very short lived
marvin has quit [Remote host closed the connection]
073AAEFOT has joined #hpy
GianlucaRizzo has joined #hpy
GianlucaRizzo has quit [Ping timeout: 250 seconds]
<cfbolz>
fijal: what happens if we allocate all the W_HPyObject's as "young but non-movable" objects?
<cfbolz>
they tend to not die extremely quickly, I assume
073AAEFOT has quit [Remote host closed the connection]
marvin_ has joined #hpy
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #hpy
<antocuni>
cfbolz: there are cases in which they actually die quickly. E.g. if you read items out of a numpy array in a for loop
<antocuni>
because each item is a numpy scalar
<antocuni>
and I think that this particular case probably hits a kind of sweet spot on CPython, thanks to the free lists
<cfbolz>
antocuni: then I don't know either, except writing a new GC
<antocuni>
well, the good part is that even if the current strategy is not optimal, it's still orders of magnitude better than cpyext
<cfbolz>
antocuni: that's aiming relatively low though ;-)
<antocuni>
😂
<fijal>
I think thta's an incredibly tight set of constraints
<fijal>
not truly much better than cpyext
squeaky_pl has joined #hpy
<squeaky_pl>
HPy was just mentioned during PyCon US Steering Council Panel
<squeaky_pl>
Somebody submitted a question that said something along the lines of "What happened to HPy"
<squeaky_pl>
The answer from the council was that nothing really happened to it and it's still going on, but it will take a lot of time to port extensions to HPy.