#hpy on 2022-01-13 — irc logs at libera.irclog.whitequark.org

2021-05-27 19:57 antocuni changed the topic of #hpy to: https://hpyproject.org - https://github.com/hpyproject/hpy - IRC logs: https://libera.irclog.whitequark.org/hpy

08:19 <mattip> the monthly call is in 10 minutes?

08:19 <antocuni_> yes

08:22 <fangerer> yes

09:34 <mattip> tp_traverse https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse

10:33 <Hodgestar> For HPy we need tp_traverse to be called for every HPyField.

10:34 <Hodgestar> Sorry, the visit function needs to be called within tp_traverse for every HPyField.

10:36 <Hodgestar> CPython seems to be lax in ways that are specific to its own garbage collection (and to the cyclic garbage collector having been bolted on later) and HPy will likely have to have more consistent rules to cater to a wider class of garbage collectors (i.e. to not expose the inner workings of the current CPython garbage collector).

10:36 <Hodgestar> I don't think this is a bad thing -- hiding the inner workings of the interpreter is one of the core goals of HPy.

10:37 <Hodgestar> From our discuss, I liked Antonio's suggestion to store the information needed in tp_traverse for the more complex numpy dtypes on the array instance so that tp_traverse can get at it easily.

11:18 <mattip> so we need to get some kind of buy in from CPython to change the requirements of tp_traverse, something like

11:19 <mattip> "tp_traverse should not require C-API calls to find the owned objects to visit"

11:23 <Hodgestar> Well, first we need CPython to require tp_traverse to visit all objects and not just the ones that could format part of reference cycles.

11:24 <Hodgestar> I guess first we should see what real tp_traverse calls in the wild do -- i.e. are there a lot of complex cases like numpy, or only a few?

11:25 <Hodgestar> And how common is it to just leave out tp_traverse in custom extension types that extend the PyObject struct in ways that reference other Python objects that happen to only be integers, strings, etc that one would not have to call tp_traverse for in CPython.

11:48 <mattip> numpy only actually uses tp_traverse in dtype metaclasesses

11:48 <mattip> https://github.com/numpy/numpy/blob/813a0c11186ded0b5caeb853fd2b22fb9addd511/numpy/core/src/multiarray/dtypemeta.c#L83

11:49 <mattip> and in ufuncs,

11:49 <mattip> https://github.com/numpy/numpy/blob/813a0c11186ded0b5caeb853fd2b22fb9addd511/numpy/core/src/umath/ufunc_object.c#L5663

11:50 <mattip> and in both cases I don't think these will break any cycles

11:55 <mattip> OTOH cython uses it

12:04 mattip has quit [Ping timeout: 256 seconds]

12:13 mattip has joined #hpy

13:23 <antocuni_> mattip: I don't understand why we need CPython to change the requirements of tp_traverse. There are already many things that you can do in CPython but not in HPy, and calling C-API functions from within tp_traverse is just another of those

13:24 <antocuni_> of course this means that for some use cases it will be harder to rewrite tp_traverse to be HPy-compatible, but I don't think there is much that we can do for that

13:26 <antocuni_> also, sorry for having been silent in my last 15 minutes of the call, my mic was turned off and you didn't hear anything of what I said (maybe it's not that bad 😅)

13:26 <antocuni_> did I miss anything important after I left?

13:30 <mattip> I also left before the end

13:31 <mattip> well, when we come to ask projects to rewrite for HPy, it would be nice to have at least a "please don't do that" in the CPython documentation

13:32 <Hodgestar> And definitely the HPy documentation. :)

14:32 <fangerer> > did I miss anything important after I left?

14:32 <fangerer> I don't want to judge that :slightly_smiling_face: . We basically continued the discussion of problems when mixing HPyFields with other (not managed) C data. Ronan also pointed out that Numpy uses memcpy to copy data that may contain Python objects and C primitives. We do currently not explicitly state in the docs that you may not do that (in general). Although, it will most certainly work if the copied fields are again owned by the

14:32 <fangerer> same object (useful in case you are resizing a C array). I think we need to extend our docs a bit.

14:35 <steve_s> > if the copied fields are again owned by the same object

14:35 <steve_s> that assumption may not work for all types of GCs, so imo it would be better to avoid this too. In general moving memory managed by GC behind GC's back is not great.

14:36 antocuni_ is now known as antocuni

14:37 <antocuni> indeed, that's a tricky case

14:37 <steve_s> There was a question whether mixing native and managed memory is a good idea after all, but anything else would make the migration more complicated...

14:38 <antocuni> for example, what is you want to implement a list-like object in C? You surely need to resize the underlying array, and you want memcpy or something similar to copy the contents to the new memory

14:38 <antocuni> you surely don't want to be forced to copy the items one by one

14:39 Techcable has quit [Ping timeout: 240 seconds]

14:43 <steve_s> would it be hard to dismarry the native parts and the HPyFields? Certainly not great for cache locality I know... Ok, we may have to allow moving memory potentially containing HPy fields within one owning object.

14:44 <antocuni> what do you mean exactly?

14:48 <steve_s> Have a region of memory that contains only HPyFields and memory that contains everything else. You could memcpy/move at least the other region

14:49 <antocuni> and what do I do if I want a struct Point {int x; int y; HPyField name;} ?

14:50 <steve_s> if you have that struct in an array you would have one array of struct Point {int x; int y; } and one array of HPyField name

14:51 <antocuni> and how would it help?

14:52 <antocuni> I mean, I'm not sure to understand what is the problem that you are trying to solve

14:56 <steve_s> there would be HPyField_ArrayMove or something like that that would allow the GC to see that things are moving around. The problem, in general, is the question: if you memmove some memory that contains HPyFields, should you then call HPyField_Store? I think that would defeat the purpose of memmove. So we may have to specify that moving memory around within one object is fine, which seems bit dangerous to me, but maybe I am too cautious

14:57 <antocuni> you cannot use HPyField_Store because that one receives an HPy, not an HPyField

14:57 <antocuni> I think the core of your question is:

14:57 <steve_s> ah righ, so you'd have to load them also

14:58 <antocuni> how does the GC knows which are the GC-managed pointers? Does it know by keeping track of HPyField_Store, or by calling tp_traverse?

14:58 <antocuni> I think that so far the answer is "by calling tp_traverse"

14:59 <antocuni> i.e., memcpy-ing memory around is fine as long as tp_traverse does the right thing (and you must be extra cautious to now invoke the GC while the copy is in progress)

14:59 <antocuni> i.e., the following will lead to problems:

14:59 <antocuni> HPyField *tmp = mylist->items;

15:00 <steve_s> in graalpython we need to keep them separate, because we cannot store them into unmanaged memory and have GC understand them. Another way of looking at this: when I call HPyField_Store(ctx, owner, &owner->foo->bar, value) should the HPyField data stay at the address &owner->foo->bar

15:00 <antocuni> HPyField *new = malloc(...);

15:00 <antocuni> memcpy(new, tmp)

15:00 <antocuni> foo()

15:00 <antocuni> mylist->items = new;

15:01 <antocuni> if foo() invokes the GC, then things might explode because tp_traverse return the old array

15:01 <antocuni> but apart for that, if foo() does "nothing", the code above should be fine

15:01 <steve_s> yes, atomicity is another issue

15:02 <antocuni> steve_s: ah, so from the graalpython point of view, tp_traverse() is just useless?

15:02 <steve_s> still with this example you have the HPyField in two places suddenly and the Python engine doesn't know about that. Could it be an issue for future optimizations/new GCs/etc. I suspect?

15:03 <antocuni> "in two places"? You mean tmp and new?

15:04 <steve_s> yes. The best would be if the specs said that tp_traverse must visit all HPyFields and it is up to the Python implementation whether it keeps track of them internally, or if it takes advantage of tp_traverse

15:04 <steve_s> > ah, so from the graalpython point of view, tp_traverse() is just useless?

15:04 <antocuni> ok, I understand the problem now

15:04 <antocuni> and I see why memcpy is a problem

15:05 <antocuni> one possibile solution could be that after the memcpy (or equivalent) the C code must call "HPy_UpdateGC(ctx, mylist)" or something similar

15:05 <antocuni> which would do nothing on e.g. pypy

15:06 <antocuni> on graalpython, you can use it as a hook where to call tp_traverse and update your internal tracking?

15:06 <antocuni> I don't know whether it would work

15:11 <steve_s> I think it can, but not sure how efficient. To be clear I think that memcpy/memmove is ok right now if it does not change the owner of the field (also for GraalPython). Forward looking I wonder if this isn't still dangerous w.r.t. enabling future improvements. Also the property of HPyField_Store(ctx, owner, &owner->foo->bar, value) => I can assume the field data stays at the address &owner->foo->bar does not hold. Again, not useful today

15:12 <antocuni> yes, I see

15:12 <steve_s> Another property that may change behind Python's back is that there is one copy of the field, if you can copy them around. Not useful today, can it be useful in the future?

15:12 <antocuni> the problem with the two-arrays solution that you propose is that e.g. if fundamentally incompatible with numpy structured arrays, where it's the user to decide what is the content of the strcut

15:13 <steve_s> right, ok, then that's no go then

15:14 <antocuni> but I think that finding an API which works well with both the "pypy approach" and the "graalpython approach" is important. Abstracting over the implementation details is the core goal of hpy, after all

15:15 <antocuni> I don't know how, though :)

16:19 Techcable has joined #hpy

19:04 <arigato> just giving my 2 cents here, in my own opinion: debating basic issues like that after more than 2 years?? sorry, but hpy is not going anywhere IMHO

23:09 <antocuni> arigato: not a very helpful comment, admittedly