#hpy on 2022-02-03 — irc logs at libera.irclog.whitequark.org

2021-05-27 19:57 antocuni changed the topic of #hpy to: https://hpyproject.org - https://github.com/hpyproject/hpy - IRC logs: https://libera.irclog.whitequark.org/hpy

01:02 dalley_ has quit [Quit: Leaving]

01:02 dalley has joined #hpy

01:48 FFY00 has quit [Read error: Connection reset by peer]

01:49 FFY00 has joined #hpy

01:49 FFY00 has quit [Remote host closed the connection]

01:50 FFY00 has joined #hpy

06:46 computerfarmer has joined #hpy

07:18 computerfarmer has quit [Read error: Connection reset by peer]

07:21 computerfarmer has joined #hpy

07:23 computerfarmer has quit [Read error: Connection reset by peer]

07:25 mattip has quit [*.net *.split]

07:27 mattip has joined #hpy

07:31 computerfarmer has joined #hpy

07:44 computerfarmer has quit [Read error: Connection reset by peer]

08:37 <fangerer> Good morning! We will have our dev call in about 25 minutes.

08:53 computerfarmer has joined #hpy

08:58 <Hodgestar> Unforuntately I can't make this morning. I have a meeting at 11:30 that I thought would be online, but I have to drive there instead. :/

09:02 <antocuni> mattip, ronan: are you joining?

09:05 <mattip> yes

10:11 Techcable has quit [Remote host closed the connection]

10:12 Techcable has joined #hpy

10:15 <mattip> https://data-apis.org/array-api/latest/

10:17 <mattip> here is a blog post about it

10:17 <mattip> https://labs.quansight.org/blog/2021/11/pydata-extensibility-vision/

11:44 computerfarmer has quit [Read error: Connection reset by peer]

11:48 computerfarmer has joined #hpy

11:52 computerfarmer has quit [Read error: Connection reset by peer]

11:55 computerfarmer has joined #hpy

11:57 computerfarmer has quit [Read error: Connection reset by peer]

11:58 computerfarmer has joined #hpy

12:12 computerfarmer has quit [Read error: Connection reset by peer]

12:16 computerfarmer has joined #hpy

12:35 computerfarmer has quit [Ping timeout: 256 seconds]

12:38 computerfarmer has joined #hpy

13:52 computerfarmer has quit [Read error: Connection reset by peer]

13:52 computerfarmer has joined #hpy

13:53 computerfarmer has quit [Read error: Connection reset by peer]

13:56 computerfarmer has joined #hpy

14:18 computerfarmer has quit [Read error: Connection reset by peer]

14:21 computerfarmer has joined #hpy

14:54 computerfarmer has quit [Read error: Connection reset by peer]

14:54 computerfarmer has joined #hpy

15:02 <Hodgestar> How was the meeting?

15:04 <Hodgestar> If the data API is specifically only covering the Python API (and not any C API) how relevant is it to HPy?

15:05 <steve_s> I haven't read up on it yet, but @mattip mentioned that numpy will/does have a mode, which implements this and which supports only dtyped without python objects in them

15:07 <steve_s> The idea was that we'll support numpy in this mode only, because it will remove all the issues with how to traverse the memory in tp_traverse if we do not have a context and cannot query the dtype object about the shape of the memory, i.e., find out where are the HPyField in the C struct

15:08 <Hodgestar> Ah, so this would provide a subset of numpy that HPy could support more easily.

15:09 <Hodgestar> mattip: Any idea if the people being funded by AMD would be interested in collaborating on porting that subset of numpy? It would be amazing to have some input / conversations about how things are done by both sides so that HPy doesn't have to rely on continually dodging "mistakes" made in the API design because of other considerations or just by accident.

15:10 computerfarmer has quit [Read error: Connection reset by peer]

15:10 computerfarmer has joined #hpy

15:10 <steve_s> I plan to write up mode details into some GitHub issue where we can discuss it further. Main thing is that it would not be something like piconumpy, it would be full numpy with the intention to upstream the code, but you would be allowed to use object in arrays only in CPython ABI mode, in universal mode, when you create an array with objects in it, it will raise an error. The parts of numpy that are necessary to support arrays with objects in

15:10 <steve_s> them can keep on using CPython API, they will never be called in universal mode.

15:15 <steve_s> The idea is that this mode should be the preferred numpy mode and there will be more reasons for using it not only HPy

15:23 <Hodgestar> steve_s: Re m_traverse: I think I still don't understand why HPy could not just register a different tp_traverse when it creates a module object (that then calls any tp_traverse passed to HPy)?

15:23 <mattip> Hodgestar: discussions are welcome at https://github.com/data-apis/consortium-feedback

15:25 <Hodgestar> mattip: I was think more collaborating (i.e. working together) and not high-level discussions.

15:28 <steve_s> Re m_traverse: the trampoline that is generated for normal tp_traverse does basically this: bounce_to_user_tp_traverse(HPy_AsStruct(object), visit, arg), but for module we need bounce_to_user_tp_traverse(HPyModule_GetState(object), visit, arg). The generated trampoline could check if the object is module or type, but I think m_traverse solution is better.

15:32 <ronan> steve_s: Actually, if we don't implement item traversal, everything should work with the universal ABI. The only issue is that cycles won't be detected, but that's numpy's current status anyway.

15:46 <Hodgestar> steve_s: The trampoline wouldn't necessarily have to check though -- it could just be a different trampoline.

15:46 <Hodgestar> steve_s: Btw, I am not necessarily advocating for a particular solution here, just working through the options to understand them.

16:00 <steve_s> > it could just be a different trampoline

16:00 <steve_s> yes, that's what I suggest. But from the function itself you don't know which one to generate. However, we could just generate both trampolines for every tp_traverse, is that what you mean? Yes that sounds also good, maybe better, just more generated code but that's not a big issue?

16:05 <steve_s> but this struct:

16:05 <steve_s> typedef struct {

16:05 <steve_s> HPySlot_Slot slot; // The slot to fill

16:05 <steve_s> void *cpy_trampoline; // Used by CPython to call impl

16:05 <steve_s> } HPySlot;

16:05 <steve_s> would need one more field for the other trampoline...

16:05 <steve_s> void *impl; // Function pointer to the implementation

16:09 <antocuni> ronan: "if we don't implement item traversal, everything should work". What do you mean? If you don't implement a proper tp_traverse, everything will explode pretty quickly on PyPy because the GC would have only a partial view of the memory

16:09 <antocuni> which means objects which are collected even if they are still alive, and others which are moved but without updating their pointers

16:09 <steve_s> ronan: yes the use case for tp_traverse on HPy is not only cycles detection

16:10 <steve_s> but to also know what objects to clear, right? I.e. the autogenerated tp_clear

16:12 <ronan> antocuni: no, it'll be handled by cpyext through refcounting - just like on upstream numpy + pypy

16:12 <antocuni> ok, then I'm confused and I don't know what we are talking about

16:13 <steve_s> who will decrement the refcount though?

16:14 <ronan> numpy does it already

16:14 <steve_s> where?

16:14 <antocuni> in array_dealloc, I suppose

16:15 <ronan> yes, or array_finalize in hpy-numpy

16:17 <steve_s> right, well, then that sounds good actually. If numpy ignores cycles detection, hpy-numpy may as well and being able to access context in tp_traverse is not an issue at all

16:19 <steve_s> then it is a question if we still do not want to support only the array api standard, because that would save us from all the HPyField stuff like storing/loading unaligned fields, memcpy and all that

16:20 <steve_s> so maybe there are still reasons to support only the array api standard and not dtypes with objects

16:22 <antocuni> well, it is still an issue because if I understand correctly this means that the plan is to implement the object dtype using legacy PyObject

16:22 <antocuni> which prevents to compile in pure universal mode

16:23 <antocuni> but I guess there is still a long way before we can kill the legacy code so maybe it's not an immediate problem

16:27 <steve_s> > which prevents to compile in pure universal mode

16:27 <steve_s> hm, yes :/ but yes, converting that to use HPy fields can be the next step. It is already a massive undertaking so the more we can partition it to smaller self-contained useful steps...

16:27 <ronan> in any case, all the object dtype code will need either to be removed or ported to compile in pure universal mode. tp_traverse is just a small part of it

16:28 <antocuni> anyway, if we can support dtype=object via legacy PyObject* it's still a good thing for now, because it means that we can move forward

16:44 <steve_s> ronan: but I assume you've ported some of the dtype support already? I was thinking if it could stay half way: some parts of the code, irrelevant in the array api standard, can stay in CPython API and you'd convert the handle to PyObject and pass it to that code and be done (for now, for this porting iteration).

16:46 <steve_s> Re universal mode: as long as the code actually doesn't call into CPython API, then PyPy/GraalPyton could still run it in universal mode, right? We'd need to provide the CPython API functions so the linker is happy, but they can be just stubs, or not if there are PyObject* <-> HPy conversion functions on the context doing the right thing they can just work.

16:47 <steve_s> but yes it would not be pure universal binary

16:50 <steve_s> Re: dtype support, what is more of an issue are PyObjects stored in the arrays themselves, right? Those I would not convert to HPyFields and just let them be PyObject and not port all the code that handles them.

17:12 computerfarmer has quit [Quit: Konversation terminated!]

17:12 computerfarmer has joined #hpy

17:21 <ronan> yes, I have ported quite a lot of the object dtype support, but there are lots of places that can probably only work on CPython, e.g. where I used HPyField_Load(ctx, HPy_NULL, some_field)

17:24 <ronan> The problem for the universal mode is that as soon as you start converting PyObject* to HPyField, you need to convert everything, and you can't run the tests until you do

17:26 <ronan> stopping before that gives a point where PyPy can run all of numpy (with all the object dtype stuff running in cpyext)

18:25 computerfarmer has quit [Read error: Connection reset by peer]

18:25 computerfarmer has joined #hpy

18:26 computerfarmer has quit [Read error: Connection reset by peer]

18:26 computerfarmer has joined #hpy

18:35 computerfarmer has quit [Read error: Connection reset by peer]

18:35 computerfarmer has joined #hpy

19:09 <antocuni> ronan: re "you need to convert everything": is this because you cannot mix HPyField and PyObject* in the same struct? Or there is some other hidden problem which prevents fields to be ported one by one?

19:26 computerfarmer has quit [Read error: Connection reset by peer]

19:26 computerfarmer has joined #hpy

19:46 computerfarmer has quit [Read error: Connection reset by peer]

19:48 computerfarmer has joined #hpy

19:48 computerfarmer has quit [Read error: Connection reset by peer]

19:50 computerfarmer has joined #hpy

20:53 computerfarmer has quit [Quit: Konversation terminated!]

23:41 <ronan> antocuni: it's about the items inside the array, they're stored in a void*, so you need to consistently use either (HPyField *)data_ptr or (PyObject **)data_ptr. In universal mode, you can't mix and match. It's much less of a problem for fields that are directly on the PyArrayObject struct, partly because there are accessor macros and because the compiler can help you.