#hpy on 2022-05-04 — irc logs at libera.irclog.whitequark.org

2021-05-27 19:57 antocuni changed the topic of #hpy to: https://hpyproject.org - https://github.com/hpyproject/hpy - IRC logs: https://libera.irclog.whitequark.org/hpy

00:43 GianlucaRizzo has quit [Remote host closed the connection]

00:43 GianlucaRizzo has joined #hpy

00:49 GianlucaRizzo has quit [Ping timeout: 256 seconds]

01:18 GianlucaRizzo has joined #hpy

01:21 Techcable has quit [Remote host closed the connection]

01:22 Techcable has joined #hpy

01:24 GianlucaRizzo has quit [Ping timeout: 248 seconds]

01:38 GianlucaRizzo has joined #hpy

01:47 GianlucaRizzo has quit [Ping timeout: 260 seconds]

04:16 GianlucaRizzo has joined #hpy

04:23 GianlucaRizzo has quit [Ping timeout: 256 seconds]

04:43 GianlucaRizzo has joined #hpy

04:48 GianlucaRizzo has quit [Ping timeout: 250 seconds]

06:53 antocuni has left #hpy [Leaving]

06:53 antocuni has joined #hpy

07:02 <fijal> antocuni: ping

07:02 <antocuni> pong

07:02 <fijal> ok, so if you have a sequence like this

07:02 <fijal> obj = malloc(obj), h1 = malloc(handle, obj), free(h1), stuff, h2 = malloc(handle, obj), free(h2)

07:02 <fijal> that's pretty normal right

07:03 <fijal> ?

07:05 GianlucaRizzo has joined #hpy

07:05 <antocuni> I'm not sure to understand the syntax; handles are not malloc()ed, they are indexes into a RPython list. You can open a new handle and this means putting the w_obj into the list (and closing the handle means to remove it from the list)

07:06 <fijal> "creation of handle"

07:06 <antocuni> ok

07:06 <fijal> ok, so the raw data that can be referenced has to survive between h1 = .... and free(h2)

07:07 <fijal> right?

07:08 <antocuni> not completely. The raw data has to survive for the whole lifetime of the object (obviously, because that's where the object lives!), but the ADDRESS of this data can change inside "stuff"

07:09 GianlucaRizzo has quit [Ping timeout: 260 seconds]

07:10 <fijal> I think you would need to have reference counting, because it seems like "gc.pin has to succeed" is a requirement

07:10 <fijal> and you would need to completely redesign our GC for the case of having too many pinned objects

07:11 <antocuni> yes, gc.pin has to succeed and yes, we need to improve our GC to do that

07:11 <fijal> I mean really, "redesign"

07:11 <fijal> it's not a simple improvement

07:12 <antocuni> that's not the impression which I got when talking to Armin, but maybe I misunderstood

07:13 <cfbolz> we should have a call

07:13 <cfbolz> this is not working

07:13 <antocuni> yes

07:13 <cfbolz> as a medium

07:13 <antocuni> but also, there is really no hurry

07:13 <fijal> cfbolz: why is it not working?

07:13 <antocuni> this is for improving the performance of hpy on pypy, but it's not the top priority

07:14 <antocuni> currently, there is no need for pinning because the C data is malloced() anyway

07:14 ronny has quit [Quit: Bridge terminating on SIGTERM]

07:14 jboi has quit [Quit: Bridge terminating on SIGTERM]

07:14 jevinskie[m] has quit [Quit: Bridge terminating on SIGTERM]

07:14 <antocuni> the pinning is needed if we decide to gc-allocate the hpy_data inside W_HPyObject, which is something which we cannot do right now anyway

07:15 <cfbolz> fijal: because you and anto are talking past each other

07:15 <cfbolz> antocuni: is it correct that for a purely python defined instance that you pass to C with HPy, everything is very efficient and you don't need pinning?

07:15 <fijal> no, I disagree

07:16 <antocuni> cfbolz: yes

07:16 <fijal> I maintain I perfectly understood what anto says

07:16 <fijal> the difference is "this is a small improvement to our GC" vs "this is a massive undertaking which requires quite careful redesign"

07:16 <fijal> I would not call it "talking past each other"

07:17 <antocuni> fijal: ok, but you are still talking about a hypotetical optimization which I'm not going to do anytime soon anyway :)

07:17 <fijal> yes yes, I'm only interested in the intellectual pursuit of what's feasible too :)

07:17 jboi has joined #hpy

07:19 <antocuni> so, this is a more precise outline of what's happening when you create an instance of a C-defined type: https://paste.openstack.org/show/bEBl0H9kQWYDHKTMlVxG/

07:19 ronny has joined #hpy

07:19 jevinskie[m] has joined #hpy

07:20 <cfbolz> antocuni: something I was wondering: in the current (non-optimized) model for C data, you find and move the object pointers in there in the GC how?

07:20 <antocuni> the C address of hpy_data must be fixed only between points 6 and 9, and only of the code calls HPy_AsStruct, and that's where pinning would be necessary

07:21 <cfbolz> 6 and 9?

07:21 <cfbolz> I want to know: a C object points to a Python list. how is the C data updated when the list moves?

07:21 <antocuni> cfbolz: not sure if I understand the question. In the current model, W_HPyObject is GC-managed and can mode, while W_HPyObject.hpy_data is malloced and has a fixed address for the whole lifetime

07:22 <antocuni> cfbolz: in that case I use an HPyField

07:22 <antocuni> which on PyPy they are implemented as gcrefs

07:23 <antocuni> and there is a custom GC tracer which calls the user-provided tp_traverse to know where they are

07:23 <antocuni> (that's what I'm doing these days)

07:23 <cfbolz> so every collect calls all these tp_traverse?

07:24 <antocuni> yes

07:24 <cfbolz> what does the api for the traverse C function look like, if the content of a field moves?

07:24 <antocuni> I don't understand the question

07:25 <cfbolz> so an HPyField in some hpy_data can point to a gcref, right?

07:25 <cfbolz> the gcref can move

07:25 <antocuni> yes

07:25 <cfbolz> who updates the hpy_data

07:25 <antocuni> the GC

07:25 <cfbolz> how does it know the offsets of the hpy_fields?

07:26 <antocuni> because it has the custom tracer (which is implemented by calling tp_traverse)

07:27 <antocuni> this is how a tp_traverse look like, FWIW: https://github.com/hpyproject/hpy/blob/master/test/test_hpyfield.py#L42

07:28 <antocuni> in that example, the void* self points to hpy_data

07:34 <fijal> cfbolz: we have the same thing for jitframe?

07:34 <fijal> it's an extra root that needs updating

07:36 <antocuni> yes, and we also have it for micronumpy's arrays of objects, although I don't know if it's still working nowadays

08:12 <cfbolz> I just was on the phone with phlebas, I understand what graalpython is doing significantly better now ;-)

08:14 <cfbolz> they have a ton of cool optimizations that we don't yet

08:16 <cfbolz> antocuni: what's the most up-to-date branch to look at for hpy in pypy

08:16 <antocuni> there is hpy-0.0.4, and I started a sub-branch hpy-0.0.4-hpyfield

08:17 <antocuni> but the latter does not contain much

08:17 <cfbolz> thanks

08:23 <cfbolz> phlebas: I know you are in a meeting, but of course CPython should do the same handle optimizations for ints etc

08:24 <cfbolz> antocuni: we already came to the conclusion that hpy can be used to introduce pointer tagging, right?

08:25 <antocuni> I think I already talked about that with someone, yes

08:25 <antocuni> I don't remember what was the conclusion :)

08:26 <antocuni> is graalpython doing it?

08:26 <cfbolz> yes

08:26 <antocuni> cool

08:26 <antocuni> only for ints or also for e.g. floats?

08:26 <cfbolz> both

08:26 <cfbolz> and they encode various other things in the bits of the handle

08:27 <cfbolz> like the length if it is small

08:27 <antocuni> uhm wait. How can you squeeze a 64 bit float into a 64 bit handle?

08:27 <antocuni> the length of what?

08:27 <cfbolz> you hide everything that's not a float in NaNs

08:27 <antocuni> ah right

08:27 <cfbolz> antocuni: anything ;-)

08:27 <cfbolz> probably only tuples and stuff

08:27 <antocuni> ah, so that HPy_Length is very fast

08:27 <antocuni> brilliant

08:27 <cfbolz> yes

08:28 <cfbolz> the NaN trick is fairly "standard", all the JS VMs do it

08:28 <antocuni> phlebas, fangerer: you should really write a blog post about this stuff

08:28 <cfbolz> yep

08:29 <antocuni> yes, I think we use the NaN trick also in PyPy for lists of int-float-none, don't we?

08:29 <cfbolz> yes

08:29 <cfbolz> but it's much cooler in handles

08:30 <antocuni> so, if we port the handle tagging thing to CPython, HPy might end up being *faster* than standard C exts? :)

08:30 <cfbolz> yes, I was wondering

08:30 <cfbolz> it would allow doing it cleanly in CPython

08:30 <cfbolz> because you would cleanly known which code supports this: everything using hpy

08:33 <antocuni> how is it different than using tagged pointers in PyObject*, though?

08:34 <cfbolz> antocuni: you can't upgrade to that easily

08:34 <cfbolz> you would have to fix too many things

08:34 <antocuni> right, like all the existing extensions

08:34 <cfbolz> yes

08:34 <cfbolz> and hpy would give you a way to know which extensions are safe

08:35 <antocuni> yes

08:35 <antocuni> and the super nice thing is that the extensions don't even need to know

08:36 <cfbolz> exactly

08:36 <cfbolz> they "just" need to use hpy correctly

08:37 <antocuni> and that's where the debug mode helps

08:41 <cfbolz> and pypy as a debugging tool (it's main use anyway)

08:43 <antocuni> "pypy as a debugging tool"? What do you mean?

08:45 <cfbolz> antocuni: you find problems in your module on pypy-hpy that you wouldn't see on CPytho if you still use PyObject* in some corner, no?

08:45 GianlucaRizzo has joined #hpy

08:45 <antocuni> hopefully these problems should be caught by the debug mode

08:46 <antocuni> but yes, pypy as a secondary debug mode is helpful as well

08:59 arigato has quit [Ping timeout: 240 seconds]

08:59 antocuni has quit [Ping timeout: 240 seconds]

09:06 arigato has joined #hpy

09:06 antocuni has joined #hpy

10:47 GianlucaRizzo has quit [Remote host closed the connection]

12:03 FFY00_ has joined #hpy

12:04 FFY00 has quit [Ping timeout: 276 seconds]

12:21 FFY00 has joined #hpy

12:23 FFY00_ has quit [Ping timeout: 252 seconds]

12:37 GianlucaRizzo has joined #hpy

12:41 FFY00_ has joined #hpy

12:42 FFY00 has quit [Ping timeout: 260 seconds]

12:52 GianlucaRizzo has quit [Remote host closed the connection]

13:37 GianlucaRizzo has joined #hpy

14:34 GianlucaRizzo has quit [Remote host closed the connection]

14:39 GianlucaRizzo has joined #hpy

14:43 GianlucaRizzo has quit [Remote host closed the connection]

15:10 GianlucaRizzo has joined #hpy

15:10 GianlucaRizzo has quit [Remote host closed the connection]

15:10 GianlucaRizzo has joined #hpy

16:05 GianlucaRizzo has quit [Remote host closed the connection]

16:08 GianlucaRizzo has joined #hpy

17:02 GianlucaRizzo has quit [Remote host closed the connection]

17:02 GianlucaRizzo has joined #hpy

17:14 GianlucaRizzo has quit [Ping timeout: 256 seconds]

17:50 GianlucaRizzo has joined #hpy

17:51 FFY00_ has quit [Ping timeout: 240 seconds]

17:57 FFY00 has joined #hpy

18:08 marvin_ has quit [Remote host closed the connection]

18:08 GianlucaRizzo has quit [Ping timeout: 246 seconds]

18:08 marvin has joined #hpy

20:16 GianlucaRizzo has joined #hpy

20:23 GianlucaRizzo has quit [Ping timeout: 276 seconds]

20:23 Gianluca_ has joined #hpy

20:34 Gianluca_ has quit [Remote host closed the connection]

20:35 GianlucaRizzo has joined #hpy

20:40 GianlucaRizzo has quit [Ping timeout: 276 seconds]

21:05 GianlucaRizzo has joined #hpy

21:14 GianlucaRizzo has quit [Ping timeout: 246 seconds]

21:57 GianlucaRizzo has joined #hpy

22:09 GianlucaRizzo has quit [Remote host closed the connection]

22:09 GianlucaRizzo has joined #hpy

22:16 GianlucaRizzo has quit [Ping timeout: 276 seconds]

22:38 pmp-p has quit [Ping timeout: 276 seconds]

22:45 GianlucaRizzo has joined #hpy

22:50 GianlucaRizzo has quit [Ping timeout: 246 seconds]

23:03 GianlucaRizzo has joined #hpy

23:03 GianlucaRizzo has quit [Remote host closed the connection]

23:04 GianlucaRizzo has joined #hpy

23:09 GianlucaRizzo has quit [Ping timeout: 260 seconds]

23:38 GianlucaRizzo has joined #hpy

23:45 GianlucaRizzo has quit [Ping timeout: 276 seconds]