#pypy on 2024-09-16 — irc logs at libera.irclog.whitequark.org

2022-11-09 10:48 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions

00:21 glyph has quit [Remote host closed the connection]

00:21 glyph has joined #pypy

05:27 jinsun has quit [Ping timeout: 260 seconds]

09:41 <cfbolz> korvo: hm, unclear I think

09:41 <cfbolz> korvo: eg we had some networkx benchmarks in some paper, I think, and the performance was fine

10:16 slav0nic has joined #pypy

11:24 vstinner has joined #pypy

11:25 <vstinner> hi. i'm working on two C API to import-export strings and integers: PEP 756 and 757. I would like to know if PyPy would benefit from these PEPs and if it would be easy to implement these APIs on PyPy?

11:25 <vstinner> is it the right place (IRC) to ask such question?

11:25 <vstinner> https://peps.python.org/pep-0756/ and https://peps.python.org/pep-0757/

11:32 <cfbolz> vstinner: I think none of these APIs are efficiently implementable in pypy :-(

11:32 <cfbolz> our unicode strings are stored in neither of the three cpython-kinds, it's always utf-8

11:33 <nikolar> wait what are the 3 cpython-kinds

11:33 <vstinner> cfbolz: for PEP 756, I added PyUnicode_FORMAT_UTF8 for PyPy. PyPy should only implement this format and ignore/reject other formats

11:33 <cfbolz> vstinner: do we expect C extensions to request utf8 if that will always fail or be inefficient on cpython?

11:34 <vstinner> cfbolz: the PEP recommends to request the 4 formats: (UCS1 | UCS2 | UCS4 | UTF8)

11:34 <cfbolz> in any case, there's always a memory copy needed for us, we cannot hand out internal pointers

11:35 <vstinner> cfbolz: i expect PyUnicode_Export(PyUnicode_FORMAT_UTF8) to behave as PyUnicode_AsUTF8(). how is PyUnicode_AsUTF8() implemented in PyPy? does it pin memory?

11:36 <vstinner> cfbolz: is it possible to "temporarily" pin a Python str object in memory while the string is exported, and then unpin it when PyBuffer_Release() is called?

11:37 <vstinner> nikolar: https://peps.python.org/pep-0756/#pep-393

11:38 <nikolar> they really don't do utf8

11:39 <vstinner> nikolar: would you mind to elaborate?

11:40 <nikolar> python doesn't use utf8 internally

11:40 <nikolar> but one of ucs{1,2,4}

11:40 <vstinner> nikolar: CPython? CPython has a cache for PyUnicode_AsUTF8(): the string is only encoded once

11:40 <cfbolz> nikolar: pypy does it differently

11:41 <cfbolz> vstinner: no, there's no way to pin GC memory for a potentially indefinite amount of time, so PyUnicode_AsUTF8 copies

11:41 <cfbolz> (and there isn't even a cache so far :-( )

11:42 <vstinner> cfbolz: how does PyPy release PyUnicode_AsUTF8() release memory? how do you know when the caller is done with the copy?

11:47 <cfbolz> vstinner: ah, I see, I was wrong

11:48 <cfbolz> when we hand out a unicode object, we allocate a small C struct with various fields

11:48 <cfbolz> and that includes a buffer that is of one of the ucs1, ucs2, ucs4 kinds, and also a pointer to a C-allocated utf8 string, maybe? I don't understand the code completely yet

11:51 <vstinner> cfbolz: "in any case, there's always a memory copy needed for us, we cannot hand out internal pointers" that's because Python str objects can be moved in memory?

11:52 <cfbolz> yep. we can pin things under some circumstances, but not for an essentially indefinite length of time

11:53 <vstinner> cfbolz: the difference with PyUnicode_AsUTF8() is that PyUnicode_Export() requires to call PyBuffer_Release(): you can unpin memory or free memory in PyBuffer_Release(), well basically do whatever you want

11:53 <cfbolz> so the lifetime would be much more clearly defined from the way the api is used, that's good

11:54 <vstinner> cfbolz: it's unfortunate that a memory copy is needed, but it's not so bad. the API is fine with a copy. anyway, thanks for all details

11:55 <vstinner> by the way, a copy is also needed on CPython in some cases, for example if only UCS-4 is requested, and the string uses UCS-1 internally

11:56 <vstinner> i always forget that PyPy objects can move in memory, it prevents some "optimizations" (like giving a direct access to object contents)

11:56 <vstinner> cfbolz, nikolar : to export integers, https://peps.python.org/pep-0757/, is it the same issue? Python int objects can be moved in memory, and so memory cannot be pinned?

11:57 <cfbolz> yeah

11:57 <vstinner> again, in this API, PyLong_FreeDigitArray() must be called when the caller is done with the "export". if memory is copied, it's the place to release memory

11:57 <cfbolz> vstinner: the expected use of that api is if you're dealing with very big ints I expect, right?

11:58 <vstinner> cfbolz: from what i understood, the most common case are small integers: fit into a C long

11:58 <vstinner> cfbolz: that's why the PEP suggests adding a fast path using PyLong_AsLong() or a function like that

11:59 <vstinner> (don't call PyLong_AsDigitArray() for small numbers, only for "large" numbers)

11:59 <cfbolz> yeah, I would hope that most people call PyLong_AsLong first

12:00 <vstinner> cfbolz: so again, in PyPy, PyLong_AsDigitArray() will need to copy digits?

12:00 <cfbolz> or maybe even PyLong_AsSsize_t to work better on win64

12:00 <vstinner> it's not possible to pin a Python int object in memory until PyLong_FreeDigitArray() is called?

12:01 <vstinner> cfbolz: by the way, i added PyLong_AsUInt64() to Python 3.14 :-) it might be more convenient for such usage

12:01 <cfbolz> yeah, maybe

12:01 <cfbolz> vstinner: I suppose it's possible to pin a digit array, but it would really expose all the details of the internal representation

12:01 <cfbolz> and I understand the usecases even less than for unicode

12:02 <vstinner> cfbolz: the usage is to convert a Python int to a GMP integer. the API is an attempt to minimize the overhead during this conversion, avoid temporary buffer if possible

12:03 <vstinner> cfbolz: for CPython, it's an abstraction on top of PyLongObject since PyLongObject structure changed multiple time (Python 3.9, 3.12) and it may change again (for small integers)

12:03 <cfbolz> wouldn't an alternative be to have an API that gives you the bitlength and then have an api that gives you 64bits at a time?

12:04 <cfbolz> that sounds much less dependent on the internal representation

12:04 <vstinner> there is also PyLongWriter_Create() API to create a Python integer from an array of digits. it's better than the private undocumented _PyLong_New() which creates an incomplete/undefined object Python int object and expects the caller to fill digits. again, it's a thin abstraction

12:05 <vstinner> cfbolz: in CPython, PyLong_AsDigitArray() doesn't copy memory. so it's efficient :-)

12:05 <vstinner> cfbolz: for PyPy, you're free to define your own layout with 64-bit digits and use this layout in PyLong_AsDigitArray(). it doesn't have to be the same layout than the one used internally in PyPy

12:05 <cfbolz> ok, but then you get the digit array and need to interpret it in just the right way

12:07 <vstinner> each Python implementation exposes its own "native layout" for these APIs: https://peps.python.org/pep-0757/#layout-api

12:07 <vstinner> for CPython, it's the just the PyLongObject implementation, 15-bit or 30-bit digits

12:09 <cfbolz> 👍

12:10 <cfbolz> vstinner: what does the current code of gmpy2 look like?

12:10 <vstinner> cfbolz: there are code examples in the Benchmarks section: https://peps.python.org/pep-0757/#export-pylong-asdigitarray-with-gmpy2

12:11 <vstinner> cfbolz: that's the code patched for PEP 757 API

12:11 <cfbolz> ah, I see, right now they are poking at the internals

12:11 <vstinner> cfbolz: right now, gmpy2 uses PyLongObject.ob_digits. so it requires to have a PyObject, which is inefficient for PyPy, no?

12:13 <vstinner> PEP 757 should allow to avoid the creation of a PyObject (and tracking this PyObject) in cpyext

12:13 <cfbolz> we don't have ob_digits, I think

12:14 <vstinner> the current code is: https://github.com/aleaxit/gmpy/blob/9177648c23f5c507e46b81c1eb7d527c79c96f00/src/gmpy2_convert_gmp.c#L60-L62

12:14 <vstinner> where GET_OB_DIGIT() gets PyLongObject.ob_digit

12:15 <vstinner> PyPy: typedef struct { PyObject_HEAD } PyLongObject;

12:15 <vstinner> oh, right. there is no .ob_digit member. so i guess that gmpy2 doesn't support PyPy currently

12:15 <cfbolz> yep

12:16 <vstinner> PEP 757 might make it easier for gmpy2, SAGE and python-FLINT to support PyPy (no idea if it would be enough :-p)

12:17 <cfbolz> but given that we don't have pinning (and very unlikely to get it) a much better api to use from pypy's pov is simply calling int.to_bytes ;-)

12:18 <cfbolz> and from_bytes

12:18 <vstinner> cfbolz: ah right, int.to_bytes/from_bytes should work

12:19 <cfbolz> should be mentioned in the pep, maybe

12:19 ruth2345345 has joined #pypy

12:24 <vstinner> my quick summary: https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/12 -- feel free to elaborate if i'm wrong :)

12:24 <vstinner> same for str: https://discuss.python.org/t/pep-756-c-api-add-pyunicode-export-and-pyunicode-import-c-functions/63891/9

12:24 <vstinner> cfbolz: thanks again

12:54 jcea has joined #pypy

14:59 <korvo> cfbolz: Ah, sure. I'm thinking interp-level instead of user-level, implementing languages like Nix, Haskell, or Unlambda. Right now I *assume* that the G-machine approach is fastest: don't build a graph, instead emit bytecode that would build the graph.

15:00 <korvo> But there's always a possibility that the JIT prefers the graph actions to be direct, and I haven't looked at this in the better part of a decade.

15:00 <cfbolz> g-machine means it's a bytecode, right?

15:01 <cfbolz> haskell is tricky to implemented, last time I've done it is like 15 years ago

15:28 jcea has quit [Ping timeout: 246 seconds]

15:50 <korvo> Yeah, this is tricky enough that I'd want to implement it once as a library and reuse it in multiple interpreters.

16:39 Dejan has joined #pypy

17:03 vstinner has left #pypy [#pypy]

18:29 jinsun has joined #pypy

19:04 Dejan has quit [Quit: Leaving]

19:32 krono_ has joined #pypy

19:32 idnar_ has joined #pypy

19:32 atomizer has joined #pypy

19:33 glyph_ has joined #pypy

19:40 glyph has quit [*.net *.split]

19:40 atomizer_ has quit [*.net *.split]

19:40 krono has quit [*.net *.split]

19:40 idnar has quit [*.net *.split]

19:40 dbohdan[phone] has quit [*.net *.split]

19:40 glyph_ is now known as glyph

19:40 krono_ is now known as krono

19:40 idnar_ is now known as idnar

19:49 dbohdan[phone] has joined #pypy

20:40 slav0nic has quit [Ping timeout: 245 seconds]