cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions
glyph has quit [Remote host closed the connection]
glyph has joined #pypy
jinsun has quit [Ping timeout: 260 seconds]
<cfbolz> korvo: hm, unclear I think
<cfbolz> korvo: eg we had some networkx benchmarks in some paper, I think, and the performance was fine
slav0nic has joined #pypy
vstinner has joined #pypy
<vstinner> hi. i'm working on two C API to import-export strings and integers: PEP 756 and 757. I would like to know if PyPy would benefit from these PEPs and if it would be easy to implement these APIs on PyPy?
<vstinner> is it the right place (IRC) to ask such question?
<cfbolz> vstinner: I think none of these APIs are efficiently implementable in pypy :-(
<cfbolz> our unicode strings are stored in neither of the three cpython-kinds, it's always utf-8
<nikolar> wait what are the 3 cpython-kinds
<vstinner> cfbolz: for PEP 756, I added PyUnicode_FORMAT_UTF8 for PyPy. PyPy should only implement this format and ignore/reject other formats
<cfbolz> vstinner: do we expect C extensions to request utf8 if that will always fail or be inefficient on cpython?
<vstinner> cfbolz: the PEP recommends to request the 4 formats: (UCS1 | UCS2 | UCS4 | UTF8)
<cfbolz> in any case, there's always a memory copy needed for us, we cannot hand out internal pointers
<vstinner> cfbolz: i expect PyUnicode_Export(PyUnicode_FORMAT_UTF8) to behave as PyUnicode_AsUTF8(). how is PyUnicode_AsUTF8() implemented in PyPy? does it pin memory?
<vstinner> cfbolz: is it possible to "temporarily" pin a Python str object in memory while the string is exported, and then unpin it when PyBuffer_Release() is called?
<nikolar> they really don't do utf8
<vstinner> nikolar: would you mind to elaborate?
<nikolar> python doesn't use utf8 internally
<nikolar> but one of ucs{1,2,4}
<vstinner> nikolar: CPython? CPython has a cache for PyUnicode_AsUTF8(): the string is only encoded once
<cfbolz> nikolar: pypy does it differently
<cfbolz> vstinner: no, there's no way to pin GC memory for a potentially indefinite amount of time, so PyUnicode_AsUTF8 copies
<cfbolz> (and there isn't even a cache so far :-( )
<vstinner> cfbolz: how does PyPy release PyUnicode_AsUTF8() release memory? how do you know when the caller is done with the copy?
<cfbolz> vstinner: ah, I see, I was wrong
<cfbolz> when we hand out a unicode object, we allocate a small C struct with various fields
<cfbolz> and that includes a buffer that is of one of the ucs1, ucs2, ucs4 kinds, and also a pointer to a C-allocated utf8 string, maybe? I don't understand the code completely yet
<vstinner> cfbolz: "in any case, there's always a memory copy needed for us, we cannot hand out internal pointers" that's because Python str objects can be moved in memory?
<cfbolz> yep. we can pin things under some circumstances, but not for an essentially indefinite length of time
<vstinner> cfbolz: the difference with PyUnicode_AsUTF8() is that PyUnicode_Export() requires to call PyBuffer_Release(): you can unpin memory or free memory in PyBuffer_Release(), well basically do whatever you want
<cfbolz> so the lifetime would be much more clearly defined from the way the api is used, that's good
<vstinner> cfbolz: it's unfortunate that a memory copy is needed, but it's not so bad. the API is fine with a copy. anyway, thanks for all details
<vstinner> by the way, a copy is also needed on CPython in some cases, for example if only UCS-4 is requested, and the string uses UCS-1 internally
<vstinner> i always forget that PyPy objects can move in memory, it prevents some "optimizations" (like giving a direct access to object contents)
<vstinner> cfbolz, nikolar : to export integers, https://peps.python.org/pep-0757/, is it the same issue? Python int objects can be moved in memory, and so memory cannot be pinned?
<cfbolz> yeah
<vstinner> again, in this API, PyLong_FreeDigitArray() must be called when the caller is done with the "export". if memory is copied, it's the place to release memory
<cfbolz> vstinner: the expected use of that api is if you're dealing with very big ints I expect, right?
<vstinner> cfbolz: from what i understood, the most common case are small integers: fit into a C long
<vstinner> cfbolz: that's why the PEP suggests adding a fast path using PyLong_AsLong() or a function like that
<vstinner> (don't call PyLong_AsDigitArray() for small numbers, only for "large" numbers)
<cfbolz> yeah, I would hope that most people call PyLong_AsLong first
<vstinner> cfbolz: so again, in PyPy, PyLong_AsDigitArray() will need to copy digits?
<cfbolz> or maybe even PyLong_AsSsize_t to work better on win64
<vstinner> it's not possible to pin a Python int object in memory until PyLong_FreeDigitArray() is called?
<vstinner> cfbolz: by the way, i added PyLong_AsUInt64() to Python 3.14 :-) it might be more convenient for such usage
<cfbolz> yeah, maybe
<cfbolz> vstinner: I suppose it's possible to pin a digit array, but it would really expose all the details of the internal representation
<cfbolz> and I understand the usecases even less than for unicode
<vstinner> cfbolz: the usage is to convert a Python int to a GMP integer. the API is an attempt to minimize the overhead during this conversion, avoid temporary buffer if possible
<vstinner> cfbolz: for CPython, it's an abstraction on top of PyLongObject since PyLongObject structure changed multiple time (Python 3.9, 3.12) and it may change again (for small integers)
<cfbolz> wouldn't an alternative be to have an API that gives you the bitlength and then have an api that gives you 64bits at a time?
<cfbolz> that sounds much less dependent on the internal representation
<vstinner> there is also PyLongWriter_Create() API to create a Python integer from an array of digits. it's better than the private undocumented _PyLong_New() which creates an incomplete/undefined object Python int object and expects the caller to fill digits. again, it's a thin abstraction
<vstinner> cfbolz: in CPython, PyLong_AsDigitArray() doesn't copy memory. so it's efficient :-)
<vstinner> cfbolz: for PyPy, you're free to define your own layout with 64-bit digits and use this layout in PyLong_AsDigitArray(). it doesn't have to be the same layout than the one used internally in PyPy
<cfbolz> ok, but then you get the digit array and need to interpret it in just the right way
<vstinner> each Python implementation exposes its own "native layout" for these APIs: https://peps.python.org/pep-0757/#layout-api
<vstinner> for CPython, it's the just the PyLongObject implementation, 15-bit or 30-bit digits
<cfbolz> 👍
<cfbolz> vstinner: what does the current code of gmpy2 look like?
<vstinner> cfbolz: there are code examples in the Benchmarks section: https://peps.python.org/pep-0757/#export-pylong-asdigitarray-with-gmpy2
<vstinner> cfbolz: that's the code patched for PEP 757 API
<cfbolz> ah, I see, right now they are poking at the internals
<vstinner> cfbolz: right now, gmpy2 uses PyLongObject.ob_digits. so it requires to have a PyObject, which is inefficient for PyPy, no?
<vstinner> PEP 757 should allow to avoid the creation of a PyObject (and tracking this PyObject) in cpyext
<cfbolz> we don't have ob_digits, I think
<vstinner> where GET_OB_DIGIT() gets PyLongObject.ob_digit
<vstinner> PyPy: typedef struct { PyObject_HEAD } PyLongObject;
<vstinner> oh, right. there is no .ob_digit member. so i guess that gmpy2 doesn't support PyPy currently
<cfbolz> yep
<vstinner> PEP 757 might make it easier for gmpy2, SAGE and python-FLINT to support PyPy (no idea if it would be enough :-p)
<cfbolz> but given that we don't have pinning (and very unlikely to get it) a much better api to use from pypy's pov is simply calling int.to_bytes ;-)
<cfbolz> and from_bytes
<vstinner> cfbolz: ah right, int.to_bytes/from_bytes should work
<cfbolz> should be mentioned in the pep, maybe
ruth2345345 has joined #pypy
<vstinner> my quick summary: https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/12 -- feel free to elaborate if i'm wrong :)
<vstinner> cfbolz: thanks again
jcea has joined #pypy
<korvo> cfbolz: Ah, sure. I'm thinking interp-level instead of user-level, implementing languages like Nix, Haskell, or Unlambda. Right now I *assume* that the G-machine approach is fastest: don't build a graph, instead emit bytecode that would build the graph.
<korvo> But there's always a possibility that the JIT prefers the graph actions to be direct, and I haven't looked at this in the better part of a decade.
<cfbolz> g-machine means it's a bytecode, right?
<cfbolz> haskell is tricky to implemented, last time I've done it is like 15 years ago
jcea has quit [Ping timeout: 246 seconds]
<korvo> Yeah, this is tricky enough that I'd want to implement it once as a library and reuse it in multiple interpreters.
Dejan has joined #pypy
vstinner has left #pypy [#pypy]
jinsun has joined #pypy
Dejan has quit [Quit: Leaving]
krono_ has joined #pypy
idnar_ has joined #pypy
atomizer has joined #pypy
glyph_ has joined #pypy
glyph has quit [*.net *.split]
atomizer_ has quit [*.net *.split]
krono has quit [*.net *.split]
idnar has quit [*.net *.split]
dbohdan[phone] has quit [*.net *.split]
glyph_ is now known as glyph
krono_ is now known as krono
idnar_ is now known as idnar
dbohdan[phone] has joined #pypy
slav0nic has quit [Ping timeout: 245 seconds]