#hpy on 2021-06-11 — irc logs at libera.irclog.whitequark.org

2021-05-27 19:57 antocuni changed the topic of #hpy to: https://hpyproject.org - https://github.com/hpyproject/hpy - IRC logs: https://libera.irclog.whitequark.org/hpy

03:22 jevinskie[m] has quit [Ping timeout: 244 seconds]

03:22 jboi has quit [Ping timeout: 244 seconds]

06:43 <fangerer> antocuni: thanks, I'll have a look today. Should we then close https://github.com/hpyproject/hpy/issues/181 ?

07:23 jboi has joined #hpy

07:24 jevinskie[m] has joined #hpy

07:46 <antocuni> fangerer: thank you, I linked the new issue from the old

07:47 <antocuni> I think we can leave the issue open. This way if someone wants to submit a PR for just BytesBuilder (once we decide the API to use), it will be able to close the issue, which feels good :)

16:15 <antocuni> FWIW, there is some interesting discussion about the string builder API going on on the capi-sig mailing list: https://mail.python.org/archives/list/capi-sig@python.org/thread/XHRT5DZZTMWOPWRYFLSILO2PFGLDX5ML/

16:23 <cfbolz> antocuni: nice, actually usefu

16:24 <cfbolz> Indeed, the import ancestors of utf-8 is a great point, also from PyPy's pov

16:24 <cfbolz> importance

17:49 <antocuni> cfbolz: what do you mean by "import ancestors"?

17:59 <cfbolz> 'importance'

18:01 <cfbolz> antocuni: The typos on phones are different than keyboard typos 🤣

18:10 <antocuni> the funny thing is that it was not obvious it was a typo, "import ancestors" could have a kind of sense in this context 😅

18:11 <antocuni> yes, I considered the UTF-8 case in my study: I couldn't find any existing code in which a native UTF-8 builder would be preferable than the existing PyUnicode_FromString or PyUnicode_DecodeUTF8

18:12 <antocuni> and indeed, this fact alone is worth of being mentioned, but I stupidly didn't think of that

18:12 <antocuni> also, the fact that I couldn't find this kind of code does not mean that it doesn't exist, of course

18:13 <antocuni> I'll try to summarize my findings and reply to the ML

18:13 <cfbolz> antocuni: ok, but note that in pypy's rpython code we use the utf8-builder a lot

18:15 <antocuni> uhm, that's also a valid point

18:15 <antocuni> maybe such C code does not exist because it's not possible/efficient on CPython

18:16 <cfbolz> yes

18:16 <antocuni> also, the only reasonable use case I can think of is when you know in advance the total length of the utf-8 builder: this way you can pre-allcoate the buffer, read() bytes into it a build the string

18:17 <antocuni> if you don't know the exact length, it's likely that you want to read() it into a temporary buffer and copy/compact it later. In such a case, PyUnicode_FromString is more than enough

18:17 <antocuni> but indeed, I can think of two very important use cases in which you DO know the length in advance: if you want to read a whole file and if you want to read a whole HTTP request

19:33 computerfarmer has joined #hpy

19:55 computerfarmer has quit [Quit: Konversation terminated!]

20:14 <Hodgestar> antocuni: I think there aren't any UTF-8 builders because it would make very little sense to do that in CPython as it is now, not because it's not a good idea.

20:16 <Hodgestar> antocuni: I think my view is a bit that the UCS-1,2,4 is an awkward implementation detail that the CPython API exposed. So we should make it easy to port such code to HPy, but we shouldn't consider it legacy support and not the direction that most implementations will take in the future.

20:17 <Hodgestar> Although there is a lot of weird stuff in this space -- e.g. I have no idea what Windows does (Unixes seem to treat many things as bytes and leave encoding up to applications, but my impression is that the story is more complicated on Windows).

21:13 <antocuni> Hodgestar: I agree with all you say

21:14 <antocuni> but we need to keep in mind that the primary goal of HPy is not to fix all the quirks of the CPython API

21:14 <antocuni> the primary goal is to be adopted by as many extensions as possible

21:14 <antocuni> so, we HAVE to provide an easy migration path for all the extensions which are using the current UCS-x API

21:15 <energizer> it would be nice to note "this is provided for backward compatibility, but new users are suggested to do xyz instead"

21:16 <antocuni> yes, this is doable and it looks like a good idea