<antocuni>
fangerer: thank you, I linked the new issue from the old
<antocuni>
I think we can leave the issue open. This way if someone wants to submit a PR for just BytesBuilder (once we decide the API to use), it will be able to close the issue, which feels good :)
<cfbolz>
Indeed, the import ancestors of utf-8 is a great point, also from PyPy's pov
<cfbolz>
importance
<antocuni>
cfbolz: what do you mean by "import ancestors"?
<cfbolz>
'importance'
<cfbolz>
antocuni: The typos on phones are different than keyboard typos 🤣
<antocuni>
the funny thing is that it was not obvious it was a typo, "import ancestors" could have a kind of sense in this context 😅
<antocuni>
yes, I considered the UTF-8 case in my study: I couldn't find any existing code in which a native UTF-8 builder would be preferable than the existing PyUnicode_FromString or PyUnicode_DecodeUTF8
<antocuni>
and indeed, this fact alone is worth of being mentioned, but I stupidly didn't think of that
<antocuni>
also, the fact that I couldn't find this kind of code does not mean that it doesn't exist, of course
<antocuni>
I'll try to summarize my findings and reply to the ML
<cfbolz>
antocuni: ok, but note that in pypy's rpython code we use the utf8-builder a lot
<antocuni>
uhm, that's also a valid point
<antocuni>
maybe such C code does not exist because it's not possible/efficient on CPython
<cfbolz>
yes
<antocuni>
also, the only reasonable use case I can think of is when you know in advance the total length of the utf-8 builder: this way you can pre-allcoate the buffer, read() bytes into it a build the string
<antocuni>
if you don't know the exact length, it's likely that you want to read() it into a temporary buffer and copy/compact it later. In such a case, PyUnicode_FromString is more than enough
<antocuni>
but indeed, I can think of two very important use cases in which you DO know the length in advance: if you want to read a whole file and if you want to read a whole HTTP request
computerfarmer has joined #hpy
computerfarmer has quit [Quit: Konversation terminated!]
<Hodgestar>
antocuni: I think there aren't any UTF-8 builders because it would make very little sense to do that in CPython as it is now, not because it's not a good idea.
<Hodgestar>
antocuni: I think my view is a bit that the UCS-1,2,4 is an awkward implementation detail that the CPython API exposed. So we should make it easy to port such code to HPy, but we shouldn't consider it legacy support and not the direction that most implementations will take in the future.
<Hodgestar>
Although there is a lot of weird stuff in this space -- e.g. I have no idea what Windows does (Unixes seem to treat many things as bytes and leave encoding up to applications, but my impression is that the story is more complicated on Windows).
<antocuni>
Hodgestar: I agree with all you say
<antocuni>
but we need to keep in mind that the primary goal of HPy is not to fix all the quirks of the CPython API
<antocuni>
the primary goal is to be adopted by as many extensions as possible
<antocuni>
so, we HAVE to provide an easy migration path for all the extensions which are using the current UCS-x API
<energizer>
it would be nice to note "this is provided for backward compatibility, but new users are suggested to do xyz instead"
<antocuni>
yes, this is doable and it looks like a good idea