cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | hacking on TLS is fun, way more fun than arguing over petty shit, turns out
<ctismer> mattip: Yes I will test it. But it will not be what I need. I am changing builtin types, like `PyCFunction_Type` or `PyType_Type`, adding some attribute. I think, `PyType_Ready` would already been called before I have a chance.
<ctismer> mattip: If that really should be supported, I think a special version of `PyDict_SetItem` that does not check would be easier. But I'm fine with re-implementing it all using heaptypes.
otisolsen70 has joined #pypy
<mattip> ctismer: It sounds strange to me to try to change attributes of built-in types, on app level if I try
<mattip> type.abc = 3
<mattip> I get a TypeError
<ctismer> mattip: It works on CPython since there is a difference between interpreter and C API. Funny that PyPy does not have that.
<ctismer> mattip: `PyDict_SetItem` does the trick. In Python, there is a layer in between that prevents it.
<mgorny> do you have any idea how to handle packages that refuse to support pypy? i'm talking of 'regex' here whose upstream basically said they won't ever support pypy because they rely on constant-width character encodings
<LarstiQ> mgorny: that looks like an implementation detail that CPython might also change?
<cfbolz> It's a bit of a weird reason
<cfbolz> We will happily give them a ucs-4 string if they ask from C
<cfbolz> Using the standard APIs
<mgorny> LarstiQ: yes, sounds like it
<mgorny> they could also have a pure python fallback or sth
<mgorny> i don't really know what to do at this point
<mgorny> going all over the place and telling people 'please don't use regex because it refuses to support pypy' doesn't feel right
<cfbolz> It's kind of the truth though?
<mgorny> and i don't know enough about pypy to try to convince him to support it
<cfbolz> mgorny: basically the cost is one extra copy
<cfbolz> Which you might or might not be prepared to pay
<mgorny> heh, _regex.c is 26.5k lines of code
<mgorny> i'm not surprised he doesn't want to maintain that
<mgorny> well, commented, let's see what happens
<cfbolz> mgorny: I don't think there's much more to do if even a pr doesn't help
<fijal> sorry to hear that though :/
otisolsen70 has quit [Ping timeout: 268 seconds]
otisolsen70 has joined #pypy
<mattip> there are projects that find the burden to support PyPy in CI too onerous
slav0nic has joined #pypy
<cfbolz> We could easily support mutating builtin classes from C
<cfbolz> Need to decide whether we want to
<cfbolz> Usually our rule of thumb is 'if cpython supports it, we do too'
<cfbolz> mattip: ^^
<mattip> we would allow modifying builtins from C? I think that is a bug in CPython not to be emulated
<cfbolz> mattip: people that do it get what they deserve ;-)
<mattip> right, including finding out it is not supported on PyPy
<cfbolz> (what ctismer is doing is quite a safe thing, btw, he adds new attributes only)
<mattip> :shrug: who knows what is safe. Today ctismer sets an attribute, tomorrow someone else sets the same attribute differently
<mattip> because of some bug/feature/problem
<mattip> now behaviour is inconsistent
<ctismer> mattip: cfbolz actually, this approach had very little impact. I could patch the new attr in without knowing much of the type. Creating a heaptype correctly and then using that is more involved (and sure, I was lazy...).
<mattip> I guess my argument is a bit of a reach, since the same could be said of any object in python
<fijal> cfbolz: pfff pfff pfff, may I moan a bit?
<mattip> but it strikes me as unexpected to modify builtins
<cfbolz> mattip: ruby for example just allows mutation of all builtin types
<ctismer> mattip: Actually, I did not expect that it would take four years and CPython still does not have `__signature__` for PyCFunction objects. I thought this was just a quich dive into Guido's time machine.
<cfbolz> fijal: always
<cfbolz> ctismer: yeah
<fijal> essentially http.py is full of that :/
<cfbolz> Haaaaah
<fijal> I think the whole file needs a review from performance perspective
<fijal> cfbolz: do you think you can review my branch?
<cfbolz> Can take a look in a bit
<cfbolz> fijal: does it help numbers?
<fijal> I think it's more or less ready
<fijal> yes, quite a bit
<fijal> (but not completely to non-cffi version, but it might be unrelated hard to check)
<fijal> like more cpython hacks might be the culprit here
<cfbolz> fijal: doesn't the JIT remove memoryview(x)?
<fijal> surely not
<fijal> you need to get an object that has a real address of something
<fijal> I'm pretty sure it hits some dont_look_inside with crazy JIT operations
* fijal checks
<fijal> _io module is newer than pypy2 6.0 right?
<fijal> maybe it does, actually
<fijal> b = bytearray(amt)
<fijal> return memoryview(b)[:n].tobytes()
<fijal> n = self.readinto(b)
<fijal> cfbolz: do you think this is a no-op?
<cfbolz> No idea
<fijal> I kinda doubt, but who knows
<cfbolz> fijal: I'm going to take a look in a bit
<fijal> ok, so I think next benchmark is really urlopen() - we should probably make it fast
<fijal> it's part of the standard lib
<fijal> cfbolz: it probably makes sense for the buildbot run to finish
<cfbolz> fijal: ok, ping me?
<fijal> sure
* cfbolz goes back to hacking the new parser
<fijal> cfbolz: I wonder if the following hack would not work
<fijal> memoryview(bytes) does pretty much nothing (does not create a raw address) unless asked for a raw address (in which case it fakes it)
<cfbolz> fijal: just for the weird usages in http?
<cfbolz> Wouldn't it be better to use some __pypy__ api?
<fijal> I don't know maybe
<fijal> do we have the *right* kind of buffer there somewhere?
<fijal> I don't know if just for the weird usages in http, it seems memoryview is used a lot in python
<fijal> there is 670 instances of word "memoryview" in lib-python
<fijal> are they all interesting? probably not
<cfbolz> fijal: I don't know, it's all a complicated mess :-(
<fijal> yep
<cfbolz> (like the parser)
Julian has joined #pypy
<fijal> I wonder if this is me, does not look like it
<fijal> cfbolz: I think it's good for a review? it's missing what's new
otisolsen70 has quit [Quit: Leaving]
<cfbolz> fijal: no, that test is flaky :-(
<fijal> "hey armin, do you want to talk about buffers" is probably not a good way to lure him from holiday ;-)
<cfbolz> haha
<cfbolz> where is he atm?
* fijal tries to think how to do a small test for urlopen
<fijal> he's at home, but Olivier is visiting
<fijal> so I presume they're running around
<fijal> well, "at home" = Sweden
<cfbolz> ok
<cfbolz> fijal: this is a branch off default?
<cfbolz> or py3.x?
<fijal> off 3.7
<cfbolz> (just for my reading ability)
<fijal> cool!
<cfbolz> fijal: I suspect that you saw in benchmarks that the chunking into smaller bytes is happening regularly, right?
<fijal> I would go with "no", but I'm not sure what you mean?
<cfbolz> fijal: I mean, the loops in ssl.py sendall really run several times and call send, right?
<cfbolz> not just a single send call
<cfbolz> fijal: anyway, I added a few comments, I think it looks like quite a reasonable approach
<cfbolz> fijal: I wonder whether the _cffi_backend changes need to go to default? I don't know what the cffi policy re python2 is
<cfbolz> (or whether this is a change that will never go to mainstream cffi)
<mattip> we could probably turn off cffi on default and no-one would notice
<mattip> gahh. OSError(int, str=None) can sometimes make subclasses of errors, but only if str is used
<mattip> and deep inside _ssl I called it on windows without a str, so the subclass was not used
<fijal> cfbolz: I think we need armin for that
<cfbolz> fijal: right
<cfbolz> fijal: it's independent of the PR anyway, I think
<fijal> yeah
<fijal> thanks!
<cfbolz> fijal: nice work!
<cfbolz> fijal: do you have some actual numbers how much it helps?
<fijal> yes-ish
<fijal> yes-ish because it's all measured on an aws instance that's not a reliable benchmark machine
<fijal> "it helps"
<fijal> I think overall it bridges the gap between rpython ssl and this by more than half
<fijal> what changes to rpython did I make?
* fijal checks
<cfbolz> fijal: look at the pr
<cfbolz> a newlines
<fijal> oh yeah
<cfbolz> and some logparser format thing
<fijal> and improved logparser formatting
<fijal> that one should go to default
<fijal> I think
<cfbolz> 👍
<fijal> (but I can make it a separate commit on default, it's not a big deal either way)
<fijal> ok
<fijal> k, so I'll make those changes and then merge it
<fijal> (probably tomorrow)
<fijal> and I'll try to write a benchmark for urlopen()
<cfbolz> fijal: doesn't need to be a blog post, but a tweet would be cool
<fijal> cool, I think this is going to continue to be honest?
<fijal> I'm writing another benchmark, but SSL benchmarks are a complete headache
<fijal> (like, setting SSL ready http server for tests is a problem)
<fijal> and on top of that you *probably* want an actual server, with all the headers etc, not a fake one
<cfbolz> fijal: and eg pypy.org files aren't big enough?
<fijal> pypy.org is fine, I guess
<cfbolz> fijal: or it's about writing not reading?
<fijal> how reliable is running benchmarks over internet though?
* fijal tries
<cfbolz> no idea :-/
<fijal> cfbolz: this is ~2x faster on cpython
<cfbolz> fijal: and your change helps?
<fijal> no, this is yet another problem
<cfbolz> ok
<fijal> let me write a proper benchmark
<fijal> but pypy2 6.0 is *a lot* faster
<fijal> like 20% slower than cpython
<fijal> more than 2x
<fijal> let's do maths....
<cfbolz> for me they are the same speed 😅
<cfbolz> I tried 3.8 though
<fijal> I mean look at POSIX time and see the user
<fijal> not the wall time, wall time makes no sense at all
<cfbolz> 'ah'
<cfbolz> Why not?
<fijal> because it largely depends how far python.org is from you?
<fijal> do you want the real program?
<fijal> that does the right thing
<cfbolz> Ok
<fijal> man I love coding for different python versions
<fijal> so for me - cpython 0.25, pypy2 6.0 - 0.34, pypy3.7 - 1.09
<cfbolz> right, I see
<fijal> ok, no wonder people think pypy is slow
<fijal> this is kind of a thing that people do all the time
<fijal> I would not be surprised if a median python program is "fetch s3 bucket, do some simple string operations, put it back in s3"
<fijal> I'm being dramatic, but only a tiny bit
<cfbolz> depends very much on the programmer
<cfbolz> the median program is "open pandas" :-P
<fijal> I think more on the task
<cfbolz> (which is even worse on pypy)
<fijal> yeah
<fijal> meh, we failed, time to go home
<cfbolz> "well"
<fijal> :-)
<fijal> I'll make a break, then create a bug report and probably just play games to be honest
<fijal> I will write some summary, so at least we have a bug report (and try to close the branch tomorrow)
<cfbolz> fijal: your branch doesn't help because it's only about writing, right?
<fijal> my branch does not help because it's yet-another-something-something
<fijal> I mean, pypy3.7 v7.35 is even worse, so it does help a bit
<fijal> (it's like 10% slower)
<cfbolz> fijal: pfff, for profiling a local server would be really useful
<fijal> cfbolz: yes, of course
<fijal> but I think you kinda need to set up nginx or something with a real certificate and that's a major nightmare, I think
<fijal> I find the SSL/TLS abstractions incredibly leaky
<cfbolz> fijal: I managed
<cfbolz> and now the wall clock time is 3x slower
<fijal> so my first suspicion would be that it does not have all the headers etc.
<fijal> but probably worth fixing anyway
<cfbolz> finding really weird code
<fijal> yep
<cfbolz> fijal: eg urllib/request.py look at add_handler
<cfbolz> it's called 10 times for every new connection
<fijal> pffff
<fijal> "what kind of dispatch do you want?"
<fijal> "all of them, in one function"
<fijal> bisect.insort?
<fijal> wtf
Julian has quit [Quit: leaving]
<cfbolz> fijal: seems be a big source of difference
<fijal> I can imagine
slav0nic has quit [Ping timeout: 268 seconds]