#pypy on 2021-11-02 — irc logs at libera.irclog.whitequark.org

2021-09-24 06:13 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | hacking on TLS is fun, way more fun than arguing over petty shit, turns out

01:05 <ctismer> mattip: Yes I will test it. But it will not be what I need. I am changing builtin types, like `PyCFunction_Type` or `PyType_Type`, adding some attribute. I think, `PyType_Ready` would already been called before I have a chance.

01:08 <ctismer> mattip: If that really should be supported, I think a special version of `PyDict_SetItem` that does not check would be easier. But I'm fine with re-implementing it all using heaptypes.

06:48 otisolsen70 has joined #pypy

07:30 <mattip> ctismer: It sounds strange to me to try to change attributes of built-in types, on app level if I try

07:30 <mattip> type.abc = 3

07:30 <mattip> I get a TypeError

08:03 <ctismer> mattip: It works on CPython since there is a difference between interpreter and C API. Funny that PyPy does not have that.

08:05 <ctismer> mattip: `PyDict_SetItem` does the trick. In Python, there is a layer in between that prevents it.

08:13 <mgorny> do you have any idea how to handle packages that refuse to support pypy? i'm talking of 'regex' here whose upstream basically said they won't ever support pypy because they rely on constant-width character encodings

08:20 <mgorny> https://bitbucket.org/mrabarnett/mrab-regex/src/hg/#rst-header-pypy

08:22 <LarstiQ> mgorny: that looks like an implementation detail that CPython might also change?

08:23 <cfbolz> It's a bit of a weird reason

08:23 <cfbolz> We will happily give them a ucs-4 string if they ask from C

08:24 <cfbolz> Using the standard APIs

08:24 <mgorny> LarstiQ: yes, sounds like it

08:24 <mgorny> they could also have a pure python fallback or sth

08:25 <mgorny> https://bitbucket.org/mrabarnett/mrab-regex/pull-requests/5#comment-252158870

08:25 <mgorny> i don't really know what to do at this point

08:26 <mgorny> going all over the place and telling people 'please don't use regex because it refuses to support pypy' doesn't feel right

08:26 <cfbolz> It's kind of the truth though?

08:26 <mgorny> and i don't know enough about pypy to try to convince him to support it

08:27 <cfbolz> mgorny: basically the cost is one extra copy

08:27 <cfbolz> Which you might or might not be prepared to pay

08:28 <mgorny> heh, _regex.c is 26.5k lines of code

08:28 <mgorny> i'm not surprised he doesn't want to maintain that

08:30 <mgorny> well, commented, let's see what happens

08:43 <cfbolz> mgorny: I don't think there's much more to do if even a pr doesn't help

08:49 <fijal> sorry to hear that though :/

09:21 otisolsen70 has quit [Ping timeout: 268 seconds]

09:25 otisolsen70 has joined #pypy

09:40 <mattip> there are projects that find the burden to support PyPy in CI too onerous

09:40 <mattip> https://foss.heptapod.net/pypy/pypy/-/issues/3337

09:48 <ctismer> mattip: I updated https://foss.heptapod.net/pypy/pypy/-/issues/3588

09:59 slav0nic has joined #pypy

10:02 <cfbolz> We could easily support mutating builtin classes from C

10:02 <cfbolz> Need to decide whether we want to

10:03 <cfbolz> Usually our rule of thumb is 'if cpython supports it, we do too'

10:06 <cfbolz> mattip: ^^

10:06 <mattip> we would allow modifying builtins from C? I think that is a bug in CPython not to be emulated

10:09 <cfbolz> mattip: people that do it get what they deserve ;-)

10:10 <mattip> right, including finding out it is not supported on PyPy

10:12 <cfbolz> (what ctismer is doing is quite a safe thing, btw, he adds new attributes only)

10:13 <mattip> :shrug: who knows what is safe. Today ctismer sets an attribute, tomorrow someone else sets the same attribute differently

10:14 <mattip> because of some bug/feature/problem

10:14 <mattip> now behaviour is inconsistent

10:15 <ctismer> mattip: cfbolz actually, this approach had very little impact. I could patch the new attr in without knowing much of the type. Creating a heaptype correctly and then using that is more involved (and sure, I was lazy...).

10:16 <mattip> I guess my argument is a bit of a reach, since the same could be said of any object in python

10:17 <fijal> cfbolz: pfff pfff pfff, may I moan a bit?

10:17 <mattip> but it strikes me as unexpected to modify builtins

10:18 <cfbolz> mattip: ruby for example just allows mutation of all builtin types

10:18 <ctismer> mattip: Actually, I did not expect that it would take four years and CPython still does not have `__signature__` for PyCFunction objects. I thought this was just a quich dive into Guido's time machine.

10:18 <cfbolz> fijal: always

10:18 <cfbolz> ctismer: yeah

10:18 <fijal> https://www.irccloud.com/pastebin/DCTs5J0s/

10:18 <fijal> essentially http.py is full of that :/

10:19 <cfbolz> Haaaaah

10:27 <fijal> I think the whole file needs a review from performance perspective

10:27 <fijal> cfbolz: do you think you can review my branch?

10:27 <cfbolz> Can take a look in a bit

10:27 <cfbolz> fijal: does it help numbers?

10:27 <fijal> I think it's more or less ready

10:27 <fijal> yes, quite a bit

10:27 <fijal> (but not completely to non-cffi version, but it might be unrelated hard to check)

10:28 <fijal> like more cpython hacks might be the culprit here

10:32 <cfbolz> fijal: doesn't the JIT remove memoryview(x)?

10:32 <fijal> surely not

10:32 <fijal> you need to get an object that has a real address of something

10:32 <fijal> I'm pretty sure it hits some dont_look_inside with crazy JIT operations

10:32 * fijal checks

10:33 <fijal> _io module is newer than pypy2 6.0 right?

10:34 <fijal> maybe it does, actually

10:36 <fijal> b = bytearray(amt)

10:36 <fijal> return memoryview(b)[:n].tobytes()

10:36 <fijal> n = self.readinto(b)

10:36 <fijal> cfbolz: do you think this is a no-op?

10:37 <cfbolz> No idea

10:37 <fijal> I kinda doubt, but who knows

10:46 <cfbolz> fijal: I'm going to take a look in a bit

10:47 <fijal> ok, so I think next benchmark is really urlopen() - we should probably make it fast

10:47 <fijal> it's part of the standard lib

10:49 <fijal> cfbolz: it probably makes sense for the buildbot run to finish

11:04 <cfbolz> fijal: ok, ping me?

11:07 <fijal> sure

11:07 * cfbolz goes back to hacking the new parser

11:29 <fijal> cfbolz: I wonder if the following hack would not work

11:30 <fijal> memoryview(bytes) does pretty much nothing (does not create a raw address) unless asked for a raw address (in which case it fakes it)

11:31 <cfbolz> fijal: just for the weird usages in http?

11:32 <cfbolz> Wouldn't it be better to use some __pypy__ api?

11:34 <fijal> I don't know maybe

11:34 <fijal> do we have the *right* kind of buffer there somewhere?

11:34 <fijal> I don't know if just for the weird usages in http, it seems memoryview is used a lot in python

11:35 <fijal> there is 670 instances of word "memoryview" in lib-python

11:35 <fijal> are they all interesting? probably not

11:45 <cfbolz> fijal: I don't know, it's all a complicated mess :-(

11:56 <fijal> yep

12:15 <cfbolz> (like the parser)

12:59 Julian has joined #pypy

13:47 <fijal> https://buildbot.pypy.org/summary/longrepr?testname=unmodified&builder=pypy-c-jit-linux-x86-64&build=7997&mod=lib-python.3.test.test_datetime

13:47 <fijal> I wonder if this is me, does not look like it

13:49 <fijal> cfbolz: I think it's good for a review? it's missing what's new

13:49 otisolsen70 has quit [Quit: Leaving]

13:49 <cfbolz> fijal: no, that test is flaky :-(

13:50 <fijal> "hey armin, do you want to talk about buffers" is probably not a good way to lure him from holiday ;-)

13:50 <cfbolz> haha

13:50 <cfbolz> where is he atm?

13:50 * fijal tries to think how to do a small test for urlopen

13:50 <fijal> he's at home, but Olivier is visiting

13:50 <fijal> so I presume they're running around

13:50 <fijal> well, "at home" = Sweden

13:50 <cfbolz> ok

13:50 <cfbolz> fijal: this is a branch off default?

13:50 <cfbolz> or py3.x?

13:50 <fijal> off 3.7

13:52 <cfbolz> fijal: https://foss.heptapod.net/pypy/pypy/-/merge_requests/842

13:52 <cfbolz> (just for my reading ability)

13:58 <fijal> cool!

14:01 <cfbolz> fijal: I suspect that you saw in benchmarks that the chunking into smaller bytes is happening regularly, right?

14:01 <fijal> I would go with "no", but I'm not sure what you mean?

14:03 <cfbolz> fijal: I mean, the loops in ssl.py sendall really run several times and call send, right?

14:03 <cfbolz> not just a single send call

14:09 <cfbolz> fijal: anyway, I added a few comments, I think it looks like quite a reasonable approach

14:11 <cfbolz> fijal: I wonder whether the _cffi_backend changes need to go to default? I don't know what the cffi policy re python2 is

14:11 <cfbolz> (or whether this is a change that will never go to mainstream cffi)

14:12 <mattip> we could probably turn off cffi on default and no-one would notice

14:12 <mattip> gahh. OSError(int, str=None) can sometimes make subclasses of errors, but only if str is used

14:13 <mattip> and deep inside _ssl I called it on windows without a str, so the subclass was not used

14:15 <fijal> cfbolz: I think we need armin for that

14:15 <cfbolz> fijal: right

14:15 <cfbolz> fijal: it's independent of the PR anyway, I think

14:15 <fijal> yeah

14:15 <fijal> thanks!

14:16 <cfbolz> fijal: nice work!

14:16 <cfbolz> fijal: do you have some actual numbers how much it helps?

14:17 <fijal> yes-ish

14:17 <fijal> yes-ish because it's all measured on an aws instance that's not a reliable benchmark machine

14:17 <fijal> "it helps"

14:17 <fijal> I think overall it bridges the gap between rpython ssl and this by more than half

14:17 <fijal> what changes to rpython did I make?

14:18 * fijal checks

14:18 <cfbolz> fijal: look at the pr

14:18 <cfbolz> a newlines

14:18 <fijal> oh yeah

14:18 <cfbolz> and some logparser format thing

14:18 <fijal> and improved logparser formatting

14:18 <fijal> that one should go to default

14:18 <fijal> I think

14:18 <cfbolz> 👍

14:18 <fijal> (but I can make it a separate commit on default, it's not a big deal either way)

14:18 <fijal> ok

14:19 <fijal> k, so I'll make those changes and then merge it

14:19 <fijal> (probably tomorrow)

14:19 <fijal> and I'll try to write a benchmark for urlopen()

14:22 <cfbolz> fijal: doesn't need to be a blog post, but a tweet would be cool

14:22 <fijal> cool, I think this is going to continue to be honest?

14:22 <fijal> I'm writing another benchmark, but SSL benchmarks are a complete headache

14:22 <fijal> (like, setting SSL ready http server for tests is a problem)

14:23 <fijal> and on top of that you *probably* want an actual server, with all the headers etc, not a fake one

14:23 <cfbolz> fijal: and eg pypy.org files aren't big enough?

14:24 <fijal> pypy.org is fine, I guess

14:24 <cfbolz> fijal: or it's about writing not reading?

14:24 <fijal> how reliable is running benchmarks over internet though?

14:24 * fijal tries

14:25 <cfbolz> no idea :-/

14:36 <fijal> https://www.irccloud.com/pastebin/JinScZ30/

14:36 <fijal> cfbolz: this is ~2x faster on cpython

14:36 <cfbolz> fijal: and your change helps?

14:37 <fijal> no, this is yet another problem

14:37 <cfbolz> ok

14:38 <fijal> let me write a proper benchmark

14:38 <fijal> but pypy2 6.0 is *a lot* faster

14:38 <fijal> like 20% slower than cpython

14:39 <fijal> more than 2x

14:39 <fijal> let's do maths....

14:40 <cfbolz> for me they are the same speed 😅

14:45 <cfbolz> I tried 3.8 though

14:47 <fijal> I mean look at POSIX time and see the user

14:47 <fijal> not the wall time, wall time makes no sense at all

14:49 <cfbolz> 'ah'

14:49 <cfbolz> Why not?

14:51 <fijal> because it largely depends how far python.org is from you?

14:51 <fijal> do you want the real program?

14:52 <fijal> that does the right thing

14:52 <cfbolz> Ok

14:52 <fijal> https://www.irccloud.com/pastebin/b52IAkyV/

14:52 <fijal> man I love coding for different python versions

14:53 <fijal> so for me - cpython 0.25, pypy2 6.0 - 0.34, pypy3.7 - 1.09

14:54 <cfbolz> right, I see

14:55 <fijal> ok, no wonder people think pypy is slow

14:55 <fijal> this is kind of a thing that people do all the time

14:56 <fijal> I would not be surprised if a median python program is "fetch s3 bucket, do some simple string operations, put it back in s3"

14:56 <fijal> I'm being dramatic, but only a tiny bit

14:56 <cfbolz> depends very much on the programmer

14:56 <cfbolz> the median program is "open pandas" :-P

14:56 <fijal> I think more on the task

14:56 <cfbolz> (which is even worse on pypy)

14:56 <fijal> yeah

14:57 <fijal> meh, we failed, time to go home

14:57 <cfbolz> "well"

14:58 <fijal> :-)

14:58 <fijal> I'll make a break, then create a bug report and probably just play games to be honest

14:58 <fijal> I will write some summary, so at least we have a bug report (and try to close the branch tomorrow)

14:58 <cfbolz> fijal: your branch doesn't help because it's only about writing, right?

15:00 <fijal> my branch does not help because it's yet-another-something-something

15:01 <fijal> I mean, pypy3.7 v7.35 is even worse, so it does help a bit

15:01 <fijal> (it's like 10% slower)

15:22 <cfbolz> fijal: pfff, for profiling a local server would be really useful

15:38 <fijal> cfbolz: yes, of course

15:38 <fijal> but I think you kinda need to set up nginx or something with a real certificate and that's a major nightmare, I think

15:39 <fijal> I find the SSL/TLS abstractions incredibly leaky

16:08 <cfbolz> fijal: I managed

16:08 <cfbolz> https://gist.githubusercontent.com/dergachev/7028596/raw/abb8bd2b53501ff7125b93e8d975e77ffd756bf1/simple-https-server.py

16:09 <cfbolz> and now the wall clock time is 3x slower

16:09 <fijal> so my first suspicion would be that it does not have all the headers etc.

16:09 <fijal> but probably worth fixing anyway

16:36 <cfbolz> finding really weird code

16:37 <fijal> yep

16:39 <cfbolz> fijal: eg urllib/request.py look at add_handler

16:39 <cfbolz> it's called 10 times for every new connection

16:42 <fijal> pffff

16:42 <fijal> "what kind of dispatch do you want?"

16:42 <fijal> "all of them, in one function"

16:43 <fijal> bisect.insort?

16:43 <fijal> wtf

16:43 Julian has quit [Quit: leaving]

16:44 <cfbolz> fijal: seems be a big source of difference

16:44 <fijal> I can imagine

18:04 <fijal> https://www.irccloud.com/pastebin/hp0tJ1rd/

23:16 slav0nic has quit [Ping timeout: 268 seconds]