01:05
<
ctismer >
mattip: Yes I will test it. But it will not be what I need. I am changing builtin types, like `PyCFunction_Type` or `PyType_Type`, adding some attribute. I think, `PyType_Ready` would already been called before I have a chance.
01:08
<
ctismer >
mattip: If that really should be supported, I think a special version of `PyDict_SetItem` that does not check would be easier. But I'm fine with re-implementing it all using heaptypes.
06:48
otisolsen70 has joined #pypy
07:30
<
mattip >
ctismer: It sounds strange to me to try to change attributes of built-in types, on app level if I try
07:30
<
mattip >
type.abc = 3
07:30
<
mattip >
I get a TypeError
08:03
<
ctismer >
mattip: It works on CPython since there is a difference between interpreter and C API. Funny that PyPy does not have that.
08:05
<
ctismer >
mattip: `PyDict_SetItem` does the trick. In Python, there is a layer in between that prevents it.
08:13
<
mgorny >
do you have any idea how to handle packages that refuse to support pypy? i'm talking of 'regex' here whose upstream basically said they won't ever support pypy because they rely on constant-width character encodings
08:22
<
LarstiQ >
mgorny: that looks like an implementation detail that CPython might also change?
08:23
<
cfbolz >
It's a bit of a weird reason
08:23
<
cfbolz >
We will happily give them a ucs-4 string if they ask from C
08:24
<
cfbolz >
Using the standard APIs
08:24
<
mgorny >
LarstiQ: yes, sounds like it
08:24
<
mgorny >
they could also have a pure python fallback or sth
08:25
<
mgorny >
i don't really know what to do at this point
08:26
<
mgorny >
going all over the place and telling people 'please don't use regex because it refuses to support pypy' doesn't feel right
08:26
<
cfbolz >
It's kind of the truth though?
08:26
<
mgorny >
and i don't know enough about pypy to try to convince him to support it
08:27
<
cfbolz >
mgorny: basically the cost is one extra copy
08:27
<
cfbolz >
Which you might or might not be prepared to pay
08:28
<
mgorny >
heh, _regex.c is 26.5k lines of code
08:28
<
mgorny >
i'm not surprised he doesn't want to maintain that
08:30
<
mgorny >
well, commented, let's see what happens
08:43
<
cfbolz >
mgorny: I don't think there's much more to do if even a pr doesn't help
08:49
<
fijal >
sorry to hear that though :/
09:21
otisolsen70 has quit [Ping timeout: 268 seconds]
09:25
otisolsen70 has joined #pypy
09:40
<
mattip >
there are projects that find the burden to support PyPy in CI too onerous
09:59
slav0nic has joined #pypy
10:02
<
cfbolz >
We could easily support mutating builtin classes from C
10:02
<
cfbolz >
Need to decide whether we want to
10:03
<
cfbolz >
Usually our rule of thumb is 'if cpython supports it, we do too'
10:06
<
cfbolz >
mattip: ^^
10:06
<
mattip >
we would allow modifying builtins from C? I think that is a bug in CPython not to be emulated
10:09
<
cfbolz >
mattip: people that do it get what they deserve ;-)
10:10
<
mattip >
right, including finding out it is not supported on PyPy
10:12
<
cfbolz >
(what ctismer is doing is quite a safe thing, btw, he adds new attributes only)
10:13
<
mattip >
:shrug: who knows what is safe. Today ctismer sets an attribute, tomorrow someone else sets the same attribute differently
10:14
<
mattip >
because of some bug/feature/problem
10:14
<
mattip >
now behaviour is inconsistent
10:15
<
ctismer >
mattip: cfbolz actually, this approach had very little impact. I could patch the new attr in without knowing much of the type. Creating a heaptype correctly and then using that is more involved (and sure, I was lazy...).
10:16
<
mattip >
I guess my argument is a bit of a reach, since the same could be said of any object in python
10:17
<
fijal >
cfbolz: pfff pfff pfff, may I moan a bit?
10:17
<
mattip >
but it strikes me as unexpected to modify builtins
10:18
<
cfbolz >
mattip: ruby for example just allows mutation of all builtin types
10:18
<
ctismer >
mattip: Actually, I did not expect that it would take four years and CPython still does not have `__signature__` for PyCFunction objects. I thought this was just a quich dive into Guido's time machine.
10:18
<
cfbolz >
fijal: always
10:18
<
cfbolz >
ctismer: yeah
10:18
<
fijal >
essentially http.py is full of that :/
10:27
<
fijal >
I think the whole file needs a review from performance perspective
10:27
<
fijal >
cfbolz: do you think you can review my branch?
10:27
<
cfbolz >
Can take a look in a bit
10:27
<
cfbolz >
fijal: does it help numbers?
10:27
<
fijal >
I think it's more or less ready
10:27
<
fijal >
yes, quite a bit
10:27
<
fijal >
(but not completely to non-cffi version, but it might be unrelated hard to check)
10:28
<
fijal >
like more cpython hacks might be the culprit here
10:32
<
cfbolz >
fijal: doesn't the JIT remove memoryview(x)?
10:32
<
fijal >
you need to get an object that has a real address of something
10:32
<
fijal >
I'm pretty sure it hits some dont_look_inside with crazy JIT operations
10:33
<
fijal >
_io module is newer than pypy2 6.0 right?
10:34
<
fijal >
maybe it does, actually
10:36
<
fijal >
b = bytearray(amt)
10:36
<
fijal >
return memoryview(b)[:n].tobytes()
10:36
<
fijal >
n = self.readinto(b)
10:36
<
fijal >
cfbolz: do you think this is a no-op?
10:37
<
fijal >
I kinda doubt, but who knows
10:46
<
cfbolz >
fijal: I'm going to take a look in a bit
10:47
<
fijal >
ok, so I think next benchmark is really urlopen() - we should probably make it fast
10:47
<
fijal >
it's part of the standard lib
10:49
<
fijal >
cfbolz: it probably makes sense for the buildbot run to finish
11:04
<
cfbolz >
fijal: ok, ping me?
11:07
* cfbolz
goes back to hacking the new parser
11:29
<
fijal >
cfbolz: I wonder if the following hack would not work
11:30
<
fijal >
memoryview(bytes) does pretty much nothing (does not create a raw address) unless asked for a raw address (in which case it fakes it)
11:31
<
cfbolz >
fijal: just for the weird usages in http?
11:32
<
cfbolz >
Wouldn't it be better to use some
__pypy__ api?
11:34
<
fijal >
I don't know maybe
11:34
<
fijal >
do we have the
*right* kind of buffer there somewhere?
11:34
<
fijal >
I don't know if just for the weird usages in http, it seems memoryview is used a lot in python
11:35
<
fijal >
there is 670 instances of word "memoryview" in lib-python
11:35
<
fijal >
are they all interesting? probably not
11:45
<
cfbolz >
fijal: I don't know, it's all a complicated mess :-(
12:15
<
cfbolz >
(like the parser)
12:59
Julian has joined #pypy
13:47
<
fijal >
I wonder if this is me, does not look like it
13:49
<
fijal >
cfbolz: I think it's good for a review? it's missing what's new
13:49
otisolsen70 has quit [Quit: Leaving]
13:49
<
cfbolz >
fijal: no, that test is flaky :-(
13:50
<
fijal >
"hey armin, do you want to talk about buffers" is probably not a good way to lure him from holiday ;-)
13:50
<
cfbolz >
where is he atm?
13:50
* fijal
tries to think how to do a small test for urlopen
13:50
<
fijal >
he's at home, but Olivier is visiting
13:50
<
fijal >
so I presume they're running around
13:50
<
fijal >
well, "at home" = Sweden
13:50
<
cfbolz >
fijal: this is a branch off default?
13:52
<
cfbolz >
(just for my reading ability)
14:01
<
cfbolz >
fijal: I suspect that you saw in benchmarks that the chunking into smaller bytes is happening regularly, right?
14:01
<
fijal >
I would go with "no", but I'm not sure what you mean?
14:03
<
cfbolz >
fijal: I mean, the loops in ssl.py sendall really run several times and call send, right?
14:03
<
cfbolz >
not just a single send call
14:09
<
cfbolz >
fijal: anyway, I added a few comments, I think it looks like quite a reasonable approach
14:11
<
cfbolz >
fijal: I wonder whether the _cffi_backend changes need to go to default? I don't know what the cffi policy re python2 is
14:11
<
cfbolz >
(or whether this is a change that will never go to mainstream cffi)
14:12
<
mattip >
we could probably turn off cffi on default and no-one would notice
14:12
<
mattip >
gahh. OSError(int, str=None) can sometimes make subclasses of errors, but only if str is used
14:13
<
mattip >
and deep inside _ssl I called it on windows without a str, so the subclass was not used
14:15
<
fijal >
cfbolz: I think we need armin for that
14:15
<
cfbolz >
fijal: right
14:15
<
cfbolz >
fijal: it's independent of the PR anyway, I think
14:16
<
cfbolz >
fijal: nice work!
14:16
<
cfbolz >
fijal: do you have some actual numbers how much it helps?
14:17
<
fijal >
yes-ish because it's all measured on an aws instance that's not a reliable benchmark machine
14:17
<
fijal >
I think overall it bridges the gap between rpython ssl and this by more than half
14:17
<
fijal >
what changes to rpython did I make?
14:18
<
cfbolz >
fijal: look at the pr
14:18
<
cfbolz >
a newlines
14:18
<
cfbolz >
and some logparser format thing
14:18
<
fijal >
and improved logparser formatting
14:18
<
fijal >
that one should go to default
14:18
<
fijal >
(but I can make it a separate commit on default, it's not a big deal either way)
14:19
<
fijal >
k, so I'll make those changes and then merge it
14:19
<
fijal >
(probably tomorrow)
14:19
<
fijal >
and I'll try to write a benchmark for urlopen()
14:22
<
cfbolz >
fijal: doesn't need to be a blog post, but a tweet would be cool
14:22
<
fijal >
cool, I think this is going to continue to be honest?
14:22
<
fijal >
I'm writing another benchmark, but SSL benchmarks are a complete headache
14:22
<
fijal >
(like, setting SSL ready http server for tests is a problem)
14:23
<
fijal >
and on top of that you
*probably* want an actual server, with all the headers etc, not a fake one
14:23
<
cfbolz >
fijal: and eg pypy.org files aren't big enough?
14:24
<
fijal >
pypy.org is fine, I guess
14:24
<
cfbolz >
fijal: or it's about writing not reading?
14:24
<
fijal >
how reliable is running benchmarks over internet though?
14:25
<
cfbolz >
no idea :-/
14:36
<
fijal >
cfbolz: this is ~2x faster on cpython
14:36
<
cfbolz >
fijal: and your change helps?
14:37
<
fijal >
no, this is yet another problem
14:38
<
fijal >
let me write a proper benchmark
14:38
<
fijal >
but pypy2 6.0 is
*a lot* faster
14:38
<
fijal >
like 20% slower than cpython
14:39
<
fijal >
more than 2x
14:39
<
fijal >
let's do maths....
14:40
<
cfbolz >
for me they are the same speed 😅
14:45
<
cfbolz >
I tried 3.8 though
14:47
<
fijal >
I mean look at POSIX time and see the user
14:47
<
fijal >
not the wall time, wall time makes no sense at all
14:51
<
fijal >
because it largely depends how far python.org is from you?
14:51
<
fijal >
do you want the real program?
14:52
<
fijal >
that does the right thing
14:52
<
fijal >
man I love coding for different python versions
14:53
<
fijal >
so for me - cpython 0.25, pypy2 6.0 - 0.34, pypy3.7 - 1.09
14:54
<
cfbolz >
right, I see
14:55
<
fijal >
ok, no wonder people think pypy is slow
14:55
<
fijal >
this is kind of a thing that people do all the time
14:56
<
fijal >
I would not be surprised if a median python program is "fetch s3 bucket, do some simple string operations, put it back in s3"
14:56
<
fijal >
I'm being dramatic, but only a tiny bit
14:56
<
cfbolz >
depends very much on the programmer
14:56
<
cfbolz >
the median program is "open pandas" :-P
14:56
<
fijal >
I think more on the task
14:56
<
cfbolz >
(which is even worse on pypy)
14:57
<
fijal >
meh, we failed, time to go home
14:58
<
fijal >
I'll make a break, then create a bug report and probably just play games to be honest
14:58
<
fijal >
I will write some summary, so at least we have a bug report (and try to close the branch tomorrow)
14:58
<
cfbolz >
fijal: your branch doesn't help because it's only about writing, right?
15:00
<
fijal >
my branch does not help because it's yet-another-something-something
15:01
<
fijal >
I mean, pypy3.7 v7.35 is even worse, so it does help a bit
15:01
<
fijal >
(it's like 10% slower)
15:22
<
cfbolz >
fijal: pfff, for profiling a local server would be really useful
15:38
<
fijal >
cfbolz: yes, of course
15:38
<
fijal >
but I think you kinda need to set up nginx or something with a real certificate and that's a major nightmare, I think
15:39
<
fijal >
I find the SSL/TLS abstractions incredibly leaky
16:08
<
cfbolz >
fijal: I managed
16:09
<
cfbolz >
and now the wall clock time is 3x slower
16:09
<
fijal >
so my first suspicion would be that it does not have all the headers etc.
16:09
<
fijal >
but probably worth fixing anyway
16:36
<
cfbolz >
finding really weird code
16:39
<
cfbolz >
fijal: eg urllib/request.py look at add_handler
16:39
<
cfbolz >
it's called 10 times for every new connection
16:42
<
fijal >
"what kind of dispatch do you want?"
16:42
<
fijal >
"all of them, in one function"
16:43
<
fijal >
bisect.insort?
16:43
Julian has quit [Quit: leaving]
16:44
<
cfbolz >
fijal: seems be a big source of difference
16:44
<
fijal >
I can imagine
23:16
slav0nic has quit [Ping timeout: 268 seconds]