<belm0>
I'm trying to understand poor pypy performance on my app. What does the "quasi-immut" abort case represent?
<belm0>
> abort: force quasi-immut:1035
<Alex_Gaynor>
belm0: that's when something that pypy treats as "usually immutable" (e.g. the definition of a method on a class) is mutated after all
<belm0>
I see. I wonder what in practice might be causing that.
<belm0>
From the log summary I can't really tell if these quasi-immut are a high percentage of cases and affecting performance significantly. Perhaps something else here looks obviously bad, please let me know.
<belm0>
[2bb32a45a91] {jit-summary
<belm0>
Tracing: 64407.893718
<belm0>
Backend: 9240.464798
<belm0>
TOTAL: 205.807830
<belm0>
ops: 11062633
<belm0>
heapcached ops: 8882906
<belm0>
recorded ops: 3573779
<belm0>
calls: 491797
<belm0>
guards: 730499
<belm0>
opt ops: 156517
<belm0>
opt guards: 36837
<belm0>
opt guards shared:19847
<belm0>
forcings: 0
<belm0>
abort: trace too long:3
<belm0>
abort: compiling:0
<belm0>
abort: vable escape:0
<belm0>
I'm a long way off from pypy being viable, and doubtful there is some magic flag or reasonable app change that would get me there. (At least 100% slower, where I need something like 50% faster.)
<Alex_Gaynor>
Unfortunately I'm mostly an alumni pypy dev, I don't remember if/how to enable logging for quasi-immut
derpydoo has quit [Quit: derpydoo]
xcm has quit [K-Lined]
marvin_ has quit [K-Lined]
luckydonald has quit [K-Lined]
hexology has quit [K-Lined]
dustinm has quit [K-Lined]
lazka has quit [K-Lined]
hexology has joined #pypy
luckydonald has joined #pypy
xcm has joined #pypy
dustinm has joined #pypy
derpydoo has joined #pypy
Julian has joined #pypy
<belm0>
I wonder if a highly-async app is just not going to do well on PyPy, since there is no regular looping of code blocks. (E.g. if there is a loop with `await`, then code of other running coroutines will be intermixed arbitrarily with each iteration.)
Julian has quit [Quit: leaving]
jcea has quit [Ping timeout: 276 seconds]
derpydoo has quit [Quit: derpydoo]
belm0 has quit [Quit: Client closed]
Guest45 has joined #pypy
Guest45 is now known as belm0
<belm0>
(I probably won't stay connected to the channel, but I'll check the log in the next day or so to see if there's some comment on my ponderings.)
belm0 has quit [Quit: Client closed]
[Arfrever] has quit [Killed (NickServ (GHOST command used by [Arfreve1]))]
[Arfrever] has joined #pypy
otisolsen70 has joined #pypy
otisolsen70 has quit [Remote host closed the connection]
otisolsen70 has joined #pypy
<fijal>
this looks a bit bad? like 1/6th of all the traces are aborted because quasi-immut, that might be really bad
<fijal>
I think the idea is to run PYPYLOG=jit:log and then grep for quasi immut maybe there will be something
<cfbolz>
Agreed, something wrong here
Guest45 has joined #pypy
Guest45 has quit [Quit: Client closed]
belm0 has joined #pypy
belm0 has quit [Client Quit]
belm0 has joined #pypy
<belm0>
I don't know if there is info here to identify the source of quasi-immut, but:
<cfbolz>
belm0: so the "problem" is an `from mod import *` somewhere
<cfbolz>
of course it should not break that way and become super slow, that's a bug on our end. but maybe this hint helps to find out what exactly your code is doing
<cfbolz>
belm0: do they all look exactly like this, more or less?
jcea has joined #pypy
[Arfrever] has quit [Killed (NickServ (GHOST command used by [Arfreve1]))]
[Arfrever] has joined #pypy
mattip_ has joined #pypy
mattip__ has joined #pypy
mattip_ has quit [Ping timeout: 260 seconds]
derpydoo has joined #pypy
mattip__ has quit [Ping timeout: 272 seconds]
mattip__ has joined #pypy
belm0 has joined #pypy
<belm0>
cfbolz: just looking at a short log capture, there were 6 ABORT_FORCE_QUASIIMMUT. The first 4 were this import_all_from (our app doesn't have any such imports, but many dependencies of the app do). The final 2 were coming from Trio (async library that we use heavily):
<fijal>
I can probably have a look into that, i was looking a bunch at trio
<belm0>
those Trio lines are using `locals()`
<belm0>
`current_time()` is one of the most commonly called functions in the API...
<cfbolz>
yeah, locals() potentially sucks :-(
<fijal>
didn'
<fijal>
t we make a hack at some point?
<fijal>
like locals()[CONSTANT] being fine?
<fijal>
or is it doing things like enumerating over locals?
<fijal>
belm0: overall, if your library does stuff like this, it's impossible to optimize
<fijal>
trio is waaaaaay too magic for me
<belm0>
just this: `locals()[LOCALS_KEY_KI_PROTECTION_ENABLED] = True`
<fijal>
that's already worse than locals()['foo']
<fijal>
but maybe
<fijal>
but likely not
<fijal>
this is a horrendous way to write production software, I'm sorry :/
<fijal>
like it's trying to avoid an if right?
<fijal>
belm0: try to write an if?
<fijal>
if LOCALS_KEY_KI_PROTECTION_ENABLED = 'something': stuff = True else: stuff = False or whatever
<belm0>
I don't know-- it's not my library (though I contribute to Trio). I'll try removing the locals() as an experiment and report how it affects my app.
<belm0>
Mutations aside, what about the concern I raised about tracing JIT not being well suited for async code? Because heavily async code (especially where things are not homogenous) is going to be context switching between arbitrary coroutines. I wonder if the async http server that pypy devs benchmarked was highly homogenous, so it happened to work
<belm0>
out-- but async apps aren't necessarily that way in general.
<belm0>
So perhaps method-at-a-time JIT will be better for async.
<cfbolz>
belm0: we cannot answer this question right now, basically
<cfbolz>
right now there are more fundamental problems
<belm0>
but intuitively, jumping around to different code in arbitrary order is going to be the worst case situation for tracing JIT, correct?
<cfbolz>
that depends on how much happens between the jumps basically
<cfbolz>
if the answer is "almost nothing" then yes
<belm0>
I mean in our app, we have 1000+ concurrent tasks, and context switching usually less than every 1 ms, so perhaps futile for tracing JIT.
<cfbolz>
belm0: sorry, you really cannot say that without more data 🤷♀️
<cfbolz>
also, there is no python implementation with a method jit, is there?
<belm0>
my only data is that pypy is running > 100% slower than CPython. I'll check again how things look tomorrow after removing locals().
<belm0>
yes, Cinder and Pyston are method-at-a-time.
<cfbolz>
belm0: have you tried those?
<belm0>
I tried Cinder a few times, but haven't been able to get any benefit yet (it really needs full type hints and following strict rules). Pyston I just learned about and will try soon, but it may not be much faster than Python 3.11.
<belm0>
(also Cinder is x86_64 only I think, well be on Arm eventually)
<cfbolz>
pypy often has these step functions in performance: you solve something small, and then performance jumps up
<belm0>
I hope that's the case here. Thank you.
<cfbolz>
belm0: is there any trio app we could try to reproduce the problem?
<fijal>
belm0: generally you are not only not doing jitting, you are also actively trying and throwing away the code
<fijal>
so in this particular case, you can really say nothing about the performance
<fijal>
other than "consistently compiling assembler and then throwing it away without running is probably bad"
<fijal>
certainly nothing about tracing jits and async code working together, there are far deeper problems
<belm0>
cfbolz: do you mean reproduce the quasi-immut problem?
<belm0>
But I'll be confirming things tomorrow. I'll try changing the locals() key to a const, or else removing them altogether, and confirm if the remaining quasi-immut are only the import_all.
<fijal>
you need to remove the entire calls to locals()
<fijal>
I mean
<fijal>
local()['foo'] = 3
<fijal>
is really foo = 3
<fijal>
so do that and not the locals()[...]
derpydoo has quit [Ping timeout: 272 seconds]
<belm0>
I hope we can work something out with the Trio maintainer, but I'll just be making a local change for this experiment anyway.
<cfbolz>
I mean there must be a reason why it's done that way
<belm0>
Yes, probably a reason, but we'll review it. When I have some evidence like "yeah, this made my trio app X% faster", then I'll open a Trio bug and report.
<cfbolz>
ah, I see, the point is that the key is not a valid identifier
<cfbolz>
but yes, does not look very pypy friendly
<belm0>
so it's intending never to collide with the app's locals
<cfbolz>
I wonder why trio doesn't use contextvars for this, it seems kind of like that's a good use case for that
belm0 has quit [Quit: Client closed]
<nimaje>
iirc trio does locals()[<non-identifier>] = …, so that other code can't accidentally break it and iirc it was implemented before contextvars, but not sure
mattip__ has joined #pypy
<cfbolz>
nimaje: yeah, that's how I understood it too
mattip_ has joined #pypy
mattip__ has quit [Ping timeout: 240 seconds]
mattip_ has quit [Ping timeout: 250 seconds]
<cfbolz>
so the tutorial echo client example is indeed 2x slower
<cfbolz>
and I see this: virtualizables forced:543447
<cfbolz>
ok, this is at least partly our fault :-(
<cfbolz>
fijal: we make a moduledict for locals() :-(((((( on 3.x
<cfbolz>
which makes no sense of course
mattip_ has joined #pypy
mattip_ has quit [Ping timeout: 240 seconds]
<cfbolz>
belm0: so I'm working on a fix that should improve the situation, you could try a nightly build of tonight
mattip_ has joined #pypy
<fijal>
is there a reason why it does not use just a dict?
<cfbolz>
fijal: an instance dict will probably work well? it's a predictable set of keys per code object
Techcable has quit [Ping timeout: 252 seconds]
epony has quit [Ping timeout: 252 seconds]
epony has joined #pypy
Techcable has joined #pypy
<fijal>
cfbolz: any dict would work better than locals, arguably
<cfbolz>
yes yes
mattip_ has quit [Ping timeout: 272 seconds]
<cfbolz>
fijal: like a factor of 20x, it gets worse than without jit of course
<fijal>
in a sense it's too easy and too socially acceptable to do crazy shit like that in pypy
<cfbolz>
like what?
<cfbolz>
turning this into a module dict?
<cfbolz>
fijal: I actually think that's kind of acceptable, but it should come with a test_pypy_c test
<fijal>
to use locals() casually
<fijal>
to create class at runtime
<fijal>
to write things like len(x for x in l if something) to count items
<fijal>
etc.
<cfbolz>
that's a python property though, right? you wrote "pypy" above
<fijal>
yes, python, sorry
<cfbolz>
anyway, while I agree I fear we very much won't change python programmers :-(
<fijal>
no, that's not the point
<fijal>
the point is a bit to point out *why* python is slow
<fijal>
it's far more the culture than the fact that you can't optimize certain things
<fijal>
and I like to understand things like how culture shapes performance of the language
<fijal>
seems to be a much bigger factor than "how fast can you add integers"
<fijal>
like, would putting all this crap into a special "reflection" module help for example?
<cfbolz>
right
<fijal>
I think this is very underexplored
<fijal>
or at least I don't know of any exploration
<fijal>
the general intersection of coding/psychology seems to make a difference - like how to design APIs that they make sense?
derpydoo has joined #pypy
Dejan has quit [Read error: Connection reset by peer]
mattip_ has joined #pypy
mattip_ has quit [Read error: Connection reset by peer]