cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | Matti: I made a bit of progress, the tests now only segfault towards the end
Guest45 has joined #pypy
Guest45 is now known as belm0
<belm0> I'm trying to understand poor pypy performance on my app.  What does the "quasi-immut" abort case represent?
<belm0> > abort: force quasi-immut:1035
<Alex_Gaynor> belm0: that's when something that pypy treats as "usually immutable" (e.g. the definition of a method on a class) is mutated after all
<belm0> I see.  I wonder what in practice might be causing that.
<belm0> From the log summary I can't really tell if these quasi-immut are a high percentage of cases and affecting performance significantly.  Perhaps something else here looks obviously bad, please let me know.
<belm0> [2bb32a45a91] {jit-summary
<belm0> Tracing: 64407.893718
<belm0> Backend: 9240.464798
<belm0> TOTAL: 205.807830
<belm0> ops: 11062633
<belm0> heapcached ops: 8882906
<belm0> recorded ops: 3573779
<belm0>   calls: 491797
<belm0> guards: 730499
<belm0> opt ops: 156517
<belm0> opt guards: 36837
<belm0> opt guards shared:19847
<belm0> forcings: 0
<belm0> abort: trace too long:3
<belm0> abort: compiling:0
<belm0> abort: vable escape:0
<belm0> I'm a long way off from pypy being viable, and doubtful there is some magic flag or reasonable app change that would get me there.  (At least 100% slower, where I need something like 50% faster.)
<Alex_Gaynor> Unfortunately I'm mostly an alumni pypy dev, I don't remember if/how to enable logging for quasi-immut
derpydoo has quit [Quit: derpydoo]
xcm has quit [K-Lined]
marvin_ has quit [K-Lined]
luckydonald has quit [K-Lined]
hexology has quit [K-Lined]
dustinm has quit [K-Lined]
lazka has quit [K-Lined]
hexology has joined #pypy
luckydonald has joined #pypy
xcm has joined #pypy
dustinm has joined #pypy
derpydoo has joined #pypy
Julian has joined #pypy
<belm0> I wonder if a highly-async app is just not going to do well on PyPy, since there is no regular looping of code blocks.  (E.g. if there is a loop with `await`, then code of other running coroutines will be intermixed arbitrarily with each iteration.)
Julian has quit [Quit: leaving]
jcea has quit [Ping timeout: 276 seconds]
derpydoo has quit [Quit: derpydoo]
belm0 has quit [Quit: Client closed]
Guest45 has joined #pypy
Guest45 is now known as belm0
<belm0> (I probably won't stay connected to the channel, but I'll check the log in the next day or so to see if there's some comment on my ponderings.)
belm0 has quit [Quit: Client closed]
[Arfrever] has quit [Killed (NickServ (GHOST command used by [Arfreve1]))]
[Arfrever] has joined #pypy
otisolsen70 has joined #pypy
otisolsen70 has quit [Remote host closed the connection]
otisolsen70 has joined #pypy
<fijal> this looks a bit bad? like 1/6th of all the traces are aborted because quasi-immut, that might be really bad
<fijal> I think the idea is to run PYPYLOG=jit:log and then grep for quasi immut maybe there will be something
<cfbolz> Agreed, something wrong here
Guest45 has joined #pypy
Guest45 has quit [Quit: Client closed]
belm0 has joined #pypy
belm0 has quit [Client Quit]
belm0 has joined #pypy
<belm0> I don't know if there is info here to identify the source of quasi-immut, but:
<belm0> ```
<belm0> [9174c68305] {jit-tracing
<belm0>   import_all_from;<builtin>/interpreter/pyopcode.py:2-20~#124 FOR_ITER
<belm0>   ...
<belm0>   import_all_from;<builtin>/interpreter/pyopcode.py:2-39~#220 STORE_SUBSCR
<belm0> [9174dce13f] {jit-invalidate-quasi-immutable
<belm0> fieldname <FieldP pypy.objspace.std.celldict.ModuleDictStrategy.mutate_version 16> invalidated 0
<belm0> d362b] jit-invalidate-quasi-immutable}
<belm0> ~~~ ABORTING TRACING ABORT_FORCE_QUASIIMMUT
<belm0> ec009] jit-tracing}
<belm0> ```
belm0 has quit [Quit: Client closed]
<cfbolz> yes, that helps a lot
<cfbolz> belm0: so the "problem" is an `from mod import *` somewhere
<cfbolz> of course it should not break that way and become super slow, that's a bug on our end. but maybe this hint helps to find out what exactly your code is doing
<cfbolz> belm0: do they all look exactly like this, more or less?
jcea has joined #pypy
[Arfrever] has quit [Killed (NickServ (GHOST command used by [Arfreve1]))]
[Arfrever] has joined #pypy
mattip_ has joined #pypy
mattip__ has joined #pypy
mattip_ has quit [Ping timeout: 260 seconds]
derpydoo has joined #pypy
mattip__ has quit [Ping timeout: 272 seconds]
mattip__ has joined #pypy
belm0 has joined #pypy
<belm0> cfbolz: just looking at a short log capture, there were 6 ABORT_FORCE_QUASIIMMUT.  The first 4 were this import_all_from (our app doesn't have any such imports, but many dependencies of the app do).  The final 2 were coming from Trio (async library that we use heavily):
<belm0> current_time;/venv_pypy3.8/lib/pypy3.8/site-packages/trio/_core/_generated_run.py:41-51~#8 STORE_SUBSCR
<belm0> wrapper;/venv_pypy3.8/lib/pypy3.8/site-packages/trio/_core/_ki.py:156-158~#4 CALL_FUNCTION
<belm0> I'll try a longer capture tomorrow.
<cfbolz> belm0: thanks, that's super helpful. could you maybe open an issue to track this?
<fijal> I can probably have a look into that, i was looking a bunch at trio
<belm0> those Trio lines are using `locals()`
<belm0> `current_time()` is one of the most commonly called functions in the API...
<cfbolz> yeah, locals() potentially sucks :-(
<fijal> didn'
<fijal> t we make a hack at some point?
<fijal> like locals()[CONSTANT] being fine?
<fijal> or is it doing things like enumerating over locals?
<fijal> belm0: overall, if your library does stuff like this, it's impossible to optimize
<fijal> trio is waaaaaay too magic for me
<belm0> just this:  `locals()[LOCALS_KEY_KI_PROTECTION_ENABLED] = True`
<fijal> that's already worse than locals()['foo']
<fijal> but maybe
<fijal> but likely not
<fijal> this is a horrendous way to write production software, I'm sorry :/
<fijal> like it's trying to avoid an if right?
<fijal> belm0: try to write an if?
<fijal> if LOCALS_KEY_KI_PROTECTION_ENABLED = 'something': stuff = True else: stuff = False or whatever
<belm0> I don't know-- it's not my library (though I contribute to Trio).   I'll try removing the locals() as an experiment and report how it affects my app.
<belm0> Mutations aside, what about the concern I raised about tracing JIT not being well suited for async code?  Because heavily async code (especially where things are not homogenous) is going to be context switching between arbitrary coroutines.  I wonder if the async http server that pypy devs benchmarked was highly homogenous, so it happened to work
<belm0> out-- but async apps aren't necessarily that way in general.
<belm0> So perhaps method-at-a-time JIT will be better for async.
<cfbolz> belm0: we cannot answer this question right now, basically
<cfbolz> right now there are more fundamental problems
<belm0> but intuitively, jumping around to different code in arbitrary order is going to be the worst case situation for tracing JIT, correct?
<cfbolz> that depends on how much happens between the jumps basically
<cfbolz> if the answer is "almost nothing" then yes
<belm0> I mean in our app, we have 1000+ concurrent tasks, and context switching usually less than every 1 ms, so perhaps futile for tracing JIT.
<cfbolz> belm0: sorry, you really cannot say that without more data 🤷‍♀️
<cfbolz> also, there is no python implementation with a method jit, is there?
<belm0> my only data is that pypy is running > 100% slower than CPython.  I'll check again how things look tomorrow after removing locals().
<belm0> yes, Cinder and Pyston are method-at-a-time.
<cfbolz> belm0: have you tried those?
<belm0> I tried Cinder a few times, but haven't been able to get any benefit yet (it really needs full type hints and following strict rules).  Pyston I just learned about and will try soon, but it may not be much faster than Python 3.11.
<belm0> (also Cinder is x86_64 only I think, well be on Arm eventually)
<cfbolz> pypy often has these step functions in performance: you solve something small, and then performance jumps up
<belm0> I hope that's the case here.  Thank you.
<cfbolz> belm0: is there any trio app we could try to reproduce the problem?
<fijal> belm0: generally you are not only not doing jitting, you are also actively trying and throwing away the code
<fijal> so in this particular case, you can really say nothing about the performance
<fijal> other than "consistently compiling assembler and then throwing it away without running is probably bad"
<fijal> certainly nothing about tracing jits and async code working together, there are far deeper problems
<belm0> cfbolz: do you mean reproduce the quasi-immut problem?
<cfbolz> yep
<belm0> I guess any trio app will be using current_time().  Perhaps an example client or server of trio-websocket (https://github.com/HyperionGray/trio-websocket).
<belm0> But I'll be confirming things tomorrow.  I'll try changing the locals() key to a const, or else removing them altogether, and confirm if the remaining quasi-immut are only the import_all.
<fijal> you need to remove the entire calls to locals()
<fijal> I mean
<fijal> local()['foo'] = 3
<fijal> is really foo = 3
<fijal> so do that and not the locals()[...]
derpydoo has quit [Ping timeout: 272 seconds]
<belm0> I hope we can work something out with the Trio maintainer, but I'll just be making a local change for this experiment anyway.
<cfbolz> I mean there must be a reason why it's done that way
<belm0> Yes, probably a reason, but we'll review it.  When I have some evidence like "yeah, this made my trio app X% faster", then I'll open a Trio bug and report.
<cfbolz> ah, I see, the point is that the key is not a valid identifier
<cfbolz> but yes, does not look very pypy friendly
<belm0> so it's intending never to collide with the app's locals
<belm0> issue for the import_all:  https://foss.heptapod.net/pypy/pypy/-/issues/3835
mattip__ has quit [Ping timeout: 240 seconds]
<cfbolz> I wonder why trio doesn't use contextvars for this, it seems kind of like that's a good use case for that
belm0 has quit [Quit: Client closed]
<nimaje> iirc trio does locals()[<non-identifier>] = …, so that other code can't accidentally break it and iirc it was implemented before contextvars, but not sure
mattip__ has joined #pypy
<cfbolz> nimaje: yeah, that's how I understood it too
mattip_ has joined #pypy
mattip__ has quit [Ping timeout: 240 seconds]
mattip_ has quit [Ping timeout: 250 seconds]
<cfbolz> so the tutorial echo client example is indeed 2x slower
<cfbolz> and I see this: virtualizables forced:543447
<cfbolz> ok, this is at least partly our fault :-(
<cfbolz> fijal: we make a moduledict for locals() :-(((((( on 3.x
<cfbolz> which makes no sense of course
mattip_ has joined #pypy
mattip_ has quit [Ping timeout: 240 seconds]
<cfbolz> belm0: so I'm working on a fix that should improve the situation, you could try a nightly build of tonight
mattip_ has joined #pypy
<fijal> is there a reason why it does not use just a dict?
<cfbolz> fijal: an instance dict will probably work well? it's a predictable set of keys per code object
Techcable has quit [Ping timeout: 252 seconds]
epony has quit [Ping timeout: 252 seconds]
epony has joined #pypy
Techcable has joined #pypy
<fijal> cfbolz: any dict would work better than locals, arguably
<cfbolz> yes yes
mattip_ has quit [Ping timeout: 272 seconds]
<cfbolz> fijal: like a factor of 20x, it gets worse than without jit of course
<fijal> in a sense it's too easy and too socially acceptable to do crazy shit like that in pypy
<cfbolz> like what?
<cfbolz> turning this into a module dict?
<cfbolz> fijal: I actually think that's kind of acceptable, but it should come with a test_pypy_c test
<fijal> to use locals() casually
<fijal> to create class at runtime
<fijal> to write things like len(x for x in l if something) to count items
<fijal> etc.
<cfbolz> that's a python property though, right? you wrote "pypy" above
<fijal> yes, python, sorry
<cfbolz> anyway, while I agree I fear we very much won't change python programmers :-(
<fijal> no, that's not the point
<fijal> the point is a bit to point out *why* python is slow
<fijal> it's far more the culture than the fact that you can't optimize certain things
<fijal> and I like to understand things like how culture shapes performance of the language
<fijal> seems to be a much bigger factor than "how fast can you add integers"
<fijal> like, would putting all this crap into a special "reflection" module help for example?
<cfbolz> right
<fijal> I think this is very underexplored
<fijal> or at least I don't know of any exploration
<fijal> the general intersection of coding/psychology seems to make a difference - like how to design APIs that they make sense?
derpydoo has joined #pypy
Dejan has quit [Read error: Connection reset by peer]
mattip_ has joined #pypy
mattip_ has quit [Read error: Connection reset by peer]
epony has quit [Ping timeout: 252 seconds]
epony has joined #pypy
Julian has joined #pypy
otisolsen70 has quit [Quit: Leaving]
derpydoo has quit [Ping timeout: 272 seconds]
Julian has quit [Quit: leaving]
epony has quit [Ping timeout: 252 seconds]