#pypy on 2024-05-08 — irc logs at libera.irclog.whitequark.org

2022-11-09 10:48 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions

03:16 jcea has quit [Ping timeout: 268 seconds]

05:39 mgorny has quit [Quit: No Ping reply in 60 seconds.]

05:40 mgorny has joined #pypy

06:47 lritter has joined #pypy

08:07 [Arfrever] has quit [Ping timeout: 256 seconds]

08:21 [Arfrever] has joined #pypy

08:40 dmalcolm has quit [Ping timeout: 264 seconds]

08:46 Dejan has joined #pypy

09:02 johnny_nick has joined #pypy

09:03 johnny_nick has quit [Client Quit]

10:53 dmalcolm has joined #pypy

11:00 otisolsen70 has joined #pypy

12:33 jcea has joined #pypy

12:54 dmalcolm has quit [Ping timeout: 240 seconds]

14:29 otisolsen70 has quit [Ping timeout: 256 seconds]

14:57 dmalcolm has joined #pypy

15:32 itamarst has joined #pypy

16:06 lritter has quit [Quit: Leaving]

16:09 <itamarst> hello! I was curious about how the JIT interacted with benchmarking

16:09 <cfbolz> itamarst: hey!

16:09 <cfbolz> what exactly do you mean?

16:09 <itamarst> e.g. in Rust benchmarking frameworks there's typically a blackbox() function you feed variables into, so that the compiler doesn't optimize the code into just a constant

16:09 <itamarst> "oh hey you're doing repetitive math with constant arguments" kinda thing

16:10 <cfbolz> itamarst: does that mean you're mainly talking about microbenchmarks?

16:13 <cfbolz> for microbenchmarks this is definitely a problem with pypy, yes. if you measure a loop invariant computation in a big loop, the computation will be moved out of the loop and the times are just for an empty loop

16:14 <itamarst> yeah microbenchmarks I guess

16:14 <itamarst> is there an equivalent blackbox function?

16:15 <cfbolz> yesish, but using it introduces other timing distortions

16:15 <cfbolz> because then you need a non-inlinable function call, which is severely expensive

16:17 <cfbolz> you can achieve it with the help of `pypyjit.residual_call(callable, *args, **kwargs)`

16:18 <cfbolz> what I do instead is to make sure that the computation I'm trying to measure is not constant-foldable (eg depends on the loop counter somehow)

16:19 <cfbolz> then I that the JIT didn't manage to cheat by looking at the JIT compiler IR (but that is not really something I can recommend in general, of course)

16:19 <cfbolz> itamarst: do you have a concrete use case in mind right now? then we could try it

16:34 <itamarst> at the moment it's benchmarks for twisted networking framework

16:34 <itamarst> so it's not really arithmetic-heavy code, lots of method calls

16:35 <itamarst> (sorry, in and out dealing wtih plumber)

16:38 <itamarst> I'm doing benchmark integration with codspeed.io, which uses Cachegrind to get CPU instruction counts

16:38 <itamarst> so by default they only run benchmarked code once

16:39 <itamarst> and that's fine for CPython, but with PyPy that means you won't get a JITed version?

16:39 <itamarst> so I was thinking of adding a warmup phase of running the code in a loop

16:39 <itamarst> but then there's the worry about the computation being moved out

16:40 <itamarst> and... sounds like residual_call() is exactly what I want for that?

16:41 <itamarst> since the function is specially a _benchmark_, not library code

16:44 <itamarst> I guess I should also investigate if this is a problem with pytest-benchmark, and file an issue there if so

16:44 <cfbolz> itamarst: that sounds more like a real program than super small synthetic code

16:44 <cfbolz> so it's a bit unlikely that the JIT will be able to const-fold it completely

16:45 <itamarst> well

16:45 <cfbolz> residual_call is a bit subtle. it only prevents passing information across that call boundary

16:46 <itamarst> I want to get this as a feature into pytest-codspeed and I guess pytest-benchmark

16:46 <itamarst> and they're used for wide variety of code

16:46 <itamarst> ah but _that_ call could still be optimized into a constant

16:46 <cfbolz> I don't think there can be a one-size-fits-all solution, I fear

16:46 <itamarst> stupid compilers being too smart

16:47 <cfbolz> yeah, sorry ;-)

16:48 * cfbolz hard at work making it smarter

16:48 <cfbolz> itamarst: do you have a link to the Rust mechanism?

16:48 <itamarst> https://doc.rust-lang.org/std/hint/fn.black_box.html is a built-in one

16:50 <cfbolz> it's definitely good to think about it, the issue will only become more relevant for cpython in the future

16:50 <itamarst> yeah I'm probably going to write an article about it

16:51 <itamarst> thank you, this is very helpful

16:51 <cfbolz> so yes, you can express something like the rust black box directly

16:51 <cfbolz> sec

16:52 <cfbolz> itamarst: I'm happy to read a draft of the post, if that would be in any way helpful

16:53 <LarstiQ> looking at that rust blackbox I thought another trick might be passing in values from the commandline so they're not known constant, but that doesn't really work for a tracing jit

16:54 <cfbolz> LarstiQ: no, that does work

16:55 <LarstiQ> cfbolz: ah, pypy doesn't at some point specialize based on the data seen?

16:55 <cfbolz> not in most cases

16:56 <LarstiQ> well, maybe I did contribute something useful then ;)

16:57 <cfbolz> I'm currently trying to mimic the example from the rust docs, but can't get pypy to cheat yet ;-)

17:03 <itamarst> so one thought is... if the benchmark framework calls the function-being-benchmarked with a different argument each time (0, 1, 2, ...)

17:04 <itamarst> and uses residual_call

17:04 <itamarst> and the benchmark author can use that to ensure it's not a const calculation

17:05 <itamarst> and then benchmark authors don't need to use residual_call() or CPython equivalent?

17:09 <cfbolz> itamarst: yes, I think that is actually enough

17:09 <cfbolz> you must use the different arguments

17:09 <cfbolz> and that's a much better guard against distortion than anything else

17:10 <cfbolz> cpython doesn't need anything like blackhole atm

17:12 <cfbolz> itamarst: then the benchmarking framework could use the result for something (xor the hash into a accumulator or something like that). that would prevent the jit from thinking the result is unneeded

17:14 <itamarst> wouldn't residual_call() suffice?

17:15 <cfbolz> itamarst: the problem is that residual_call has a huge overhead

17:15 <cfbolz> small example: https://gist.github.com/cfbolz/626555a7d53ed7fbfa39e4849b1fbee3

17:24 <itamarst> mmm

17:24 <itamarst> I'm not sure that's a problem? if the benchmark framework is smart enough to remove that overhead

17:28 <cfbolz> I'm not sure that's possible

17:29 <itamarst> ah

17:29 <cfbolz> basically my philosophical point is: you fundamentally cannot really automate any of this in a robust way that will prevent people from making benchmarking errors

17:29 <itamarst> :(

17:29 <cfbolz> (and I'm including myself :-P)

17:30 <cfbolz> itamarst: sorry for being pessimistic :-(

17:30 <itamarst> but maybe you can cover 90% of cases

17:30 <itamarst> and I guess in some cases this isn't an issue at all

17:30 <itamarst> so let's say... 1% badness vs 0.1% badness

17:31 <itamarst> I can imagine arguments that you actually want benchmarks to be broken 100% of the time

17:31 <itamarst> if you do them wrong

17:31 <itamarst> so it's easier to notice?

17:31 <itamarst> or I guess that's maybe another question

17:31 <cfbolz> heh, no, I am definitely not against benchmark harm reduction

17:31 <itamarst> maybe you can't _prevent_ it with automation, can you _detect_ it with automation?

17:35 <cfbolz> itamarst: basically a positive way to make my point would be: communicating the pitfalls is necessary and ultimately more important than purely technical solutions

17:35 * itamarst nods

17:36 <itamarst> harm reduction (or detection?) is probably important too though, given (a) people don't ofetn read docs and (b) LLMs are making this worse

17:38 <itamarst> but I guess if it ends up in pytest-benchmark docs that'll help people use the correct template

17:44 <itamarst> thank you for all the feedback, this is very helpful

17:45 <itamarst> I will report back as I come up with suggestions for the relevant frameworks (docs / code changes)