#pypy on 2022-04-21 — irc logs at libera.irclog.whitequark.org

2022-04-07 20:04 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | Matti: I made a bit of progress, the tests now only segfault towards the end

01:36 f4at has quit [Quit: bye]

07:05 otisolsen70 has joined #pypy

07:06 otisolsen70 has quit [Remote host closed the connection]

07:07 otisolsen70 has joined #pypy

08:16 otisolsen70 has quit [Quit: Leaving]

08:21 otisolsen70 has joined #pypy

09:07 slav0nic has joined #pypy

10:47 <Corbin> https://arxiv.org/abs/2011.13127 is on the front page of lobste.rs today. I don't recall it being discussed at the time. How does this compare to RPython's generated JITs?

10:48 <Corbin> I was particularly thinking of p25. "None of [the template JITs mentioned] above supported patching the binary code to burn in literals, stack offsets, and jump addresses, so their technique only works if the binary code can be concatenated without modification."

10:49 <Corbin> "This implies that all jumps and calls are indirect, and that all constants must be retrieved from memory, resulting in inferior execution performance."

11:06 <LarstiQ> cfbolz tweeted about it recently too

11:07 <cfbolz> Great paper

11:07 <cfbolz> It's a technique for quite good baseline jits

11:08 <cfbolz> It's not a meta jit though and making it one would be an interesting research project

11:09 <cfbolz> It's also an interesting question how well this would work for a language like python

11:22 lritter has joined #pypy

11:42 otisolsen70_ has joined #pypy

11:45 otisolsen70 has quit [Ping timeout: 250 seconds]

12:38 otisolsen70__ has joined #pypy

12:41 otisolsen70_ has quit [Ping timeout: 246 seconds]

12:55 otisolsen70__ has quit [Quit: Leaving]

17:48 <arigato> re the copy-and-patch JIT paper, yes, it's interesting and I think the approach would work for RPython JITs too

17:49 <cfbolz> arigato: I am still thinking about the latter part

17:49 <arigato> I'm thinking, for the backend only, to make somewhat optimized machine code

17:49 <arigato> (really I just started reading)

17:49 <cfbolz> ok

17:50 <cfbolz> arigato: I am wondering whether you could the approach as a first stage instead, to "just" get the ~2x speedup that getting rid of bytecode dispatch brings

17:51 <arigato> I haven't read far enough to be sure

17:54 greedom has joined #pypy

18:03 greedom has quit [Remote host closed the connection]

18:52 greedom has joined #pypy

18:55 <arigato> I think someone motivated enough could use it manually to get that effect on CPython

18:57 <arigato> not sure how well it would work on PyPy without tons of hints to guide it

19:08 <arigato> it would be interesting to try to use it with RPython's JIT, as a replacement for the backend and possibly the rewrite step that occurs just before too

19:09 <cfbolz> arigato: right

19:10 <arigato> but of course the usual problems remain: (1) GC integration, (2) guard failures

19:11 <cfbolz> arigato: for GC you would need to have an indirection for the GC constants, plus a way to find roots?

19:13 <arigato> the indirection for GC constants might not be needed, because there is already an indirection for all constants (their offset in the machine code)

19:14 greedom has quit [Remote host closed the connection]

19:14 <arigato> I think that across calls, *all* values are spilled in this model, so it might be easier to find the roots

19:14 greedom has joined #pypy

19:15 greedom has quit [Client Quit]

19:15 greedom has joined #pypy

19:17 <arigato> (doing an indirection for GC constants might still be easier, it's what we do already for all backends anyway)

19:17 <cfbolz> ok

19:19 lritter has quit [Ping timeout: 240 seconds]

19:22 <arigato> it's interesting in the paper how they can extract all templates they need from running the LLVM compiler, on any number of architectures

19:22 <cfbolz> arigato: yes

19:23 <cfbolz> arigato: I also really like the approach of using tail calls

19:23 <arigato> yes, together with the modes inside LLVM made for compiling GHC, I think? (unsure)

19:24 <cfbolz> not sure it's 'for' ghc, but ghc uses it

19:24 <cfbolz> arigato: this is the hacky non-C++ code I played with yesterday, btw: https://gist.github.com/cfbolz/3ffa8746fc44f5d1192c02028a0ce058

19:26 <cfbolz> (it's a lot more hacky and low-tech than the paper)

19:26 <arigato> "This calling convention has been implemented specifically for use by the Glasgow

19:26 <arigato> Haskell Compiler (GHC)."

19:27 <cfbolz> ah, cool

19:27 <cfbolz> fair enough

19:28 <arigato> but also "at the moment only X86 supports this convention"

19:28 <cfbolz> yes, just found that too

19:29 <arigato> they didn't talk about that point in the paper, or I missed it

19:32 <arigato> unsure I understand, because they have somewhere else wording that implies their work works on x86-64, ARM and SPARC

19:32 <cfbolz> arigato: yeah, I wonder whether it actually works on say ARM

19:33 <cfbolz> yes, but the formulation is vague

19:34 <arigato> personally I will stay far away from that whole approach, because it looks like another rabbit hole of unfinished LLVM features

19:34 <cfbolz> heh, got burned once by that already with stm?

19:35 <arigato> hah, if only that was only once

19:35 <cfbolz> oh no, what else?

19:35 <arigato> I don't remember exactly, just that every time we tried to use LLVM we eventually failed for that reason

19:37 <cfbolz> :-(

19:37 <arigato> https://rpython.readthedocs.io/en/latest/faq.html#could-we-use-llvm

19:38 <cfbolz> arigato: ok, fine, but I am not sure the paper really needs much of llvm. it mostly uses clang I think

19:39 <arigato> it more or less depends on a special calling convention, and this one is only implemented on x86 according to the docs, so... roadblock again?

19:40 <arigato> "implementing it inside llvm looks like it should not be too much work" (famouns last words that will not be mine)

19:41 <cfbolz> arigato: no, I think the calling convention "just" makes it more efficient, it works with the normal one too, I suspect

19:42 <cfbolz> but yes, I get your skepticism

19:43 <arigato> right, I think it wouldn't work with the plain calling convention because of the risk that it produces real calls and blow up the stack, but maybe there are other more portable conventions that guarantee tail-calls

19:44 <arigato> yes, it seems they have a "tailcc" for precisely that purpose

19:56 <arigato> I like how they represent all live values as parameters and arguments, but I'm not sure I see how they handle that in practice

19:58 <arigato> e.g. if two pieces of code are both INT_ADD+INT_MUL, but one uses the result of the addition as argument to the multiplication while the other just passes the addition as live variable, how is the difference represented?

19:59 <cfbolz> arigato: I think that would be different variants of these two, probably?

19:59 <arigato> do they need to insert special instructions to move registers around so that they end up where we need them, or do they instead general many variants of the INT_ADD and/or INT_MUL to pass/get various arguments in various positions?

20:00 <arigato> OK

20:00 <arigato> s/general/generate

20:00 <cfbolz> arigato: but yes, I don't quite get whether they have lots and lots of variants for "I have 8 other live things that I pass along"

20:02 <arigato> and is there many variants like "INT_ADD(a,b,c,d,e,f) which adds b to e"?

20:03 <cfbolz> indeed

20:04 <cfbolz> arigato: the paper says this: "We cannot naively enumerate all possible combinations of function prototypes for the different types of values that may be passed through, since the total number of combinations grows exponentially. The crucial observation is that each stencil only cares about its own inputs. The contents stored in the other registers do not matter, as long as they are not clobbered by the stencil. Therefore, for those

20:04 <cfbolz> registers, it is sufficient to always represent it by the longest type (uint64_t or double), and pass it from the argument to the continuation verbatim."

20:04 <arigato> somewhat doubtful, because otherwise they wouldn't end up with just 35kb of templates (in the simple case without too many superinstructions)

20:05 <cfbolz> ah no

20:05 <cfbolz> that's about types

20:05 <arigato> yes

20:06 <arigato> or maybe they do, and really have only a small number of live variables at most?

20:08 <cfbolz> "In our current implementation, we only use registers to store temporary values while evaluating expression trees." (but those can be deep, of course)

20:12 <Corbin> Hm. Could we compare copy-and-patch to compiling-to-closures for JIT codes?

20:13 <cfbolz> I don't know what compiling-to-closures is

20:14 <Corbin> Old Lisp technique. Each AST node or bytecode is turned into a call to a runtime function taking two arguments, an environment (the "closure") and the input value. It's amenable to CPS, just like in the copy-and-patch paper.

20:24 <krono> CPS, CEK, compile-to-colsures sound like all in the same ballpark…

20:59 reneeontheweb has joined #pypy

21:56 greedom has quit [Remote host closed the connection]

22:11 slav0nic has quit [Ping timeout: 250 seconds]

23:55 reneeontheweb has quit [Ping timeout: 252 seconds]