#pypy on 2025-04-10 — irc logs at libera.irclog.whitequark.org

2022-11-09 10:48 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions

01:55 <korvo> cfbolz, arigato: While we're on GPU stuff, I can sometimes answer questions about GPU ISAs, even if the answer is vague and can't cite anybody directly for ~the reasons~

01:56 <korvo> On GPUs where I can go find a PDF of the ISA, like any ATI/AMD Radeon in the past couple decades, the main issues are all conceptual. There's a cool Dyalog APL compiler, co-dfns, whose design is like 20% APL on GPU and 80% compiler on GPU.

01:58 <korvo> "computing the full expression at every pixel" is often the best choice, yeah. I'm a big believer in Inigo Quizlez' approach, even if I don't use their "ShaderToy" product; evaluating a signed distance function is usually not that slow and the cost is really in the size of the shader and textures.

02:03 <korvo> ...At risk of sounding like a crank, the issue of needing a driver for a GPU is tangent to an #esolangs conversation from a while ago. The different parts of a modern motherboard are effectively message-passing actors.

02:04 <korvo> Not every actor is going to be capable of driving itself. The CPU is only Turing-complete when paired with a nice memory controller, even! So it shouldn't be surprising that some actors effectively require a driving actor next to them in order to be Turing-complete.

02:16 <korvo> ...Actually, at risk of being a bold crank who over-promises, I've been thinking about this for a while. In a serious way, there's nothing technically preventing an RPython process from opening up a DRI/DRM node on Linux or BSD and asking for a compute handle.

02:18 <korvo> I know libdrm and can help anybody with the basic ioctls. But there's a bunch of device-specific initialization that has to be done, and we really only have the docs to do it on Radeon cards.

02:20 <korvo> I have no idea how guards would work, and that was enough for me to stop thinking about it.

02:56 derpydoo has joined #pypy

03:01 jcea has quit [Ping timeout: 248 seconds]

03:48 derpydoo has quit [Quit: derpydoo]

04:52 itamarst has quit [Quit: Connection closed for inactivity]

06:51 auk has quit [Quit: Leaving]

07:44 [Arfrever] has quit [Ping timeout: 252 seconds]

07:54 <arigato> cfbolz: fwiw, I'm pretty sure we could get massive speedups on the Prospero challenge because it's just made of individual letter parts which repeat

07:55 <arigato> in a very gamedev-oriented solution (instead of CUDA), I would extract the 665 independent bits, then identify which ones are actually the same ones that only differ by a translation, further reducing the number

07:56 <arigato> then compute the bounding box of each of these parts, by doing some range analysis

07:57 <arigato> then we get 665 quads on the screen, which we can implement as a mesh of 665*2 triangles, with UV coordinates that encode the coordinates of the vertices plus translations, and a 3rd UV coordinate that selects which of the faces to use

07:58 <arigato> we end up with a single mesh draw call, with some overlap, where every mesh uses only one block of 10 or 20 instructions

07:58 [Arfrever] has joined #pypy

07:59 <arigato> with a single switch which is constant for the whole quad

07:59 <arigato> that's pretty much the best possible case for GPUs drawing triangles

08:02 <arigato> I'd expect this to be maybe 20 times faster than what I did yesterday

08:03 <arigato> s/every mesh/every quad (there is a single mesh)

08:08 <arigato> (also, if some more advanced range analysis says the points are inside some other tigher polygonal shape than an axis-aligned bounding box, then there is no runtime cost in replacing the quad with this other shape, further increasing performance)

08:09 <cfbolz> arigato: right

08:09 <cfbolz> if the application is some kind of 3d CAD it becomes a bit trickier though, I assume

08:10 derpydoo has joined #pypy

08:10 <arigato> if the 3d model is the union of a large number of small pieces, the same approach should work

08:11 <arigato> but maybe we could treat large pieces differently? like, stick them all inside the same union-based formula and use a single quad covering the whole window for it

08:13 <arigato> ...I actually did exactly that (with lots of small pieces) in a VR game I made: it renders many many bubbles, which are small spheres, but rendered as octagons---so only 4 triangles per sphere, and a custom shader that make the triangle appear exactly like its spherical subset

08:14 <arigato> (4 visible triangles, plus 4 triangles on the other side that the GPU discards early)

08:17 <arigato> I don't know why but it is apparently an uncommon solution (within games). Normally spheres are rendered as high-poly surfaces, but hundreds of faces times thousands of spheres is too much for the Quest VR device

08:18 <arigato> (a good PC GPU would be fine with 10 times more spheres, but not 100 times more)

11:01 derpydoo has quit [Quit: derpydoo]

11:10 otisolsen70 has joined #pypy

11:11 otisolsen70 has quit [Remote host closed the connection]

11:12 otisolsen70 has joined #pypy

11:45 itamarst has joined #pypy

12:20 jcea has joined #pypy

12:40 otisolsen70 has quit [Quit: Leaving]

13:32 nimaje1 has joined #pypy

13:32 solremn has joined #pypy

13:32 krono_ has joined #pypy

13:36 infernixx has joined #pypy

13:40 remn has quit [*.net *.split]

13:40 mjacob has quit [*.net *.split]

13:40 infernix has quit [*.net *.split]

13:40 nimaje has quit [*.net *.split]

13:40 krono has quit [*.net *.split]

13:40 krono_ is now known as krono

13:40 mjacob has joined #pypy

13:40 infernixx is now known as infernix

13:43 nimaje1 is now known as nimaje

15:04 <korvo> It's not just gamedev; X11 and other windowing systems capture the "glyphs" of a font and re-use each glyph as a texture or stencil.

15:13 Diggsey has joined #pypy

16:16 jcea has quit [Ping timeout: 248 seconds]

16:44 <arigato> korvo: right, and we can also compare that with signed distance fields, which give a nicer-looking result than just rendering a texture, for fonts

16:44 <korvo> arigato: Yep. I haven't dug into Prospero yet, but I hear that it was designed by generating bytecode which approximates an SDF? So we come full circle.

16:45 <arigato> yes, it is based on a SDF defined entirely by computations, and no texture sampling at all

16:47 <arigato> but of course it makes sense to think about font-rendering systems for Prospero, because in that case the image we want to render is actually some text

16:49 <korvo> 2D SDFs on a 2D projection can usually be pre-composed with that projection, too, so if the SDF has *any* repeating structure then it can be isolated and cached. This is in stark contrast to 3D SDFs, where the projection isn't just a linear mapping but requires a loop and numerical methods.

16:52 <arigato> yet I believe we can still do the same in 3D in the general case, e.g. splitting it as a bunch of 3D bounding box isolated from each other, and then rendering it as one 3D cube each

16:53 <korvo> I guess that I should point out the other reason that glyphs make sense for windowing systems: at a fixed (small) point size, fonts need subpixel information to get legible rendering. So they have to be rendered in software on the CPU and transferred to the GPU as stencilling textures anyway.

16:55 Dejan has joined #pypy

16:57 <korvo> ...Wait, I see how to encode guards. Just pick a sentinel value that indicates that the computation failed and designate one channel to carry error information showing which guard failed. That would even be Vulkan-compatible.

16:57 [Arfrever] has quit [Ping timeout: 276 seconds]

16:57 <korvo> Okay, so the main problems are (1) not all cards have Vulkan yet, (2) I feel like it'd be worthwhile to just write our own drivers, (3) this is an enormous amount of work regardless of whether the API is DRI/DRM or Vulkan.

16:59 <arigato> korvo: just to check, you're talking about PyPy's JIT-produced guards, right?

16:59 <korvo> arigato: Yes.

17:00 <arigato> OK

17:00 <korvo> Oh! I was sleep-deprived and didn't explain myself. So right now my Cammy implementation in RPython can ray-trace basic scenes at 2-3 megapixels/second. Not bad for a CPU renderer. I have multiple tiers of optimization and think I've gotten down to an assembly-like language that could be compiled to machine code.

17:01 <korvo> It wouldn't be *that* hard to put Cammy on a GPU, or at least straight-line Cammy which only manipulates floats. The corresponding shader is just bytecode, really; Mesa3D is millions of lines of code because it's C/C++, not because the task is tricky.

17:09 [Arfrever] has joined #pypy

17:54 jcea has joined #pypy

18:41 jcea has quit [Ping timeout: 276 seconds]

19:01 dmalcolm_ has joined #pypy

19:02 dmalcolm has quit [Remote host closed the connection]

20:03 Dejan has quit [Quit: Leaving]

21:40 jcea has joined #pypy