cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions
Dagger has quit [Ping timeout: 265 seconds]
Dagger has joined #pypy
jcea has quit [Ping timeout: 264 seconds]
zii has joined #pypy
[Arfrever] has quit [Quit: leaving]
[Arfrever] has joined #pypy
<zii> I don't know if this is worth creating an issue about, but I'm disappointed that PyPy doesn't let me have my cake and eat it too. 🥲 I have some 2-dimensional vectors that I want to find L1 distances between. I model these vectors using a class that supports vectors of arbitrary dimensions. Using the class' methods that supports vectors of
<zii> arbitrary dimensions to compute L1 distances is much slower (~6x) than directly using the formula for 2-dimensional vectors. I had hoped that PyPy's JIT could figure out that the vectors are 2-dimensional at runtime and produce code with similar performance to the handwritten formula through inlining, unrolling and allocation removal optimisations.
<nimaje> I think you want Point.c be a tuple, but that is just a general comment, as I like immutable data, no idea if it would change anything for your benchmark
<zii> It makes the manual version a bit faster. But I do want the coordinates to be mutable.  😅
<nimaje> do you ever use that you can throw a generic Iterable into __sub__? if not, how about assuming Point there and using zip(s.c, o.c, …)?
<zii> Yep. The change also doesn't improve the performance.
<nimaje> if you implement manh_dist with a for loop instead of sum and map?
<zii> I tried that too. I doesn't help, unfortunately. 😢
<zii> Even `def __sub__(s, o: Point) -> Point: return Point([a - o.c[i] for i, a in enumerate(s.c)])` isn't enough.
sugarbeet has quit [Ping timeout: 260 seconds]
sugarbeet has joined #pypy
[Arfrever] has quit [Ping timeout: 252 seconds]
[Arfrever] has joined #pypy
zii has quit [Quit: Client closed]
<cfbolz> nimaje: a tuple would be worse, performance wise actually
<cfbolz> zii: unfortunately this would require automatic unrolling for the loops in `__sub__` and `manh_dist`, which is something the pypy jit can't really do :-(
[Arfrever] has quit [Ping timeout: 248 seconds]
<korvo> I'd wonder whether a class is the wrong tool, since it enforces an extra allocation and indirection compared to a function that takes the vectors as arguments.
* korvo still on that Stop Writing Classes
[Arfrever] has joined #pypy
zii has joined #pypy
<zii> Oh, I actually thought PyPy did unrolling, but maybe it's only for the RPython interpreter code? A potential solution could be to override __new__ and return some specialised classes for common numbers of dimensions. These could potentially be automatically generated...
zii has quit [Quit: Client closed]
<cfbolz> zii: yeah, unrolling is only for rpython loops, and even there it's not automatic
<cfbolz> I'm actually less concerned about the method version being slow, but would instead like the manual version to be even faster ;-)
<korvo> Since nobody's said it yet, a correct automatic-unrolling algorithm would be undecidable, or at least that's the lore I was told years ago. It seemed plausible to me; I know that perfect inlining is undecidable too.
<korvo> zii: Have you looked at the JIT traces yet? That's usually where I start when trying to understand why hot code is slow.
zii has joined #pypy
<cfbolz> korvo: static analysis routinely solves theoretically undecidable problems in practice, though ;-)
<korvo> Hm, at user level, is it possible to assert that the length of the lists going into zip() are constant?
zii has quit [Client Quit]
zii has joined #pypy
<korvo> cfbolz: Of course. Trying to remember who it was that said, "A heuristic is an algorithm which is not correct." I'm just thinking about how to manage user-level expectations.
<cfbolz> one simple way, which we picked, is by never unrolling 🤷‍♀️
zii has quit [Client Quit]
zii has joined #pypy
cfbolz has quit [Quit: Updating details, brb]
cfbolz has joined #pypy
<zii> korvo I'm unfortunately not deep enough in PyPy to retrieve useful information from JIT traces. Maybe one day. 🙂
<cfbolz> I already looked at them. they aren't too unreasonable, apart from the fact that the JIT has no way of knowing that all these lists are exactly length two in practice, always
<korvo> Oh, no worries! There's an environment variable that can get you started, PYPYLOG. For example, one of my .envrc starts with `export PYPYLOG=jit-summary:-`, which prints a little summary at the end. Useful for batch jobs like raytracing.
<zii> cfbolz Thanks for clearing up why the method version cannot be faster!
<korvo> Is there a known reason why the immutable tuple does worse? Maybe something like storage strategies? I agree with nimaje that it would be an intuitive next step, but I'm eager to learn why intuition is wrong here.
<cfbolz> zii: I'll now go and make abs(int) faster though, which would help both versions
<zii> korvo I'll give it a try!
<cfbolz> also I have some longer-term work to make all such small lists slightly more efficient
<cfbolz> (not unrolling them though)
<zii> Sounds great 😃
<cfbolz> I'd also want to point out that all versions are much faster than cpython ;-)
zii has quit [Quit: Client closed]
zii has joined #pypy
<cfbolz> zii: heh, it helps a lot! the manual method becomes more than 3x faster (the manual method unfortunately only 5%)
<zii> cfbolz Did the methods function become 5% faster? 😅
<cfbolz> ah yes, I wrote 'manual' twice
<cfbolz> yeah, the methods version only became a bit faster
<cfbolz> so the distance is much bigger now
<cfbolz> manual is ~40x faster for me now 😅
<zii> Right. It's good motivation for implementing generation of specialised vector classes.
<zii> Was the 3x speedup just from specialising `abs` for integers?
<cfbolz> no, it's specialized for abs already
<cfbolz> but it's written in a branchy way
<cfbolz> and that's always bad for the tracing JIT
<cfbolz> so now I wrote it like this: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs (being very careful not to mess up abs(MININT))
<zii> Ah, nice!
<zii> 👍
<zii> Very neat for such a speedup. Unfortunately most programs are probably not bottlenecked by calls to abs. 😅
zii has quit [Ping timeout: 240 seconds]
[Arfrever] has quit [Ping timeout: 265 seconds]
[Arfrever] has joined #pypy
jcea has joined #pypy