#pypy on 2022-04-19 — irc logs at libera.irclog.whitequark.org

2022-04-07 20:04 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | Matti: I made a bit of progress, the tests now only segfault towards the end

01:11 epony has quit [Ping timeout: 260 seconds]

01:46 nimaje has quit [Ping timeout: 260 seconds]

07:13 otisolsen70 has joined #pypy

07:41 nimaje has joined #pypy

07:48 slav0nic has joined #pypy

08:06 otisolsen70 has quit [Quit: Leaving]

09:14 epony has joined #pypy

10:01 Dejan has joined #pypy

11:20 lritter has joined #pypy

11:40 otisolsen70 has joined #pypy

14:35 komasa2 has joined #pypy

14:56 otisolsen70_ has joined #pypy

14:59 otisolsen70 has quit [Ping timeout: 260 seconds]

15:14 otisolsen70_ has quit [Quit: Leaving]

15:16 <komasa2> Somewhat conceptual question about pypy: Does pypy optimize based on the assumption that if `x == y` then the objects are interchangeable? Specifically, if there is a object variable that isn't checked in the `__eq__` implementation?

15:17 <komasa2> I'm chasing down some confusing failure in my test case here, that seems to be influenced by e.g. a _different_ testcase

15:36 <cfbolz> komasa2: this can happen for ints, floats and stuff like that, but it should not be observable

15:38 komasa2 is now known as komasa

15:40 <mattip> is there some OS resource (socket, file, mmap) involved? Perhaps you are depending on refcount semantics to close or release the resource.

15:40 <mattip> does changing the order of the tests change anything?

15:43 <komasa> Shouldn't involve any OS resource directly

15:45 <komasa> The order of the test does change things, but currently I can also get it to fail with just one test

15:47 <mattip> can you share the general workflow of the test?

15:47 <komasa> The core issue is: There is some kind of class that consists of basically three values, let's call them "x" "y" and "tags". `tags` is a set of other objects

15:47 <komasa> The __hash__ and __eq__ method of this object only take x and y into account

15:47 <cfbolz> komasa: it there some way we can run this ourselves?

15:48 <komasa> cfbolz: not really, if there was I'd just be bugging the maintainer of the underlying library code directly :/

15:48 <cfbolz> komasa: ok

15:49 <komasa> this is some interaction between code for work I can't share, and library code (github.com/angr/angr)

15:49 <cfbolz> komasa: did you try to run pypy with '--jit off'?

15:50 <komasa> Can I somehow pass that to pytest? Because one of the baffling behavior is that directly running the test with the interpreter also doesn't fail the test

15:51 <cfbolz> komasa: pypy --jit off -m pytest

15:52 <komasa> Still fails

15:52 <cfbolz> Good

15:53 <cfbolz> A jit bug would be much more disconcerting

15:54 <cfbolz> komasa So I fear now it's down to trying to minimize the problem :-(

15:54 <komasa> That's okay, I have been doing that for hours at this point, my inital question was already part of that process :)

15:55 <komasa> I'm still not 100% sure this is a just a pypy issue, it might just need a different setup to trigger with CPython

15:55 <cfbolz> komasa: let us know if we can help in some way

15:55 <cfbolz> komasa: out of curiosity, what are you using PyPy for?

15:55 <komasa> Uh, "make Python code go fast"

15:55 <komasa> We are doing large scale program analysis with the angr Framework

15:55 <cfbolz> That's extremely generic 😂

15:56 <cfbolz> komasa: cool

15:56 <cfbolz> If you ever want to write a guest post on the PyPy blog on that, we would be very interested

15:56 <komasa> About the general thing we do?

15:57 <komasa> The angr framework specifically recommends pypy as a tool for speeding it up btw: https://github.com/angr/angr-doc/blob/master/docs/speed.md

15:57 <cfbolz> komasa: yep, and how PyPy is involved

15:58 <mattip> a quick search for "open(" in angr turns up things like this line, that never closes fd

15:58 <mattip> https://github.com/angr/angr/blob/bcd8f3a95c9858297b2fdc2697cbb5c2ef256f23/angr/procedures/libc/fopen.py#L43

15:58 <komasa> Ah, that is more complicated than that though

15:59 <komasa> That code basically is a python abstraction that simulates the actual "fopen" library call

15:59 <komasa> Which is only ever called during a specific kind of "emulation" of assembly code

16:01 <cfbolz> And I suspect you don't know when to call close, since that's up to the binary

16:01 <komasa> This shouldn't actually ever call the real "open" function from python

16:02 <mattip> ok, just something easy to grep for. There are more like it

16:02 <komasa> Fair enough. Though I am not sure how that could cause this issue

16:03 <komasa> I have narrowed down the offending code to pretty much this already: https://github.com/fmagin/angr/blob/23582b5970bb85801da1f44e84ce3c3e102087ad/angr/analyses/reaching_definitions/rd_state.py#L260-L268

16:03 <komasa> (the asserts are added by me)

16:03 <mattip> anyhow, my point is that if you are getting the error with "--jit off", then that should move the spotlight to the garbage collector differences between PyPy and CPython

16:04 <komasa> WOuld the GC somehow reuse objects with the same hash?

16:05 <komasa> My strong suspicion is that this is related to the __hash__ implementation somehow, because my problem is that I basically getting the same object (according to the __hash__) but the tags are empty.

16:06 <komasa> And this objects gets moved into sets/dicts a lot

16:07 <komasa> so maybe something like { Foo(1,2, {}) } | { Foo(1,2, {"bar"} } where the __hash__ method only uses the first two parameters

16:09 <komasa> Both the set containing just the `Foo(1,2, {})` or the set containing just `Foo(1,2, {"bar"})` would seem like valid results in that case

16:23 <cfbolz> no, the GC doesn't care about the __hash__

16:46 <cfbolz> komasa: hm, is it specifically dicts that are involved?

16:46 <cfbolz> Sets, I mean?

16:47 <cfbolz> There's one behavior difference between sets in cpython and PyPy, which is that sets are insertion ordered in PyPy, like dicts. In cpython the order is arbitrary (and based on the hashes)

20:09 lritter has quit [Ping timeout: 240 seconds]

20:34 Dejan has quit [Quit: Leaving]

21:43 slav0nic has quit [Ping timeout: 260 seconds]