cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions
jcea has quit [Ping timeout: 256 seconds]
jcea has joined #pypy
jcea has quit [Ping timeout: 272 seconds]
andrewsmedina has quit [Ping timeout: 255 seconds]
andrewsmedina has joined #pypy
<arigo_> cfbolz: let me just say, aaaaaaaa what?
<arigo_> this bug seems easy to reproduce and was there since the introduction of incminimark, but wasn't discovered until now?
<arigo_> (easy once you know it of course)
<cfbolz> arigo_: yes :-(
<cfbolz> arigo_: in practice you need to be extremely unlucky though, things need to happen in *just* the wrong order
<arigo_> and writing references into prebuilt objects is kinda rare
<cfbolz> yes
<cfbolz> it's a hypothesis of mine, but maybe it started happening more because I added some more caching mechanisms into pycode (of which there are quite a few prebuilt ones)
<cfbolz> arigo_: it's definitely an example where a model checker would be cool :-)
<cfbolz> or maybe i should write a hypothesis stateful test
<arigo_> I thought we already had some random tests
<cfbolz> for some things like the minimarkpages, yes
<cfbolz> arigo_: I wonder whether the four lines starting here: https://github.com/pypy/pypy/blob/main/rpython/memory/gc/incminimark.py#L2410 were trying to address the same problem (the commit has no tests though)
<arigo_> maybe the GC is already correct except for the assert
<arigo_> but it would require proof
<cfbolz> arigo_: yes, that's entirely plausible, except for the fact that I find crashes that are exactly freeing objects that are still reachable (in the same programs that fail with the assert earlier)
<cfbolz> but I didn't manage to reproduce the crash in a small test
<cfbolz> so I'm not entirely satisfied with the current situation yet
<cfbolz> and so far it looks like the crashes go away with my changes too. but I would still like to find a test :-(
<arigo_> the line that says "black -> white-but-prebuilt-so-dont-care", did you add that just now?
<arigo_> ah I see, the flag NO_HEAP_PTRS is removed, so you get a "black -> white-but-prebuit-but-no-longer-NO_HEAP_PTRS"
<arigo_> what did you change?
<cfbolz> arigo_: what I changed is that if NO_HEAP_PTRS is removed, I also add the object to the to-be-marked list (objects_to_trace) if we're in the MARKING phase
<cfbolz> in the write barrier
<cfbolz> this turns it from white to gray
<arigo_> OK, sounds reasonable to me
<cfbolz> arigo_: yes, I'm pretty sure it's not "wrong" conceptually
<arigo_> maybe a different fix would be to record the length of self.prebuilt_root_objects at the start of a collection, and at the end, if it grew, trace the extra objects
<cfbolz> what I'm doing now is to write a random test that would have found the same assertion error
<cfbolz> arigo_: because right now it's tracing them all?
<arigo_> ah, good question. maybe or maybe not, but what I said would still fix it, no?
<cfbolz> yeah
<arigo_> at worst it's looking a second time unnecessarily into some prebuilt objects, but at most once per prebuilt object for the lifetime of the process
<cfbolz> yeah. I think this is essentially what my fix does too, at the cost of adding it to objects_to_trace (instead of saving the index)
<arigo_> but you need to fix some _debug_objects_to_trace_dict1/2 too in DEBUG mode, no?
<cfbolz> no, why?
<cfbolz> _debug_objects_to_trace_dict1 is the dict version of objects_to_trace
<arigo_> I'm just confused because you're adding stuff to objects_to_trace but not _debug_objects_to_trace_dict1
<cfbolz> the debug mode is quite inefficient, it just recreates these dicts at every collection
<cfbolz> it does this: self._debug_objects_to_trace_dict1 = self.objects_to_trace.stack2dict()
<arigo_> ah, and the write barrier is outside collections, of course
<cfbolz> yes
<cfbolz> I also had to swap back in a lot 😅
<arigo_> "XXX do we need to add the dest_addr to objects_to_trace too?" I would say yes
<cfbolz> yeah, I think so too
<arigo_> it's a really rare case, and it's safe to do so anyway
<cfbolz> I wonder who copies things into prebuilt arrays ;-)
<cfbolz> ok, a random test finds the assert immediately, good
<cfbolz> if I undo the change
<cfbolz> but so far I cannot find a crash if I turn debugging off
jcea has joined #pypy
<arigo_> cfbolz: actually... I think that prebuilt_root_objects is going to be completely walked anyway at the end of marking?
<arigo_> a bit confused
<arigo_> doesn't this mean that there was never any problem in the GC? apart from the debug-mode asserts triggering when they shouldn't?
<arigo_> you said that https://github.com/pypy/pypy/commit/497818528f7b6992afdf028d03e39a4646619305 appears to fix something, but I don't get why right now, because whatever you add to 'objects_to_trace' is also added to 'prebuilt_root_objects' and that's going to be seen in https://github.com/pypy/pypy/blob/789f964fff59c722b0872abcdc56d2b1373a9f3b/rpython/memory/gc/incminimark.py#L2636, which essentially copies 'prebuilt_root_objects' into 'objects_to_trace'
<arigo_> I think it should have no effect, apart from adding the object to _debug_objects_to_trace_dict1 earlier, which silences the assert
luckydonald has quit [Quit: Uh oh, bouncer going down! (Survived for 2y 15w 1h 30m 42s, this time.)]
luckydonald has joined #pypy
<cfbolz> arigo_: there is definitely a problem in the GC, because the nanobind tests crash with asserts turned off
<cfbolz> so even if it "just" fixes the asserts, that sounds useful to me, because then I can try to re-run nanobind and get an *actual* assertion error
<cfbolz> I agree with your argument, fwiw
luckydonald has quit [Quit: Uh oh, bouncer going down! (Survived for 23m 4s, this time.)]
luckydonald has joined #pypy
<cfbolz> (the crash is unfortunately somewhat rare, so it will take a while to repeatedly run it and see whether it reoccurs)
<cfbolz> arigo_: or would you instead simply change the assert to also count the objects in prebuilt_root_objects as grey?
derpydoo has joined #pypy
LuisSyst-M has joined #pypy
LuisSyst-M has quit [Ping timeout: 272 seconds]
LuisSyst-M has joined #pypy
LuisSyst-M has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
LuisSyst-M has joined #pypy
derpydoo has quit [Ping timeout: 260 seconds]
LuisSyst-M has quit [Ping timeout: 256 seconds]
LuisSyst-M has joined #pypy