otisolsen70 has quit [Remote host closed the connection]
otisolsen70 has joined #pypy
mgornyz has joined #pypy
mgorny has quit [Ping timeout: 246 seconds]
mgorny has joined #pypy
mgornyz has quit [Ping timeout: 246 seconds]
mgornyz has joined #pypy
mgorny has quit [Ping timeout: 246 seconds]
mgorny has joined #pypy
mgornyz has quit [Ping timeout: 246 seconds]
mgornyz has joined #pypy
mgorny has quit [Ping timeout: 246 seconds]
mgorny has joined #pypy
mgornyz has quit [Ping timeout: 246 seconds]
mgornyz has joined #pypy
mgorny has quit [Ping timeout: 246 seconds]
mgornyz is now known as mgorny
<antocuni>
pff, I tried to apply my gc-custom-trace-memo branch also to default, and also in that case it's consistently ~5% slower than vanilla default :(
<antocuni>
arigato: do you know what are the gc.trace() calls which are likely to have a bigger impact on performance?
<antocuni>
with --jit off, I don't see any difference
<antocuni>
which is weird: if the problem is that gc callbacks are slower due to my branch, I'd expect the slowdown to show up even without JIT
<antocuni>
unless maybe the reason is that with --jit off everything is slower and so the GC-related slowdown is lost in the noise?
<cfbolz>
antocuni: the jit uses a trace function for the jit frames, right?
<antocuni>
ah, yes
greedom has joined #pypy
greedom has quit [Read error: Connection reset by peer]
<antocuni>
update on my benchmarks: targetgcbench also shows a reproducible ~8% slowdown, without JIT. So it's likely a real slowdown in the GC, and since it's a small target it's probably easier to investigate
<antocuni>
this is the output which I get by analyzing the targetgcbench PYPYLOG (using a modified version of logparser which prints actual milliseconds instead of CPU ticks)
<antocuni>
it is very clear that the difference is in gc-minor, now I "just" need to investigate why
<antocuni>
(also, I didn't expect that we spend so much time on gc-hardware! But looking at the impl is reasonable because we open and parse /proc/meminfo
derpydoo has joined #pypy
<antocuni>
I might have found the problem: the branch introduces an indirection to call _trace_drag_out, so that gc.trace calls the callback which calls gc._trace_drag_out
<antocuni>
by marking the generated callback as _always_inline_, the extra call is removed and it seems that the performance is good again