<mattip>
I released 7.3.13. Hope there are no problems necessitating a 7.3.14 fix
<tumbleweed>
mattip: thanks!
otisolsen70 has quit [Quit: Leaving]
<komasa>
Another conceptual GC Question: Is it possible to set PYPY_GC_MAX after the Interpreter has already started or even later during runtime?
<mattip>
no, I think not.
<mattip>
There is a call rgc.set_heap_size() we could theoretically expose to pypy app-level, but it would require work and testing
<mattip>
that is, if you are talking about using that setting from pypy python
<mattip>
in your own interpreter, you can do more
<komasa>
What do you mean by "your own interpreter" ?
<mattip>
you can use rpython to build any language interpreter you want
Guest60 has joined #pypy
<mattip>
python (i.e. pypy) is not the only interpreter built on rpython, and you didn;t say what your use case is,
<mattip>
so I didn't want to make assumptions
Guest60 has quit [Client Quit]
<komasa>
Ah, I thought me asking about the PYPY_GC_* variables implied PyPy already
<komasa>
I'm not sure if anyone answered while I was disconnected, but I'm still unsure about what PYPY_GC_MAX_DELTA and that's somewhat more important
<komasa>
The documentation refers to total RAM, which makes me suspicious that this will lead to weird behavior when I have ~30 processes that together consume nearly the total RAM. Each program will stay far below 1/8 of total RAM size, so maybe my issue is that the GC just runs too rarely and this is why the programs just request a lot more memory from the OS than needed, which then isn't released again
<cfbolz>
mattip: yay, thank you matti!
<mattip>
cfbolz, tumbleweed: np
<cfbolz>
komasa: I kind of doubt that it's about your physical RAM
<cfbolz>
no, I'm wrong! it really reads /proc/meminfo on linux to find out the total RAM
<cfbolz>
note that it's the *MAX* delta. that means the delta will start out *much* smaller, it will just not go above that size
<komasa>
Yeah, I'm suspecting that the problem is that the initial allocations for loading the CFG take absurd amount of RAM (order of dozens of GB), so by the point later analyses run the delta is probably way too high
<komasa>
Are major collections also triggered before the process requests more memory from the OS?
<cfbolz>
not every time
<komasa>
Naively that feels sensible to do, but if we are just malloc, Python can't know when malloc will actually call brk to get more memory
<cfbolz>
there has been a recent paper that is supposed to have a better algorithm for setting dynamic heap limits to solve this kind of problem: https://arxiv.org/abs/2204.10455
<cfbolz>
it's even a relatively understandable heuristic, but somebody would still have to go and implement it
<cfbolz>
komasa: did you try to call "collect" a bunch of times yourself? does the memory every go down if you do that?
<cfbolz>
because if not, it's much more likely that you really have some kind of leak on the python level
<komasa>
I tried tracking down a potential leak already, but there aren't obvious ones left
<cfbolz>
we cannot easily reproduce this ourselves, right?
<komasa>
I used the PYPY_GC_MAX variable and I could reduce the RSS of some workflow from 3.5GB to 2GB
<komasa>
The 2GB are mostly the initial CFG that is loaded for the later analysis
<komasa>
Which also roughly looks like the 1.82 factor that triggers a major collection
<komasa>
the later analysis step is basically one specific kind of analysis run on thousands of targets
<komasa>
Each target might require up to a few hundred MBs to analyze, but the result of each target is tiny (summing to less than a hundred ints and short strings)
<komasa>
So my suspicioun is that the garbage collector doesn't collect until the intermediate data from all the targets has accumulated to a fairly large amount
<komasa>
Which means that the process is roughly taking ~1.5 more memory than actually needed
<komasa>
My naive attempt of calling gc.collect() after each target completely tanked the peformance, so I need something smarter than that. The variance in intermediate RAM needed between targets is also huge (some I'd estimate to a few KB, some are up to hundreds of MB)
<komasa>
The thing is, if I set PYPY_GC_MAX_DELTA, then I might tank the performance of the initial CFG loading
<cfbolz>
komasa: sounds all tricky indeed :-(
<komasa>
If my line of thought seems plausible/solid to you I have a few ideas how to work around this (if you don't have any)
<mattip>
maybe try injecting some __pypy__.add_memory_pressure(x) calls? That will tell the GC to instructs the GC to garbage collect sooner than it would otherwise.
<mattip>
typically that happens when you malloc() without the GC knowing about it, in a C call
<mattip>
cffi/ctypes/C-API are all supposed to be well-behaved and tell the GC about allocations, but
<mattip>
user-written code that allocates needs to let the GC know
<mattip>
s/user-written/third-party/
<cfbolz>
komasa: so a full gc.collect call tanks performance. how about at least forcing gc.collect_step after every target?
<komasa>
gc.collect_step helps slightly with maximum RSS (peaks at 2.8GB instead of 3GB)
<komasa>
For reference, if I constrain the process with PYPY_MAX_GC=1.8GB (found via trial and error to be the lower limit before it aborts out), the maximum RSS is around 2.4GB
<komasa>
I'm going to experiment with PYPY_GC_MAX_DELTA, I think there should be a value where the initial CFG import still isn't impacted much, but the later memory usage doesn't run off
otisolsen70 has joined #pypy
<komasa>
Okay, the solution is most likely properly tweaking PYPY_GC_MAX_DELTA. If I set it to 200MB the process only has a max RSS of 2.1GB