#pypy on 2023-09-29 — irc logs at libera.catirclogs.org

2022-11-09 10:48 cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions

06:24 otisolsen70 has joined #pypy

07:55 <mattip> I released 7.3.13. Hope there are no problems necessitating a 7.3.14 fix

08:44 <tumbleweed> mattip: thanks!

08:44 otisolsen70 has quit [Quit: Leaving]

08:44 <komasa> Another conceptual GC Question: Is it possible to set PYPY_GC_MAX after the Interpreter has already started or even later during runtime?

08:54 <mattip> no, I think not.

09:00 <mattip> There is a call rgc.set_heap_size() we could theoretically expose to pypy app-level, but it would require work and testing

09:01 <mattip> that is, if you are talking about using that setting from pypy python

09:01 <mattip> in your own interpreter, you can do more

09:12 <komasa> What do you mean by "your own interpreter" ?

09:15 <mattip> you can use rpython to build any language interpreter you want

09:15 Guest60 has joined #pypy

09:16 <mattip> python (i.e. pypy) is not the only interpreter built on rpython, and you didn;t say what your use case is,

09:16 <mattip> so I didn't want to make assumptions

09:16 Guest60 has quit [Client Quit]

09:16 <komasa> Ah, I thought me asking about the PYPY_GC_* variables implied PyPy already

09:19 <komasa> I'm not sure if anyone answered while I was disconnected, but I'm still unsure about what PYPY_GC_MAX_DELTA and that's somewhat more important

09:21 <komasa> The documentation refers to total RAM, which makes me suspicious that this will lead to weird behavior when I have ~30 processes that together consume nearly the total RAM. Each program will stay far below 1/8 of total RAM size, so maybe my issue is that the GC just runs too rarely and this is why the programs just request a lot more memory from the OS than needed, which then isn't released again

10:41 <cfbolz> mattip: yay, thank you matti!

10:45 <mattip> cfbolz, tumbleweed: np

10:45 <cfbolz> komasa: I kind of doubt that it's about your physical RAM

10:48 <cfbolz> no, I'm wrong! it really reads /proc/meminfo on linux to find out the total RAM

10:49 <cfbolz> note that it's the *MAX* delta. that means the delta will start out *much* smaller, it will just not go above that size

10:50 <komasa> Yeah, I'm suspecting that the problem is that the initial allocations for loading the CFG take absurd amount of RAM (order of dozens of GB), so by the point later analyses run the delta is probably way too high

10:51 <komasa> Are major collections also triggered before the process requests more memory from the OS?

10:51 <cfbolz> not every time

10:51 <komasa> Naively that feels sensible to do, but if we are just malloc, Python can't know when malloc will actually call brk to get more memory

10:55 <cfbolz> there has been a recent paper that is supposed to have a better algorithm for setting dynamic heap limits to solve this kind of problem: https://arxiv.org/abs/2204.10455

10:56 <cfbolz> it's even a relatively understandable heuristic, but somebody would still have to go and implement it

10:56 <cfbolz> komasa: did you try to call "collect" a bunch of times yourself? does the memory every go down if you do that?

10:56 <cfbolz> because if not, it's much more likely that you really have some kind of leak on the python level

10:57 <komasa> I tried tracking down a potential leak already, but there aren't obvious ones left

10:58 <cfbolz> we cannot easily reproduce this ourselves, right?

10:58 <komasa> I used the PYPY_GC_MAX variable and I could reduce the RSS of some workflow from 3.5GB to 2GB

10:59 <komasa> The 2GB are mostly the initial CFG that is loaded for the later analysis

10:59 <komasa> Which also roughly looks like the 1.82 factor that triggers a major collection

11:00 <komasa> the later analysis step is basically one specific kind of analysis run on thousands of targets

11:01 <komasa> Each target might require up to a few hundred MBs to analyze, but the result of each target is tiny (summing to less than a hundred ints and short strings)

11:02 <komasa> So my suspicioun is that the garbage collector doesn't collect until the intermediate data from all the targets has accumulated to a fairly large amount

11:02 <komasa> Which means that the process is roughly taking ~1.5 more memory than actually needed

11:04 <komasa> My naive attempt of calling gc.collect() after each target completely tanked the peformance, so I need something smarter than that. The variance in intermediate RAM needed between targets is also huge (some I'd estimate to a few KB, some are up to hundreds of MB)

11:06 <komasa> The thing is, if I set PYPY_GC_MAX_DELTA, then I might tank the performance of the initial CFG loading

11:15 <cfbolz> komasa: sounds all tricky indeed :-(

11:17 <komasa> If my line of thought seems plausible/solid to you I have a few ideas how to work around this (if you don't have any)

11:40 <mattip> maybe try injecting some __pypy__.add_memory_pressure(x) calls? That will tell the GC to instructs the GC to garbage collect sooner than it would otherwise.

11:40 <mattip> https://doc.pypy.org/en/latest/__pypy__-module.html#

11:41 <mattip> typically that happens when you malloc() without the GC knowing about it, in a C call

11:41 <mattip> cffi/ctypes/C-API are all supposed to be well-behaved and tell the GC about allocations, but

11:42 <mattip> user-written code that allocates needs to let the GC know

11:42 <mattip> s/user-written/third-party/

11:43 <cfbolz> komasa: so a full gc.collect call tanks performance. how about at least forcing gc.collect_step after every target?

11:58 <komasa> gc.collect_step helps slightly with maximum RSS (peaks at 2.8GB instead of 3GB)

12:02 <komasa> For reference, if I constrain the process with PYPY_MAX_GC=1.8GB (found via trial and error to be the lower limit before it aborts out), the maximum RSS is around 2.4GB

12:03 <komasa> I'm going to experiment with PYPY_GC_MAX_DELTA, I think there should be a value where the initial CFG import still isn't impacted much, but the later memory usage doesn't run off

12:15 otisolsen70 has joined #pypy

12:20 <komasa> Okay, the solution is most likely properly tweaking PYPY_GC_MAX_DELTA. If I set it to 200MB the process only has a max RSS of 2.1GB

13:04 ruth2345345 has quit [Ping timeout: 255 seconds]

14:31 ruth2345345 has joined #pypy

16:04 krono has quit [Ping timeout: 252 seconds]

16:04 krono has joined #pypy

16:21 Dejan has quit [Quit: Leaving]

16:29 Dejan has joined #pypy

17:46 Guest8416 has quit [Quit: Artafath!]

18:18 ruth2345345 has quit [Ping timeout: 260 seconds]

20:09 otisolsen70 has quit [Quit: Leaving]