cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | the pypy angle is to shrug and copy the implementation of CPython as closely as possible, and staying out of design decisions
cahoots_ has quit [*.net *.split]
[Arfrever] has quit [*.net *.split]
mwhudson has quit [*.net *.split]
mwhudson has joined #pypy
cahoots_ has joined #pypy
[Arfrever] has joined #pypy
cahoots_ has quit [Ping timeout: 258 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 264 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 264 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 248 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 246 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 264 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 246 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 248 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
otisolsen70 has joined #pypy
<komasa> What is the state of memory profiling tools for pypy? I just tried memray with CPython which worked pretty well, but memray doesn't work with pypy.
cahoots_ has joined #pypy
<komasa> It would also work for me to profile with CPython, if the results are reasonably representative for memory usage under PyPy
<cfbolz> komasa: there aren't really any :-(
<cfbolz> komasa: I'm kind of planning to work on this problem "any year now"
<cfbolz> what kind of information would be useful for you?
<komasa> My problem is that some large and complicated analysis functions take a fairly high overall amount of RAM, and I want to know where this is actually needed
<komasa> (analysis functions in the sense that I am doing some automated program analysis)
cahoots_ has quit [Ping timeout: 240 seconds]
<cfbolz> the cpython results should be helpful directionally, but in details the memory usage is different, yes
<komasa> I have managed to setup my code in a way that the problematic function is resoponsible for 90% of the total RAM usage, so even just some code that walks the heap afterwards to track the objects that were allocated (and haven't been released) would be nice
<komasa> The low tech version of that is some combination of `Counter(map(type(gc.foo)))`, which I'll try in a minute
<komasa> One thing I am seeing with memray on CPython is that allocating an empty set (that I know will not even be filled), seems to take 1MB already, which seems surprisingly large
<komasa> But that also seems like the kind of thing pypy would do better
<cfbolz> komasa: there are some heap dump facilities in pypy's gc module
<cfbolz> and yes, empty sets should be 3 words in pypy (24 bytes)
<komasa> And they are really 1MB in CPython? I want to properly calibrate my appreciation for pypy :)
<nimaje> komasa: automated program analysis sounds intressting, can you share more about what you do?
<komasa> nimaje: The very short version is that we (the team in a research institute I work at) are buidling a larger analysis framework on top of github.com/angr/angr
<komasa> For that kind of workload pypy also makes a huge difference, I don't remember the exact numbers, but I think it was on the order of 3-7 times speedup?
<nimaje> hm, on cpython it is at least the object header (?B) + some sizes and pointers (7 * 8B?) + a preallocated smalltable (8 * 16B?) (+is there anything allocated for the weakref list, for a new object?), so I see that it needs at least 184B + whatever the object header overhead is and maybe padding
<komasa> mattip: you had a post on HN recently about wanting input from people using "pypy for real work", I think what we are doing counts, and I'm up for answering questions and giving input
<cfbolz> komasa: that sounds exciting, which institute is that?
<komasa> Fraunhofer SIT in Darmstadt
<cfbolz> very cool :-)
<komasa> The team I am in is a small subpart of the institute though, the specific thing we are doing is https://www.sit.fraunhofer.de/en/appicaptor/
cahoots_ has joined #pypy
<cfbolz> komasa: sys.getsiseof(set()) return 216 on cpy 3.10 for me
<cfbolz> so that's way less than 1MB
<komasa> yeah, that feels a lot more plausible
<cfbolz> but quite a bit more than 24 bytes ;-)
<cfbolz> of course pypy sets get much bigger too once you add the first element
<komasa> Might be memray just rounding up to 1MB for display reasons?
<cfbolz> that would be kind of weird too?
<komasa> yeah
<komasa> What I am seeing in the memray flame graph is that some lines that are just `self.foo = set()` being marked with `359MiB total, 359 allocations`
<komasa> But that line should have been run 764_000 times
<komasa> It's part of the constructor of an object that has ~764k instances at runtime
cahoots_ has quit [Ping timeout: 264 seconds]
<komasa> Okay, part of the issue also seems to be that the OS can't reclaim the memory that pypy allocated, even after pypy doesn't need it anymore? gc.get_stats() on pypy reports `rawmalloced: 9360.6MB` under `Total memory allocated`, after the code that needed that RAM is finished already
<cfbolz> that is weird. what kind of raw malloc objects is that? are you making heavy use of C extensions?
<komasa> Not as far as I know, that's what confuses me
<cfbolz> hm
<cfbolz> some of the JIT memory might be reported as rawmalloced, I'm not sure. but 1GB is excessive
<komasa> It's _10_ GB though?
<cfbolz> even worse ;-)
<komasa> exactly
<cfbolz> maybe huge strings? I'm not sure whether they get rawmalloced above a certain size
<cfbolz> or array.array instances?
<komasa> I have the memray info where this is supposedly coming from, 10GB is the largest part of the overall memory usage
<komasa> actually, should "GC Allocated" be the sum of "in arenas, rawmalloc, nursery"?
<cfbolz> I don't know that, sorry
* cfbolz afk
<mattip> komasa: hi
<komasa> hey!
cahoots_ has joined #pypy
<mattip> I am looking at the links you posted, appicaptor looks nice
cahoots_ has quit [Ping timeout: 240 seconds]
<mattip> do you know why angr + pypy + appicaptor does so well? Is there something in the data flow?
<komasa> In what sense "does so well"?
<komasa> The performance?
<mattip> yes
<komasa> It's basically "angr + pypy", appicaptor is mostly in JVM languages, it's just that the component for static analysis of native code is in angr
<mattip> got it
<mattip> about "the OS can't reclaim the memory that pypy allocated, even after pypy doesn't need it anymore"
<komasa> I think the huge reason is that many analysis rely on the general idea of "abstract intepretation"
<mattip> I think that is a general malloc() limitation?
<komasa> "Abstract Interpretation" is basically a different kind of interpreter/emulator for some language
<mattip> ahh, and PyPy does well with emulators, that we know
<komasa> exactly
<mattip> cool
<mattip> do you know where in angr memory is being allocated? Maybe we are missing some gc hints
<mattip> that would tell the GC "hey, you may thing this object is small, but actually it is quite large"
<mattip> that can happen with c-extensions
<komasa> I'm currently tracking this down in the actual angr code
<komasa> I'm also talking to the main dev of the relevant code, and the actual memory usage is also just way too high
<komasa> How do those GC hints work?
<mattip> I see this cpp code for instance
<komasa> That is for using the unicorn framework for quick faithful native code emulation
<komasa> This is _not_ used in the current workloads I run
<komasa> The other culprit I _could_ imagine is the pyvex component
<mattip> if you know how much a c call is allocating, you can use __pypy__.add_memory_pressure(bytes)
<komasa> But pyvex is only be responsible for lifting the native code to the IR, so the overall data should not be much larger than the amount of code being analyzed
<komasa> According to memray, the specific line of code that causes ~half the allocations is https://github.com/angr/angr/blob/7d9bffb13d93c2860dc08a6e70299fa4aad03ed5/angr/knowledge_plugins/key_definitions/liveness.py#L44
<komasa> (half the allocations in terms of overall GB of memory)
<komasa> but this is on CPython
<mattip> weird, maybe the line before?
<mattip> ahh, it is a default dict, so that line could be adding a new entry to self.loc_to_defs
<komasa> "weird, maybe the line before?" good idea, but _that_ line is responsible for the remaining ~40% of the allocations :P
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 245 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 264 seconds]
cahoots_ has joined #pypy
<mattip> komasa: I guess my debugging-via-IRC skills are not up to the task
<mattip> happy to continue the discussion on an issue or via mail, I am mattip on github and https://foss.heptapod.net/pypy/pypy
cahoots_ has quit [Ping timeout: 264 seconds]
ruth2345345 has quit [Ping timeout: 246 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 264 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 246 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
ruth2345345 has joined #pypy
dmalcolm_ has joined #pypy
dmalcolm__ has quit [Ping timeout: 252 seconds]
lazka has quit [Quit: bye]
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #pypy
lazka has joined #pypy
lazka has quit [Client Quit]
marvin_ has quit [Remote host closed the connection]
marvin_ has joined #pypy
lazka has joined #pypy
lazka has quit [Quit: bye]
marvin_ has quit [Remote host closed the connection]
marvin has joined #pypy
lazka has joined #pypy
dustinm- has quit [Server closed connection]
dustinm has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
otisolsen70 has quit [Read error: Connection reset by peer]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
cahoots_ has quit [Ping timeout: 240 seconds]
cahoots_ has joined #pypy
sugarbeet has quit [Ping timeout: 246 seconds]
sugarbeet has joined #pypy