<LarstiQ>
`refs = gc.get_referrers(old)` seems to be the slow bit
otisolsen70 has joined #pypy
Guest96 has joined #pypy
otisolsen70 has quit [Ping timeout: 260 seconds]
Guest96 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
otisolsen70 has joined #pypy
jacob22 has quit [Quit: Konversation terminated!]
Guest96 has joined #pypy
arkanoid has joined #pypy
<arkanoid>
hello! I'm trying pypy again after a looong time, when numpy and pandas were not compatible. I've downloaded pypy from official page binary release, I've created a virtualenv for it and I've executed a pip install for my dependencies. It's taking a very long time!
<arkanoid>
on pandas and numpy in particular, it took more than 5 mins to get over it
<arkanoid>
I see it's spending time on compiling stuff
<arkanoid>
is there a way to handle this in a binary distribution fashion and recompile only what is necessary to be recompiled?
<mattip>
the wheels there are for pyarrow-0.15.0a0-pp36, but the methodology should work for building your own pypy3.7/pypy3.8 wheels
<arkanoid>
mattip: I've tried downloading the wheel there, but I'm getting ERROR: pyarrow-0.15.0a0-pp36-pypy36_pp73-linux_x86_64.whl is not a supported wheel on this platform.
<arkanoid>
have to drop pypy for now, maybe I can optimize just a portion of code with numba?
<mattip>
to be honest, if your workflow uses pandas/pyarrow/other c-extension libraries, you are likely to see poor performance under pypy
<arkanoid>
mattip: no, what I need to optimize is networkx, that is pure python
<arkanoid>
but I need it into the usual scientific stack + streamlit, that streamlit wants pyarrow
<mattip>
we used to have a way to open a cpython process and a pypy process and have them hand off objects to eachother
<mattip>
but I don't know if anyone used it in earnest
<Corbin>
arkanoid: What's your actual speed target or performance goals? You could estimate that there's maybe a 20x speedup from moving away from "the usual scientific stack" and using array.array with PyPy directly. If that's worth it, then maybe ditch Numpy entirely?
<Corbin>
I asked Nix to build `pypy3.withPackages (ps: [ ps.pyarrow ])` and it asked me to manually mark several packages as unbroken on PyPy. The current blocking broken package is XlsxWriter, but I'm sure that there are others in the way.
<arkanoid>
I'm ok with numpy speed with cpython too, the slow guy in the room is networkx algorithms
<arkanoid>
I run algorithms on large graphs and it takes quite some time, and I feels there's room for optimization there
<Corbin>
mattip's microservice suggestion might be a good idea. I've had situations where the graph algorithms need to live in their own process.
<arkanoid>
sure I could rpc and all the things, but if I have to do that I'd switch to a C based graph lib
<arkanoid>
well, also numba fails to optimize lib call in both nopython and object modes
jacob22 has joined #pypy
<mattip>
arkanoid: can you somehow benchmark pypy performance for your slow networkx code
<mattip>
by serializing the inputs to the functions and comparing runs of cpython to pypy
<mattip>
to make sure this is worth the effort
<mattip>
if pypy is not significantly faster, we would definitely want to know
<cfbolz>
(so far our experiments with networkx were always quite positive