cfbolz changed the topic of #pypy to: #pypy PyPy, the flexible snake https://pypy.org | IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end and https://libera.irclog.whitequark.org/pypy | so many corner cases, so little time
Atque has quit [Quit: ...]
Atque has joined #pypy
Atque has quit [Quit: ...]
Atque has joined #pypy
Atque has quit [Quit: ...]
Guest96 has joined #pypy
otisolsen70 has joined #pypy
otisolsen70 has quit [Remote host closed the connection]
otisolsen70 has joined #pypy
danchr_ is now known as danchr
slav0nic has joined #pypy
<cfbolz> mattip: should we mention in the release notes that this will be the last release of 3.7?
<mattip> good point
<cfbolz> mattip: I can add it
<cfbolz> I am doing minor things in there anyway
<mattip> cool
<mattip> 64a22a19910f is on py3.9, probably belongs on default
<cfbolz> oh ouch, thanks
<cfbolz> mattip: both done
<cfbolz> mattip: if we release with the missing dict copy stuff (and it looks like we probably will) we should maybe mention that too in the announcement?
<fijal> cfbolz: hi
<mattip> ok
<mattip> thanks for the fixes
<mgorny> it seems that parallel (-j 12) compileall for pypy3.9 stdlib hangs for me (-j1 is fine)
<mgorny> i wonder if it could mean that pypy3.9 has some generic concurrency issues
<cfbolz> mgorny: interesting, what is the precise command line?
<mgorny> /tmp/portage/dev-python/pypy3-7.3.8_rc1/image/usr/bin/pypy3-c-7.3.8_rc1 -O -m compileall -j 12 -q -f -d /usr/lib/pypy3.9 /tmp/portage/dev-python/pypy3-7.3.8_rc1/image/usr/lib/pypy3.9
<mgorny> (this still using python3.8-ish code, we first run without -O, then with -O, then with -OO)
<mgorny> if i ^c it and run it again, it finishes successfully
<mgorny> but if i start over, i get the same hang consistently
<mgorny> hmm, on 5th attempt it didn't hang
<cfbolz> mgorny: iirc there is some new feature to start subprocesses more efficiently in 3.9, that could be a candidate for the bug
<mgorny> maybe it's related to failures from trying to compile badsyntax etc. -- i see cpython excludes them from compileall explicitly
<cfbolz> that shouldn't hang though, I hope
<cfbolz> (but yes, also possible)
<mgorny> hmm, i recall we had to carry some concurrency-related patch for a while in cpython 3.9
<mgorny> i'll see if it applies to pypy3.9 and changes anything
<cfbolz> mgorny: you could leave of -q to see whether it's always a specific file?
<cfbolz> (I can't reproduce the hang yet)
<mgorny> oh, right -- i was looking for some --verbose option and missed -q ;-D
<mgorny> it doesn't seem to help with this but you probably want to grab https://github.com/python/cpython/commit/3b9d886567c4fc6279c2198b6711f0590dbf3336 anyway -- i recall it caused us quite the pain
<cfbolz> mgorny: we ship that
<cfbolz> it was released as part of cpy 3.9.9
<cfbolz> oops
<cfbolz> 3.9.10
<cfbolz> and we use that stdlib
<mgorny> are we talking of 7.3.8rc1?
<mgorny> ah, sorry, i'm blind
<mgorny> our patch had .copy(), so i didn't notice the list()
<mgorny> well, first hang was after Compiling '/tmp/portage/dev-python/pypy3-7.3.8_rc1/image/usr/lib/pypy3.9/zoneinfo/_common.py'...
<mgorny> so prolly last file
<mgorny> i've added more debug to check if it hangs after last -O file or before first -O file
<cfbolz> ok, thank you
<cfbolz> still no local hang
<mgorny> er, definitely after last file, as i didn't remove -q from other runs
<mattip> maybe some file/resource being help open?
<mgorny> ok, happens without -O too
<mgorny> (but more commonly with -O...)
<cfbolz> I am getting it now too
<cfbolz> progress
<mattip> mgorny: the 3.9 binary is supposed to be called pypy3.9-c and the shared library libpypy3.9-c.so
<mattip> so it can be put next to the other shared libraries
<cfbolz> mgorny: when I strace it, it doesn't hang of course ;-)
<mgorny> mattip: but i don't think renaming it matters here? i'm trying minimal changse from 3.8
<mgorny> cfbolz: according to gdb, they're all waiting on some futex
<mgorny> i can try rebuilding pypy3 with debug symbols if that helps
<mattip> ok, just curious if the new naming feature is useful - but let's find this error first
<mgorny> mattip: for end users, certainly. however, on gentoo i don't really have the resources to support more than one pypy3 version
<mattip> +1
Atque has joined #pypy
<cfbolz> mgorny: when it hangs, there are 12 subprocesses still open. that means the shutdown at the end is definitely not working right
<cfbolz> anyway, need to leave for now
otisolsen70 has quit [Ping timeout: 256 seconds]
<mattip> I think using lower case locale solves the failing own test/test_app_main.py tests:
<mattip> sudo docker run -e LANG=en_us.utf-8 -e LC_ALL=en_us.utf-8 ...
<cfbolz> mattip: "cool"
<mattip> let's see next time a x86_64 own tests run
Atque has quit [Quit: ...]
<mgorny> i've added initial pypy3.9 ebuilds to gentoo (masked)
<mgorny> i've confirmed that the layout works for building pillow
<mgorny> in the next hour, i'm going to start rebuilding everything
Atque has joined #pypy
greedom has joined #pypy
Julian has joined #pypy
greedom has quit [Remote host closed the connection]
greedom has joined #pypy
greedom has quit [Remote host closed the connection]
greedom has joined #pypy
Atque has quit [Quit: ...]
otisolsen70 has joined #pypy
otisolsen70 has quit [Remote host closed the connection]
otisolsen70 has joined #pypy
Julian has quit [Quit: leaving]
Guest96 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Guest96 has joined #pypy
Guest96 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<mgorny> cfbolz: bad news: turns out compileall also hangs on various other packages
<mgorny> including very small packages :-(
<cfbolz> mgorny: yeah, we think it's a more general problem with concurrent futures
<cfbolz> So definitely a release blocker :-(
<mgorny> cfbolz: do you have some suggestions how i could help? i suppose it's much easier to repro on my hardware
<mattip> what disk are you running? I have a nvme ssd and cannot reproduce the hang
<mattip> does it help if you do --jit off ?
<mgorny> nvme as well, though the builds are happening on tmpfs
* tumbleweed hasn't tried started trying to do the new layout with 3.8 yet, but if I get time I'll poke at it in the next days
<tumbleweed> fwiw, make_ssl_data.py isn't valid Python3, that could be breaking your compileall
* tumbleweed uses his own py_compile wrapper, without the benefit of compileall's paralellism
<mgorny> tumbleweed: it happens on random py3 packages as well, e.g. tomli
<mgorny> (and tomli is 4 .py files...)
<tumbleweed> err I meant isn't valid python2.7, it's python3, so it breaks py_compile for pypy2.7
<tumbleweed> right
<tumbleweed> sounds like crappy concurrent futures
<mgorny> actually, it might be even easier to reproduce on small packages, i guess
<cfbolz> mgorny: yes, thanks for the hint with tomli
<mgorny> cfbolz: actually, much harder to reproduce with it
<mgorny> it seems that was one-time lucky streak ;-)
<mgorny> mattip: --jit off doesn't help
<cfbolz> seems trying to compileall multiprocessing hangs for me, and is a lot less files than the whole stdlib
<mgorny> i managed with just one file ;-)
<mgorny> but man, the logic is so complex :-(
<mgorny> that said, i know that there were some serious multiprocessing bugs in cpython
<mgorny> i had a tool (gemato) that i've made parallel at some point but i had to revert that because some users had weird hangs i couldn't ever reproduce
<mgorny> and that made very little sense -- i suspect it was some bug in cpython that might still be there
<mattip> does lsof help show which files are open?
<cfbolz> yep
<mgorny> a bunch of /dev/shm/sem*, a bunch of pipes and /dev/null
<mgorny> interestingly, it seems that the number of pipes grow with pid
<cfbolz> mgorny: that's the parent process, I suspect there's a pipe for all the subprocesses in the pool
<mgorny> yeah but check child processes
<mgorny> for some reason child+1 has more pipes than child
<mgorny> maybe they're not closing something?
<mgorny> i mean, i don't see why compileall would have different pipe counts in children
* tumbleweed has the multiprocessing test suite disabled, because it leaves stray processes on the build daemons
<cfbolz> I bet they are not closing anything
<cfbolz> multiprocessing has a history of relying on refcounting
<cfbolz> it's some kind of weird deadlock. one work item is not done, but both the manager process and the worker processes are waiting for each other
<mattip> more than one lock and a race condition?
<mattip> can gdb tell you where the processes are waiting?
<cfbolz> not helpfully
<cfbolz> because I'd like to know where in *python* we are waiting
<mattip> right
<cfbolz> I am fairly sure it's not really a bug on either side. I suspect multiprocessing or concurrent futures are making assumptions
<mgorny> cfbolz: maybe you could try using pdb? i've seen pdb-attach package too, maybe that could help
<cfbolz> thanks, good idea
<cfbolz> (of course all the problems go away if I ask multiprocessing to use spawn for processes instead of fork. why is that not the default yet?)
<cfbolz> which means that something somewhere is not fork safe or does not have the right fork hooks registered in the right order :-(
sam_ has joined #pypy
<mgorny> i think cpython defaults to spawn these days
<mattip> we could decide spawn should be the default for PyPy, right?
<mgorny> i'm looking where to swap that ;-)
<mgorny> if anything, that sounds like a good interim solution
<mgorny> k, found it
<cfbolz> mgorny: does it? in 3.9? we would have picked that up, no?
<mgorny> no, i was wrong
<mgorny> i was probably thinking of subprocess
<cfbolz> yes
<mgorny> trying with spawn context now
<mgorny> no hangs so far
<cfbolz> right
<cfbolz> are we comfortable changing that default? I am not sure how to make that decision
<cfbolz> (my print debugging makes this look weirder and weirder, fwiw)
<mattip> well, if cpython is moving in that direction we can accelerate the process
<mattip> (I found the os problem: os.scandir(fp) does not call rewinddir() before closing )
<cfbolz> mattip: cool, that's a good outcome of today ;-)
<cfbolz> gdb time! need to build a debug version
<cfbolz> mattip: I am not sure cpython is moving towards spawn as the default
<sam_> spawn is the default on windows/macos
<cfbolz> sam_: thanks
<sam_> np!
<cfbolz> that sounds like a good enough reason ;-)
<sam_> (I know this because I've struggled with bugs on macOS caused by the default changing)
<mgorny> well, it'd be weird if fork() was the default on windows ;-)
<cfbolz> "Note that safely forking a multithreaded process is problematic." - I love this, because that's what concurrent.futures is doing, no?
greedom has quit [Remote host closed the connection]
<mgorny> cfbolz, mattip: are you ok with me switching the default for Gentoo?
<cfbolz> mgorny: we should document it then
<mgorny> (at least for rc1, I can switch it for the final release)
<mgorny> switch it back*
<cfbolz> sounds like a plan
Guest96 has joined #pypy
<cfbolz> mgorny: ok, I will try to add a comment
<mgorny> thanks
<mattip> mgorny: if we can't fix 3650, we should switch for the release
<mgorny> and i'm going to try rebuilding all python packages in my test environment
<mattip> the benefit of fork (lower memory requirements) comes at great cost (don't fork multithreaded processes)
<cfbolz> just typing stuff and using irc as my rubber duck: it seems from gdb that some of the forked processes don't ever start doing any useful work, because they immediately deadlock blocking on some semaphore
<cfbolz> and it seems that the lock that we deadlock on could be the sys.stdout lock??!
<mgorny> maybe cpython uses different kind of underlying locks or sth?
<mattip> maybe the debug printing?
<cfbolz> mattip: possible
* cfbolz switches to os.write
lazka has quit [Quit: Ping timeout (120 seconds)]
lazka has joined #pypy
<cfbolz> mattip: thanks, that makes sense, no it deadlocks somewhere else
<mattip> os.scandir left the fd in a bad state, I don't know whether that is part of the compile_all code?
<cfbolz> mattip: not totally impossible, but unlikely I think
* cfbolz goes to bed
<mattip> gnite
<mgorny> next problem: Py_GenericAlias is needed by regex, filed https://foss.heptapod.net/pypy/pypy/-/issues/3651
Guest96 has quit [Quit: Textual IRC Client: www.textualapp.com]
otisolsen70 has quit [Quit: Leaving]
slav0nic has quit [Ping timeout: 268 seconds]
<mgorny> ok, just two failures from build-time package testing
<mgorny> i'll run some test suites tomorrow
<mgorny> (both reported)