Summary: | dev-lang/python-3.8.2-r2 : hangs at install phase (related to multiprocessing, maybe related to qemu?) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alexander Tsoy <alexander> |
Component: | Current packages | Assignee: | Python Gentoo Team <python> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | daniel, dev-portage, dilfridge, gentoo, jlee, jstein, mgorny, sam, tamiko, virtualization |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
python-3.8.2-r2:20200429-201324.log.gz
Patch Makefile.pre.in to avoid ProcessPoolExecutor with compilall.py -j1 |
Description
Alexander Tsoy
2020-04-29 23:11:06 UTC
Forgot to add: according to strace, all processes a waiting for mutex. I wonder if this is related to a different problem I've been having with Portage on Python 3.8. I've been seeing hangs upon completion of rsyncing the main Gentoo repo (with 'emaint sync -a' or 'emerge --sync'). The process goes idle, blocked in a FUTEX_WAIT_PRIVATE syscall. It might be related to https://bugs.python.org/issue39360 , whose minimal reproducer in the first comment hangs on my system (with some non-zero probability). I can confirm this, with qemu-user chroots both for riscv64 and arm. It happens about 50% of emerges, and (from all system packages, including python-37) *only* with python-3.8. (Which makes building stages with catalyst somewhat painful.) Given that noone else has reported it yet, it might also be qemu-specific. (In reply to Matt Whitlock from comment #2) > I wonder if this is related to a different problem I've been having with > Portage on Python 3.8. I've been seeing hangs upon completion of rsyncing > the main Gentoo repo (with 'emaint sync -a' or 'emerge --sync'). The process > goes idle, blocked in a FUTEX_WAIT_PRIVATE syscall. It might be related to > https://bugs.python.org/issue39360 , whose minimal reproducer in the first > comment hangs on my system (with some non-zero probability). that looks interesting, indeed. I've seen a somewhat similar problem recently. I've been building multiple Python versions in parallel, and install phases of all of them suddenly hanged. It turned out that it was caused by a parallel emerge process that I've stopped via ^z (i.e. a process that shouldn't affect this emerge at all). Might have to do something with ptys. (In reply to Matt Whitlock from comment #2) > I wonder if this is related to a different problem I've been having with > Portage on Python 3.8. I've been seeing hangs upon completion of rsyncing > the main Gentoo repo (with 'emaint sync -a' or 'emerge --sync'). The process > goes idle, blocked in a FUTEX_WAIT_PRIVATE syscall. It might be related to > https://bugs.python.org/issue39360 , whose minimal reproducer in the first > comment hangs on my system (with some non-zero probability). I hope this patch for bug 730192 solves that: https://gitweb.gentoo.org/proj/portage.git/commit/?id=bde44b75407dfe0a390033636894a136af4e7533 The c(In reply to Alexander Tsoy from comment #0) > Created attachment 635252 [details] > python-3.8.2-r2:20200429-201324.log.gz > > python-3.8.2-r2 hangs at install phase. Looks like a deadlock related to > multiprocessing (again?). This is an armv7 container on a x84_64 host (via > qemu-arm). See the result of double Ctrl+C in attached build log. > > Also note that Makefile is unconditionally passing -j0 to compileall.py > > $ ps auxww | grep python3.8 > root 216855 0.1 0.1 4441276 35792 pts/0 SNl+ апр29 0:07 > /usr/bin/qemu-arm ./python -E -Wi > /var/tmp/portage/dev-lang/python-3.8.2-r2/image/usr/lib/python3.8/compileall. > py -j0 -d /usr/lib/python3.8 -f -x > bad_coding|badsyntax|site-packages|lib2to3/tests/data > /var/tmp/portage/dev-lang/python-3.8.2-r2/image/usr/lib/python3.8 With -j0, it uses concurrent.futures.ProcessPoolExecutor, and this looks very similar to the gemato deadlock from bug 647964. If we patch it to use -j1 instead, then it won't use ProcessPoolExecutor. Created attachment 650130 [details, diff]
Patch Makefile.pre.in to avoid ProcessPoolExecutor with compilall.py -j1
(In reply to Michał Górny from comment #6) > Might have to do something with ptys. If the underlying cause is the same as my 'emaint sync -a' hangs, then it's not related to PTYs, as I see the hangs even when running emaint-sync from a cronjob. By the way, the hang at the end of repo syncing is very reproducible for me. I don't know enough about Python debugging to get a Python stacktrace, but I can get a native stacktrace, which is how I know the process is blocked in a futex syscall. (In reply to Matt Whitlock from comment #10) > (In reply to Michał Górny from comment #6) > > Might have to do something with ptys. > > If the underlying cause is the same as my 'emaint sync -a' hangs, then it's > not related to PTYs, as I see the hangs even when running emaint-sync from a > cronjob. > > By the way, the hang at the end of repo syncing is very reproducible for me. > I don't know enough about Python debugging to get a Python stacktrace, but I > can get a native stacktrace, which is how I know the process is blocked in a > futex syscall. Please try https://github.com/gentoo/portage/pull/565.patch for emaint sync and or emerge --sync deadlocks. (In reply to Zac Medico from comment #11) > Please try https://github.com/gentoo/portage/pull/565.patch for emaint sync > and or emerge --sync deadlocks. Would I need PR 580 also? I see both PRs linked at https://bugs.gentoo.org/730192, and it looks like it's still a work in progress. (In reply to Matt Whitlock from comment #12) > (In reply to Zac Medico from comment #11) > > Please try https://github.com/gentoo/portage/pull/565.patch for emaint sync > > and or emerge --sync deadlocks. > > Would I need PR 580 also? I see both PRs linked at > https://bugs.gentoo.org/730192, and it looks like it's still a work in > progress. For emaint sync and emerge --sync, PR 565 is enough (it's also included in sys-apps/portage-3.0.0-r1). PR 580 applies the same fix to merge / unmerge code. Since compileall.py deadlocks in concurrent.futures.ProcessPoolExecutor, I've searched for issues mentioning ProcessPoolExecutor and found at least these: https://bugs.python.org/issue35809 https://bugs.python.org/issue35866 (In reply to Zac Medico from comment #9) > Created attachment 650130 [details, diff] [details, diff] > Patch Makefile.pre.in to avoid ProcessPoolExecutor with compilall.py -j1 Over the last days I used a patch equivalent to this one for stage builds. I haven't seen a new hang yet, so looks good. (In reply to Andreas K. Hüttel from comment #15) > (In reply to Zac Medico from comment #9) > > Created attachment 650130 [details, diff] [details, diff] [details, diff] > > Patch Makefile.pre.in to avoid ProcessPoolExecutor with compilall.py -j1 > > Over the last days I used a patch equivalent to this one for stage builds. > I haven't seen a new hang yet, so looks good. https://gitweb.gentoo.org/proj/releng.git/tree/releases/portage/stages-qemu/patches/dev-lang/python:3.8/compileall-singlethreaded.patch for reference... FYI I've still seen this with python 3.10 at least. |