Summary: | sys-app/portage: forkserver multiprocessing start method triggers OSError: AF_UNIX path too long with python 3.14 | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Zac Medico <zmedico> |
Component: | Unclassified | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | sam, service, zmedico |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=941955 https://github.com/python/cpython/issues/132124 https://bugs.gentoo.org/show_bug.cgi?id=956004 https://github.com/python/cpython/pull/134085 https://bugs.gentoo.org/show_bug.cgi?id=956378 https://bugs.gentoo.org/show_bug.cgi?id=957070 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Zac Medico
![]() I've found a simple workaround using spawn: diff --git a/lib/portage/locks.py b/lib/portage/locks.py index ee40451b12..377df22a16 100644 --- a/lib/portage/locks.py +++ b/lib/portage/locks.py @@ -29,0 +30,3 @@ import warnings +if multiprocessing.get_start_method() == "forkserver": + multiprocessing = multiprocessing.get_context("spawn") + diff --git a/lib/portage/util/_async/ForkProcess.py b/lib/portage/util/_async/ForkProcess.py index 946978b301..9c138b63d9 100644 --- a/lib/portage/util/_async/ForkProcess.py +++ b/lib/portage/util/_async/ForkProcess.py @@ -5 +5 @@ import fcntl -import multiprocessing +import multiprocessing as _multiprocessing @@ -19,0 +20,4 @@ _registered_run_exitfuncs = None +if _multiprocessing.get_start_method() == "forkserver": + multiprocessing = _multiprocessing.get_context("spawn") +else: + multiprocessing = _multiprocessing @@ -36 +40 @@ class ForkProcess(SpawnProcess): - _HAVE_SEND_HANDLE = getattr(multiprocessing.reduction, "HAVE_SEND_HANDLE", False) + _HAVE_SEND_HANDLE = getattr(_multiprocessing.reduction, "HAVE_SEND_HANDLE", False) @@ -160 +164 @@ class ForkProcess(SpawnProcess): - multiprocessing.reduction.send_handle( + _multiprocessing.reduction.send_handle( @@ -300 +304 @@ class ForkProcess(SpawnProcess): - fd_pipes_map[fd] = multiprocessing.reduction.recv_handle( + fd_pipes_map[fd] = _multiprocessing.reduction.recv_handle( There are some additional test failures even after the partial spawn patch, which is interesting because these failures do not appear in the test run that entirely uses spawn: FAILED ../../home/runner/work/portage/portage/lib/portage/tests/ebuild/test_ipc_daemon.py::IpcDaemonTestCase::testIpcDaemon FAILED ../../home/runner/work/portage/portage/lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitKillHardlink FAILED ../../home/runner/work/portage/portage/lib/portage/tests/locks/test_lock_nonblock.py::LockNonblockTestCase::testLockNonblockHardlink - AssertionError: 1 != 0 FAILED ../../home/runner/work/portage/portage/lib/portage/tests/process/test_spawn_returnproc.py::SpawnReturnProcTestCase::testSpawnReturnProcTerminate The mysterious test failures seem to be resolved by pytest-rerunfailures like this: --reruns 5 --only-rerun "node down: Not properly terminated" It seems that this issue is fixed in 3.14.0a3. There are these other test failures with 3.14.0a3 though: =================================== FAILURES =================================== _________________ lib/portage/tests/util/futures/test_retry.py _________________ [gw8] linux -- Python 3.14.0 /opt/hostedtoolcache/Python/3.14.0-alpha.3/x64/bin/python worker 'gw8' crashed while running 'lib/portage/tests/util/futures/test_retry.py::RetryForkExecutorTestCase::testCancelRetry' _________________ lib/portage/tests/util/futures/test_retry.py _________________ [gw14] linux -- Python 3.14.0 /opt/hostedtoolcache/Python/3.14.0-alpha.3/x64/bin/python worker 'gw14' crashed while running 'lib/portage/tests/util/futures/test_retry.py::RetryForkExecutorTestCase::testHangForever' ______________ lib/portage/tests/locks/test_asynchronous_lock.py _______________ [gw1] linux -- Python 3.14.0 /opt/hostedtoolcache/Python/3.14.0-alpha.3/x64/bin/python worker 'gw1' crashed while running 'lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitCancelHardlink' (In reply to Zac Medico from comment #4) > It seems that this issue is fixed in 3.14.0a3. Actually it is still present. *** Bug 955884 has been marked as a duplicate of this bug. *** There's https://github.com/python/cpython/issues/132124 to avoid bombing out on too-long paths but it looks stalled. With https://github.com/python/cpython/pull/134085, I get 8 failures instead of 25. The remaining failures are: ``` FAILED lib/portage/tests/util/file_copy/test_copyfile.py::CopyFileSparseTestCase::testCopyFileSparse - Failed: sparse copy failed with _fastcopy FAILED lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitKill FAILED lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitKillHardlink FAILED lib/portage/tests/process/test_spawn_returnproc.py::SpawnReturnProcTestCase::testSpawnReturnProcTerminate FAILED lib/portage/tests/emerge/test_baseline.py::test_portage_baseline[dispatch-conf-xpak] - AssertionError: 'dispatch-conf' failed with args '()' assert 0 == -15 + where 0 = os.EX_OK + and -15 = <Process 13913>.returncode FAILED lib/portage/tests/emerge/test_baseline.py::test_portage_baseline[dispatch-conf-gpkg] - AssertionError: 'dispatch-conf' failed with args '()' assert 0 == -15 + where 0 = os.EX_OK + and -15 = <Process 26155>.returncode FAILED lib/portage/tests/locks/test_lock_nonblock.py::LockNonblockTestCase::testLockNonblockHardlink - AssertionError: 1 != 0 FAILED lib/portage/tests/emerge/test_config_protect.py::ConfigProtectTestCase::testConfigProtect - AssertionError: 0 != -15 : emerge failed with args ('/usr/bin/python3.14', '-b', '-Wd', '/var/tmp/portage.notmp/portage/sys-apps/portage-9999/work/portage-9999/bin/dispatch-conf') ``` The first one is one I've had for a while and is unrelated to py3.14. *** Bug 956378 has been marked as a duplicate of this bug. *** The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=e5b7b8a3ac0a710272e174851673e5eb74250a69 commit e5b7b8a3ac0a710272e174851673e5eb74250a69 Author: Sam James <sam@gentoo.org> AuthorDate: 2025-06-03 01:56:06 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2025-06-03 01:57:07 +0000 bin: env-update, dispatch-conf: add missing __name__ main guards for Python 3.14 The remaining test failures I see with Python 3.14 look like: ``` RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable. To fix this issue, refer to the "Safe importing of main module" section in https://docs.python.org/3/library/multiprocessing.html ``` We're indeed missing some of those guards, so let's add them in. Bug: https://bugs.gentoo.org/941956 Signed-off-by: Sam James <sam@gentoo.org> bin/dispatch-conf | 29 +++++++++++++++-------------- bin/env-update | 44 +++++++++++++++++++++++--------------------- 2 files changed, 38 insertions(+), 35 deletions(-) The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=c387e1ec61dede826dca25b831905384b83fbd56 commit c387e1ec61dede826dca25b831905384b83fbd56 Author: Sam James <sam@gentoo.org> AuthorDate: 2025-06-03 02:04:09 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2025-06-03 02:06:28 +0000 bin: lock-helper.py: move more code into __name__ main guard for Python 3.14 I was still seeing (flaky!) test failures after e5b7b8a3ac0a710272e174851673e5eb74250a69 with: ``` FAILED lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitKill FAILED lib/portage/tests/process/test_spawn_returnproc.py::SpawnReturnProcTestCase::testSpawnReturnProcTerminate FAILED lib/portage/tests/locks/test_asynchronous_lock.py::AsynchronousLockTestCase::testAsynchronousLockWaitKillHardlink FAILED lib/portage/tests/locks/test_lock_nonblock.py::LockNonblockTestCase::testLockNonblockHardlink - AssertionError: 1 != 0 ``` Move more code into the guard in bin/lock-helper.py. It seems to help a bit but not sure if that's completely cured it. In other bin/*, we mostly call disable_legacy_globals in the guard, so this seems right on that end too. Bug: https://bugs.gentoo.org/941956 Signed-off-by: Sam James <sam@gentoo.org> bin/lock-helper.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) I'll close this one, as the root issue is fixed in Python (it now handles too-long paths correctly), and file another for remaining test failures. |