Created attachment 885694 [details] emerge --info after updating to sys-apps/portage-3.0.62 emerge does not return to the Shell after merging a(ny) package. Interestingly enough, it only happens on one out of five of my ~amd64 boxes… pressing ^C reveals the attached trace
Created attachment 885695 [details] output after pressing Ctrl-C
just noticed: also happens on an ~x86 and an ~arm box…
This is triggered when atexit hooks try to call into the event loop, because process.spawn uses the event loop since bug 916566.
This commit might help because it changed the atexit stop hook to use asyncio.run, but we'll have to verify that asyncio.run actually works in the atexit context: https://gitweb.gentoo.org/proj/portage.git/commit/?id=fbaaa4a733aaadc2744b656527756ac4e2b7ab58 commit fbaaa4a733aaadc2744b656527756ac4e2b7ab58 Author: Zac Medico <zmedico@gentoo.org> Date: Wed Feb 21 22:47:33 2024 -0800 socks5: Use real asyncio.run Use real asyncio.run to demonstrate that it is compatible with portage internals. Since the socks5 ProxyManager uses the process.spawn function, the internal _running_loop function needs to return the correct loop for use in the wait method of MultiprocessingProcess, or else it will lead to Future "attached to a different loop" errors. Bug: https://bugs.gentoo.org/761538 Signed-off-by: Zac Medico <zmedico@gentoo.org>
no - issue persists even with the proposed patch I've placed the patch below /etc/portage/patches/sys-apps/portage-3.0.62/… and re-merged =sys-apps/portage-3.0.62 (twice, of course :))
(any additional idea why this is not an issue on *every* system…?)
It would be easier to determine if we had emerge --info from a system of yours where it doesn't happen, but it's probably because of distcc use here.
This little test case shows that asyncio.run works in atexit hooks: > #!/usr/bin/env python > import asyncio > import atexit > > > async def async_func(): > print("hello world", flush=True) > > > def atexit_hook(): > asyncio.run(async_func()) > > > if __name__ == "__main__": > atexit.register(atexit_hook) (In reply to Christian Bricart from comment #5) > no - issue persists even with the proposed patch > > I've placed the patch below /etc/portage/patches/sys-apps/portage-3.0.62/… > and re-merged =sys-apps/portage-3.0.62 (twice, of course :)) Great! This other patch was intended to be a dependency so I think you'll need it to avoid possible "attached to a different loop" errors: https://gitweb.gentoo.org/proj/portage.git/commit/?id=d718cea94a180042b2285698b2c19113c5d25987 commit d718cea94a180042b2285698b2c19113c5d25987 Author: Zac Medico <zmedico@gentoo.org> Date: Wed Feb 21 22:41:49 2024 -0800 _get_running_loop: Support real asyncio.run When called via the real asyncio.run implementation, wrap the running asyncio loop. Otherwise, it's not possible to call portage libraries via the real asyncio.run without triggering Future "attached to a different loop" errors. Bug: https://bugs.gentoo.org/761538 Signed-off-by: Zac Medico <zmedico@gentoo.org> (In reply to Christian Bricart from comment #6) > (any additional idea why this is not an issue on *every* system…?) It's triggered when the get_socks5_proxy function is called, and that only happens for FEATURES=network-sandbox-proxy or FEATURES=distcc combined with FEATURES=network-sandbox.
Created attachment 885727 [details] emerge --info of a working ~amd64 box (In reply to Sam James from comment #7) > It would be easier to determine if we had emerge --info from a system of > yours where it doesn't happen sure - attached > but it's probably because of distcc use here. YES! good catch: # FEATURES="-distcc" emerge -1v =sys-apps/portage-3.0.62 worked and ended up in the Shell
(In reply to Zac Medico from comment #4) commit fbaaa4a733aaadc2744b656527756ac4e2b7ab58 (In reply to Zac Medico from comment #8) > commit d718cea94a180042b2285698b2c19113c5d25987 applied both as user-patches - issue persists
FYI: (In reply to Christian Bricart from comment #9) > # FEATURES="-distcc" emerge -1v =sys-apps/portage-3.0.62 > worked and ended up in the Shell FEATURES="-network-sandbox" emerge -1v =sys-apps/portage-3.0.62 also worked fine. So Zac's analysis is fully correct here
When I tested locally with the current master branch and FEATURES=network-sandbox-proxy and had to press ^C twice for it to exit like this: > >>> Jobs: 1 of 1 complete Load avg: 0.68, 0.23, 0.08 > ^CException ignored in atexit callback: <function _exit_function at 0x7f1f01a913a0> > Traceback (most recent call last): > File "/usr/lib/python3.12/multiprocessing/util.py", line 360, in _exit_function > Traceback (most recent call last): > File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run > return self._loop.run_until_complete(task) > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/usr/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete > p.join() > File "/usr/lib/python3.12/multiprocessing/process.py", line 149, in join > res = self._popen.wait(timeout) > ^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 43, in wait > return future.result() > ^^^^^^^^^^^^^^^ > return self.poll(os.WNOHANG if timeout == 0.0 else 0) > File "portage/bin/socks5-server.py", line 226, in run_socks5_server > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 27, in poll > pid, sts = os.waitpid(self.pid, flag) > await asyncio.get_running_loop().create_future() > ^^^^^^^^^^^^^^^^^^^^^^^^^^ > KeyboardInterrupt: > asyncio.exceptions.CancelledError > > During handling of the above exception, another exception occurred: > > ^C > ==================================== > Error in portage.process.run_exitfuncs > > File "portage/lib/portage/process.py", line 217, in run_exitfuncs > func(*targs, **kargs) > File "portage/lib/portage/util/socks5.py", line 77, in stop > asyncio.run(self._proc.wait()) > File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run > return runner.run(main) > ^^^^^^^^^^^^^^^^ > File "/usr/lib/python3.12/asyncio/runners.py", line 123, in run > raise KeyboardInterrupt() > ==================================== > > Exception ignored in atexit callback: <function run_exitfuncs at 0x7f1f01e85bc0> > Traceback (most recent call last): > File "portage/lib/portage/process.py", line 225, in run_exitfuncs > raise exc_info[0](exc_info[1]).with_traceback(exc_info[2]) > File "portage/lib/portage/process.py", line 217, in run_exitfuncs > func(*targs, **kargs) > File "portage/lib/portage/util/socks5.py", line 77, in stop > asyncio.run(self._proc.wait()) > File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run > return runner.run(main) > ^^^^^^^^^^^^^^^^ > File "/usr/lib/python3.12/asyncio/runners.py", line 123, in run > raise KeyboardInterrupt() > KeyboardInterrupt: > [ERROR] Task was destroyed but it is pending! > task: <Task pending name='Task-74' coro=<MultiprocessingProcess._proc_join() running at portage/lib/portage/process.py:423 > > wait_for=<Future pending cb=[sleep.<locals>.cancel_callback() at portage/lib/portage/util/futures/_asyncio/__init__.py:2 > 42, Task.task_wakeup()]> cb=[MultiprocessingProcess._proc_join_done()]>
It seems the multiprocess.Process join method never completes in atexit hooks, even though the corresponding process is in a defunct state.
If I patch it to call the atexit hooks in post_emerge like this then it hangs in asyncio.run: --- a/lib/_emerge/post_emerge.py +++ b/lib/_emerge/post_emerge.py @@ -109,4 +109,6 @@ def post_emerge(myaction, myopts, myfiles, target_root, trees, mtimedb, retval): _flush_elog_mod_echo() + portage.process.run_exitfuncs() + if not vardbapi._pkgs_changed: # GLEP 42 says to display news *after* an emerge --pretend I sent it a SIGUSR1 and found that the ProxyManager's _proc (instance of MultiprocessingProcess) has waiters attached to two different loops, and the first loop is no longer running: (Pdb) p self._proc._exit_waiters[0]._loop <_UnixSelectorEventLoop running=False closed=False debug=False> (Pdb) p self._proc._exit_waiters[1]._loop <_UnixSelectorEventLoop running=True closed=False debug=False>
It exits normally with this patch and emits "[ERROR] Task was destroyed but it is pending!" errors for the waiters that were attached to the old loop: diff --git a/lib/_emerge/post_emerge.py b/lib/_emerge/post_emerge.py index 37e2c3cc80..5ca9176397 100644 --- a/lib/_emerge/post_emerge.py +++ b/lib/_emerge/post_emerge.py @@ -109,4 +109,6 @@ def post_emerge(myaction, myopts, myfiles, target_root, trees, mtimedb, retval): _flush_elog_mod_echo() + portage.process.run_exitfuncs() + if not vardbapi._pkgs_changed: # GLEP 42 says to display news *after* an emerge --pretend diff --git a/lib/portage/process.py b/lib/portage/process.py index f4758c824c..6cf2e9025c 100644 --- a/lib/portage/process.py +++ b/lib/portage/process.py @@ -384,5 +384,7 @@ class MultiprocessingProcess(AbstractProcess): loop = global_event_loop() - if not self._exit_waiters: + if not self._exit_waiters or not any( + loop._loop is waiter._loop for waiter in self._exit_waiters + ): asyncio.ensure_future(self._proc_join(), loop=loop).add_done_callback( self._proc_join_done @@ -434,4 +436,6 @@ class MultiprocessingProcess(AbstractProcess): for waiter in self._exit_waiters: + if waiter._loop is not future._loop: + continue if not waiter.cancelled(): waiter.set_result(self.returncode)
(In reply to Zac Medico from comment #14) > I sent it a SIGUSR1 and found that the ProxyManager's _proc (instance of > MultiprocessingProcess) has waiters attached to two different loops, and the > first loop is no longer running: > > (Pdb) p self._proc._exit_waiters[0]._loop > <_UnixSelectorEventLoop running=False closed=False debug=False> > (Pdb) p self._proc._exit_waiters[1]._loop > <_UnixSelectorEventLoop running=True closed=False debug=False> In https://github.com/gentoo/portage/pull/1276 I've fixed it to use portage's global event loop when necessary to prevent this issue of having waiters attached to different loops, like this: --- a/lib/portage/util/socks5.py +++ b/lib/portage/util/socks5.py @@ -10,0 +11,7 @@ from typing import Union +import portage + +portage.proxy.lazyimport.lazyimport( + globals(), + "portage.util._eventloop.global_event_loop:global_event_loop", +) + @@ -77 +84,4 @@ class ProxyManager: - asyncio.run(self._proc.wait()) + # In this case spawn internals would have used + # portage's global loop when attaching a waiter to + # self._proc, so we are obligated to use that. + global_event_loop().run_until_complete(self._proc.wait())
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=3cc986f87ddda86ee93770e03cca06346aee54c5 commit 3cc986f87ddda86ee93770e03cca06346aee54c5 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-02-23 06:06:14 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-02-23 06:48:29 +0000 AsyncioEventLoop: Call process.run_exitfuncs() before close For the event loop running in the main thread, call process.run_exitfuncs() before close with the event loop running so that anything attached can clean itself up (like the socks5 ProxyManager for bug 925240). This is necessary because process.spawn uses the event loop to implement the new returnproc parameter related to bug 916566. Bug: https://bugs.gentoo.org/916566 Bug: https://bugs.gentoo.org/925240 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/tests/util/test_socks5.py | 51 ++++++++++++++++++++++- lib/portage/util/_eventloop/asyncio_event_loop.py | 44 +++++++++++++++---- lib/portage/util/socks5.py | 16 ++++++- 3 files changed, 101 insertions(+), 10 deletions(-)
confirming, using =sys-apps/portage-9999 from HEAD 3cc986f87ddda86ee93770e03cca06346aee54c5 does not show the issue anymore and otherwise seems to work fine - thanks Zac!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6ad9d3103abc02f60d9e123ae21fa4a2e69b7e38 commit 6ad9d3103abc02f60d9e123ae21fa4a2e69b7e38 Author: Sam James <sam@gentoo.org> AuthorDate: 2024-02-25 08:32:40 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-02-25 08:32:47 +0000 sys-apps/portage: add 3.0.63 Closes: https://bugs.gentoo.org/925214 Closes: https://bugs.gentoo.org/651018 Closes: https://bugs.gentoo.org/922935 Closes: https://bugs.gentoo.org/925240 Closes: https://bugs.gentoo.org/925311 Closes: https://bugs.gentoo.org/925333 Closes: https://bugs.gentoo.org/925350 Closes: https://bugs.gentoo.org/925456 Closes: https://bugs.gentoo.org/925460 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-3.0.63.ebuild | 246 +++++++++++++++++++++++++++++++++ 2 files changed, 247 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=11c65496bd951c3b7778a3c1ea240e347b1f4c38 commit 11c65496bd951c3b7778a3c1ea240e347b1f4c38 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-03-03 21:06:13 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-03-03 21:10:49 +0000 socks5: Use run_coroutine_exitfuncs() Since commit c3ebdbb42e72 the atexit_register(proxy.stop) call in the get_socks5_proxy function can accept a coroutine function to execute in run_coroutine_exitfuncs(), so convert the ProxyManager stop method to a coroutine and use run_coroutine_exitfuncs(). Bug: https://bugs.gentoo.org/925240 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/tests/util/test_socks5.py | 5 ++- lib/portage/util/_eventloop/asyncio_event_loop.py | 12 ------- lib/portage/util/socks5.py | 39 +++-------------------- 3 files changed, 7 insertions(+), 49 deletions(-)