Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 925240 - sys-apps/portage-3.0.62 does not return to shell after merging packages with network-sandbox-proxy or distcc FEATURES combined with network-sandbox
Summary: sys-apps/portage-3.0.62 does not return to shell after merging packages with ...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS, REGRESSION
Depends on: 925214
Blocks:
  Show dependency tree
 
Reported: 2024-02-22 09:38 UTC by Christian Bricart
Modified: 2024-03-03 22:50 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge-info.txt,10.85 KB, text/plain)
2024-02-22 09:38 UTC, Christian Bricart
Details
output after pressing Ctrl-C (portage-trace.txt,3.53 KB, text/plain)
2024-02-22 09:38 UTC, Christian Bricart
Details
emerge --info of a working ~amd64 box (emerge-info.txt,10.38 KB, text/plain)
2024-02-22 17:25 UTC, Christian Bricart
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Bricart 2024-02-22 09:38:02 UTC
Created attachment 885694 [details]
emerge --info

after updating to sys-apps/portage-3.0.62 emerge does not return to the Shell after merging a(ny) package.

Interestingly enough, it only happens on one out of five of my ~amd64 boxes…

pressing ^C reveals the attached trace
Comment 1 Christian Bricart 2024-02-22 09:38:49 UTC
Created attachment 885695 [details]
output after pressing Ctrl-C
Comment 2 Christian Bricart 2024-02-22 10:55:15 UTC
just noticed: also happens on an ~x86 and an ~arm box…
Comment 3 Zac Medico gentoo-dev 2024-02-22 15:44:06 UTC
This is triggered when atexit hooks try to call into the event loop, because process.spawn uses the event loop since bug 916566.
Comment 4 Zac Medico gentoo-dev 2024-02-22 16:31:11 UTC
This commit might help because it changed the atexit stop hook to use asyncio.run, but we'll have to verify that asyncio.run actually works in the atexit context:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=fbaaa4a733aaadc2744b656527756ac4e2b7ab58

commit fbaaa4a733aaadc2744b656527756ac4e2b7ab58
Author: Zac Medico <zmedico@gentoo.org>
Date:   Wed Feb 21 22:47:33 2024 -0800

    socks5: Use real asyncio.run
    
    Use real asyncio.run to demonstrate that it is compatible with
    portage internals. Since the socks5 ProxyManager uses the
    process.spawn function, the internal _running_loop function
    needs to return the correct loop for use in the wait method of
    MultiprocessingProcess, or else it will lead to Future
    "attached to a different loop" errors.
    
    Bug: https://bugs.gentoo.org/761538
    Signed-off-by: Zac Medico <zmedico@gentoo.org>
Comment 5 Christian Bricart 2024-02-22 16:53:14 UTC
no - issue persists even with the proposed patch

I've placed the patch below /etc/portage/patches/sys-apps/portage-3.0.62/… and re-merged =sys-apps/portage-3.0.62 (twice, of course :))
Comment 6 Christian Bricart 2024-02-22 16:55:01 UTC
(any additional idea why this is not an issue on *every* system…?)
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-22 16:59:57 UTC
It would be easier to determine if we had emerge --info from a system of yours where it doesn't happen, but it's probably because of distcc use here.
Comment 8 Zac Medico gentoo-dev 2024-02-22 17:10:13 UTC
This little test case shows that asyncio.run works in atexit hooks:

> #!/usr/bin/env python
> import asyncio
> import atexit
> 
> 
> async def async_func():
>     print("hello world", flush=True)
> 
> 
> def atexit_hook():
>     asyncio.run(async_func())
> 
> 
> if __name__ == "__main__":
>     atexit.register(atexit_hook)

(In reply to Christian Bricart from comment #5)
> no - issue persists even with the proposed patch
> 
> I've placed the patch below /etc/portage/patches/sys-apps/portage-3.0.62/…
> and re-merged =sys-apps/portage-3.0.62 (twice, of course :))

Great! This other patch was intended to be a dependency so I think you'll need it to avoid possible "attached to a different loop" errors:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=d718cea94a180042b2285698b2c19113c5d25987

commit d718cea94a180042b2285698b2c19113c5d25987
Author: Zac Medico <zmedico@gentoo.org>
Date:   Wed Feb 21 22:41:49 2024 -0800

    _get_running_loop: Support real asyncio.run
    
    When called via the real asyncio.run implementation, wrap
    the running asyncio loop. Otherwise, it's not possible to
    call portage libraries via the real asyncio.run without
    triggering Future "attached to a different loop" errors.
    
    Bug: https://bugs.gentoo.org/761538
    Signed-off-by: Zac Medico <zmedico@gentoo.org>


(In reply to Christian Bricart from comment #6)
> (any additional idea why this is not an issue on *every* system…?)

It's triggered when the get_socks5_proxy function is called, and that only happens for FEATURES=network-sandbox-proxy or FEATURES=distcc combined with FEATURES=network-sandbox.
Comment 9 Christian Bricart 2024-02-22 17:25:07 UTC
Created attachment 885727 [details]
emerge --info of a working ~amd64 box

(In reply to Sam James from comment #7)
> It would be easier to determine if we had emerge --info from a system of
> yours where it doesn't happen

sure - attached

> but it's probably because of distcc use here.

YES! good catch:
# FEATURES="-distcc" emerge -1v =sys-apps/portage-3.0.62
worked and ended up in the Shell
Comment 10 Christian Bricart 2024-02-22 17:30:55 UTC
(In reply to Zac Medico from comment #4)
commit fbaaa4a733aaadc2744b656527756ac4e2b7ab58
(In reply to Zac Medico from comment #8)
> commit d718cea94a180042b2285698b2c19113c5d25987

applied both as user-patches - issue persists
Comment 11 Christian Bricart 2024-02-22 17:36:57 UTC
FYI:

(In reply to Christian Bricart from comment #9)
> # FEATURES="-distcc" emerge -1v =sys-apps/portage-3.0.62
> worked and ended up in the Shell

FEATURES="-network-sandbox" emerge -1v =sys-apps/portage-3.0.62

also worked fine. So Zac's analysis is fully correct here
Comment 12 Zac Medico gentoo-dev 2024-02-22 17:47:38 UTC
When I tested locally with the current master branch and FEATURES=network-sandbox-proxy and had to press ^C twice for it to exit like this:

> >>> Jobs: 1 of 1 complete                           Load avg: 0.68, 0.23, 0.08
> ^CException ignored in atexit callback: <function _exit_function at 0x7f1f01a913a0>
> Traceback (most recent call last):
>   File "/usr/lib/python3.12/multiprocessing/util.py", line 360, in _exit_function
> Traceback (most recent call last):                                                                       
>   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
>     return self._loop.run_until_complete(task)
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete                                                                                                                               
>     p.join()                                                                                                                                                                                                       
>   File "/usr/lib/python3.12/multiprocessing/process.py", line 149, in join
>     res = self._popen.wait(timeout)                                                                                                                                                                                
>           ^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 43, in wait
>     return future.result()
>            ^^^^^^^^^^^^^^^
>     return self.poll(os.WNOHANG if timeout == 0.0 else 0)
>   File "portage/bin/socks5-server.py", line 226, in run_socks5_server
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.12/multiprocessing/popen_fork.py", line 27, in poll
>     pid, sts = os.waitpid(self.pid, flag)
>     await asyncio.get_running_loop().create_future() 
>                ^^^^^^^^^^^^^^^^^^^^^^^^^^
> KeyboardInterrupt: 
> asyncio.exceptions.CancelledError
> 
> During handling of the above exception, another exception occurred:
> 
> ^C                                                                                       
> ====================================
> Error in portage.process.run_exitfuncs
> 
>   File "portage/lib/portage/process.py", line 217, in run_exitfuncs
>     func(*targs, **kargs)
>   File "portage/lib/portage/util/socks5.py", line 77, in stop
>     asyncio.run(self._proc.wait())
>   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
>     return runner.run(main)
>            ^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.12/asyncio/runners.py", line 123, in run
>     raise KeyboardInterrupt()
> ====================================
> 
> Exception ignored in atexit callback: <function run_exitfuncs at 0x7f1f01e85bc0>
> Traceback (most recent call last):
>   File "portage/lib/portage/process.py", line 225, in run_exitfuncs
>     raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
>   File "portage/lib/portage/process.py", line 217, in run_exitfuncs
>     func(*targs, **kargs)
>   File "portage/lib/portage/util/socks5.py", line 77, in stop
>     asyncio.run(self._proc.wait())
>   File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
>     return runner.run(main)
>            ^^^^^^^^^^^^^^^^
>   File "/usr/lib/python3.12/asyncio/runners.py", line 123, in run
>     raise KeyboardInterrupt()
> KeyboardInterrupt: 
> [ERROR] Task was destroyed but it is pending!
> task: <Task pending name='Task-74' coro=<MultiprocessingProcess._proc_join() running at portage/lib/portage/process.py:423
> > wait_for=<Future pending cb=[sleep.<locals>.cancel_callback() at portage/lib/portage/util/futures/_asyncio/__init__.py:2
> 42, Task.task_wakeup()]> cb=[MultiprocessingProcess._proc_join_done()]>
Comment 13 Zac Medico gentoo-dev 2024-02-22 19:09:00 UTC
It seems the multiprocess.Process join method never completes in atexit hooks, even though the corresponding process is in a defunct state.
Comment 14 Zac Medico gentoo-dev 2024-02-22 21:29:52 UTC
If I patch it to call the atexit hooks in post_emerge like this then it hangs in asyncio.run:

--- a/lib/_emerge/post_emerge.py
+++ b/lib/_emerge/post_emerge.py
@@ -109,4 +109,6 @@ def post_emerge(myaction, myopts, myfiles, target_root, trees, mtimedb, retval):
     _flush_elog_mod_echo()
 
+    portage.process.run_exitfuncs()
+
     if not vardbapi._pkgs_changed:
         # GLEP 42 says to display news *after* an emerge --pretend


I sent it a SIGUSR1 and found that the ProxyManager's _proc (instance of MultiprocessingProcess) has waiters attached to two different loops, and the first loop is no longer running:

(Pdb) p self._proc._exit_waiters[0]._loop
<_UnixSelectorEventLoop running=False closed=False debug=False>
(Pdb) p self._proc._exit_waiters[1]._loop
<_UnixSelectorEventLoop running=True closed=False debug=False>
Comment 15 Zac Medico gentoo-dev 2024-02-22 22:02:34 UTC
It exits normally with this patch and emits "[ERROR] Task was destroyed but it is pending!" errors for the waiters that were attached to the old loop:

diff --git a/lib/_emerge/post_emerge.py b/lib/_emerge/post_emerge.py
index 37e2c3cc80..5ca9176397 100644
--- a/lib/_emerge/post_emerge.py
+++ b/lib/_emerge/post_emerge.py
@@ -109,4 +109,6 @@ def post_emerge(myaction, myopts, myfiles, target_root, trees, mtimedb, retval):
     _flush_elog_mod_echo()
 
+    portage.process.run_exitfuncs()
+
     if not vardbapi._pkgs_changed:
         # GLEP 42 says to display news *after* an emerge --pretend
diff --git a/lib/portage/process.py b/lib/portage/process.py
index f4758c824c..6cf2e9025c 100644
--- a/lib/portage/process.py
+++ b/lib/portage/process.py
@@ -384,5 +384,7 @@ class MultiprocessingProcess(AbstractProcess):
 
         loop = global_event_loop()
-        if not self._exit_waiters:
+        if not self._exit_waiters or not any(
+            loop._loop is waiter._loop for waiter in self._exit_waiters
+        ):
             asyncio.ensure_future(self._proc_join(), loop=loop).add_done_callback(
                 self._proc_join_done
@@ -434,4 +436,6 @@ class MultiprocessingProcess(AbstractProcess):
 
         for waiter in self._exit_waiters:
+            if waiter._loop is not future._loop:
+                continue
             if not waiter.cancelled():
                 waiter.set_result(self.returncode)
Comment 16 Zac Medico gentoo-dev 2024-02-23 06:14:12 UTC
(In reply to Zac Medico from comment #14)
> I sent it a SIGUSR1 and found that the ProxyManager's _proc (instance of
> MultiprocessingProcess) has waiters attached to two different loops, and the
> first loop is no longer running:
> 
> (Pdb) p self._proc._exit_waiters[0]._loop
> <_UnixSelectorEventLoop running=False closed=False debug=False>
> (Pdb) p self._proc._exit_waiters[1]._loop
> <_UnixSelectorEventLoop running=True closed=False debug=False>

In https://github.com/gentoo/portage/pull/1276 I've fixed it to use portage's global event loop when necessary to prevent this issue of having waiters attached to different loops, like this:

--- a/lib/portage/util/socks5.py
+++ b/lib/portage/util/socks5.py
@@ -10,0 +11,7 @@ from typing import Union
+import portage
+
+portage.proxy.lazyimport.lazyimport(
+    globals(),
+    "portage.util._eventloop.global_event_loop:global_event_loop",
+)
+
@@ -77 +84,4 @@ class ProxyManager:
-                asyncio.run(self._proc.wait())
+                # In this case spawn internals would have used
+                # portage's global loop when attaching a waiter to
+                # self._proc, so we are obligated to use that.
+                global_event_loop().run_until_complete(self._proc.wait())
Comment 17 Larry the Git Cow gentoo-dev 2024-02-23 08:02:51 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=3cc986f87ddda86ee93770e03cca06346aee54c5

commit 3cc986f87ddda86ee93770e03cca06346aee54c5
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-02-23 06:06:14 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-02-23 06:48:29 +0000

    AsyncioEventLoop: Call process.run_exitfuncs() before close
    
    For the event loop running in the main thread, call
    process.run_exitfuncs() before close with the event loop
    running so that anything attached can clean itself up (like
    the socks5 ProxyManager for bug 925240). This is necessary
    because process.spawn uses the event loop to implement the
    new returnproc parameter related to bug 916566.
    
    Bug: https://bugs.gentoo.org/916566
    Bug: https://bugs.gentoo.org/925240
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/tests/util/test_socks5.py             | 51 ++++++++++++++++++++++-
 lib/portage/util/_eventloop/asyncio_event_loop.py | 44 +++++++++++++++----
 lib/portage/util/socks5.py                        | 16 ++++++-
 3 files changed, 101 insertions(+), 10 deletions(-)
Comment 18 Christian Bricart 2024-02-23 09:19:38 UTC
confirming, using =sys-apps/portage-9999 from HEAD 3cc986f87ddda86ee93770e03cca06346aee54c5 does not show the issue anymore and otherwise seems to work fine - thanks Zac!
Comment 19 Larry the Git Cow gentoo-dev 2024-02-25 08:32:59 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6ad9d3103abc02f60d9e123ae21fa4a2e69b7e38

commit 6ad9d3103abc02f60d9e123ae21fa4a2e69b7e38
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-02-25 08:32:40 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-02-25 08:32:47 +0000

    sys-apps/portage: add 3.0.63
    
    Closes: https://bugs.gentoo.org/925214
    Closes: https://bugs.gentoo.org/651018
    Closes: https://bugs.gentoo.org/922935
    Closes: https://bugs.gentoo.org/925240
    Closes: https://bugs.gentoo.org/925311
    Closes: https://bugs.gentoo.org/925333
    Closes: https://bugs.gentoo.org/925350
    Closes: https://bugs.gentoo.org/925456
    Closes: https://bugs.gentoo.org/925460
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-3.0.63.ebuild | 246 +++++++++++++++++++++++++++++++++
 2 files changed, 247 insertions(+)
Comment 20 Larry the Git Cow gentoo-dev 2024-03-03 22:50:08 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=11c65496bd951c3b7778a3c1ea240e347b1f4c38

commit 11c65496bd951c3b7778a3c1ea240e347b1f4c38
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-03-03 21:06:13 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-03-03 21:10:49 +0000

    socks5: Use run_coroutine_exitfuncs()
    
    Since commit c3ebdbb42e72 the atexit_register(proxy.stop) call in
    the get_socks5_proxy function can accept a coroutine function to
    execute in run_coroutine_exitfuncs(), so convert the ProxyManager
    stop method to a coroutine and use run_coroutine_exitfuncs().
    
    Bug: https://bugs.gentoo.org/925240
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/tests/util/test_socks5.py             |  5 ++-
 lib/portage/util/_eventloop/asyncio_event_loop.py | 12 -------
 lib/portage/util/socks5.py                        | 39 +++--------------------
 3 files changed, 7 insertions(+), 49 deletions(-)