$ emerge -eav --jobs=5 --exclude=portage @system @world [...] >>> Installing (144 of 2065) acct-user/ntp-0::gentoo >>> Installing (143 of 2065) acct-user/tor-0::gentoo >>> Installing (146 of 2065) sys-fs/ddrescue-1.24::gentoo >>> Jobs: 111 of 2065 complete Load avg: 0.16, 0.59, 1.50 No progress at this point for several minutes. Sending SIGUSR1, I get: --Return-- > /usr/lib/python-exec/python3.6/emerge(30)debug_signal()->None -> pdb.set_trace() (Pdb) bt /usr/lib/python-exec/python3.6/emerge(53)<module>() -> retval = emerge_main() /usr/lib64/python3.6/site-packages/_emerge/main.py(1309)emerge_main() -> return run_action(emerge_config) /usr/lib64/python3.6/site-packages/_emerge/actions.py(3374)run_action() -> retval = action_build(emerge_config, spinner=spinner) /usr/lib64/python3.6/site-packages/_emerge/actions.py(564)action_build() -> retval = mergetask.merge() /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1020)merge() -> rval = self._merge() /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1414)_merge() -> self._main_loop() /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1390)_main_loop() -> self._event_loop.run_until_complete(self._main_exit) /usr/lib64/python3.6/site-packages/portage/util/_eventloop/asyncio_event_loop.py(127)_run_until_complete() -> return self._loop.run_until_complete(future) /usr/lib64/python3.6/asyncio/base_events.py(475)run_until_complete() -> self.run_forever() /usr/lib64/python3.6/asyncio/base_events.py(442)run_forever() -> self._run_once() /usr/lib64/python3.6/asyncio/base_events.py(1426)_run_once() -> event_list = self._selector.select(timeout) /usr/lib64/python3.6/selectors.py(445)select() -> fd_event_list = self._epoll.poll(timeout, max_ev) > /usr/lib/python-exec/python3.6/emerge(30)debug_signal()->None -> pdb.set_trace() (Pdb)
Created attachment 616772 [details] emerge --info sys-apps/portage
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=01d088c2b38fa2ab599ca09ee08a988148bf6827 commit 01d088c2b38fa2ab599ca09ee08a988148bf6827 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-02 16:28:10 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-02 16:31:09 +0000 sys-apps/portage: Drop 2.3.91 keywords (bug 711322) With portage-2.3.91 the emerge process can hang indefinitely. Bug: https://bugs.gentoo.org/711322 Package-Manager: Portage-2.3.91, Repoman-2.3.20 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/portage-2.3.91.ebuild | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
I saw the problem already yesterday on another machine, and I've just verified that it was with portage-2.3.90. So the regression happened from .89 to .90.
installing a binpkg for portage-2.3.89 from 2.3.90, build.log hangs at: >>> Extracting sys-apps/portage-2.3.89
When you observe the hung state, please use a command like `ps axf` to check the process tree. If the emerge process has any children (possibly defunct) then that could provide a useful clue about the state. I'm currently running emerge -e @world in a stage3, and it has not hung yet with 184 of 216 complete.
(In reply to Zac Medico from comment #5) > When you observe the hung state, please use a command like `ps axf` to check > the process tree. If the emerge process has any children (possibly defunct) > then that could provide a useful clue about the state. I had checked with "pstree <pid of emerge process>" and it showed only a single process without any children.
(In reply to Ulrich Müller from comment #6) > (In reply to Zac Medico from comment #5) > > When you observe the hung state, please use a command like `ps axf` to check > > the process tree. If the emerge process has any children (possibly defunct) > > then that could provide a useful clue about the state. > > I had checked with "pstree <pid of emerge process>" and it showed only a > single process without any children. We can also collect some useful state information by using lsof to see if any .ipc_in pipes are open in ${PORTAGE_TMPDIR}. Also, we can look at the state of ${PORTAGE_TMPDIR}/portage/*/*/.* files to see which phase(s) any packages may have hung at.
If they're hung after src_install, it leads me to wonder if the hang is related to special scheduling for @system packages and their deps, which can be disabled by emerge --implicit-system-deps=n.
I'm in the habit of using --load-average, and as soon as I stopped using it I was able to reproduce the issue. This means that the --load-average option can serve as a workaround. I'll go ahead and apply a simple workaround so that it won't hang in any case.
The problem started with this commit: https://gitweb.gentoo.org/proj/portage.git/commit/?id=c7e52d0466211907d20cdbc04f1e90e7da626694 The problem is that this changed SequentialTaskQueue exit callback scheduling order, and that broke assumptions within Scheduler._schedule calls. The emerge --load-average option suppresses the problem by triggering an extra Scheduler._schedule call every 30 seconds.
Patch: https://github.com/gentoo/portage/pull/520
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2fe510215d12499f0f4f955adfd0f22e761a015e commit 2fe510215d12499f0f4f955adfd0f22e761a015e Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-04 10:16:59 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-04 10:25:03 +0000 sys-apps/portage: Bump to version 2.3.92 #601252 emerge --pretend --fetchonly event loop recursion #709334 socks5-server.py async and await coroutine syntax #709746 Rename PORTAGE_LOG_FILTER_FILE_CMD from PORTAGE_LOG_FILTER_FILE #711322 emerge hang after src_install #711362 egencache AttributeError: 'NoneType' object has no attribute 'ebuild' #711400 AttributeError: 'NoneType' object has no attribute 'depth' Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/709334 Bug: https://bugs.gentoo.org/709746 Closes: https://bugs.gentoo.org/711322 Bug: https://bugs.gentoo.org/711362 Bug: https://bugs.gentoo.org/711400 Package-Manager: Portage-2.3.92, Repoman-2.3.20 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 2 +- sys-apps/portage/{portage-2.3.91.ebuild => portage-2.3.92.ebuild} | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=459b3535baa416888b546cd1635ae28324259a70 commit 459b3535baa416888b546cd1635ae28324259a70 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-04 08:17:28 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-04 09:53:27 +0000 SequentialTaskQueue: update bool(self) sooner (bug 711322) Use addExitListener to add a _task_exit callback that will be invoked as soon as the task exits (before the future's done callback is called). This is required in order for bool(self) to have an updated value for Scheduler._schedule to base assumptions upon. Delayed updates to bool(self) is what caused Scheduler to hang as in bug 711322. This reverts changes in SequentialTaskQueue task queue exit listener behavior from commit c7e52d046621, so that only the changes necessary to support async_start remain. Fixes: c7e52d046621 ("EbuildPhase: add _async_start coroutine") Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/SequentialTaskQueue.py | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=d389b3b378d88b8c41dfaba2a90bc9643a9ba99e commit d389b3b378d88b8c41dfaba2a90bc9643a9ba99e Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-05 06:46:26 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-05 08:06:19 +0000 Scheduler: use add_done_callback (bug 711322) Use add_done_callback instead of addExistListener, in order to avoid callback races like the SequentialTaskQueue exit listener race that triggered bug 711322. The addExistListener method is prone to races because its listeners are executed in quick succession. In contrast, callbacks scheduled via add_done_callback are placed in a fifo queue, ensuring that they execute in an order that is unsurprising relative to other callbacks. Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/Scheduler.py | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=62b0dc613d7d8eb099231bc4cb7303a0abdaf480 commit 62b0dc613d7d8eb099231bc4cb7303a0abdaf480 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-05 16:25:45 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-05 16:31:04 +0000 AsynchronousTask: schedule start listeners via call_soon (bug 711322) Schedule start listeners via call_soon, in order to avoid callback races like the SequentialTaskQueue exit listener race that triggered bug 711322. Callbacks scheduled via call_soon are placed in a fifo queue, ensuring that they execute in an order that is unsurprising relative to other callbacks. Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/AsynchronousTask.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=46903f3e5622bc479d4687c76c0e9fada8eb53db commit 46903f3e5622bc479d4687c76c0e9fada8eb53db Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-05 16:45:25 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-05 23:30:03 +0000 AsynchronousTask: schedule exit listeners via call_soon (bug 711322) Schedule exit listeners via call_soon, in order to avoid callback races like the SequentialTaskQueue exit listener race that triggered bug 711322. Callbacks scheduled via call_soon are placed in a fifo queue, ensuring that they execute in an order that is unsurprising relative to other callbacks. Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/AsynchronousTask.py | 53 ++++++++++++++++------------------------- 1 file changed, 21 insertions(+), 32 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=ca053dec87ea593596f83e8d20c63b40678bf03a commit ca053dec87ea593596f83e8d20c63b40678bf03a Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-06 03:15:40 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-06 03:29:07 +0000 Scheduler: replace add_done_callback with addExitListener For simplicity, use addExitListener instead of add_done_callback, since addExitListener has been fixed to use call_soon in commit 46903f3e5622. Note that each addExitListener call occurs *after* a call to the SequentialTaskQueue add method, since the SequentialTaskQueue needs to be notified of task exit *first* (see commit 459b3535baa4). Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/Scheduler.py | 27 +++++++++++---------------- 1 file changed, 11 insertions(+), 16 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=8cc84cea654238676f7edc04b9c75c001535c0b4 commit 8cc84cea654238676f7edc04b9c75c001535c0b4 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-07 21:52:53 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-07 22:01:22 +0000 SequentialTaskQueue: cancel unstarted tasks when appropriate (bug 711322) When the clear method is called, cancel any tasks which have not started yet, in order to ensure that their start/exit listeners are called. This fixes a case where emerge would hang after SIGINT. Also fix the CompositeTask _cancel method to react appropriately to the cancel event when the task has not started yet. Bug: https://bugs.gentoo.org/711322 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/CompositeTask.py | 4 ++++ lib/_emerge/SequentialTaskQueue.py | 3 +++ 2 files changed, 7 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3330b6df6278675073b2f25cb1a743005ad7cc98 commit 3330b6df6278675073b2f25cb1a743005ad7cc98 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-07 22:55:37 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-07 22:59:57 +0000 sys-apps/portage: Bump to version 2.3.93 #711322 schedule exit listeners via call_soon #711688 BinpkgFetcher sync_timestamp KeyError regression Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/711322 Closes: https://bugs.gentoo.org/711688 Package-Manager: Portage-2.3.93, Repoman-2.3.20 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-2.3.93.ebuild | 271 +++++++++++++++++++++++++++++++++ 2 files changed, 272 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=389429d798a186bdbeb11354d7f1299f628851fd commit 389429d798a186bdbeb11354d7f1299f628851fd Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-04-09 04:45:16 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-04-09 06:01:27 +0000 Scheduler: wakeup for empty merge queue (bug 711322) Add a wakeup callback to schedule a new merge when the merge queue becomes empty. This prevents the scheduler from hanging in cases where the order of _merge_exit callback invocation may cause the the merge queue to appear non-empty when it is about to become empty. Bug: https://bugs.gentoo.org/711322 Bug: https://bugs.gentoo.org/716636 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/Scheduler.py | 23 +++++++++++++++++++++++ lib/_emerge/SequentialTaskQueue.py | 22 ++++++++++++---------- 2 files changed, 35 insertions(+), 10 deletions(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=273eac55223836285c42697268c22c925672835e commit 273eac55223836285c42697268c22c925672835e Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-04-09 07:00:57 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-04-09 07:05:14 +0000 sys-apps/portage: Bump to version 2.3.98 #711322 always wakeup for empty merge queue Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/711322 Closes: https://bugs.gentoo.org/716636 Package-Manager: Portage-2.3.98, Repoman-2.3.22 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/Manifest | 1 + sys-apps/portage/portage-2.3.98.ebuild | 263 +++++++++++++++++++++++++++++++++ 2 files changed, 264 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=ad325eb10bc6e8ec2a8248f8e9173911f957c0da commit ad325eb10bc6e8ec2a8248f8e9173911f957c0da Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-04-09 20:27:36 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-04-09 20:47:08 +0000 _schedule_merge_wakeup: handle main loop exit Detect main loop exit and do not attempt to schedule in this case. Fixes: 389429d798a1 ("Scheduler: wakeup for empty merge queue (bug 711322)") Reported-by: Rick Farina <zerochaos@gentoo.org> Bug: https://bugs.gentoo.org/711322 Bug: https://bugs.gentoo.org/716636 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/_emerge/Scheduler.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7bbc98c13ec8eecf7ea291752cdbb72b60240fcf commit 7bbc98c13ec8eecf7ea291752cdbb72b60240fcf Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-04-09 20:57:43 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-04-09 21:05:13 +0000 sys-apps/portage: 2.3.98-r1 revbump Fix this error: Exception in callback Scheduler._schedule_merge_wakeup(<Future finished result=None>) handle: <Handle Scheduler._schedule_merge_wakeup(<Future finished result=None>)> Traceback (most recent call last): File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/lib/python3.7/site-packages/_emerge/Scheduler.py", line 1638, in _schedule_merge_wakeup self._schedule() File "/usr/lib/python3.7/site-packages/_emerge/PollScheduler.py", line 154, in _schedule self._schedule_tasks() File "/usr/lib/python3.7/site-packages/_emerge/Scheduler.py", line 1615, in _schedule_tasks self._keep_scheduling() or self._main_exit.done()): AttributeError: 'NoneType' object has no attribute 'done' Reported-by: Rick Farina <zerochaos@gentoo.org> Bug: https://bugs.gentoo.org/711148 Bug: https://bugs.gentoo.org/711322 Bug: https://bugs.gentoo.org/716636 Package-Manager: Portage-2.3.98, Repoman-2.3.22 Signed-off-by: Zac Medico <zmedico@gentoo.org> sys-apps/portage/{portage-2.3.98.ebuild => portage-2.3.98-r1.ebuild} | 3 +++ 1 file changed, 3 insertions(+)