Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 711322 - sys-apps/portage-2.3.91 hangs in emerge --jobs (without --load-average)
Summary: sys-apps/portage-2.3.91 hangs in emerge --jobs (without --load-average)
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: PATCH, REGRESSION
Depends on:
Blocks: 711148
  Show dependency tree
 
Reported: 2020-03-02 14:49 UTC by Ulrich Müller
Modified: 2020-04-09 21:05 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info sys-apps/portage (emerge.info,7.25 KB, text/plain)
2020-03-02 14:50 UTC, Ulrich Müller
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ulrich Müller gentoo-dev 2020-03-02 14:49:54 UTC
$ emerge -eav --jobs=5 --exclude=portage @system @world
[...]
>>> Installing (144 of 2065) acct-user/ntp-0::gentoo
>>> Installing (143 of 2065) acct-user/tor-0::gentoo
>>> Installing (146 of 2065) sys-fs/ddrescue-1.24::gentoo
>>> Jobs: 111 of 2065 complete                      Load avg: 0.16, 0.59, 1.50

No progress at this point for several minutes. Sending SIGUSR1, I get:

--Return--
> /usr/lib/python-exec/python3.6/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /usr/lib/python-exec/python3.6/emerge(53)<module>()
-> retval = emerge_main()
  /usr/lib64/python3.6/site-packages/_emerge/main.py(1309)emerge_main()
-> return run_action(emerge_config)
  /usr/lib64/python3.6/site-packages/_emerge/actions.py(3374)run_action()
-> retval = action_build(emerge_config, spinner=spinner)
  /usr/lib64/python3.6/site-packages/_emerge/actions.py(564)action_build()
-> retval = mergetask.merge()
  /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1020)merge()
-> rval = self._merge()
  /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1414)_merge()
-> self._main_loop()
  /usr/lib64/python3.6/site-packages/_emerge/Scheduler.py(1390)_main_loop()
-> self._event_loop.run_until_complete(self._main_exit)
  /usr/lib64/python3.6/site-packages/portage/util/_eventloop/asyncio_event_loop.py(127)_run_until_complete()
-> return self._loop.run_until_complete(future)
  /usr/lib64/python3.6/asyncio/base_events.py(475)run_until_complete()
-> self.run_forever()
  /usr/lib64/python3.6/asyncio/base_events.py(442)run_forever()
-> self._run_once()
  /usr/lib64/python3.6/asyncio/base_events.py(1426)_run_once()
-> event_list = self._selector.select(timeout)
  /usr/lib64/python3.6/selectors.py(445)select()
-> fd_event_list = self._epoll.poll(timeout, max_ev)
> /usr/lib/python-exec/python3.6/emerge(30)debug_signal()->None
-> pdb.set_trace()
(Pdb)
Comment 1 Ulrich Müller gentoo-dev 2020-03-02 14:50:33 UTC
Created attachment 616772 [details]
emerge --info sys-apps/portage
Comment 2 Larry the Git Cow gentoo-dev 2020-03-02 16:31:32 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=01d088c2b38fa2ab599ca09ee08a988148bf6827

commit 01d088c2b38fa2ab599ca09ee08a988148bf6827
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-02 16:28:10 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-02 16:31:09 +0000

    sys-apps/portage: Drop 2.3.91 keywords (bug 711322)
    
    With portage-2.3.91 the emerge process can hang indefinitely.
    
    Bug: https://bugs.gentoo.org/711322
    Package-Manager: Portage-2.3.91, Repoman-2.3.20
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/portage-2.3.91.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 3 Ulrich Müller gentoo-dev 2020-03-02 22:02:13 UTC
I saw the problem already yesterday on another machine, and I've just verified that it was with portage-2.3.90. So the regression happened from .89 to .90.
Comment 4 Alex Xu (Hello71) 2020-03-02 23:39:06 UTC
installing a binpkg for portage-2.3.89 from 2.3.90, build.log hangs at:

>>> Extracting sys-apps/portage-2.3.89
Comment 5 Zac Medico gentoo-dev 2020-03-03 07:34:56 UTC
When you observe the hung state, please use a command like `ps axf` to check the process tree. If the emerge process has any children (possibly defunct) then that could provide a useful clue about the state.

I'm currently running emerge -e @world in a stage3, and it has not hung yet with 184 of 216 complete.
Comment 6 Ulrich Müller gentoo-dev 2020-03-03 07:40:57 UTC
(In reply to Zac Medico from comment #5)
> When you observe the hung state, please use a command like `ps axf` to check
> the process tree. If the emerge process has any children (possibly defunct)
> then that could provide a useful clue about the state.

I had checked with "pstree <pid of emerge process>" and it showed only a single process without any children.
Comment 7 Zac Medico gentoo-dev 2020-03-03 16:49:54 UTC
(In reply to Ulrich Müller from comment #6)
> (In reply to Zac Medico from comment #5)
> > When you observe the hung state, please use a command like `ps axf` to check
> > the process tree. If the emerge process has any children (possibly defunct)
> > then that could provide a useful clue about the state.
> 
> I had checked with "pstree <pid of emerge process>" and it showed only a
> single process without any children.

We can also collect some useful state information by using lsof to see if any .ipc_in pipes are open in ${PORTAGE_TMPDIR}. Also, we can look at the state of ${PORTAGE_TMPDIR}/portage/*/*/.* files to see which phase(s) any packages may have hung at.
Comment 8 Zac Medico gentoo-dev 2020-03-03 18:51:45 UTC
If they're hung after src_install, it leads me to wonder if the hang is related to special scheduling for @system packages and their deps, which can be disabled by emerge --implicit-system-deps=n.
Comment 9 Zac Medico gentoo-dev 2020-03-04 06:28:27 UTC
I'm in the habit of using --load-average, and as soon as I stopped using it I was able to reproduce the issue. This means that the --load-average option can serve as a workaround. I'll go ahead and apply a simple workaround so that it won't hang in any case.
Comment 10 Zac Medico gentoo-dev 2020-03-04 08:07:49 UTC
The problem started with this commit:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=c7e52d0466211907d20cdbc04f1e90e7da626694

The problem is that this changed SequentialTaskQueue exit callback scheduling order, and that broke assumptions within Scheduler._schedule calls. The emerge --load-average option suppresses the problem by triggering an extra Scheduler._schedule call every 30 seconds.
Comment 11 Zac Medico gentoo-dev 2020-03-04 08:43:48 UTC
Patch: https://github.com/gentoo/portage/pull/520
Comment 12 Larry the Git Cow gentoo-dev 2020-03-04 10:27:29 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2fe510215d12499f0f4f955adfd0f22e761a015e

commit 2fe510215d12499f0f4f955adfd0f22e761a015e
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-04 10:16:59 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-04 10:25:03 +0000

    sys-apps/portage: Bump to version 2.3.92
    
     #601252 emerge --pretend --fetchonly event loop recursion
     #709334 socks5-server.py async and await coroutine syntax
     #709746 Rename PORTAGE_LOG_FILTER_FILE_CMD from PORTAGE_LOG_FILTER_FILE
     #711322 emerge hang after src_install
     #711362 egencache AttributeError: 'NoneType' object has no attribute
             'ebuild'
     #711400 AttributeError: 'NoneType' object has no attribute 'depth'
    
    Bug: https://bugs.gentoo.org/711148
    Bug: https://bugs.gentoo.org/709334
    Bug: https://bugs.gentoo.org/709746
    Closes: https://bugs.gentoo.org/711322
    Bug: https://bugs.gentoo.org/711362
    Bug: https://bugs.gentoo.org/711400
    Package-Manager: Portage-2.3.92, Repoman-2.3.20
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest                                         | 2 +-
 sys-apps/portage/{portage-2.3.91.ebuild => portage-2.3.92.ebuild} | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
Comment 13 Larry the Git Cow gentoo-dev 2020-03-04 10:28:19 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=459b3535baa416888b546cd1635ae28324259a70

commit 459b3535baa416888b546cd1635ae28324259a70
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-04 08:17:28 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-04 09:53:27 +0000

    SequentialTaskQueue: update bool(self) sooner (bug 711322)
    
    Use addExitListener to add a _task_exit callback that will be invoked
    as soon as the task exits (before the future's done callback is called).
    This is required in order for bool(self) to have an updated value for
    Scheduler._schedule to base assumptions upon. Delayed updates to
    bool(self) is what caused Scheduler to hang as in bug 711322.
    
    This reverts changes in SequentialTaskQueue task queue exit listener
    behavior from commit c7e52d046621, so that only the changes necessary
    to support async_start remain.
    
    Fixes: c7e52d046621 ("EbuildPhase: add _async_start coroutine")
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/SequentialTaskQueue.py | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)
Comment 14 Larry the Git Cow gentoo-dev 2020-03-05 08:26:07 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=d389b3b378d88b8c41dfaba2a90bc9643a9ba99e

commit d389b3b378d88b8c41dfaba2a90bc9643a9ba99e
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-05 06:46:26 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-05 08:06:19 +0000

    Scheduler: use add_done_callback (bug 711322)
    
    Use add_done_callback instead of addExistListener, in order to avoid
    callback races like the SequentialTaskQueue exit listener race that
    triggered bug 711322. The addExistListener method is prone to races
    because its listeners are executed in quick succession. In contrast,
    callbacks scheduled via add_done_callback are placed in a fifo
    queue, ensuring that they execute in an order that is unsurprising
    relative to other callbacks.
    
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/Scheduler.py | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)
Comment 15 Larry the Git Cow gentoo-dev 2020-03-05 17:39:37 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=62b0dc613d7d8eb099231bc4cb7303a0abdaf480

commit 62b0dc613d7d8eb099231bc4cb7303a0abdaf480
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-05 16:25:45 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-05 16:31:04 +0000

    AsynchronousTask: schedule start listeners via call_soon (bug 711322)
    
    Schedule start listeners via call_soon, in order to avoid callback races
    like the SequentialTaskQueue exit listener race that triggered bug
    711322. Callbacks scheduled via call_soon are placed in a fifo queue,
    ensuring that they execute in an order that is unsurprising relative to
    other callbacks.
    
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/AsynchronousTask.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 16 Larry the Git Cow gentoo-dev 2020-03-06 03:04:33 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=46903f3e5622bc479d4687c76c0e9fada8eb53db

commit 46903f3e5622bc479d4687c76c0e9fada8eb53db
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-05 16:45:25 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-05 23:30:03 +0000

    AsynchronousTask: schedule exit listeners via call_soon (bug 711322)
    
    Schedule exit listeners via call_soon, in order to avoid callback races
    like the SequentialTaskQueue exit listener race that triggered bug
    711322. Callbacks scheduled via call_soon are placed in a fifo queue,
    ensuring that they execute in an order that is unsurprising relative to
    other callbacks.
    
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/AsynchronousTask.py | 53 ++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 32 deletions(-)
Comment 17 Larry the Git Cow gentoo-dev 2020-03-06 03:36:42 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=ca053dec87ea593596f83e8d20c63b40678bf03a

commit ca053dec87ea593596f83e8d20c63b40678bf03a
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-06 03:15:40 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-06 03:29:07 +0000

    Scheduler: replace add_done_callback with addExitListener
    
    For simplicity, use addExitListener instead of add_done_callback, since
    addExitListener has been fixed to use call_soon in commit 46903f3e5622.
    Note that each addExitListener call occurs *after* a call to the
    SequentialTaskQueue add method, since the SequentialTaskQueue needs to
    be notified of task exit *first* (see commit 459b3535baa4).
    
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/Scheduler.py | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)
Comment 18 Larry the Git Cow gentoo-dev 2020-03-07 22:18:08 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=8cc84cea654238676f7edc04b9c75c001535c0b4

commit 8cc84cea654238676f7edc04b9c75c001535c0b4
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-07 21:52:53 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-07 22:01:22 +0000

    SequentialTaskQueue: cancel unstarted tasks when appropriate (bug 711322)
    
    When the clear method is called, cancel any tasks which have not
    started yet, in order to ensure that their start/exit listeners are
    called. This fixes a case where emerge would hang after SIGINT.
    
    Also fix the CompositeTask _cancel method to react appropriately to
    the cancel event when the task has not started yet.
    
    Bug: https://bugs.gentoo.org/711322
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/CompositeTask.py       | 4 ++++
 lib/_emerge/SequentialTaskQueue.py | 3 +++
 2 files changed, 7 insertions(+)
Comment 19 Larry the Git Cow gentoo-dev 2020-03-07 23:00:12 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3330b6df6278675073b2f25cb1a743005ad7cc98

commit 3330b6df6278675073b2f25cb1a743005ad7cc98
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-07 22:55:37 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-07 22:59:57 +0000

    sys-apps/portage: Bump to version 2.3.93
    
     #711322 schedule exit listeners via call_soon
     #711688 BinpkgFetcher sync_timestamp KeyError regression
    
    Bug: https://bugs.gentoo.org/711148
    Bug: https://bugs.gentoo.org/711322
    Closes: https://bugs.gentoo.org/711688
    Package-Manager: Portage-2.3.93, Repoman-2.3.20
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-2.3.93.ebuild | 271 +++++++++++++++++++++++++++++++++
 2 files changed, 272 insertions(+)
Comment 20 Larry the Git Cow gentoo-dev 2020-04-09 06:48:33 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=389429d798a186bdbeb11354d7f1299f628851fd

commit 389429d798a186bdbeb11354d7f1299f628851fd
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-04-09 04:45:16 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-04-09 06:01:27 +0000

    Scheduler: wakeup for empty merge queue (bug 711322)
    
    Add a wakeup callback to schedule a new merge when the merge queue
    becomes empty. This prevents the scheduler from hanging in cases
    where the order of _merge_exit callback invocation may cause the
    the merge queue to appear non-empty when it is about to become
    empty.
    
    Bug: https://bugs.gentoo.org/711322
    Bug: https://bugs.gentoo.org/716636
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/Scheduler.py           | 23 +++++++++++++++++++++++
 lib/_emerge/SequentialTaskQueue.py | 22 ++++++++++++----------
 2 files changed, 35 insertions(+), 10 deletions(-)
Comment 21 Larry the Git Cow gentoo-dev 2020-04-09 07:05:42 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=273eac55223836285c42697268c22c925672835e

commit 273eac55223836285c42697268c22c925672835e
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-04-09 07:00:57 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-04-09 07:05:14 +0000

    sys-apps/portage: Bump to version 2.3.98
    
     #711322 always wakeup for empty merge queue
    
    Bug: https://bugs.gentoo.org/711148
    Bug: https://bugs.gentoo.org/711322
    Closes: https://bugs.gentoo.org/716636
    Package-Manager: Portage-2.3.98, Repoman-2.3.22
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-2.3.98.ebuild | 263 +++++++++++++++++++++++++++++++++
 2 files changed, 264 insertions(+)
Comment 22 Larry the Git Cow gentoo-dev 2020-04-09 20:48:48 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=ad325eb10bc6e8ec2a8248f8e9173911f957c0da

commit ad325eb10bc6e8ec2a8248f8e9173911f957c0da
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-04-09 20:27:36 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-04-09 20:47:08 +0000

    _schedule_merge_wakeup: handle main loop exit
    
    Detect main loop exit and do not attempt to schedule in this case.
    
    Fixes: 389429d798a1 ("Scheduler: wakeup for empty merge queue (bug 711322)")
    Reported-by: Rick Farina <zerochaos@gentoo.org>
    Bug: https://bugs.gentoo.org/711322
    Bug: https://bugs.gentoo.org/716636
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/_emerge/Scheduler.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
Comment 23 Larry the Git Cow gentoo-dev 2020-04-09 21:05:33 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7bbc98c13ec8eecf7ea291752cdbb72b60240fcf

commit 7bbc98c13ec8eecf7ea291752cdbb72b60240fcf
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-04-09 20:57:43 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-04-09 21:05:13 +0000

    sys-apps/portage: 2.3.98-r1 revbump
    
    Fix this error:
    
    Exception in callback Scheduler._schedule_merge_wakeup(<Future finished result=None>)
    handle: <Handle Scheduler._schedule_merge_wakeup(<Future finished result=None>)>
    Traceback (most recent call last):
      File "/usr/lib/python3.7/asyncio/events.py", line 88, in _run
        self._context.run(self._callback, *self._args)
      File "/usr/lib/python3.7/site-packages/_emerge/Scheduler.py", line 1638, in _schedule_merge_wakeup
        self._schedule()
      File "/usr/lib/python3.7/site-packages/_emerge/PollScheduler.py", line 154, in _schedule
        self._schedule_tasks()
      File "/usr/lib/python3.7/site-packages/_emerge/Scheduler.py", line 1615, in _schedule_tasks
        self._keep_scheduling() or self._main_exit.done()):
    AttributeError: 'NoneType' object has no attribute 'done'
    
    Reported-by: Rick Farina <zerochaos@gentoo.org>
    Bug: https://bugs.gentoo.org/711148
    Bug: https://bugs.gentoo.org/711322
    Bug: https://bugs.gentoo.org/716636
    Package-Manager: Portage-2.3.98, Repoman-2.3.22
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/{portage-2.3.98.ebuild => portage-2.3.98-r1.ebuild} | 3 +++
 1 file changed, 3 insertions(+)