Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 404123 - sys-apps/portage-2.2.01.20153: portage enters an infinite loop when using --jobs
Summary: sys-apps/portage-2.2.01.20153: portage enters an infinite loop when using --jobs
Status: RESOLVED FIXED
Alias: None
Product: Gentoo/Alt
Classification: Unclassified
Component: Prefix Support (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Prefix
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-16 14:22 UTC by Richard Yao (RETIRED)
Modified: 2012-02-21 05:43 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Yao (RETIRED) gentoo-dev 2012-02-16 14:22:21 UTC
I am not really sure how to provide useful information regarding this issue. I encounter it on a 64-thread system that has 128GB of RAM after only a few packages have been compiled. One core stays at full utilization in htop, which is attributed to emerge. This occurred with a fresh install a few days ago, but installing more software and doing updates has not changed anything.

Stopping emerge does not provide a useful backtrace and using USRSIG1 to drop into the debugger does not provide useful information either, although maybe someone else would find it more useful than I did:

Total: 314 packages (1 new, 313 reinstalls), Size of downloads: 140 kB

!!! Ebuilds for the following packages are either all
!!! masked or don't exist:
>=sys-apps/baselayout-2

Would you like to merge these packages? [Yes/No]
>>> Verifying ebuild manifests
>>> Running pre-merge checks for sci-libs/mkl-10.3.4.191
 * QA: Package does not specify unit for the size check
 * QA: Assuming mebibytes.
 * QA: File bug against the package. It should specify the unit.
 * Checking for at least 1536 mebibytes disk space at "/home/stonybrook/gentoo/var/tmp/portage/sci-libs/mkl-10.3.4.191/temp" ...                                                                                                                                        [ ok ]
>>> Starting parallel fetch
>>> Emerging (1 of 314) sys-libs/ncurses-5.9-r1
>>> Emerging (2 of 314) app-arch/xz-utils-5.0.3
>>> Emerging (3 of 314) virtual/libintl-0
>>> Emerging (4 of 314) app-arch/bzip2-1.0.6-r3
>>> Emerging (5 of 314) virtual/libiconv-0
>>> Emerging (6 of 314) sys-devel/gnuconfig-20110814
>>> Emerging (7 of 314) sys-devel/binutils-config-3-r1
>>> Emerging (8 of 314) app-misc/mime-types-8
>>> Emerging (9 of 314) app-misc/pax-utils-0.2.1
>>> Emerging (10 of 314) sys-apps/tcp-wrappers-7.6-r8
>>> Emerging (11 of 314) sys-apps/which-2.20
>>> Emerging (12 of 314) app-arch/cpio-2.11
>>> Emerging (13 of 314) sys-process/numactl-2.0.7-r1
>>> Emerging (14 of 314) virtual/os-headers-0
>>> Emerging (15 of 314) sys-devel/autoconf-wrapper-12
>>> Emerging (16 of 314) sys-devel/automake-wrapper-6
>>> Emerging (17 of 314) dev-util/gperf-3.0.4
>>> Emerging (18 of 314) dev-util/byacc-20120115
>>> Emerging (19 of 314) mail-client/mailx-support-20060102-r1
>>> Emerging (20 of 314) net-mail/mailbase-1
>>> Emerging (21 of 314) sys-process/lsof-4.84
>>> Emerging (22 of 314) app-portage/portage-utils-0.9
>>> Installing (14 of 314) virtual/os-headers-0
>>> Installing (16 of 314) sys-devel/automake-wrapper-6
>>> Installing (15 of 314) sys-devel/autoconf-wrapper-12
>>> Installing (20 of 314) net-mail/mailbase-1
>>> Installing (19 of 314) mail-client/mailx-support-20060102-r1
>>> Installing (21 of 314) sys-process/lsof-4.84
>>> Installing (13 of 314) sys-process/numactl-2.0.7-r1
>>> Installing (18 of 314) dev-util/byacc-20120115
>>> Installing (17 of 314) dev-util/gperf-3.0.4
>>> Installing (22 of 314) app-portage/portage-utils-0.9
>>> Installing (12 of 314) app-arch/cpio-2.11
>>> Installing (3 of 314) virtual/libintl-0
>>> Jobs: 12 of 314 complete                        Load avg: 1.42, 0.78, 0.82--Return--
> /home/stonybrook/gentoo/usr/bin/emerge(28)debug_signal()->None
-> pdb.set_trace()
(Pdb) bt
  /home/stonybrook/gentoo/usr/bin/emerge(44)<module>()
-> retval = emerge_main()
  /home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/main.py(2031)emerge_main()
-> myopts, myaction, myfiles, spinner)
  /home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/actions.py(442)action_build()
-> retval = mergetask.merge()
  /home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py(1009)merge()
-> rval = self._merge()
  /home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py(1354)_merge()
-> self._main_loop()
  /home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py(1495)_main_loop()
-> while self._keep_scheduling():
> /home/stonybrook/gentoo/usr/bin/emerge(28)debug_signal()->None
-> pdb.set_trace()
(Pdb) quit


 * Messages for package sci-libs/mkl-10.3.4.191:

 * QA: Package does not specify unit for the size check
 * QA: Assuming mebibytes.
 * QA: File bug against the package. It should specify the unit.

 * Messages for package net-mail/mailbase-1:

 * 'enewgroup()' disabled in Prefixed Portage with non root user
 * 'enewuser()' disabled in Prefixed Portage with non-root user
 * 'enewuser()' disabled in Prefixed Portage with non-root user
 * fowners ignored in Prefix with non-privileged user

 * Messages for package app-portage/portage-utils-0.9:

 * ERROR: app-portage/portage-utils-0.9 failed (test phase):
 *   Make check failed. See above for details.
 * 
 * Call stack:
 *          ebuild.sh, line   85:  Called call-ebuildshell 'src_test'
 *        environment, line  490:  Called src_test
 *        environment, line 2565:  Called _eapi0_src_test
 *   phase-helpers.sh, line  550:  Called die
 * The specific snippet of code:
 *              $emake_cmd -j1 check || \
 *                      die "Make check failed. See above for details."
 * 
 * If you need support, post the output of 'emerge --info =app-portage/portage-utils-0.9',
 * the complete build log and the output of 'emerge -pqv =app-portage/portage-utils-0.9'.
 * The complete build log is located at '/home/stonybrook/gentoo/var/tmp/portage/app-portage/portage-utils-0.9/temp/build.log'.
 * The ebuild environment file is located at '/home/stonybrook/gentoo/var/tmp/portage/app-portage/portage-utils-0.9/temp/environment'.
 * S: '/home/stonybrook/gentoo/var/tmp/portage/app-portage/portage-utils-0.9/work/portage-utils-0.9'
 * /home/stonybrook/gentoo/etc/portage/postsync.d/q-reinitialize has been installed for convenience
 * If you wish for it to be automatically run at the end of every --sync:
 *    # chmod +x /home/stonybrook/gentoo/etc/portage/postsync.d/q-reinitialize
 * Normally this should only take a few seconds to run but file systems
 * such as ext3 can take a lot longer.  To disable, simply do:
 *    # chmod -x /home/stonybrook/gentoo/etc/portage/postsync.d/q-reinitialize
Traceback (most recent call last):
  File "/home/stonybrook/gentoo/usr/bin/emerge", line 44, in <module>
    retval = emerge_main()
  File "/home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/main.py", line 2031, in emerge_main
    myopts, myaction, myfiles, spinner)
  File "/home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/actions.py", line 442, in action_build
    retval = mergetask.merge()
  File "/home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py", line 1009, in merge
    rval = self._merge()
  File "/home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py", line 1354, in _merge
    self._main_loop()
  File "/home/stonybrook/gentoo/usr/lib/portage/pym/_emerge/Scheduler.py", line 1495, in _main_loop
    while self._keep_scheduling():
  File "/home/stonybrook/gentoo/usr/bin/emerge", line 28, in debug_signal
    pdb.set_trace()
  File "/home/stonybrook/gentoo/usr/lib/python2.7/bdb.py", line 52, in trace_dispatch
    return self.dispatch_return(frame, arg)
  File "/home/stonybrook/gentoo/usr/lib/python2.7/bdb.py", line 86, in dispatch_return
    if self.quitting: raise BdbQuit
BdbQuit

I briefly spoke with zmedico in #gentoo-portage about this and he suggested that the recent recursion patches from bug #402335 might help.
Comment 1 Richard Yao (RETIRED) gentoo-dev 2012-02-16 21:12:05 UTC
While this is grasping at straws, this might be related to bug #403287. I imagine that a deadlock would turn into an infinite loop if a thread is polling a thread that deadlocked.
Comment 2 Zac Medico gentoo-dev 2012-02-16 21:39:15 UTC
(In reply to comment #1)
> While this is grasping at straws, this might be related to bug #403287. I
> imagine that a deadlock would turn into an infinite loop if a thread is polling
> a thread that deadlocked.

That doesn't affect portage, since portage doesn't use threads.
Comment 3 Zac Medico gentoo-dev 2012-02-16 21:39:15 UTC
(In reply to comment #1)
> While this is grasping at straws, this might be related to bug #403287. I
> imagine that a deadlock would turn into an infinite loop if a thread is polling
> a thread that deadlocked.

That doesn't affect portage, since portage doesn't use threads.
Comment 4 Zac Medico gentoo-dev 2012-02-18 05:38:07 UTC
There are logs of EventLoop changes in v2.2.0_alpha87, so it's worth merging those into the prefix branch to see if it helps this.
Comment 5 Fabian Groffen gentoo-dev 2012-02-19 13:26:40 UTC
Please try sys-apps/portage-2.2.01.20239, which is the Prefix equivalent of v2.2.0_alpha87.
Comment 6 Richard Yao (RETIRED) gentoo-dev 2012-02-21 05:43:00 UTC
I was able to reproduce this issue in a single-core Ubuntu 10.04 VM where I can confirm that the newer ebuild fixes it.

I also tested the fix on the original 64-thread system and it appears to have worked, although I think one ebuild's build system deadlocked judging from:

>>> Jobs: 216 of 282 complete, 1 running            Load avg: 0.06, 0.05, 0.06

I am closing this as fixed.