After starting an emerge to update a couple of packages, I discovered that blas-atlas-3.9.1 was taking over 15 hours to emerge (normally, it takes 3.5 hours). After looking at the build log, it looks like it's stuck in an infinite loop while running drottest. Normally, I would have attached a complete build log, but it's 500MB long:) Note 1: the build infinite loop is repeatable. Note 2: blas-atlas-3.8.0, 3.8.1 and 3.8.2 had emerged fine on this machine, taking ~3.5 hours to compile. Note 3: blas-atlas-3.9.1 had emerged correctly on a different machine (~amd64, Q6600). # emerge --info Portage 2.2_rc5 (default/linux/x86/2008.0/desktop, gcc-4.3.1, glibc-2.8_p20080602-r0, 2.6.25-gentoo-r6 i686) ================================================================= System uname: Linux-2.6.25-gentoo-r6-i686-Intel-R-_Pentium-R-_M_processor_1.60GHz-with-glibc2.0 Timestamp of tree: Thu, 31 Jul 2008 04:30:01 +0000 distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.4 [enabled] app-shells/bash: 3.2_p39 dev-java/java-config: 1.3.7, 2.1.6-r1 dev-lang/python: 2.4.4-r13, 2.5.2-r5 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 dev-util/confcache: 0.4.2-r1 sys-apps/baselayout: 2.0.0 sys-apps/openrc: 0.2.5 sys-apps/sandbox: 1.2.18.1-r3 sys-devel/autoconf: 2.13, 2.62-r1 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1 sys-devel/binutils: 2.16.1-r3, 2.17-r2, 2.18-r3 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 2.2.4 virtual/os-headers: 2.6.25-r4 ACCEPT_KEYWORDS="x86 ~x86" CBUILD="i686-pc-linux-gnu" CFLAGS="-march=pentium-m -O2 -pipe -frename-registers" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/NX/etc /usr/NX/home /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-march=pentium-m -O2 -pipe -frename-registers" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks parallel-fetch preserve-libs sandbox sfperms strict unmerge-orphans userfetch userpriv" GENTOO_MIRRORS="http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo" LANG="en_US.utf8" LDFLAGS="-Wl,--as-needed -Wl,-O1" LINGUAS="C en POSIX ru" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage"
Created attachment 161918 [details] Small piece of the build log A small piece of the 500MB build log, showing the infinite loop.
I think I'm seeing something similar. When I try to emerge blas-atlas 3.9.1 the emerge runs for an hour or so, and then starts to eat RAM. I've got 1.5GB of RAM and 1.5GB of swap. At some point during the blas-atlas emerge, all of RAM usage skyrockets. Soon RAM and swap are both 100% full and the machine becomes non-response to the point where I have to press the reset button. Not good...
(In reply to comment #2) > I think I'm seeing something similar. When I try to emerge > blas-atlas 3.9.1 the emerge runs for an hour or so, and then > starts to eat RAM. I should add that I've had blas-atlas installed on this machine for years. Emerging previous versions didn't turn into DOS attacks. ;)
It's far worse than an infinite loop: it's an infinite recursion. An infinite loop just wastes CPU time. Infinite recursion will kill a machine. Based on what I could see from the logs it's doing a seeming infinite number of "make drotest", but the real problem is the number of processes. When swap usage started to rise, there were over 2800 processes running and it was climbing steadily. All except about 90 of them were children of the "emerge". All the ones I could see where shells. When I killed the emerge, the total number of processes dropped back to 90, and my RAM usage resturned to normal. Having an ebuild fail to install a package is one thing. Having it kill a machine is pretty bad. Not sure how to troubleshoot this further...
*** Bug 233674 has been marked as a duplicate of this bug. ***
Thanks much for your bug report! I'll have a look at it. Best, Markus
Created attachment 161995 [details, diff] propsed patch for infinite compile loop Folks, Please give the above patch a spin and let me know if it fixes your issues. Thanks, Markus
The recursive make calls happens for me too, on ~amd64 Q9450, so it's no x86-only problem.
(In reply to comment #8) > The recursive make calls happens for me too, on ~amd64 Q9450, > so it's no x86-only problem. > With or without the patch?
Without the patch it went into the recursion, with it it compiles and installs fine.
I tried with the patch, and it still recurses infinitely. I'm going to attempt to attach: * a shellscript I used to unpack, patch, compile. * the output from that shellscript containing output from ebuild unpack patch ebuild compile * snapshots of the ebuild process tree taken every 10 seconds or so until the ebuild had created a few hundred processes.
Created attachment 162025 [details] Shell script to unpack/patch/compile using the proposed patch from above.
Created attachment 162026 [details] Gzipped text output from ebuild unpack, patch, ebuild compile
Created attachment 162028 [details] Gzipped text output showing "ps axf" format process tree snapshots at 10s intervals
(In reply to comment #11) > I tried with the patch, and it still recurses infinitely. I'm > going to attempt to attach: > > * a shellscript I used to unpack, patch, compile. Just to be paranoid, I added an "ebuild <...> clean" before the unpack, and I still got the unending recusion when the make got to the point where it was trying to do a "make drotest". This time I killed the ebuild once it had about 120 process running, and grepped the ebuild output for 'make drottest': # grep 'make drottest' blastest.out2 TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt="" TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" TST: make drottest urout=rot1_x1y1.c opt="" TST: make drottest urout=rot4_x1y1.c opt=""
(In reply to comment #15) > (In reply to comment #11) > > I tried with the patch, and it still recurses infinitely. I'm > > going to attempt to attach: > > > > * a shellscript I used to unpack, patch, compile. > > Just to be paranoid, I added an "ebuild <...> clean" before the > unpack, and I still got the unending recusion when the make got > to the point where it was trying to do a "make drotest". > > This time I killed the ebuild once it had about 120 process > running, and grepped the ebuild output for 'make drottest': Grant, They way you apply the patch (via your script) won't work properly. The ebuild's "unpack" stage does more than just unpack the tarball and also runs atlas' configure stage. Rather, in the 3.9.1 ebuild add the line epatch "${FILESDIR}"/${P}-timing.patch right after the other patch lines, move the timing patch itself to the files directory, re-digest and then try re-emerging. Best, Markus
(In reply to comment #16) > > > * a shellscript I used to unpack, patch, compile. > They way you apply the patch (via your script) won't work > properly. The ebuild's "unpack" stage does more than just > unpack the tarball and also runs atlas' configure stage. Of course. I should have realized that. I modified the ebuild so that the patch is applied before the configure operation, and it installed fine. Thanks!
(In reply to comment #17) > (In reply to comment #16) > > > > > * a shellscript I used to unpack, patch, compile. > > > They way you apply the patch (via your script) won't work > > properly. The ebuild's "unpack" stage does more than just > > unpack the tarball and also runs atlas' configure stage. > > Of course. I should have realized that. I modified the ebuild > so that the patch is applied before the configure operation, > and it installed fine. > > Thanks! > That's good news and thanks a lot for testing! I'll add the patch to portage then. I believe these fixes will be in the next atlas (3.9.2) release. Best, Markus
These patches are now in portage cvs. Thanks to everybody for testing them. Best, Markus
I know it is a little bit off topic, but where I can find a description about of the meaning of the ${MY_PN}, ${PATCH_V}, ${DISTDIR} and ${P} macros?
Some are here: http://devmanual.gentoo.org/ebuild-writing/variables/index.html others are defined as needed.