Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 233614 - sci-libs/blas-atlas-3.9.1 loops semi-infinitely during `drottest'
Summary: sci-libs/blas-atlas-3.9.1 loops semi-infinitely during `drottest'
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: All Linux
: High major (vote)
Assignee: Gentoo Science Related Packages
URL:
Whiteboard:
Keywords:
: 233674 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-01 15:43 UTC by Alexandre Rostovtsev (RETIRED)
Modified: 2008-08-07 12:35 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Small piece of the build log (blas-atlas-3.9.1-partial-build.log,646.72 KB, text/plain)
2008-08-01 15:48 UTC, Alexandre Rostovtsev (RETIRED)
Details
propsed patch for infinite compile loop (blas-atlas-3.9.1-timing.patch,2.40 KB, patch)
2008-08-02 15:33 UTC, Markus Dittrich (RETIRED)
Details | Diff
Shell script to unpack/patch/compile using the proposed patch from above. (blastest.sh,563 bytes, text/plain)
2008-08-02 20:25 UTC, Grant Edwards
Details
Gzipped text output from ebuild unpack, patch, ebuild compile (blastest.out.gz,48.41 KB, application/octet-stream)
2008-08-02 20:28 UTC, Grant Edwards
Details
Gzipped text output showing "ps axf" format process tree snapshots at 10s intervals (ebuild-blas_ps.out.gz,26.00 KB, text/plain)
2008-08-02 20:29 UTC, Grant Edwards
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandre Rostovtsev (RETIRED) gentoo-dev 2008-08-01 15:43:27 UTC
After starting an emerge to update a couple of packages, I discovered that blas-atlas-3.9.1 was taking over 15 hours to emerge (normally, it takes 3.5 hours). After looking at the build log, it looks like it's stuck in an infinite loop while running drottest.

Normally, I would have attached a complete build log, but it's 500MB long:)

Note 1: the build infinite loop is repeatable.
Note 2: blas-atlas-3.8.0, 3.8.1 and 3.8.2 had emerged fine on this machine, taking ~3.5 hours to compile.
Note 3: blas-atlas-3.9.1 had emerged correctly on a different machine (~amd64, Q6600).

# emerge --info
Portage 2.2_rc5 (default/linux/x86/2008.0/desktop, gcc-4.3.1, glibc-2.8_p20080602-r0, 2.6.25-gentoo-r6 i686)
=================================================================
System uname: Linux-2.6.25-gentoo-r6-i686-Intel-R-_Pentium-R-_M_processor_1.60GHz-with-glibc2.0
Timestamp of tree: Thu, 31 Jul 2008 04:30:01 +0000
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p39
dev-java/java-config: 1.3.7, 2.1.6-r1
dev-lang/python:     2.4.4-r13, 2.5.2-r5
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
dev-util/confcache:  0.4.2-r1
sys-apps/baselayout: 2.0.0
sys-apps/openrc:     0.2.5
sys-apps/sandbox:    1.2.18.1-r3
sys-devel/autoconf:  2.13, 2.62-r1
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1-r1
sys-devel/binutils:  2.16.1-r3, 2.17-r2, 2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   2.2.4
virtual/os-headers:  2.6.25-r4
ACCEPT_KEYWORDS="x86 ~x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=pentium-m -O2 -pipe -frename-registers"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/NX/etc /usr/NX/home /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=pentium-m -O2 -pipe -frename-registers"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks parallel-fetch preserve-libs sandbox sfperms strict unmerge-orphans userfetch userpriv"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LANG="en_US.utf8"
LDFLAGS="-Wl,--as-needed -Wl,-O1"
LINGUAS="C en POSIX ru"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
Comment 1 Alexandre Rostovtsev (RETIRED) gentoo-dev 2008-08-01 15:48:39 UTC
Created attachment 161918 [details]
Small piece of the build log

A small piece of the 500MB build log, showing the infinite loop.
Comment 2 Grant Edwards 2008-08-01 23:51:10 UTC
I think I'm seeing something similar.  When I try to emerge
blas-atlas 3.9.1 the emerge runs for an hour or so, and then
starts to eat RAM.  I've got 1.5GB of RAM and 1.5GB of swap. At
some point during the blas-atlas emerge, all of RAM usage
skyrockets.  Soon RAM and swap are both 100% full and the
machine becomes non-response to the point where I have to press
the reset button.

Not good...

Comment 3 Grant Edwards 2008-08-01 23:57:56 UTC
(In reply to comment #2)
> I think I'm seeing something similar.  When I try to emerge
> blas-atlas 3.9.1 the emerge runs for an hour or so, and then
> starts to eat RAM.

I should add that I've had blas-atlas installed on this machine
for years. Emerging previous versions didn't turn into DOS attacks.  ;)
Comment 4 Grant Edwards 2008-08-02 00:18:28 UTC
It's far worse than an infinite loop: it's an infinite
recursion.

An infinite loop just wastes CPU time.  Infinite recursion will
kill a machine.

Based on what I could see from the logs it's doing a seeming
infinite number of "make drotest", but the real problem is the
number of processes.  When swap usage started to rise, there
were over 2800 processes running and it was climbing steadily.
All except about 90 of them were children of the "emerge".  All
the ones I could see where shells.

When I killed the emerge, the total number of processes dropped
back to 90, and my RAM usage resturned to normal. Having an
ebuild fail to install a package is one thing. Having it kill a
machine is pretty bad.

Not sure how to troubleshoot this further...


Comment 5 Wormo (RETIRED) gentoo-dev 2008-08-02 07:19:45 UTC
*** Bug 233674 has been marked as a duplicate of this bug. ***
Comment 6 Markus Dittrich (RETIRED) gentoo-dev 2008-08-02 11:21:28 UTC
Thanks much for your bug report! I'll have a look at it.

Best,
Markus
Comment 7 Markus Dittrich (RETIRED) gentoo-dev 2008-08-02 15:33:23 UTC
Created attachment 161995 [details, diff]
propsed patch for infinite compile loop

Folks,

Please give the above patch a spin and let me know if it
fixes your issues.

Thanks,
Markus
Comment 8 emerald 2008-08-02 18:06:44 UTC
The recursive make calls happens for me too, on ~amd64 Q9450,
so it's no x86-only problem.
Comment 9 Markus Dittrich (RETIRED) gentoo-dev 2008-08-02 18:44:43 UTC
(In reply to comment #8)
> The recursive make calls happens for me too, on ~amd64 Q9450,
> so it's no x86-only problem.
> 

With or without the patch?

Comment 10 emerald 2008-08-02 19:19:57 UTC
Without the patch it went into the recursion, with it it compiles and installs fine.
Comment 11 Grant Edwards 2008-08-02 20:23:54 UTC
I tried with the patch, and it still recurses infinitely. I'm
going to attempt to attach:

 * a shellscript I used to unpack, patch, compile.

 * the output from that shellscript containing output from
     ebuild unpack
     patch
     ebuild compile

 * snapshots of the ebuild process tree taken every 10 seconds
   or so until the ebuild had created a few hundred processes.
Comment 12 Grant Edwards 2008-08-02 20:25:08 UTC
Created attachment 162025 [details]
Shell script to unpack/patch/compile using the proposed patch from above.
Comment 13 Grant Edwards 2008-08-02 20:28:51 UTC
Created attachment 162026 [details]
Gzipped text output from ebuild unpack, patch, ebuild compile
Comment 14 Grant Edwards 2008-08-02 20:29:50 UTC
Created attachment 162028 [details]
Gzipped text output showing "ps axf" format process tree snapshots at 10s intervals
Comment 15 Grant Edwards 2008-08-02 20:55:24 UTC
(In reply to comment #11)
> I tried with the patch, and it still recurses infinitely. I'm
> going to attempt to attach:
> 
>  * a shellscript I used to unpack, patch, compile.

Just to be paranoid, I added an "ebuild <...> clean" before the
unpack, and I still got the unending recusion when the make got
to the point where it was trying to do a "make drotest".

This time I killed the ebuild once it had about 120 process
running, and grepped the ebuild output for 'make drottest':

 # grep 'make drottest' blastest.out2
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 
TST: make drottest urout=rot1_x0y0.c opt=" -X 4 1 -1 2 -3 -Y 4 1 -1 3 -2" 
TST: make drottest urout=rot1_x1y1.c opt="" 
TST: make drottest urout=rot4_x1y1.c opt="" 

Comment 16 Markus Dittrich (RETIRED) gentoo-dev 2008-08-02 22:54:36 UTC
(In reply to comment #15)
> (In reply to comment #11)
> > I tried with the patch, and it still recurses infinitely. I'm
> > going to attempt to attach:
> > 
> >  * a shellscript I used to unpack, patch, compile.
> 
> Just to be paranoid, I added an "ebuild <...> clean" before the
> unpack, and I still got the unending recusion when the make got
> to the point where it was trying to do a "make drotest".
> 
> This time I killed the ebuild once it had about 120 process
> running, and grepped the ebuild output for 'make drottest':

Grant,

They way you apply the patch (via your script) won't work
properly. The ebuild's "unpack" stage does more than just
unpack the tarball and also runs atlas' configure stage.
Rather, in the 3.9.1 ebuild add the line

epatch "${FILESDIR}"/${P}-timing.patch

right after the other patch lines, move the timing patch itself
to the files directory, re-digest and then try re-emerging.

Best,
Markus

Comment 17 Grant Edwards 2008-08-03 14:07:05 UTC
(In reply to comment #16)

> > >  * a shellscript I used to unpack, patch, compile.

> They way you apply the patch (via your script) won't work
> properly. The ebuild's "unpack" stage does more than just
> unpack the tarball and also runs atlas' configure stage.

Of course.  I should have realized that.  I modified the ebuild
so that the patch is applied before the configure operation,
and it installed fine.

Thanks!

Comment 18 Markus Dittrich (RETIRED) gentoo-dev 2008-08-03 14:43:00 UTC
(In reply to comment #17)
> (In reply to comment #16)
> 
> > > >  * a shellscript I used to unpack, patch, compile.
> 
> > They way you apply the patch (via your script) won't work
> > properly. The ebuild's "unpack" stage does more than just
> > unpack the tarball and also runs atlas' configure stage.
> 
> Of course.  I should have realized that.  I modified the ebuild
> so that the patch is applied before the configure operation,
> and it installed fine.
> 
> Thanks!
> 

That's good news and thanks a lot for testing! I'll add the patch
to portage then. I believe these fixes will be in the next atlas 
(3.9.2) release.

Best,
Markus
Comment 19 Markus Dittrich (RETIRED) gentoo-dev 2008-08-03 19:11:11 UTC
These patches are now in portage cvs.
Thanks to everybody for testing them.

Best,
Markus
Comment 20 Juergen Rose 2008-08-06 13:22:09 UTC
I know it is a little bit off topic, but where I can find a description about of the meaning of the ${MY_PN}, ${PATCH_V}, ${DISTDIR} and ${P} macros?
Comment 21 Jeffrey Gardner (RETIRED) gentoo-dev 2008-08-07 12:35:18 UTC
Some are here: http://devmanual.gentoo.org/ebuild-writing/variables/index.html others are defined as needed.