Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 351663 - =dev-vcs/git-1.7.3.4-r1 intermittent failure to emerge (make exiting with jobserver tokens available)
Summary: =dev-vcs/git-1.7.3.4-r1 intermittent failure to emerge (make exiting with job...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Development (show other bugs)
Hardware: All Linux
: High normal
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-14 15:14 UTC by Israel G. Lugo
Modified: 2011-06-27 19:42 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info =dev-vcs/git-1.7.3.4-r1 (info.txt,4.37 KB, text/plain)
2011-01-14 15:17 UTC, Israel G. Lugo
Details
build.log (build.log,6.94 KB, text/plain)
2011-01-14 15:20 UTC, Israel G. Lugo
Details
failed build.log with FEATURES=-sandbox (build.log,14.24 KB, text/plain)
2011-01-25 10:46 UTC, Israel G. Lugo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Israel G. Lugo 2011-01-14 15:14:31 UTC
I was doing emerge -ve system && emerge -ve world on a small server after performing all upgrades from the past months (including an upgrade from gcc-4.3.4 to gcc-4.4.4-r2), when I received the following error:

    [...]
    CC builtin/tag.o
builtin/tag.c: In function 'show_reference':
builtin/tag.c:79: warning: ignoring return value of 'fwrite', declared with attribute warn_unused_result
    CC builtin/tar-tree.o
    CC builtin/unpack-file.o
    CC builtin/unpack-objects.o
    CC builtin/update-index.o
    CC builtin/update-ref.o
make: *** [builtin/update-ref.o] Terminated
make: *** Deleting file `builtin/update-ref.o'
make: INTERNAL: Exiting with 3 jobserver tokens available; should be 2!
emake failed
 * ERROR: dev-vcs/git-1.7.3.4-r1 failed:
 *   emake failed
 * 
 * Call stack:
 *     ebuild.sh, line  56:  Called src_compile
 *   environment, line 2967:  Called die
 * The specific snippet of code:
 *       git_emake || die "emake failed";
 *

I'm using MAKEOPTS="--jobs=2 --load-average=2" and calling emerge itself with --jobs=1 --load-average=2. The problem is intermittent; I have since emerged the package successfully by hand without changing anything else (using "emerge -va1 =dev-vcs/git-1.7.3.4-r1"). Also, the package had been successfully emerged before, during the "emerge -vaNuD world" phase (albeit with an older gcc at the time).

I would suggest it may be a parallelism issue, similar for example to what was found in bug #337715 -- I haven't looked at the code, though.

As an added note, this is a 32-bit Gentoo Hardened installation; I'm using the hardened toolchain and kernel. Details will follow in attachment.
Comment 1 Israel G. Lugo 2011-01-14 15:17:10 UTC
Created attachment 259826 [details]
emerge --info =dev-vcs/git-1.7.3.4-r1
Comment 2 Israel G. Lugo 2011-01-14 15:20:23 UTC
Created attachment 259827 [details]
build.log

Also, the output of emerge -pqv =dev-vcs/git-1.7.3.4-r1 is:
[ebuild   R   ] dev-vcs/git-1.7.3.4-r1  USE="bash-completion blksha1 curl iconv perl threads webdav -cgi -cvs -doc -emacs -gtk (-ppcsha1) -subversion -tk -xinetd"
Comment 3 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2011-01-15 13:38:20 UTC
Looks like an internal issue in make, assigning to base-system@.
Comment 4 Israel G. Lugo 2011-01-15 14:11:36 UTC
(In reply to comment #3)
> Looks like an internal issue in make, assigning to base-system@.
> 

This is just a guess, since I haven't looked at the code, but I have seen this before caused by compile-time dependencies being incorrectly specified on the Makefiles -- bug 337715 for example was manifesting itself in a very similar way, and it turned out the code was including a few .h files without declaring the dependencies on the respective targets (see bug 337715, comment 8).

Even if that does happen to be the case though, it seems like make should be able to detect this and provide a useful error, instead of just "omgwtfbye".
Comment 5 SpanKY gentoo-dev 2011-01-15 23:57:35 UTC
while i have seen the error "make: INTERNAL: Exiting with 3 jobserver tokens available; should be 2!" before, it was never relevant to the failure at hand

does it fail if you do FEATURES=-sandbox ?
Comment 6 Israel G. Lugo 2011-01-16 01:50:43 UTC
(In reply to comment #5)
> does it fail if you do FEATURES=-sandbox ?
> 

I'm entirely sure I can replicate it at will; the package built successfully upon upgrade, then it failed on emerge -ve world, then it succeeded again manually.

I can try to rebuild the package several times with sandbox, then several times without, to try and catch a pattern -- I won't be able to do that for a couple of days, though, since this is a production server and I need to do other unrelated stuff on it first.
Comment 7 Israel G. Lugo 2011-01-16 01:51:23 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > does it fail if you do FEATURES=-sandbox ?
> > 
> 
> I'm entirely sure I can replicate it at will; the package built successfully
> upon upgrade, then it failed on emerge -ve world, then it succeeded again
> manually.

Sorry, I should've written "I'm *not* entirely sure I can replicate it at will".
Comment 8 Israel G. Lugo 2011-01-20 03:25:42 UTC
I am seeing similar intermittent failure behavior on another server (amd64 hardened), with a different ebuild: =sys-process/audit-1.7.3

The immediate cause for failure in that case seems to be a missing .h file, as per the gcc error, followed by a similar "make: INTERNAL: Exiting with 4 jobserver tokens available; should be 3!".

I have reported that problem as bug 352198, and added a detailed explanation there. Please take a look as perhaps the two may be related; I don't know if they are related or not... They could be if it's a make internal issue, but it could also be a problem with the Makefiles on sys-process/audit or something.

As for this bug, I shall try to recreate the problem (emerging git) with and without sandbox as requested. Haven't had much time to look at this since I'm quite busy performing seasonal upgrades to a lot of servers at work.
Comment 9 Israel G. Lugo 2011-01-21 17:44:07 UTC
(In reply to comment #5)
> while i have seen the error "make: INTERNAL: Exiting with 3 jobserver tokens
> available; should be 2!" before, it was never relevant to the failure at hand
> 
> does it fail if you do FEATURES=-sandbox ?
> 

I have been unable to reproduce the error while emerging the package by itself, without changing anything else. I ran "emerge -va1 =dev-vcs/git-1.7.3.4-r1" 20 times and there was no failure. The intermittent nature and the fact that so far it has only failed while building alongside other packages would seem to suggest some kind of race being triggered.

Regarding bug 352198, it turned out to be a missing dependency on a Makefile. It caused a race condition where a given header was not guaranteed to be present when a certain source code unit needed it for compilation.

I will try emerging several other ebuilds simultaneously to see if I can reproduce this problem more accurately. If so, I will then try with FEATURES=-sandbox.
Comment 10 Israel G. Lugo 2011-01-25 10:46:11 UTC
Created attachment 260653 [details]
failed build.log with FEATURES=-sandbox

(In reply to comment #9)
> 
> I will try emerging several other ebuilds simultaneously to see if I can
> reproduce this problem more accurately. If so, I will then try with
> FEATURES=-sandbox.
> 

I've been able to reproduce this fairly accurately by doing "emerge -v1 =dev-vcs/git-1.7.3.4-r1" in a loop, while emerging perl in the background on another tty to add CPU load. It will usually fail around the 4th try. Emerging with FEATURES=-sandbox fails as well.

The intermittent nature would seem at first glance to suggest a parallel build issue causing some kind of race...

Curiously, the errors which I have been getting are slightly different from the initial report. I am attaching the build.log of a failed compilation, with FEATURES=-sandbox. The relevant final part follows inline:

ws.c:236: warning: ignoring return value of 'fwrite', declared with attribute warn_unused_result
    CC wt-status.o
    CC xdiff-interface.o
    CC block-sha1/sha1.o
    CC thread-utils.o
    CC compat/strlcpy.o
    CC xdiff/xdiffi.o
    CC xdiff/xprepare.o
    CC xdiff/xutils.o
    CC xdiff/xemit.o
    CC xdiff/xmerge.o
    CC xdiff/xpatience.o
    CC imap-send.o
    CC shell.o
    CC show-index.o
    CC upload-pack.o
    CC http-backend.o
    CC http.o
    CC http-walker.o
    CC http-fetch.o
    CC http-push.o
    CC daemon.o
    CC remote-curl.o
    GEN git-add--interactive
    GEN git-difftool
Writing perl.mak for Git
Writing perl.mak for Git
    GEN git-archimport
make[2]: *** [perl.mak] Error 1
make[1]: *** [instlibdir] Error 2
make: *** [git-add--interactive] Error 2
make: *** Waiting for unfinished jobs....
emake failed
 * ERROR: dev-vcs/git-1.7.3.4-r1 failed:
 *   emake failed
 * 
 * Call stack:
 *     ebuild.sh, line  56:  Called src_compile
 *   environment, line 2964:  Called die
 * The specific snippet of code:
 *       git_emake || die "emake failed";
Comment 11 Israel G. Lugo 2011-01-25 10:48:52 UTC
(In reply to comment #10)
> Writing perl.mak for Git
> Writing perl.mak for Git
>     GEN git-archimport
> make[2]: *** [perl.mak] Error 1
> make[1]: *** [instlibdir] Error 2
> make: *** [git-add--interactive] Error 2
> make: *** Waiting for unfinished jobs....
> emake failed
>  * ERROR: dev-vcs/git-1.7.3.4-r1 failed:
>  *   emake failed

Unless I'm triggering this particular problem inadvertently by emerging perl in the background (i.e. a race caused by merging perl back into the filesystem while this build needs it). I shall try again, doing something completely different like compiling the kernel.
Comment 12 SpanKY gentoo-dev 2011-02-12 22:57:44 UTC
your perl test case doesnt make much sense i dont think as you'll be updating files which you're also trying to execute.  that failure case is unrelated to the original issue.

the only case i'm interested in is the one where it fails like so:
make: *** [builtin/update-ref.o] Terminated

we need to figure out why that is being Terminated.  is there anything in `dmesg` to indicate the kernel is killing it due to OOM ?
Comment 13 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2011-06-27 19:42:20 UTC
No response from user, closing.