Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 523392 - sys-devel/llvm-3.5.0 - armv6j-hardfloat-linux-gnueabi-g++: internal compiler error: Killed (program cc1plus)
Summary: sys-devel/llvm-3.5.0 - armv6j-hardfloat-linux-gnueabi-g++: internal compiler ...
Status: RESOLVED WONTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: ARM Linux
: Normal normal (vote)
Assignee: Bernard Cafarelli
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-21 17:06 UTC by John Bowler
Modified: 2014-09-22 18:30 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
complete build.log (plus extra after ebuilds) (build.log,350.96 KB, text/plain)
2014-09-21 17:06 UTC, John Bowler
Details
emerge --info output (emerge.info,4.85 KB, text/plain)
2014-09-21 17:07 UTC, John Bowler
Details
environment (including non-working DISTCC_IO_TIMEOUT work round) (environment,162.81 KB, text/plain)
2014-09-21 17:08 UTC, John Bowler
Details
Command that times out on the distcc server (cc1plus.command,645 bytes, text/plain)
2014-09-21 17:09 UTC, John Bowler
Details
Preprocessed file being compiled (distccd_4404fc4d.ii.bz2,341.75 KB, application/x-bzip)
2014-09-21 17:11 UTC, John Bowler
Details
Test program to validate how long the compile actually takes (cc1plus.sh,723 bytes, application/x-shellscript)
2014-09-21 17:11 UTC, John Bowler
Details
emerge -pqv '=sys-devel/llvm-3.5.0::gentoo' (emerge.pqv,805 bytes, text/plain)
2014-09-21 17:13 UTC, John Bowler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Bowler 2014-09-21 17:06:47 UTC
Created attachment 385246 [details]
complete build.log (plus extra after ebuilds)

The bug is specific to distcc builds.  It is necessary to use these to build llvm on lower memory ARM systems (e.g. a raspberry pi b+, as in this case).  Unfortunately the latest upgrade to 3.5.0 results in the compilation of Function.cpp now taking more than 5 minutes on my powerful 64-bit x86 system and this exceeds the default setting for DISTCC_IO_TIMEOUT of 300s.  The compile files, a local compile is attempted and some time after cc1plus has grow to 250MByte of RSS (this on a machine with 512MByte of RAM) the aggressive Linux OOM killer kills it.

The problem won't arise on larger systems as the local compile will succeed (hiding the timeout).  It probably won't arise on Linices other than 3.12.x because the OOM killer action is totally unnecessary (there is more than enough swap space to do the compile).

It's not clear how to change the distcc IO timeout - adding declare -x DISTCC_IO_TIMEOUT=600 to environment (see attachments) does not help.  I'm still investigating a work-round.

Note that the attached build.log has a lot of history in it - an emerge followed by several attempts to continue the compile step with "ebuild install".  Marked the bug as minor because it is pretty obvious what is going on; just difficult to find a work-round.
Comment 1 John Bowler 2014-09-21 17:07:25 UTC
Created attachment 385248 [details]
emerge --info output
Comment 2 John Bowler 2014-09-21 17:08:14 UTC
Created attachment 385250 [details]
environment (including non-working DISTCC_IO_TIMEOUT work round)

The only change to the environment from the original emerge generated one was to add DISTCC_IO_TIMEOUT
Comment 3 John Bowler 2014-09-21 17:09:01 UTC
Created attachment 385252 [details]
Command that times out on the distcc server

This is the command copied from ps ww
Comment 4 John Bowler 2014-09-21 17:11:16 UTC
Created attachment 385254 [details]
Preprocessed file being compiled

This is the file the client machine (where the emerge is running) sent to the compile server.  (Compressed because it is so large)
Comment 5 John Bowler 2014-09-21 17:11:59 UTC
Created attachment 385256 [details]
Test program to validate how long the compile actually takes
Comment 6 John Bowler 2014-09-21 17:12:59 UTC
The attachment cc1plus.sh validates that the compile does actually run to completion and verifies how long it takes - just over the 300s limit.  Output is:

+ cp distccd_4404fc4d.ii /tmp/distccd_4404fc4d.ii
+ env -i /usr/libexec/gcc/armv6j-hardfloat-linux-gnueabi/4.8.3/cc1plus -fpreprocessed /tmp/distccd_4404fc4d.ii -quiet -dumpbase distccd_4404fc4d.ii -march=armv6zk -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mfpu=vfp -mfloat-abi=hard -march=armv6zk -mfpu=vfp -mfloat-abi=hard -mtls-dialect=gnu -auxbase-strip /tmp/distccd_4439fc4d.o -g -g -g -Os -Os -Os -Woverloaded-virtual -Wcast-qual -Wpedantic -Wno-long-long -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wno-maybe-uninitialized -Wno-missing-field-initializers -std=c++11 -fvisibility-inlines-hidden -fno-exceptions -fPIC -ffunction-sections -fdata-sections -fstack-protector -o /tmp/ccAiCu7t.s

real    5m52.762s
user    5m52.694s
sys     0m0.226s
Comment 7 John Bowler 2014-09-21 17:13:40 UTC
Created attachment 385258 [details]
emerge -pqv '=sys-devel/llvm-3.5.0::gentoo'
Comment 8 John Bowler 2014-09-21 18:37:08 UTC
It looks like this is a distcc bug; although some distcc documentation describes 'DISTCC_IO_TIMEOUT' the distcc installed by gentoo (sys-devel/distcc-3.1-r9) does not seem to have support for it.

This is why my declaration had no effect; I can demonstrate that it was correct by defining DISTCC_FALLBACK=0 in the same way and this does prevent the certain-to-fail local compile.

I increased the severity of the bug to normal; this is preventing me building llvm 3.5.0, if DISTCC_IO_TIMEOUT worked I could make a work-round and one could be incorporated into the llvm ebuild.
Comment 9 Jeroen Roovers (RETIRED) gentoo-dev 2014-09-22 08:06:30 UTC
Comment on attachment 385246 [details]
complete build.log (plus extra after ebuilds)

distcc[8293] (dcc_select_for_read) ERROR: IO timeout
distcc[8293] (dcc_r_token_int) ERROR: read failed while waiting for token "DONE"
distcc[8293] (dcc_r_result_header) ERROR: server provided no answer. Is the server configured to allow access from your IP address?
Does the server have the compiler installed? Is the server configured to access the compiler?
distcc[8293] Warning: failed to distribute /var/tmp/portage/sys-devel/llvm-3.5.0/work/llvm-3.5.0.src/lib/IR/Function.cpp to hippopop
us.jbowler.com, running locally instead
armv6j-hardfloat-linux-gnueabi-g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.gentoo.org/> for instructions.
distcc[8293] ERROR: compile /var/tmp/portage/sys-devel/llvm-3.5.0/work/llvm-3.5.0.src/lib/IR/Function.cpp on localhost failed with e
xit code 4
/var/tmp/portage/sys-devel/llvm-3.5.0/work/llvm-3.5.0.src/Makefile.rules:1519: recipe for target '/var/tmp/portage/sys-devel/llvm-3.
5.0/work/llvm-3.5.0.src-.arm/lib/IR/Release/Function.o' failed
make[1]: *** [/var/tmp/portage/sys-devel/llvm-3.5.0/work/llvm-3.5.0.src-.arm/lib/IR/Release/Function.o] Error 1


distcc is working fine. You ran out of memory.
Comment 10 Jeroen Roovers (RETIRED) gentoo-dev 2014-09-22 08:10:52 UTC
If a remote distcc job fails, it simply abandons that job and tries again locally. It is there that it failed, using to much memory and getting killed.

You have five make jobs on a system with less than half a gigabyte of memory. Each compiler job (preprocessing, compiling, assembling, linking) might easily take up half a gigabyte.

The best thing to do on such a system is to compile nothing locally, and instead prepare packages on a proper workstation that cross-compiles everything.
Comment 11 John Bowler 2014-09-22 15:16:06 UTC
My fix works with distcc 3.2_rc1 on the *client* side (my server was still 3.1).

The issue is that DISTCC_IO_TIMEOUT isn't supported in 3.1 and 3.2 has been out-for-testing for (apparently) 3 years (might be wrong about that; it's based on the 2011 date in the portage testing mask).

"distcc works fine for me" isn't really true, is it?  It doesn't work - it falls back to the local compile and *that* works (and it takes 5 minutes to find out then probably a lot longer on the local system).
Comment 12 John Bowler 2014-09-22 15:46:15 UTC
Incidentally; the subject line on the bug is wrong; this bug isn't about cc1plus failing because a rabid Linux OOM killer kills it (that's a separate bug), it's a bug about distcc timing out a compile that takes 5m42s.

I listed llvm originally because it happens in the llvm build and, given the built in default timeout of 5m in distcc, it can be fixed in the project ebuild, but I guess you could say the work-round is client (emerge machine) specific and it can be fixed in make.conf.  (So it's nothing to do with llvm, just a distcc issue.)
Comment 13 John Bowler 2014-09-22 18:30:57 UTC
See:

https://bugs.gentoo.org/show_bug.cgi?id=518884

A fix for this bug is blocked by that.

BTW, the OOM killer issue seems to be a bug in Linux 3.12.y; the process tree above cc1plus (or the later ld which suffers from it if this bug is fixed) has plenty of swappable processes waiting on the cc/ld; Linux chooses to kill the child rather than swap (or page) the parent.

Put status back to 'wontfix', it clearly *can* be fixed (whatever you think the bug is - either the too short distcc timeout or the OOM problem.)