Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 632670 - dev-lisp/sbcl-1.4.0 fails to build: CORRUPTION WARNING in SBCL: Memory fault - The integrity of this image is possibly compromised
Summary: dev-lisp/sbcl-1.4.0 fails to build: CORRUPTION WARNING in SBCL: Memory fault ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Common Lisp Bugs
URL:
Whiteboard:
Keywords:
: 632646 633860 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-10-02 04:32 UTC by Lee Starnes
Modified: 2017-11-13 22:17 UTC (History)
21 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log (with control codes stripped) (build.log,205.18 KB, text/x-log)
2017-10-02 04:32 UTC, Lee Starnes
Details
emerge --info '=dev-lisp/sbcl-1.4.0::gentoo' (emerge-info.txt,6.07 KB, text/plain)
2017-10-02 04:33 UTC, Lee Starnes
Details
emerge -pqv '=dev-lisp/sbcl-1.4.0::gentoo' (emerge-pqv.txt,465 bytes, text/plain)
2017-10-02 04:33 UTC, Lee Starnes
Details
Respect default {C,LD}FLAGS values, fixing broken binaries. (sbcl-1.4.0.patch,1.42 KB, patch)
2017-10-04 22:38 UTC, Mihai Moldovan
Details | Diff
[v2] Respect default {C,LD}FLAGS values, fixing broken binaries. (sbcl-1.4.0.patch,2.10 KB, patch)
2017-10-06 00:10 UTC, Mihai Moldovan
Details | Diff
build.log and emerge --info (file_632670.txt,41.68 KB, text/plain)
2017-10-06 19:56 UTC, Dmitry Derevyanko
Details
script building logs (file_632670.txt,25.12 KB, text/plain)
2017-10-06 22:44 UTC, Dmitry Derevyanko
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lee Starnes 2017-10-02 04:32:44 UTC
Created attachment 497390 [details]
build.log (with control codes stripped)

Emerging =dev-lisp/sbcl-1.4.0 on amd64 fails with the following error:

+ ./src/runtime/sbcl --core output/cold-sbcl.core --lose-on-corruption --no-sysinit --no-userinit
This is SBCL 1.4.0, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses.  See the CREDITS and COPYING files in the
distribution for more information.
CORRUPTION WARNING in SBCL pid 21897(tid 0x7ffff7fa1700):
Memory fault at (nil) (pc=(nil), sp=0x7ffff6c9feb0)
The integrity of this image is possibly compromised.
Exiting.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>  * ERROR: dev-lisp/sbcl-1.4.0::gentoo failed (compile phase):
 *   make failed
 * 
 * Call stack:
 *     ebuild.sh, line 115:  Called src_compile
 *   environment, line 2466:  Called die
 * The specific snippet of code:
 *       env - HOME="${T}" PATH="${PATH}" CC="$(tc-getCC)" AS="$(tc-getAS)" LD="$(tc-getLD)" CPPFLAGS="${CPPFLAGS}" CFLAGS="${CFLAGS}" ASFLAGS="${ASFLAGS}" LDFLAGS="${LDFLAGS}" GNUMAKE=make ./make.sh "sh ${bindir}/run-sbcl.sh --no-sysinit --no-userinit --disable-debugger" || die "make failed";
 * 
 * If you need support, post the output of `emerge --info '=dev-lisp/sbcl-1.4.0::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=dev-lisp/sbcl-1.4.0::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/dev-lisp/sbcl-1.4.0/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/dev-lisp/sbcl-1.4.0/temp/environment'.
 * Working directory: '/var/tmp/portage/dev-lisp/sbcl-1.4.0/work/sbcl-1.4.0'
 * S: '/var/tmp/portage/dev-lisp/sbcl-1.4.0/work/sbcl-1.4.0'
Comment 1 Lee Starnes 2017-10-02 04:33:20 UTC
Created attachment 497392 [details]
emerge --info '=dev-lisp/sbcl-1.4.0::gentoo'
Comment 2 Lee Starnes 2017-10-02 04:33:39 UTC
Created attachment 497394 [details]
emerge -pqv '=dev-lisp/sbcl-1.4.0::gentoo'
Comment 3 Andrey Grozin gentoo-dev 2017-10-02 13:12:46 UTC
I had this problem on ~x86 starting from sbcl-1.3.18, see https://bugs.launchpad.net/sbcl/+bug/1710375
On ~amd64 everything was fine until now, but sbcl-1.4.0 fails on ~amd64 in the same way. git bisect on my ~x86 box showed that the patch which introduced the problem was

commit f4e758d44524b25ae23cafadb2f5eda76323f4dd
Author: Douglas Katzman <dougk@google.com>
Date: Tue May 2 19:15:33 2017 -0400

    Give user more control over C compiler flags in src/runtime.

    And though make-target-contrib.sh clears EXTRA_CFLAGS, append them
    in contrib/asdf-module.mk instead of assigning,
    to allow driving the contribs build step differently.

So, the problem is with user's $CFLAGS. With their default CFLAGS sbcl builds successfully; Gentoo propagates user's (arbitrary) CFLAGS into the build process, and it fails. If we'll find out what specific flag[s] cause the failure, we'll be able to filter it [them] out.
Comment 4 Jouni Kosonen 2017-10-02 14:30:13 UTC
(In reply to Andrey Grozin from comment #3)
> So, the problem is with user's $CFLAGS. With their default CFLAGS sbcl
> builds successfully; Gentoo propagates user's (arbitrary) CFLAGS into the
> build process, and it fails. If we'll find out what specific flag[s] cause
> the failure, we'll be able to filter it [them] out.

My build fails with this error too. 
For a data point, my CFLAGS as portageq sees them are
-march=amdfam10 -O2 -pipe
and what gets recorded in the build log is 
-march=amdfam10 -O2 -pipe -Wall -Wsign-compare -Wpointer-arith

Building without -pipe in CFLAGS fails the same way.
Building without -march in CFLAGS fails the same way.
Building without -O2 in CFLAGS fails the same way.
Building with empty CFLAGS fails the same way.

CORRUPTION WARNING in SBCL pid 28799(tid 0x7ffff7f65700):
Memory fault at (nil) (pc=(nil), sp=0x7ffff6c8feb0)
The integrity of this image is possibly compromised.
Exiting.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>  * ERROR: dev-lisp/sbcl-1.4.0::gentoo failed (compile phase):
 *   make failed

dev-lisp/sbcl-1.3.21 still builds successfully here.
Comment 5 Small_Penguin 2017-10-02 21:22:17 UTC
I can confirm this issue, but only use -march=native -O2 -pipe -fno-ident -ggdb
LDFLAGS="${LDFLAGS} -Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -Wl,--sort-common -Wl,-z,now"

This happens on sandy bridge and haswell.
Comment 6 Juergen Rose 2017-10-03 15:10:13 UTC
It fails here with gcc-6.4 on 
Skylake systems with:
   CFLAGS="-march=native -O2 -pipe"
   CFLAGS="-march=broadwell -O2 -pipe"
   CFLAGS="skylake-avx512 -O2 -pipe" ,
at Ivy Bridge systems with:
   CFLAGS="-march=native -O2 -pipe"
   CFLAGS="-march=ivybridge -O2 -pipe"
and AMD phenom systems with:
   CFLAGS="-march=amdfam10 -O2 -pipe" .
Comment 7 Mihai Moldovan 2017-10-04 20:14:12 UTC
(In reply to Andrey Grozin from comment #3)
> I had this problem on ~x86 starting from sbcl-1.3.18, see
> https://bugs.launchpad.net/sbcl/+bug/1710375
> On ~amd64 everything was fine until now, but sbcl-1.4.0 fails on ~amd64 in
> the same way. git bisect on my ~x86 box showed that the patch which
> introduced the problem was
> 
> commit f4e758d44524b25ae23cafadb2f5eda76323f4dd
> Author: Douglas Katzman <dougk@google.com>
> Date: Tue May 2 19:15:33 2017 -0400
> 
>     Give user more control over C compiler flags in src/runtime.
> 
>     And though make-target-contrib.sh clears EXTRA_CFLAGS, append them
>     in contrib/asdf-module.mk instead of assigning,
>     to allow driving the contribs build step differently.
> 
> So, the problem is with user's $CFLAGS. With their default CFLAGS sbcl
> builds successfully; Gentoo propagates user's (arbitrary) CFLAGS into the
> build process, and it fails. If we'll find out what specific flag[s] cause
> the failure, we'll be able to filter it [them] out.

I can't confirm this on my ~amd64 machine.

Overriding CFLAGS, CXXFLAGS and LDFLAGS to an empty value with GCC 6.3.0 and 6.3.0 leads the same result and doesn't work.

Compiling the package with GCC 5.4.0 and my default {C{,XX},LD}FLAGS as well as an empty value for these works fine.

My current assessment is that filtering out *FLAGS won't help us. This problem seems to be triggered by... code generation changes in GCC 6+? There seems to be some incompatibility.
Comment 8 Mihai Moldovan 2017-10-04 22:33:12 UTC
Thanks to a very helpful upstream developer, it was determined that the CFLAGS and LINKFLAGS override from #526194 and #620532 break no-pie builds. We need to respect the default values of LINKFLAGS which includes -Wl,-export-dynamic for non-PIE builds to work and additionally use -fno-pie when compiling the code into object files.

Both things were overridden by the overzealous sed replacements.

Instead of overriding that in a hard way, let's take a saner approach and just append our values.
Comment 9 Mihai Moldovan 2017-10-04 22:38:30 UTC
Created attachment 497704 [details, diff]
Respect default {C,LD}FLAGS values, fixing broken binaries.

Please test this patch - especially on x86 as well, since I have no such machine.
Comment 10 Mihai Moldovan 2017-10-04 22:46:24 UTC
One drawback of this approach is that user CFLAGS will be passed to GCC twice per invocation - once normally as the default value and later on via appending to the first affected line in the Makefile. This isn't pretty, but necessary to let users override some of the default CFLAGS values, particularly -O3, as the last switch on the command line takes precedence.

We could clear CFLAGS in the environment when calling make.sh, but that might have unforeseen consequences I'm not keen on exploring.

Passing the CFLAGS twice thus sounds like something I can live with.


Another change that this introduces is that debugging symbols will be enabled by default, since -g is part of the default CFLAGS values that gets appended in the Makefile.

If that's a problem for anyone, we can easily change the replacement line to something like "@CFLAGS += -g \(.*\)@CFLAGS += \1 ${CFLAGS}@" instead. In my opinion, debug symbols for sbcl won't be a huge problem though, so I'm keeping it simple for now.
Comment 11 Jouni Kosonen 2017-10-04 23:53:06 UTC
The ebuild patched as in comment #9 does complete the build on my amd64 with gcc-6.4.0.
Comment 12 Andrey Grozin gentoo-dev 2017-10-05 15:30:20 UTC
(In reply to Mihai Moldovan from comment #9)
> Created attachment 497704 [details, diff] [details, diff]
> Respect default {C,LD}FLAGS values, fixing broken binaries.
> 
> Please test this patch - especially on x86 as well, since I have no such
> machine.
The patched sbcl-1.4.0.ebuild successfully builds on both my ~amd64 and ~x86 boxes. The resulting sbcl successfully compiles maxima (and this is a strong check!). Thanks.
Comment 13 Michael J Coss 2017-10-05 23:30:51 UTC
I have a similar problem with 1.4.0 ebuild caused by the rather aggressive replacement of both CFLAGS and LINKFLAGS.  However in my case, it wasn't the CFLAGS but rather the replacement of LINKFLAGS with my LDFLAGS that caused an issue with the build,  My LDFLAGS adds -s. The resultant stripped sbcl binary fails in the build process because it uses nm to look for symbols, and finding none, fails the build.  Note that the proposed patch, still fails on my current setup.  I just patched it locally to remove propagation of LDFLAGS, and it builds fine.
Comment 14 Mihai Moldovan 2017-10-05 23:55:53 UTC
(In reply to Michael J Coss from comment #13)
> I have a similar problem with 1.4.0 ebuild caused by the rather aggressive
> replacement of both CFLAGS and LINKFLAGS.  However in my case, it wasn't the
> CFLAGS but rather the replacement of LINKFLAGS with my LDFLAGS that caused
> an issue with the build,  My LDFLAGS adds -s. The resultant stripped sbcl
> binary fails in the build process because it uses nm to look for symbols,
> and finding none, fails the build.  Note that the proposed patch, still
> fails on my current setup.  I just patched it locally to remove propagation
> of LDFLAGS, and it builds fine.

That makes sense. Especially given the fact that sbcl even needs symbols to be exported that by default are not exported (hence its usage of -export-dynamic.)

I'll probably change the patch to strip out -s and -Wl,-s, but seriously speaking, there are a lot of ways for users to break builds. The easiest would be to just pass an unknown linker option. It's unreasonable to add safeguards for such situations to build systems or package managers.

I really would advise not to use the -s linker flag globally. Rather, let build systems take care of stripping binaries via the strip binary and instruct your package manager to do that by default. This is likely to give you what you want in most cases and make sure that software that explicitly expects symbols to be available won't break (since it won't allow stripping in the first place.)
Comment 15 Mihai Moldovan 2017-10-06 00:10:19 UTC
Created attachment 497812 [details, diff]
[v2] Respect default {C,LD}FLAGS values, fixing broken binaries.

Now with flags stripping in the right place as well. And stripping of -Wl,-s globally and -s in LDFLAGS.
Comment 16 Chema Alonso Josa (RETIRED) gentoo-dev 2017-10-06 08:13:15 UTC
Thanks guys.

Builds and works fine here too applying Mihai's patch with both gcc-5.4.0 and gcc-6.4.0 on amd64.

I will apply this patch in the following days if noone opposes.
Comment 17 Dmitry Derevyanko 2017-10-06 19:56:16 UTC
Created attachment 497922 [details]
build.log and emerge --info

I still have corruption.
Emerge fails but running building script by hands passes.
This is difference in CFLAGS:
Emerge: i686-pc-linux-gnu-gcc -O2 -march=i686 -pipe -O2 -pipe -march=silvermont -m32 -fno-omit-frame-pointer  -I../src/runtime -Wl,-O1 -Wl,--as-needed endianness.c  -ldl -o determine-endianness
Manual: cc -m32 -fno-omit-frame-pointer -I../src/runtime   determine-endianness.c  -ldl -o determine-endianness
Comment 18 Mihai Moldovan 2017-10-06 21:48:27 UTC
(In reply to Dmitry Derevyanko from comment #17)
> Created attachment 497922 [details]
> build.log and emerge --info
> 
> I still have corruption.
> Emerge fails but running building script by hands passes.
> This is difference in CFLAGS:
> Emerge: i686-pc-linux-gnu-gcc -O2 -march=i686 -pipe -O2 -pipe
> -march=silvermont -m32 -fno-omit-frame-pointer  -I../src/runtime -Wl,-O1
> -Wl,--as-needed endianness.c  -ldl -o determine-endianness
> Manual: cc -m32 -fno-omit-frame-pointer -I../src/runtime  
> determine-endianness.c  -ldl -o determine-endianness

Not interesting. This is not what fails in your case.

What fails is the bundled pre-compiled sbcl binary that is being used to bootstrap sbcl.

For a test, please try extracting /usr/portage/distfiles/sbcl-1.2.7-x86-linux-binary.tar.bz2 to a temporary directory and execute the script in the top directory via ./run-sbcl.sh --no-sysinit --no-userinit --disable-debugger (which will call the precompiled binary src/runtime/sbcl.) If that fails, we're in a bit of trouble, since it means that the precompiled binary doesn't work on your system, which is crucial for bootstrapping sbcl. Sadly, 1.2.7 is the latest version as published on http://www.sbcl.org/platform-table.html

The weird part here is that the precompiled x86 binary obviously worked on Andrey's machine, so something about your machine must be different and provoking the failure? I'm afraid there isn't much that could be done as part of the ebuild to fix that, since it's a problem in the upstream binary bootstrap package.
Comment 19 Dmitry Derevyanko 2017-10-06 22:44:10 UTC
Created attachment 497936 [details]
script building logs

This clearly not binary bootstrap package error. I built sbcl previously (1.3.17 at least) and also I can built 1.4.0 manually by running script. May be I should open another bug for this? Also I have to mention that I use glibc-2.26 and gcc-7.2.0.
Comment 20 Mihai Moldovan 2017-10-06 23:14:15 UTC
(In reply to Dmitry Derevyanko from comment #19)
> This clearly not binary bootstrap package error.
Well, it's the bootstrapping sbcl binary that's failing, hence bootstrap package error. :)


> I built sbcl previously (1.3.17 at least) and
> also I can built 1.4.0 manually by running script.
At least good to know, thanks.


> May be I should open another bug for this?
I guess so, since it's something completely different from the original failure described here, even if it looks familiar.


> Also I have to mention that I use
> glibc-2.26 and gcc-7.2.0.
The only binary that is being compiled on your system before the bootstrapping interpreter is called is the determine-endianness tool, which, when executed, should print either ' :little-endian' or ' :big-endian'. If that's not the case, maybe the generated lisp configuration script is incomplete, leading to such a failure.


Please try rebuilding sbcl and wait for it to fail. Then, execute '/var/tmp/portage/dev-lisp/sbcl-1.4.0/work/sbcl-1.4.0/tools-for-build/determine-endianness' - what does it output when compiled via portage?
Comment 21 Dmitry Derevyanko 2017-10-06 23:46:37 UTC
determine-endianness returns ":little-endian". Also as I can build sbcl itself and it runs (though cannot build maxima), then gcc-glibc are not to blame. Something in environment I beleive..
Comment 22 Mihai Moldovan 2017-10-09 02:13:54 UTC
So... if the endianess test binary works correctly, I'm afraid I can't explain what's going on on your system. That's the only binary that has been built so far in your attempts - and given that's only a helper and actually works fine, it looks like a completely different issue.

I'd spin that off as a different bug report.

Sadly, I won't be able to help you, since I have no way to reproduce your issue.

My best guess is that this failure might be related to the sandbox feature, which injects a library into binaries. This doesn't explain why Andrey has not experienced this failure on his ~x86 box, since I strongly believe that he didn't disable the sandboxing feature that is enabled by default (and is very rightfully enabled.)

It's also alarming that your self-compiled sbcl version is not able to successfully build maxima. Something definitely is odd, but I have no idea what that might be.
Comment 23 Chema Alonso Josa (RETIRED) gentoo-dev 2017-10-09 18:56:30 UTC
*** Bug 633860 has been marked as a duplicate of this bug. ***
Comment 24 Chema Alonso Josa (RETIRED) gentoo-dev 2017-10-11 13:29:04 UTC
Patch applied. Closing.

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=87a3c45d345f336b0b766f84dc18e085e2649b8d
Comment 25 Chema Alonso Josa (RETIRED) gentoo-dev 2017-10-11 13:38:30 UTC
*** Bug 632646 has been marked as a duplicate of this bug. ***
Comment 26 Dmitry Derevyanko 2017-11-12 09:24:33 UTC
dev-lisp/sbcl-1.4.1 uses old ebuild and won't compile.
Comment 27 Larry the Git Cow gentoo-dev 2017-11-13 22:15:43 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ba66c87fafb87b82c38b36575e0e1c764dcd4792

commit ba66c87fafb87b82c38b36575e0e1c764dcd4792
Author:     Chema Alonso Josa <nimiux@gentoo.org>
AuthorDate: 2017-11-13 22:12:00 +0000
Commit:     Chema Alonso Josa <nimiux@gentoo.org>
CommitDate: 2017-11-13 22:12:00 +0000

    dev-lisp/sbcl: Fix CFLAGS and LINKFLAGS to let users override the default values
    
    Bug: https://bugs.gentoo.org/632670
    Package-Manager: Portage-2.3.8, Repoman-2.3.3

 dev-lisp/sbcl/sbcl-1.4.1.ebuild | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)}
Comment 28 Chema Alonso Josa (RETIRED) gentoo-dev 2017-11-13 22:17:11 UTC
(In reply to Dmitry Derevyanko from comment #26)
> dev-lisp/sbcl-1.4.1 uses old ebuild and won't compile.

Thanks for the heads up. Fixed now.