Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 673560 - gcc-7.3.0-r3 segfault in __add_to_environ at setenv.c:143
Summary: gcc-7.3.0-r3 segfault in __add_to_environ at setenv.c:143
Status: RESOLVED DUPLICATE of bug 669702
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-12-22 07:17 UTC by Walther
Modified: 2019-01-07 00:50 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge openssl log (log.txt,28.84 KB, text/plain)
2018-12-22 07:17 UTC, Walther
Details
emerge --info (emerge-info.txt,14.25 KB, text/plain)
2018-12-22 07:18 UTC, Walther
Details
-march=native expansion (cflags.txt,1.85 KB, patch)
2018-12-22 18:29 UTC, Walther
Details | Diff
environment of the failed emerge (environment,110.22 KB, text/plain)
2018-12-22 18:30 UTC, Walther
Details
Valgrind output, reduced to fit the 1000KB limit. (valgrind.out,823.37 KB, text/plain)
2018-12-28 20:43 UTC, Walther
Details
Valgrind output (valgrind.tbz,95.30 KB, application/x-bzip-compressed-tar)
2018-12-29 00:26 UTC, Walther
Details
sandbox-2.14-sigchld-init.patch (sandbox-2.14-sigchld-init.patch,1.42 KB, patch)
2018-12-29 12:04 UTC, Sergei Trofimovich (RETIRED)
Details | Diff
valgrind output after patching sandbox (valgrind.tbz,285.27 KB, application/x-bzip-compressed-tar)
2018-12-30 15:26 UTC, Walther
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Walther 2018-12-22 07:17:29 UTC
Created attachment 558334 [details]
emerge openssl log

For a few weeks I've had difficulties updating my world due to some packages having a segmentation fault. Since the problem seems to lie in gcc-7.3.0-r3 + glibc-2.27-r6, I decided to build both of them using -ggdb and FEATURES=nosplit. I selected openssl-1.0.2p-r1 as package to debug. After setting ulimit -c unlimited and emerging openssl I get a segmentation fault with the attached core-dump backtrace.

Then, trying to get further information on the bug (see attached backtrace, the problem seems to be on the environmental variable setting, but I don't know enough of the glibc/gcc code to figure out on my own what went wrong. Help?

[code]
> cd /var/tmp/portage/dev-libs/openssl-1.0.2p-r1/work/openssl-1.0.2p-abi_x86_64.amd64/crypto/objects
> file core
core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'x86_64-pc-linux-gnu-gcc -Werror -D OPENSSL_DOING_MAKEDEPEND -M -fPIC -DOPENSSL_', real uid: 250, effective uid: 250, real gid: 250, effective gid: 250, execfn: '/usr/bin/x86_64-pc-linux-gnu-gcc', platform: 'x86_64'
> gdb /usr/bin/x86_64-pc-linux-gnu-gcc -core core --directory=/home/wallex/scripts/glibc-2.27/stdlib/
GNU gdb (Gentoo 8.1 p1) 8.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/x86_64-pc-linux-gnu-gcc...done.

warning: core file may not match specified executable file.
[New LWP 16940]
Core was generated by `x86_64-pc-linux-gnu-gcc -Werror -D OPENSSL_DOING_MAKEDEPEND -M -fPIC -DOPENSSL_'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strncmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:198
198		movdqu	(%rdi), %xmm1
(gdb) bt full
#0  __strncmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:198
No locals.
#1  0x00007f8ad5a7afc2 in __add_to_environ (name=0x7ffcb3d4b220 "COLLECT_GCC_OPTIONS", value=value@entry=0x0, 
    combined=combined@entry=0x1c198a0 "COLLECT_GCC_OPTIONS='-Werror' '-D' 'OPENSSL_DOING_MAKEDEPEND' '-M' '-fPIC' '-D' 'OPENSSL_PIC' '-D' 'ZLIB' '-D' 'OPENSSL_THREADS' '-D' '_REENTRANT' '-D' 'DSO_DLFCN' '-D' 'HAVE_DLFCN_H' '-D' 'L_ENDIAN' "..., replace=replace@entry=1) at setenv.c:143
        ep = 0x1c1a430
        size = 0
        namelen = 19
        vallen = 32
#2  0x00007f8ad5a7ae91 in putenv (
    string=0x1c198a0 "COLLECT_GCC_OPTIONS='-Werror' '-D' 'OPENSSL_DOING_MAKEDEPEND' '-M' '-fPIC' '-D' 'OPENSSL_PIC' '-D' 'ZLIB' '-D' 'OPENSSL_THREADS' '-D' '_REENTRANT' '-D' 'DSO_DLFCN' '-D' 'HAVE_DLFCN_H' '-D' 'L_ENDIAN' "...) at putenv.c:77
        name = <optimized out>
        use_malloc = <optimized out>
        result = <optimized out>
        name_end = <optimized out>
#3  0x0000000000407bd1 in env_manager::xput (this=this@entry=0x6f1540 <env>, string=<optimized out>)
    at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:135
        __FUNCTION__ = "xput"
#4  0x000000000040821b in xputenv (string=<optimized out>) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:2686
No locals.
#5  set_collect_gcc_options () at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:4710
        i = <optimized out>
        first_time = <optimized out>
#6  0x000000000041063c in do_spec (spec=<optimized out>) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:4863
        value = <optimized out>
#7  0x0000000000410b45 in do_spec (spec=<optimized out>) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:8194
        value = <optimized out>
        value = <optimized out>
#8  driver::do_spec_on_infiles (this=0x1, this@entry=0x7ffcb3d4b400) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:8133
        value = <optimized out>
        this_file_error = 0
        i = 2396870773383653632
        __FUNCTION__ = "do_spec_on_infiles"
#9  0x00000000004037dc in driver::main (this=this@entry=0x7ffcb3d4b400, argc=<optimized out>, argc@entry=59, argv=<optimized out>, 
    argv@entry=0x7ffcb3d4b528) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc.c:7262
        early_exit = <optimized out>
#10 0x0000000000403aa4 in main (argc=59, argv=0x7ffcb3d4b528) at /var/tmp/portage/sys-devel/gcc-7.3.0-r3/work/gcc-7.3.0/gcc/gcc-main.c:46
        d = {explicit_link_files = 0x1c18ab0 "", decoded_options = 0x1c10e70, decoded_options_count = 58, m_option_suggestions = 0x0}
(gdb) print __environ
$1 = (char **) 0x1c1a430
(gdb) p *__environ
$2 = 0x2d20786d6d6d2d20 <error: Cannot access memory at address 0x2d20786d6d6d2d20>
[/code]
Comment 1 Walther 2018-12-22 07:18:15 UTC
Created attachment 558336 [details]
emerge --info
Comment 2 Sergei Trofimovich (RETIRED) gentoo-dev 2018-12-22 10:22:49 UTC
Perhaps gcc or glibc itself is miscompiled against -march=native.

Can you follow
    https://wiki.gentoo.org/wiki/Gcc-ICE-reporting-guide
to extract minimal source and expand -march=native for me?

I'll try to reproduce locally meanwhile.
Comment 3 Walther 2018-12-22 18:28:16 UTC
The issue I have with reproducibility is that I can't get it to trigger outside of an emerge command (or "ebuild openssl*.ebuild compile"). It always happens on the same step of the build process, so I know it's reproducible somehow.

Since the backtrace from the coredump mentions the functions putenv() / __add_to_environ(), I am pretty certain the environmental variables play into this, but I can't figure out how to duplicate the env setup (the '/var/tmp/portage/dev-libs/openssl-1.0.2p-r1/temp/environment' file cannot be sourced or anything?) to try and reproduce the bug.

I'll attach the expansion of -march=native, as per explained in the guide. As well as the environment file since I believe it's related.
Comment 4 Walther 2018-12-22 18:29:14 UTC
Created attachment 558366 [details, diff]
-march=native expansion
Comment 5 Walther 2018-12-22 18:30:21 UTC
Created attachment 558368 [details]
environment of the failed emerge
Comment 6 Sergei Trofimovich (RETIRED) gentoo-dev 2018-12-22 20:04:41 UTC
I failed to reproduce the failure locally. You'll need to try to figure out it yourself.

The crash happens in setenv():

    https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/setenv.c;h=58b4a2a3100765f3f30422bdff3847e46b260a6e;hb=23158b08a0908f381459f273a984c6fd328363cb#l143

 139   size = 0;
 140   if (ep != NULL)
 141     {
 142       for (; *ep != NULL; ++ep)
 143         if (!strncmp (*ep, name, namelen) && (*ep)[namelen] == '=')
 144           break;
 145         else
 146           ++size;
 147     }

It's not a complicated code. strncmp implementation you have is __strncmp_sse42:

    https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/strcmp-sse42.S;h=6fa0c2c7d257c1a0f825b64987997e15fa5a8dad;hb=23158b08a0908f381459f273a984c6fd328363cb#l198

; int strncmp(const char *s1, const char *s2, size_t n);
; s1: %rdi (%edi)
; s2: %rsi (%esi)
; n:  %rdx (%edx)

 127 STRCMP_SSE42:
 ...
 157 #if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
 158         test    %rdx, %rdx
 159         je      LABEL(strcmp_exitz) ; if (n == 0) return ...;
 160         cmp     $1, %rdx            ; if (n == 1) return ...;
 161         je      LABEL(Byte0)
 162         mov     %rdx, %r11          ; %r11 = n
 163 #endif
 ...
 164         mov     %esi, %ecx
 165         mov     %edi, %eax
 166 /* Use 64bit AND here to avoid long NOP padding.  */
 167         and     $0x3f, %rcx             /* rsi alignment in cache line */
 168         and     $0x3f, %rax             /* rdi alignment in cache line */
 ...
 194         cmp     $0x30, %ecx
 195         ja      LABEL(crosscache)/* rsi: 16-byte load will cross cache line */
 196         cmp     $0x30, %eax
 197         ja      LABEL(crosscache)/* rdi: 16-byte load will cross cache line */
 198         movdqu  (%rdi), %xmm1 ; crash here: load *(u128*)(s1 + off)
 199         movdqu  (%rsi), %xmm2 ; load *(u128*)(s2 + off)

The above hints at __environ corruption and we need to check if it's true.

You can check a few things:
- look at 'print __environ[0]', 'print __environ[1]', ... in gdb to see if all entries are corrupted or just first.
- check at actual addresses of __environ[0] and friends: do they look like heap pointers or stack pointers?
    (gdb) print __environ[0]
    $5 = 0x7fffffffdbe0 "LC_ALL="
    (gdb) print __environ[1]
    $6 = 0x7fffffffdbe8 "LS_COLORS=rs=0:di=0"...

    0x7ff... is normally stack (+/- ASLR offset).
- run gcc under ltrace to check what does it usually put to putenv:
  $ ltrace -e putenv gcc ...
- write a small program that does the same:
  int main() {
      putenv("FOO=1");
      putenv("BAR=2");
      ...
- run gcc and small tool with different environment sizes:
      $ A_VAR=<some-stuff> ./a-tool
  to try to reproduce it
- run gcc under valgrind. maybe it will point out heap corruption earlier

But it could also be bad code generation that corrupted local variables.
Comment 7 Walther 2018-12-25 19:29:01 UTC
Yes, it is actually that same bug, as I managed to compile all affected packages when I removed the sandbox from the PORTAGE_FEATURES, that also explains why I could not replicate it when trying to re-run the same commands from the terminal.

How could I try and help fix the bug, since I don't know how to use the sandbox manually to try & replicate the bug? Perhaps I'll just CC myself on the other bug and see if there's any tests posted to try out.

*** This bug has been marked as a duplicate of bug 673724 ***
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2018-12-25 19:56:28 UTC
I'm not sure it's a sandbox bug but it's a possibility.

You can also try sandbox tests themselves:
    FEATURES=test emerge -v1 sandbox
to see if it triggers any failures.

sandbox is a wrapper tool. You can run original command with sandbox around it:
    $ gcc a.c -o a
    $ sandbox gcc a.c -o a
     * ACCESS DENIED:  symlink:      /gentoo/ccache/8/stats.lock

(you can pass SANDBOX* variables to sandbox from your 'environment' file to pas the same parameters)
Comment 9 Walther 2018-12-28 20:38:59 UTC
Since I had difficulties replicating the crash on openssl, I picked a package with a simpler ebuild, busybox.

Going to the workdir, I can reproduce the crash if I change into bash, then source the environmental file, and do "sandbox make -j8 -s V=1 busybox", without the sandbox part, it compiles normally. 

However, if I use "make --trace" and locate the actual gcc command that caused the crash, I can't get the crash to reproduce by skipping make and invoking "sandbox gcc...", the closes to the source of the crash I've gotten is:

# sandbox make -j1 -s V=1 scripts/kconfig/conf.o
Segmentation fault (core dumped)
make[2]: *** [scripts/Makefile.host:120: scripts/kconfig/conf.o] Error 139

# gdb /usr/bin/x86_64-pc-linux-gnu-gcc -core core
GNU gdb (Gentoo 8.1 p1) 8.1
Core was generated by `x86_64-pc-linux-gnu-gcc -Wp,-MD,scripts/kconfig/.conf.o.d -Wall -Wstrict-protot'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __strncmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:199
199		movdqu	(%rdi), %xmm1
(gdb) bt
#0  __strncmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:199
#1  0x00007fad40190cda in __add_to_environ (name=0x7ffd9d567300 "COLLECT_GCC_OPTIONS", value=value@entry=0x0, 
    combined=combined@entry=0x1a71100 "COLLECT_GCC_OPTIONS='-Wall' '-Wstrict-prototypes' '-O2' '-fomit-frame-pointer' '-c' '-o' 'scripts/kconfig/conf.o' '-mtune=generic' '-march=x86-64'", replace=replace@entry=1) at setenv.c:143
#2  0x00007fad40190bb1 in putenv (
    string=0x1a71100 "COLLECT_GCC_OPTIONS='-Wall' '-Wstrict-prototypes' '-O2' '-fomit-frame-pointer' '-c' '-o' 'scripts/kconfig/conf.o' '-mtune=generic' '-march=x86-64'") at putenv.c:77
#3  0x0000000000408773 in env_manager::xput(char const*) () at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:135
#4  0x0000000000408de2 in xputenv (string=<optimized out>) at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:2681
#5  set_collect_gcc_options () at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:4730
#6  0x0000000000411881 in do_spec (spec=<optimized out>) at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:4883
#7  0x0000000000411976 in driver::do_spec_on_infiles() const () at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:8193
#8  0x00000000004041fc in driver::main (this=this@entry=0x7ffd9d5674c0, argc=<optimized out>, argc@entry=10, argv=<optimized out>, 
    argv@entry=0x7ffd9d5675e8) at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc.c:7322
#9  0x00000000004044c4 in main (argc=10, argv=0x7ffd9d5675e8) at /var/tmp/portage/sys-devel/gcc-8.2.0-r5/work/gcc-8.2.0/gcc/gcc-main.c:46
(gdb) up
#1  0x00007fad40190cda in __add_to_environ (name=0x7ffd9d567300 "COLLECT_GCC_OPTIONS", value=value@entry=0x0, 
    combined=combined@entry=0x1a71100 "COLLECT_GCC_OPTIONS='-Wall' '-Wstrict-prototypes' '-O2' '-fomit-frame-pointer' '-c' '-o' 'scripts/kconfig/conf.o' '-mtune=generic' '-march=x86-64'", replace=replace@entry=1) at setenv.c:143
143		if (!strncmp (*ep, name, namelen) && (*ep)[namelen] == '=')
(gdb) p __environ
$1 = (char **) 0x1a72010
(gdb) p __environ[0]
$2 = 0x1a6f780 "SANDBOX_MESSAGE_P@TH=/var/log/sandbox/sandbox-debug-6864.log"
(gdb) p __environ[1]
$3 = 0x1 <error: Cannot access memory at address 0x1>
(gdb) p __environ[2]
$4 = 0x0
(gdb) p __environ[3]
$5 = 0x21 <error: Cannot access memory at address 0x21>

I can't use ltrace because.... I get operation not permitted errors, probably because of the sandbox itself?

However, the valgrind output, despite being heavily messy, printed a few /bin/dash related issues, and it turns out that if I change my shell from dash to bash, the crash disappears.

So perhaps the problem is caused by how dash sets up the environment prior to executing gcc? 

I'll attach the valgrind output, though it's full of "trace_child_signal: child (28782) signal SIGCHLD(17), code CLD_???(128), status SIG???(0)" and the file just keeps growing, I don't know if it ever ends. I think the relevant section is:

==28783== Conditional jump or move depends on uninitialised value(s)
==28783==    at 0x45923A: fork (fork.c:121)
==28783==    by 0x40723D: forkshell (jobs.c:936)
==28783==    by 0x402DAE: evalpipe (eval.c:575)
==28783==    by 0x402834: evaltree (eval.c:287)
==28783==    by 0x40300D: evalstring (eval.c:176)
==28783==    by 0x400CB8: main (main.c:170)

Note: updating dash from current stable (0.5.9.1-r3) to latest version 0.5.10.2) doesn't change the output, changing to bash does.
Comment 10 Walther 2018-12-28 20:43:24 UTC
Created attachment 558724 [details]
Valgrind output, reduced to fit the 1000KB limit.
Comment 11 Sergei Trofimovich (RETIRED) gentoo-dev 2018-12-28 22:01:56 UTC
Thank you!

To trace all children you need to pass --trace-children=yes to valgrind to have visibility into gcc and dash.

I still hope it's a generic bug that depend on the environment size and not the host processes creating it. In my understanding corrupting environment for a child process is not trivial. It must be a problem in gcc itself (or force-injected libsandbox.so).

I wonder if you would be able to reproduce the crash by changing host environment a bit, like 'A=1 gcc-command', 'A=11 gcc-command', 'A=111 gcc-command' and so on.
Comment 12 Walther 2018-12-29 00:25:39 UTC
I am using "--trace-children=yes", but it seems Make spawns SO MANY PROCESSES that the log just becomes infinite. Or just very slow, I decided to wait and after 15 or so minutes (and a 5.6MB log) it finished, but the output file doesn't seems to have any segmentation faults within?

sandbox make --trace -j1 -s V=1 scripts/kconfig/conf.o

Causes the crash, so I valgrinded it as:

valgrind --trace-children=yes sandbox make -j1 -s V=1 scripts/kconfig/conf.o > valgrind.out 2>&1

If compressed, the whole log does fit into the limits here, but I don't see any Segmentation issues within the log file, or I could be missing something else. Anyway, the relevant parts are found with "Command: /usr/bin/x86_64-pc-linux-gnu-gcc" at lines 61486 (scripts/basic/fixdep.c), 65031 (scripts/basic/split-include.cscripts/basic/split-include.c) and 68709 (scripts/basic/docproc.c).

I'll keep trying to see if I can figure out how to reproduce the crash without invoking Make, as that just adds too many commands into the execution process before the crash. :/
Comment 13 Walther 2018-12-29 00:26:22 UTC
Created attachment 558742 [details]
Valgrind output
Comment 14 Sergei Trofimovich (RETIRED) gentoo-dev 2018-12-29 12:04:16 UTC
Created attachment 558796 [details, diff]
sandbox-2.14-sigchld-init.patch

Interesting. A few things are seen in log.

1. sandbox does not initialize child sigaction handler. That's one source of non-determinism. Try to drop sandbox-2.14-sigchld-init.patch to /etc/portage/patches/sys-apps/sandbox/ to see if it decreases amount of valgrind spam.

2. Numerous out-of-bounds accessed in dash:

==10372== Use of uninitialised value of size 8
==10372==    at 0x405B49: argstr (expand.c:278)
==10372==    by 0x405E82: expandarg (expand.c:196)
==10372==    by 0x4032DA: evalcommand (eval.c:741)
==10372==    by 0x402834: evaltree (eval.c:287)
==10372==    by 0x402834: evaltree (eval.c:287)
==10372==    by 0x403001: evalstring (eval.c:176)
==10372==    by 0x400CB8: main (main.c:170)

  Those might be real or might be fake (as sse instructions are by design working on larger blocks).

Can you rebuild sandbox and rerun valgrind? For better error reporting and failure detection I suggest a few more options to valgrind:
    --track-origins=yes --num-callers=50 --malloc-fill=0xE1 --free-fill=0xF1
Comment 15 Walther 2018-12-30 15:26:42 UTC
Created attachment 559000 [details]
valgrind output after patching sandbox

Here's the output of 
valgrind --trace-children=yes --track-origins=yes --num-callers=50 --malloc-fill=0xE1 --free-fill=0xF1 sandbox make -j1 -s V=1 scripts/kconfig/conf.o > valgrind.out 2>&1

Note:
1. The patch of this thread was bugged, it was invoking memset after setting the fields, which broke sandbox :P It took me a while to notice that.
2. I updated dash to latest portage version (0.5.10.2) and compiled it with -O0 to make sure the related bugs do exist and aren't optimization related.
3. It still crashes, but there's no crash if I run it through valgrind (if I re-run the command afterwards without Valgrind, the second execution shows there's nothing to compile).
Comment 16 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-02 11:01:26 UTC
(In reply to Walther from comment #15)
> Created attachment 559000 [details]
> valgrind output after patching sandbox
> 
> Here's the output of 
> valgrind --trace-children=yes --track-origins=yes --num-callers=50
> --malloc-fill=0xE1 --free-fill=0xF1 sandbox make -j1 -s V=1
> scripts/kconfig/conf.o > valgrind.out 2>&1
> 
> Note:
> 1. The patch of this thread was bugged, it was invoking memset after setting
> the fields, which broke sandbox :P It took me a while to notice that.

Oh, nice catch!

> 2. I updated dash to latest portage version (0.5.10.2) and compiled it with
> -O0 to make sure the related bugs do exist and aren't optimization related.
> 3. It still crashes, but there's no crash if I run it through valgrind (if I
> re-run the command afterwards without Valgrind, the second execution shows
> there's nothing to compile).

That perhaps means that valgrind's ELF loader is different enough from real system to avoid the crash. That is unfortunate. valgrind won't help us here.

https://bugs.gentoo.org/669702 suggests USE=static dash managed to kill gcc. That is very amusing failure mode. Do you use the same?

With USE=static dash I finally managed to reproduce the failure \o/.
Comment 17 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-02 11:47:51 UTC
Given that original bug was closed without investigation let's poke at it a bit more on toolchain@ side.

Maybe we'll find something actionable.
Comment 18 Walther 2019-01-02 18:09:23 UTC
After poking around some other package (grep-3.1), which fails during the configuration phase...

checking for suffix of object files... configure: error: in `/var/tmp/portage/sys-apps/grep-3.1-r1/work/grep-3.1':
configure: error: cannot compute suffix of object files: cannot compile
See `config.log' for more details

...allowed me to setup a very simple replication method:
> sandbox /bin/sh
> x86_64-pc-linux-gnu-gcc test.c
Segmentation fault

test.c (taken from the output of grep's config.log):

/* confdefs.h */
#define PACKAGE_NAME "GNU grep"
#define PACKAGE_TARNAME "grep"
#define PACKAGE_VERSION "3.1"
#define PACKAGE_STRING "GNU grep 3.1"
#define PACKAGE_BUGREPORT "bug-grep@gnu.org"
#define PACKAGE_URL "http://www.gnu.org/software/grep/"
#define GREP 1
#define PACKAGE "grep"
#define VERSION "3.1"
/* end confdefs.h.  */

int
main (void)
{

  ;
  return 0;
}

Yes, dash is statically linked. It appears the bug involves having a sandboxed statically linked executable, this somehow messes up the interaction with the library, causing a corrupted environment, which in turn triggers a segmentation fault when the underlying process (gcc) tries to access said environment.

I am having difficulties setting the test to be a single command (to attempt a valgrind run of it), since:

LANG=C sandbox /bin/sh -c x86_64-pc-linux-gnu-gcc test.c
x86_64-pc-linux-gnu-gcc: fatal error: no input files
compilation terminated.

Despite the manual for dash telling me this is how I should use "-c".

Anyway, this may be the end of the road for me, as it appears the bug is something deep within the interaction of sandbox and library linking, which is beyond my area of expertise. Also, as per the resolution of bug 669702, it appears there is no interest in having sandbox support statically linked programs anyway, so the suggested resolution for this bug would be to recompile dash with USE="-static" and to not worry about it anymore. :S
Comment 19 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-02 18:34:54 UTC
(In reply to Walther from comment #18)
> After poking around some other package (grep-3.1), which fails during the
> configuration phase...
> 
> checking for suffix of object files... configure: error: in
> `/var/tmp/portage/sys-apps/grep-3.1-r1/work/grep-3.1':
> configure: error: cannot compute suffix of object files: cannot compile
> See `config.log' for more details
> 
> ...allowed me to setup a very simple replication method:
> > sandbox /bin/sh
> > x86_64-pc-linux-gnu-gcc test.c
> Segmentation fault
> 
> test.c (taken from the output of grep's config.log):
> 
> /* confdefs.h */
> #define PACKAGE_NAME "GNU grep"
> #define PACKAGE_TARNAME "grep"
> #define PACKAGE_VERSION "3.1"
> #define PACKAGE_STRING "GNU grep 3.1"
> #define PACKAGE_BUGREPORT "bug-grep@gnu.org"
> #define PACKAGE_URL "http://www.gnu.org/software/grep/"
> #define GREP 1
> #define PACKAGE "grep"
> #define VERSION "3.1"
> /* end confdefs.h.  */
> 
> int
> main (void)
> {
> 
>   ;
>   return 0;
> }
> 
> Yes, dash is statically linked. It appears the bug involves having a
> sandboxed statically linked executable, this somehow messes up the
> interaction with the library, causing a corrupted environment, which in turn
> triggers a segmentation fault when the underlying process (gcc) tries to
> access said environment.
> 
> I am having difficulties setting the test to be a single command (to attempt
> a valgrind run of it), since:
> 
> LANG=C sandbox /bin/sh -c x86_64-pc-linux-gnu-gcc test.c
> x86_64-pc-linux-gnu-gcc: fatal error: no input files
> compilation terminated.
> 
> Despite the manual for dash telling me this is how I should use "-c".
> 
> Anyway, this may be the end of the road for me, as it appears the bug is
> something deep within the interaction of sandbox and library linking, which
> is beyond my area of expertise. Also, as per the resolution of bug 669702,
> it appears there is no interest in having sandbox support statically linked
> programs anyway, so the suggested resolution for this bug would be to
> recompile dash with USE="-static" and to not worry about it anymore. :S

1. Whatever static binary is doing it should never manage to corrupt environment of child processes. Environment is a series of strings and not much more. That interface is hard to break to a point where getenv crashes your binary. On environment corruption either exec*() syscall will fail or binary will start with a "meaningful" environment.

Thus the crash is a real bug in one of components.

2. Whatever static binary support of sandbox it's effect should never cause crashes. Lack of tracing is fine.

3. Carefully tweaking an environment I can get similar gcc crash against dynamically linked bash locally. Which eliminates two main suspects: static linking and ptrace()-based tracing.
Comment 20 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-02 18:39:48 UTC
> 3. Carefully tweaking an environment I can get similar gcc crash against
> dynamically linked bash locally. Which eliminates two main suspects: static
> linking and ptrace()-based tracing.

On, and passing gdb instead of gcc does not make the failure go away. This eliminates gcc as a buggy component.
Comment 21 Walther 2019-01-03 00:01:44 UTC
I am just glad that more experienced people can reproduce the bug, and that I have a workaround that I can use in the meantime to keep my system updated.

I'll keep an eye on this bug in case there's anything else I can do to help figure it out.
Comment 22 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-03 14:23:00 UTC
I think I finally found it: subtle 'environ' corruption happens in sandbox's execv() wrapper: https://bugs.gentoo.org/669702#c8

*** This bug has been marked as a duplicate of bug 669702 ***
Comment 23 Sergei Trofimovich (RETIRED) gentoo-dev 2019-01-06 20:54:41 UTC
I wonder if you could try attached sandbox patch to bug #669702 and check if helps you.
Comment 24 Walther 2019-01-07 00:50:12 UTC
Using USE="static" emerge dash, I go back to crashing when emerging busybox. Then recompiling sandbox with the mentioned patch and reemerging busybox leads to a successful compile.

Yes, it does seem the patch works. I'll try recompiling a few other packages which used to cause segmentation faults (if there's no further comment from me then assume the patch works well as far as I can test it).