Summary: | www-client/chromium-65.0.3325.146 with sys-devel/clang-6.0.0 - FAILED: mksnapshot // corrupted size vs. prev_size | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Michael Mair-Keimberger (iamnr3) <mmk> |
Component: | Current packages | Assignee: | Chromium Project <chromium> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alexander, chris.murtagh1, creideiki+gentoo-bugzilla, dharding, dschridde+gentoobugs, EoD, faa1976, finkandreas, gentoo, gentoo, gentoo, gmturner007, jarausch, jasmin+gentoo, llvm, marcan, me, mgorny, mlen, orodruinlair, samuel, sarnex, silencly07, wbrana, weedy2887 |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
build.tar.gz
chromium-65.0.3325.146\:20180311-190046.log.gz minimally-reproducing response file Patch for sys-devel/clang-6.0.0 |
Description
Michael Mair-Keimberger (iamnr3)
2018-03-10 09:22:32 UTC
Comment on attachment 523264 [details]
build.tar.gz
Why would you put a single file in a tar archive and then compress it when you could just compress the file itself?
www-client/chromium-66.0.3355.0 Same problem What version(s) of sys-devel/clang are installed? sys-devel/clang-6.0.0 (In reply to Mike Gilbert from comment #3) > What version(s) of sys-devel/clang are installed? sys-devel/clang-6.0.0 Same here. Also using sys-devel/clang-6.0.0 I can reproduce this problem with clang-6.0.0. Building works with clang-5.0.1. Copying llvm. > corrupted size vs. prev_size
This seems to be glibc(?) indicating a memory problem. Does this come from clang or the compiled executable?
(In reply to Michał Górny from comment #8) > > corrupted size vs. prev_size > > This seems to be glibc(?) indicating a memory problem. Does this come from > clang or the compiled executable? Based on the log, I would say from clang, or possibly the python wrapper that calls it (gcc_link_wrapper.py). Could you maybe check 'dmesg' in case glibc were more specific? I'm not seeing anything interesting in dmesg. Created attachment 523532 [details]
chromium-65.0.3325.146\:20180311-190046.log.gz
I have a stack strace in the build log and the following in dmesg:
[ 7094.589347] x86_64-pc-linux[27602]: segfault at 0 ip 00007eff5ab83ec9 sp 00007ffcd2a5d7a0 error 4 in libclangDriver.so.6.0.0[7eff5ab28000+18e000]
I just did -uDN @world which updated clang(-runtime) to 6.0.0 and have the same error with chromium. Nothing in dmesg. If you have some spare resources, you may want to try if it still happens with clang-9999. If you do that, you may want to use EGIT_COMMIT_DATE to make sure LLVM/clang don't mis-sync in the long time it takes to build LLVM ;-). You may also try the usual tools -- valgrind, gdb, etc. I've the same error: [1013/1013] python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt FAILED: mksnapshot python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt ninja: build stopped: subcommand failed. * ERROR: www-client/chromium-65.0.3325.146::gentoo failed (compile phase): * ninja -v -j14 -l13 -C out/Release mksnapshot failed * * Call stack: * ebuild.sh, line 124: Called src_compile * environment, line 5008: Called eninja '-C' 'out/Release' 'mksnapshot' * environment, line 1695: Called die * The specific snippet of code: * "$@" || die "${nonfatal_args[@]}" "${*} failed" I found the following in dmesg that I don't know if it's related: traps: x86_64-pc-linux[14834] general protection ip:7f1d25569bf1 sp:55eea0cd7c70 error:0 in libc-2.26.so[7f1d254e3000+1c1000] I'm using ccache, but it was never a problem in previous chromium compilations. Similar problem here-- mksnapshot link fails. Different in that LLVM segfaults. Chromium-66 fails identically, chromium-64 is fine. Tried both clang/LLVM 5 and 6, same errors. Tried bfd instead of gold, same errors. 32GB ram, no shortage there. 615/616] touch obj/v8/v8_init.stamp [616/616] python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata FAILED: mksnapshot python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata LLVMSymbolizer: error reading file: No such file or directory #0 0x00007f29c49f55ca llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib/llvm/6/bin/../lib64/libLLVMSupport.so.6+0xa55ca) #1 0x00007f29c49ecae6 llvm::sys::RunSignalHandlers() (/usr/lib/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x9cae6) #2 0x00007f29c49ed131 (/usr/lib/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x9d131) #3 0x00007f29c33ada20 (/lib64/libc.so.6+0x35a20) #4 0x00007f29c3f218da clang::driver::Driver::setDriverModeFromOption(llvm::StringRef) (/usr/lib/llvm/6/bin/../lib64/libclangDriver.so.6+0x598da) #5 0x00007f29c3f21d64 clang::driver::Driver::ParseDriverMode(llvm::StringRef, llvm::ArrayRef<char const*>) (/usr/lib/llvm/6/bin/../lib64/libclangDriver.so.6+0x59d64) #6 0x00007f29c3f3aaa6 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (/usr/lib/llvm/6/bin/../lib64/libclangDriver.so.6+0x72aa6) #7 0x0000563e93c7e15d (x86_64-pc-linux-gnu-clang+++0xc15d) #8 0x00007f29c3398eec __libc_start_main (/lib64/libc.so.6+0x20eec) #9 0x0000563e93c8089a (x86_64-pc-linux-gnu-clang+++0xe89a) Stack dump: 0. Program arguments: x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o ./mksnapshot -Wl,--start-group @./mksnapshot.rsp -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata 1. Compilation construction ninja: build stopped: subcommand failed. dmesg says: [52875.128708] x86_64-pc-linux[28292]: segfault at 0 ip 00007face6f318da sp 00007ffe597ae3b0 error 4 in libclangDriver.so.6.0.0[7face6ed8000+17e000] Saw the comment about trying clang-9999 so maybe I'll try that later. sh bash 4.4_p19 ld GNU gold (Gentoo 2.30 p1 2.30.0) 1.15 app-shells/bash: 4.4_p19::gentoo dev-java/java-config: 2.2.0-r3::gentoo dev-lang/perl: 5.24.3::gentoo dev-lang/python: 2.7.14-r1::gentoo, 3.4.6-r1::gentoo, 3.5.4-r1::gentoo, 3.6.3-r1::gentoo dev-util/cmake: 3.9.6::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.4.1-r2::gentoo sys-apps/openrc: 0.34.11::gentoo sys-apps/sandbox: 2.12::gentoo sys-devel/autoconf: 2.13::gentoo, 2.69-r4::gentoo sys-devel/automake: 1.11.6-r3::gentoo, 1.15.1-r2::gentoo sys-devel/binutils: 2.29.1-r1::gentoo, 2.30::gentoo sys-devel/gcc: 6.4.0-r1::gentoo, 7.2.0-r1::gentoo, 7.3.0::gentoo sys-devel/gcc-config: 1.9.1::gentoo sys-devel/libtool: 2.4.6-r4::gentoo sys-devel/make: 4.2.1::gentoo sys-kernel/linux-headers: 4.15::gentoo (virtual/os-headers) sys-libs/glibc: 2.26-r6::gentoo Just built the whole clang/llvm-7 (**9999) toolchain. Now it ends with libc segfaulting... [61366.676207] x86_64-pc-linux[18592]: segfault at 241 ip 00007fbfebd4deb6 sp 000055afd7271000 error 6 in libc-2.26.so[7fbfebcc8000+1b7000] [608/610] touch obj/v8/v8_nosnapshot.stamp [609/610] touch obj/v8/v8_init.stamp [610/610] python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O2 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata FAILED: mksnapshot python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O2 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata ninja: build stopped: subcommand failed. * ERROR: www-client/chromium-66.0.3359.22::gentoo failed (compile phase): * ninja -v -j6 -l0 -C out/Release mksnapshot failed I have both of the above outcomes in two different build chroots: "corrupted size vs. prev_size" vs the segfault with stacktrace. However, running the given command manually works: portage@basestar ~/www-client/chromium-65.0.3325.146/work/chromium-65.0.3325.146 $ ninja -v -C out/Release mksnapshot [...] [3/3] python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata FAILED: mksnapshot python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata LLVMSymbolizer: error reading file: No such file or directory #0 0x00007f2367213599 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib64/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x121599) [...] vs. portage@basestar ~/www-client/chromium-65.0.3325.146/work/chromium-65.0.3325.146/out/Release $ python "../../build/toolchain/gcc_link_wrapper.py" --output="./mksnapshot" -- x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o "./mksnapshot" -Wl,--start-group @"./mksnapshot.rsp" -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata So there seems to be something different about the environment that ninja runs the command in. (In reply to Hector Martin from comment #18) > So there seems to be something different about the environment that ninja > runs the command in. Very interesting. I can confirm this behavior difference. Created attachment 523914 [details]
minimally-reproducing response file
This is the smallest response file with which I could reproduce the crash on x86_64-gentoo-linux-musl
I am also running into this on hardened/linux/musl/amd64 after upgrading to clang-6.0.0 and trying to build chromium-65. I've attached a response file based on the one used for the failing link command. I can trigger a segfault with for f in $(cat tst.rsp); do mkdir -p $(dirname $f); gcc -c -o $f -xc /dev/null; done x86_64-gentoo-linux-musl-clang++ @./tst.rsp Removing any one file from the list avoids the crash (allowing the link to fail as expected, since some source files are missing). Running clang under gdb shows the following backtrace: #0 0x00007ffff7db7346 in strlen () from /lib/ld-musl-x86_64.so.1 #1 0x00007fffee65de40 in clang::driver::Driver::ParseDriverMode(llvm::StringRef, llvm::ArrayRef<char const*>) () from /usr/lib/llvm/6/bin/../lib/libclangDriver.so.6 #2 0x00007fffee678025 in clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) () from /usr/lib/llvm/6/bin/../lib/libclangDriver.so.6 #3 0x0000555555561c14 in main () Disassembly: 0x00007fffee65de25 <+405>: je 0x7fffee65de57 <_ZN5clang6driver6Driver15ParseDriverModeEN4llvm9StringRefENS2_8ArrayRefIPKcEE+455> 0x00007fffee65de27 <+407>: nopw 0x0(%rax,%rax,1) 0x00007fffee65de30 <+416>: mov (%rbx),%rbp 0x00007fffee65de33 <+419>: test %rbp,%rbp <<<< "if (ArgPtr == nullptr) continue;" 0x00007fffee65de36 <+422>: je 0x7fffee65de4e <_ZN5clang6driver6Driver15ParseDriverModeEN4llvm9StringRefENS2_8ArrayRefIPKcEE+446> 0x00007fffee65de38 <+424>: mov %rbp,%rdi 0x00007fffee65de3b <+427>: callq 0x7fffee64f2e0 <strlen@plt> <<<< FAILING CALL HERE 0x00007fffee65de40 <+432>: mov %rbp,%rsi 0x00007fffee65de43 <+435>: mov %r12,%rdi 0x00007fffee65de46 <+438>: mov %rax,%rdx 0x00007fffee65de49 <+441>: callq 0x7fffee64f360 <_ZN5clang6driver6Driver23setDriverModeFromOptionEN4llvm9StringRefE@plt> 0x00007fffee65de4e <+446>: add $0x8,%rbx 0x00007fffee65de52 <+450>: cmp %rbx,%r13 0x00007fffee65de55 <+453>: jne 0x7fffee65de30 <_ZN5clang6driver6Driver15ParseDriverModeEN4llvm9StringRefENS2_8ArrayRefIPKcEE+416> Which is at https://clang.llvm.org/doxygen/Driver_8cpp_source.html#l00130 The strlen() call is from the StringRef constructor on line 139: const StringRef Arg = ArgPtr; So somehow a not-quite-null pointer is getting into the Args array, or it's reading out of bounds. There are two workarounds: 1) Replace the response file with command-line arguments. @"./mksnapshot.rsp" → $(cat ./mksnapshot.rsp) This doesn't work within the build system, and only works if there aren't too many arguments, but it allows the command to proceed. 2) Remove the triple from the beginning of the clang command. For some reason, calling "clang++" instead of "x86_64-pc-linux-gnu-clang++" or "x86_64-gentoo-linux-musl-clang++" works fine. So modify the chromium ebuild: # Force clang since gcc is pretty broken at the moment. - CC=${CHOST}-clang - CXX=${CHOST}-clang++ + CC=clang + CXX=clang++ strip-unsupported-flags I also changed the lines below to always select the "tc-is-clang" case: - if tc-is-clang; then + if true; then myconf_gn+=" is_clang=true clang_use_chrome_plugins=false" else I'm not sure this second change is necessary. With these two changes, I was able to get chromium-65.0.3325.146 to build with clang-6.0.0. *** Bug 650556 has been marked as a duplicate of this bug. *** I also encountered this an dwas also able to resolve by removing ${CHOST}- from the CC and CXX asssignment A quick brain-dump on this before I give up and go to bed :( Freaky how this manifests pseudo-randomly but very repeatably on each host. Makes me think something in toolchain is running amok in the heap, i.e., use-after-free somewhere in the llvm/clang/glibc/wrapper...? Some things the problem does /not/ seem to be sensitive to: * emerge -1 $(qlist -ISC llvm clang) * USE=anything Thanks to the magic of coredumpctl, I get this partial stack trace: Stack trace of thread 4418: #0 0x00007f54b2920c31 __strlen_avx2 (libc.so.6) #1 0x00007f54b3767cd8 _ZN5clang6driver6Driver15ParseDriverModeEN4llvm9StringRefENS2_8ArrayRefIPKcEE (libclangDriver.so.6) #2 0x00007f54b3780f9a _ZN5clang6driver6Driver16BuildCompilationEN4llvm8ArrayRefIPKcEE (libclangDriver.so.6) #3 0x000056237b876bde main (clang-6.0) #4 0x00007f54b27d6faa __libc_start_main (libc.so.6) #5 0x000056237b87935a _start (clang-6.0) GNU gdb (Gentoo 8.1 p1) 8.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://bugs.gentoo.org/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib64/llvm/6/bin/clang-6.0...(no debugging symbols found)...done. warning: core file may not match specified executable file. [New LWP 4418] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `x86_64-pc-linux-gnu-clang++ -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -W'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62 62 VPCMPEQ (%rdi), %ymm0, %ymm1 (gdb) list __strlen_avx2 47 # ifdef USE_AS_WCSLEN 48 shl $2, %rsi 49 # endif 50 movq %rsi, %r8 51 # endif 52 movl %edi, %ecx 53 movq %rdi, %rdx 54 vpxor %xmm0, %xmm0, %xmm0 55 56 /* Check if we may cross page boundary with one vector load. */ (gdb) 57 andl $(2 * VEC_SIZE - 1), %ecx 58 cmpl $VEC_SIZE, %ecx 59 ja L(cros_page_boundary) 60 61 /* Check the first VEC_SIZE bytes. */ 62 VPCMPEQ (%rdi), %ymm0, %ymm1 63 vpmovmskb %ymm1, %eax 64 testl %eax, %eax 65 66 # ifdef USE_AS_STRNLEN (gdb) Looks like I don't build llvm with symbols (maybe I need to change that). I wonder if one removed the avx strlen optimization from glibc (https://github.com/kraj/glibc/commit/dc485ceb2ac596d27294cc1942adf3181f15e8bf.patch?diff=unified), what would happen? Maybe it's recent libc actually to blame rather than llvm? I've seen reports elsewhere that the avx2-optimized strlen breaks valgrind and other stack-unwinders, contributing to the culprit-ey smell there. Presumably matt blanc's trace above is equivalent to mine except that (a) he has not libc symbols and I do, while he has llvm symbols and I don't and (b) for whatever reason my trace is captured before some exception handlers run and his after (I've seen traces that look more like his also locally). Has anyone ruled out sandbox? I haven't. I have. Running the ninja command outside sandbox still crashes. Running the python gcc_link_wrapper.py command directly inside sandbox does not. FWIW, running the clang++ command under valgrind segfaults, while outside valgrind it does not, but inside ninja it does. So it looks like a heisenbug that sometimes happens not to segfault depending on the exact environment, but valgrind probably reliably catches it. This is what valgrind says: ==20764== Memcheck, a memory error detector ==20764== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==20764== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==20764== Command: x86_64-pc-linux-gnu-clang++ -pie -fPIC -Wl,-z,noexecstack -Wl,-z,now -Wl,-z,relro -Wl,-z,defs -Wl,--no-as-needed -lpthread -Wl,--as-needed -m64 -Wl,-rpath-link=. -Wl,--disable-new-dtags -Wl,-O1 -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -o ./mksnapshot -Wl,--start-group @./mksnapshot.rsp -Wl,--end-group -ldl -lpthread -lrt -licui18n -licuuc -licudata ==20764== ==20764== Invalid read of size 8 ==20764== at 0x4C3507C: memmove (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0x1184BB: ??? (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== by 0x115658: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== Address 0x17f78b18 is 8 bytes before a block of size 9,080 alloc'd ==20764== at 0x4C303AE: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0xCDE9247: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /usr/lib64/llvm/6/lib64/libLLVMSupport.so.6.0.0) ==20764== by 0x116110: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== ==20764== Invalid write of size 8 ==20764== at 0x4C35086: memmove (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0x1184BB: ??? (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== by 0x115658: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== Address 0x17f78b18 is 8 bytes before a block of size 9,080 alloc'd ==20764== at 0x4C303AE: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0xCDE9247: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /usr/lib64/llvm/6/lib64/libLLVMSupport.so.6.0.0) ==20764== by 0x116110: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== ==20764== Invalid write of size 8 ==20764== at 0x4C34FB3: memmove (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0x118466: ??? (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== by 0x115658: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== Address 0x17eb1478 is 8 bytes inside a block of size 4,536 free'd ==20764== at 0x4C303AE: realloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0xCDE9247: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /usr/lib64/llvm/6/lib64/libLLVMSupport.so.6.0.0) ==20764== by 0x116110: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== Block was alloc'd at ==20764== at 0x4C2DF3E: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==20764== by 0xCDE928C: llvm::SmallVectorBase::grow_pod(void*, unsigned long, unsigned long) (in /usr/lib64/llvm/6/lib64/libLLVMSupport.so.6.0.0) ==20764== by 0xCDB73C0: llvm::cl::ExpandResponseFiles(llvm::StringSaver&, void (*)(llvm::StringRef, llvm::StringSaver&, llvm::SmallVectorImpl<char const*>&, bool), llvm::SmallVectorImpl<char const*>&, bool, bool) (in /usr/lib64/llvm/6/lib64/libLLVMSupport.so.6.0.0) ==20764== by 0x1143CF: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== ==20764== Invalid read of size 8 ==20764== at 0xE2D4E86: clang::driver::Driver::setDriverModeFromOption(llvm::StringRef) (in /usr/lib64/llvm/6/lib64/libclangDriver.so.6.0.0) ==20764== by 0xE2D5324: clang::driver::Driver::ParseDriverMode(llvm::StringRef, llvm::ArrayRef<char const*>) (in /usr/lib64/llvm/6/lib64/libclangDriver.so.6.0.0) ==20764== by 0xE2EF431: clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (in /usr/lib64/llvm/6/lib64/libclangDriver.so.6.0.0) ==20764== by 0x11502D: main (in /usr/lib64/llvm/6/bin/clang-6.0) ==20764== Address 0x3e40 is not stack'd, malloc'd or (recently) free'd ==20764== LLVMSymbolizer: error reading file: No such file or directory #0 0x000000000ce4e599 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib64/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x121599) #1 0x000000000ce4c306 llvm::sys::RunSignalHandlers() (/usr/lib64/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x11f306) #2 0x000000000ce4c627 (/usr/lib64/llvm/6/bin/../lib64/libLLVMSupport.so.6+0x11f627) #3 0x000000000f243220 (/lib64/libc.so.6+0x37220) #4 0x000000000e2d4e86 clang::driver::Driver::setDriverModeFromOption(llvm::StringRef) (/usr/lib64/llvm/6/bin/../lib64/libclangDriver.so.6+0x5be86) #5 0x000000000e2d5325 clang::driver::Driver::ParseDriverMode(llvm::StringRef, llvm::ArrayRef<char const*>) (/usr/lib64/llvm/6/bin/../lib64/libclangDriver.so.6+0x5c325) #6 0x000000000e2ef432 clang::driver::Driver::BuildCompilation(llvm::ArrayRef<char const*>) (/usr/lib64/llvm/6/bin/../lib64/libclangDriver.so.6+0x76432) #7 0x000000000011502e (x86_64-pc-linux-gnu-clang+++0xd02e) #8 0x000000000f22cf56 __libc_start_main /var/tmp/portage/sys-libs/glibc-2.26-r6/work/glibc-2.26/csu/../csu/libc-start.c:342:0 #9 0x000000000011783a (x86_64-pc-linux-gnu-clang+++0xf83a) Since this is a heisenbug, I think all of the previous hacks to "make it work" are red herrings (like the $(cat ./mksnapshot.rsp) thing). It seems changing anything will flip a coin again as to whether it'll crash or not in that exact configuration, without actually affecting the root cause. Valgrind should reliably complain in all cases. The invalid reads/writes *before* the actual crash one are a few bytes before an allocated block. Those would not segfault themselves, but would corrupt the heap. That probably explains the "corrupted size vs. prev_size" and could explain the other potential crash. Building clang with USE=debug actually asserts before doing any invalid loads/stores. x86_64-pc-linux-gnu-clang++: /usr/lib64/llvm/6/include/llvm/ADT/SmallVector.h:605: llvm::SmallVectorImpl<T>::iterator llvm::SmallVectorImpl<T>::insert(llvm::SmallVectorImpl<T>::iterator, ItTy, ItTy) [with ItTy = const char**; <template-parameter-2-2> = void; T = const char*; llvm::SmallVectorImpl<T>::iterator = const char**]: Assertion `I >= this->begin() && "Insertion iterator is out of bounds."' failed. [...] ==58227== Process terminating with default action of signal 6 (SIGABRT) ==58227== at 0x103A314C: raise (raise.c:51) ==58227== by 0x103A4E6E: abort (abort.c:119) ==58227== by 0x1039A298: __assert_fail_base (assert.c:92) ==58227== by 0x1039A320: __assert_fail (assert.c:101) ==58227== by 0x118BA3: char const** llvm::SmallVectorImpl<char const*>::insert<char const**, void>(char const**, char const**, char const**) (in /usr/lib64/llvm/6/bin/clang-6.0) ==58227== by 0x1152B9: main (in /usr/lib64/llvm/6/bin/clang-6.0) Interestingly, though, it still only does this under valgrind. Looks like the problem is really early in clang's main function or something inlined into it. Found the bug. From cfe-6.0.0.src/tools/driver/driver.cpp: static void insertTargetAndModeArgs(const ParsedClangName &NameParts, SmallVectorImpl<const char *> &ArgVector, std::set<std::string> &SavedStrings) { // Put target and mode arguments at the start of argument list so that // arguments specified in command line could override them. Avoid putting // them at index 0, as an option like '-cc1' must remain the first. auto InsertionPoint = ArgVector.begin(); if (InsertionPoint != ArgVector.end()) ++InsertionPoint; if (NameParts.DriverMode) { // Add the mode flag to the arguments. ArgVector.insert(InsertionPoint, GetStableCStr(SavedStrings, NameParts.DriverMode)); } if (NameParts.TargetIsValid) { const char *arr[] = {"-target", GetStableCStr(SavedStrings, NameParts.TargetPrefix)}; ArgVector.insert(InsertionPoint, std::begin(arr), std::end(arr)); } } The first .insert() potentially invalidates the InsertionPoint iterator, so the second .insert() is illegal. This explains why it's a heisenbug. It only happens *if* a target is specified (which is why clang++ without the x86_64-pc-linux-gnu- works), *if* there are enough arguments that the insertion causes a realloc of the array, and also only *if* the realloc changes the base address of the vector (which under valgrind it probably does by design, but is probably a lot less common normally). Created attachment 524256 [details, diff]
Patch for sys-devel/clang-6.0.0
(In reply to Hector Martin from comment #30) > Found the bug. > > [...] > > This explains why it's a heisenbug. It only happens *if* a target is > specified (which is why clang++ without the x86_64-pc-linux-gnu- works), > *if* there are enough arguments that the insertion causes a realloc of the > array, and also only *if* the realloc changes the base address of the vector > (which under valgrind it probably does by design, but is probably a lot less > common normally). This sounds worth reporting upstream, doesn't it? (In reply to EoD from comment #32) > This sounds worth reporting upstream, doesn't it? Yes. I'm going to send the patch upstream and see how that goes. Upstream code review: https://reviews.llvm.org/D44607 Wow man, good work! Valgrind to the freaking rescue! Wish I'd thought of that. GL making a them a test though... I guess you can give them a test that runs bootstrap-rap.sh and then builds chromium in the resulting prefix :P :P :P Great job Hector. I can confirm that with this patch in clang, chromium-65.0.3325.146 compiles fine. Although, don't know if it because of this fix, chromium took 4h22 to compile. Usually it was around 2h50 with my CPU (overclocked i7-4790K and -j5). I see that upstream has accepted the patch. I have one more in review queue (bug #650316) and once that one's merged too, I will make 6.0.0-r1. I confirm that the patched sys-devel/clang-6.0.0 succeeds in linking mksnapshot where the unpatched clang had been consistently failing. Thank you, Hector and Mikał. Sorry. Michał. (I was so focused on getting that slashed lowercase 'ł' right that I botched the other letters! No offense intended.) Could someone please change this bug report to allow finding the patch faster? URL: https://reviews.llvm.org/D44607 See-Also: https://reviews.llvm.org/D44607 (In reply to Dennis Schridde from comment #40) > Could someone please change this bug report to allow finding the patch > faster? Do you have some suggestion? Requested backport in https://bugs.llvm.org/show_bug.cgi?id=36824 The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e461935372c17b9715f87b6ce16620046d465add commit e461935372c17b9715f87b6ce16620046d465add Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2018-03-20 20:19:44 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2018-03-20 22:13:13 +0000 sys-devel/clang: Backport fix for crash with long cmdline to 6.0.0 Closes: https://bugs.gentoo.org/650082 .../{clang-6.0.0.ebuild => clang-6.0.0-r1.ebuild} | 4 ++ ...d-invalidated-iterator-in-insertTargetAnd.patch | 55 ++++++++++++++++++++++ 2 files changed, 59 insertions(+) Fixes bug for me too using recent clang/llvm-9999 build. *** Bug 652076 has been marked as a duplicate of this bug. *** *** Bug 653962 has been marked as a duplicate of this bug. *** |