Summary: | dev-lang/rust: stage0 rustc segfaults on -system-bootstrap build on phenom | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | JohnLM <janis.smedins> |
Component: | Current packages | Assignee: | Gentoo Rust Project <rust> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | axl, ionen, janis.smedins, navi, randy, rust |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
output of a failed emerge
emerge failure, glibc w/o clone3 |
Description
JohnLM
2022-11-07 19:37:07 UTC
$ emerge --info glibc [...] sys-libs/glibc-2.35-r8::gentoo was built with the following: USE="clone3 multiarch (multilib) ssp stack-realign (static-libs) systemd -audit -caps (-cet) -compile-locales (-crypt) (-custom-cflags) -doc (-experimental-loong) -gd -headers-only -multilib-bootstrap -nscd -profile (-selinux) -suid -systemtap -test (-vanilla)" ABI_X86="(64)" CFLAGS="-march=barcelona -pipe -fno-builtin-strlen -O2" CXXFLAGS="-march=barcelona -pipe -fno-builtin-strlen -O2" FEATURES="binpkg-docompress network-sandbox userfetch binpkg-logs ipc-sandbox pid-sandbox usersync merge-sync config-protect-if-modified distlocks protect-owned assume-digests userpriv unknown-features-warn parallel-fetch noinfo qa-unresolved-soname-deps fixlafiles multilib-strict preserve-libs nostrip sfperms binpkg-dostrip news sandbox xattr usersandbox buildpkg-live unmerge-logs unmerge-orphans strict ebuild-locks" Created attachment 828519 [details] output of a failed emerge The exact place of error varies, but it's not completely random. I've noticed some pattern in the address offset. On last build the syslog got: Nov 07 21:29:29 isg005 kernel: opt build_scrip[263489]: segfault at 18 ip 00007f6c4c8a5a85 sp 00007f6c3d1fb318 error 4 in libc.so.6[7f6c4c82b000+16f000] Nov 07 21:29:29 isg005 kernel: Code: 83 c8 01 c3 90 ff ca 7e dc 0f b7 0e 0f b7 07 0f c9 0f c8 d1 e9 d1 e8 8a 0c 16 0f b6 3c 17 09 f8 29 c8 c3 66 90 b9 ff ff 00 00 <0f> 10 06 0f 10 0f 66 0f 74 c8 66 0f d7 c1 29 c8 75 39 48 83 ea 20 Nov 07 21:29:29 isg005 systemd[1]: Started Process Core Dump (PID 263491/UID 0). Nov 07 21:29:29 isg005 kernel: opt build_scrip[263510]: segfault at 10 ip 00007fcb05eda53c sp 00007fcaf83fc8a0 error 4 in libLLVM-14-rust-1.63.0-stable.so[7fcb0536f000+29f1000] Nov 07 21:29:29 isg005 kernel: Code: 48 89 84 24 18 01 00 00 31 db 4c 89 74 24 30 4c 89 7c 24 28 66 2e 0f 1f 84 00 00 00 00 00 49 8b 42 30 4c 8b 2c c8 49 8b 45 00 <48> 8b 40 10 48 3b 05 41 41 2b 02 89 9c 24 0c 01 00 00 48 89 8c 24 Nov 07 21:29:29 isg005 systemd[1]: Started Process Core Dump (PID 263512/UID 0). Nov 07 21:29:30 isg005 systemd-coredump[263492]: elfutils disabled, parsing ELF objects not supported Nov 07 21:29:30 isg005 systemd-coredump[263492]: [🡕] Process 263459 (rustc) of user 250 dumped core. Nov 07 21:29:30 isg005 systemd[1]: systemd-coredump@19-263491-0.service: Deactivated successfully. Nov 07 21:29:30 isg005 systemd-coredump[263513]: elfutils disabled, parsing ELF objects not supported Nov 07 21:29:30 isg005 systemd-coredump[263513]: [🡕] Process 263457 (rustc) of user 250 dumped core. Nov 07 21:29:30 isg005 systemd[1]: systemd-coredump@20-263512-0.service: Deactivated successfully. Emerge output including traceback in the attachement. (Copied from terminal, the portage didn't leave the tmpdir around). Oh! I should probably tell that this started versions 1.53.0 and 1.58.0 as dev-lang/rust-1.58.0 was first one to fail to build. Somehow I ended up trying system-bootstrapping, and I bootstrapped from 1.53.0 (I had installed at the time) through 1.58.0 (latest then) to the 1.64.0-r1 (latest now). * _between_ versions 1.53.0 and 1.58.0 if you re-emerge systemd with USE=elfutils, you'd get more info in logs. I highly recommend enabling it for time being. very useful. also you probably should be able to get a stacktrace with coredumpctl info rustc ideally try those steps to reproduce and provide more info 1. install gdb, if not installed yet 2. re-install systemd with USE=elfutils 3. reproduce the crash, do not clean up /var/tmp/portage/... yet 4. run: coredumpctl info rustc (provide output) 5. run: coredumpctl debug rustc 6. in the gdb prompt, which opens as result of step 5, type: bt (provide output) 7. provide output of 'qlop -vmu' qlop output will help me see what else of significance has been updated in time period around rust-1.53 and later. you may need to rebuild glibc with debugging information (https://wiki.gentoo.org/wiki/Debugging), -ggdb in CFLAGS and nostrip or splitdebug in FEATURES should be enough. but since you already have fno-builtin-strlen for it I guess you already have debuginfo for it as well. basically I'd really like to see gdb backtrace or at least coredumpctl backtrace. Log in New API Help About Rust failure gdb backtrace 6.8 KB of Plain text Created 3 minutes ago — expires in 7 days Viewed 2 times COPY TO CLIPBOARD SOFT WRAP RAW TEXT DUPLICATE DIFF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 (gdb) bt full #0 arena_get_from_edata (edata=0x0) at include/jemalloc/internal/arena_inlines_b.h:16 No locals. #1 _rjem_je_large_dalloc (tsdn=0x7f5cb99febf8, edata=0x0) at src/large.c:271 arena = <optimized out> #2 0x000055c51b8dc48c in arena_dalloc_large (ptr=<optimized out>, tcache=<optimized out>, szind=<optimized out>, slow_path=false, tsdn=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:297 edata = 0x0 edata = <optimized out> #3 arena_dalloc (ptr=<optimized out>, tcache=<optimized out>, slow_path=false, tsdn=<optimized out>, caller_alloc_ctx=<optimized out>) at include/jemalloc/internal/arena_inlines_b.h:334 alloc_ctx = <optimized out> #4 idalloctm (ptr=<optimized out>, tcache=<optimized out>, is_internal=false, slow_path=false, tsdn=<optimized out>, alloc_ctx=<optimized out>) at include/jemalloc/internal/jemalloc_internal_inlines_c.h:120 No locals. #5 ifree (ptr=<optimized out>, tcache=<optimized out>, slow_path=false, tsd=<optimized out>) at src/jemalloc.c:2887 alloc_ctx = <optimized out> usize = 289078108240281600 #6 _rjem_je_free_default (ptr=<optimized out>) at src/jemalloc.c:3014 tcache = <optimized out> tsd = <optimized out> #7 0x00007f5cd30cf097 in llvm::BranchProbabilityInfo::calculate(llvm::Function const&, llvm::LoopInfo const&, llvm::TargetLibraryInfo const*, llvm::DominatorTree*, llvm::PostDominatorTree*) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #8 0x00007f5cd39c2f94 in llvm::detail::AnalysisPassModel<llvm::Function, llvm::BranchProbabilityAnalysis, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>::Invalidator>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #9 0x00007f5cd269c6e7 in llvm::AnalysisManager<llvm::Function>::getResultImpl(llvm::AnalysisKey*, llvm::Function&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #10 0x00007f5cd29f3cb6 in llvm::detail::AnalysisPassModel<llvm::Function, llvm::BlockFrequencyAnalysis, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>::Invalidator>::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #11 0x00007f5cd269c6e7 in llvm::AnalysisManager<llvm::Function>::getResultImpl(llvm::AnalysisKey*, llvm::Function&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #12 0x00007f5cd2a1cdaf in llvm::AlwaysInlinerPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #13 0x00007f5cd2a1caad in llvm::detail::PassModel<llvm::Module, llvm::AlwaysInlinerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #14 0x00007f5cd35060e6 in llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/libLLVM-15-rust-1.65.0-stable.so No symbol table info available. #15 0x00007f5cd7d8b341 in LLVMRustOptimizeWithNewPassManager () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. #16 0x00007f5cd7d877ca in rustc_codegen_llvm::back::write::optimize_with_new_llvm_pass_manager () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. #17 0x00007f5cd7d86815 in rustc_codegen_llvm::back::write::optimize () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. #18 0x00007f5cd7bb172b in rustc_codegen_ssa::back::write::execute_work_item::<rustc_codegen_llvm::LlvmCodegenBackend> () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. #19 0x00007f5cd7bafc38 in std::sys_common::backtrace::__rust_begin_short_backtrace::<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()> () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. --Type <RET> for more, q to quit, c to continue without paging--c #20 0x00007f5cd7b8a738 in <<std::thread::Builder>::spawn_unchecked_<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa::back::write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()>::{closure#1} as core::ops::function::FnOnce<()>>::call_once::{shim:vtable#0} () from /home/johnlm/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so No symbol table info available. #21 0x00007f5cd58cd003 in alloc::boxed::{impl#44}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1940 No locals. #22 alloc::boxed::{impl#44}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1940 No locals. #23 std::sys::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/unix/thread.rs:108 No locals. #24 0x00007f5cd56246da in start_thread (arg=<optimized out>) at pthread_create.c:442 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140036239554608, 5389326673050754667, 140036227987008, 0, 140036693705728, 0, -5443779712600351125, -5443982464774676885}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #25 0x00007f5cd56a9bdc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 No locals. Share: looks like clone3() is responsible. disabling that flag on glibc might be a workaround, however I can't answer why it's happening.
>
> $ emerge --info
> Portage 3.0.38.1 (python 3.10.8-final-0, default/linux/amd64/17.1/systemd,
> gcc-11.3.0, glibc-2.35-r8, 5.15.75-gentoo x86_64)
Could you please update to the newest glibc-2.35 available and try again?
i.e. sys-libs/glibc-2.35-r11
There were some code path selection fixes...
(In reply to JohnLM from comment #0) > And rustup-installed compiler seems to work when running another distro > (Linux Mint) on the same machine. fwiw what glibc version does your linux mint install have? If it's glibc-2.33 or older it likely wouldn't say much given it didn't use clone3. Created attachment 830299 [details]
emerge failure, glibc w/o clone3
OK, I rebuilt glibc with USE=-clone3. It didn't make the segfault go away. Attached log for `USE=-system-bootstrap MAKEOPTS=-j1 emerge -a1v rust`
The coredump did stack traces of 5 threads -- which is cool -- but I'm not sure if it's any more useful.
In attached log segfault was triggered in jemalloc code again, but it's not necessarily always so. I did a bunch more `cargo +stable build` runs. 2/3 segfaults were in LLVM code and about half of the time the build simply froze without triggering a segfault. Hmm...
(In reply to Ionen Wolkens from comment #9) > (In reply to JohnLM from comment #0) > > And rustup-installed compiler seems to work when running another distro > > (Linux Mint) on the same machine. > fwiw what glibc version does your linux mint install have? If it's > glibc-2.33 or older it likely wouldn't say much given it didn't use clone3. I used Linux Mint 20.3 LiveUSB. It runs glibc-2.31 (In reply to Andreas K. Hüttel from comment #8) > Could you please update to the newest glibc-2.35 available and try again? > i.e. sys-libs/glibc-2.35-r11 > > There were some code path selection fixes... I unkeyworded and updated to sys-libs/glibc-2.35-r11, still with USE=-clone3 No changes, unfortunately. can you attach backtrace failure without clone3? I see it still fails in pthread_create, but since clone3 is disabled - stack should look differently. (In reply to Georgy Yakovlev from comment #12) > can you attach backtrace failure without clone3? (In reply to Georgy Yakovlev from comment #13) > I see it still fails in pthread_create, but since clone3 is disabled - stack > should look differently. Yes, I already did. That's attachment 830299 [details] -- it's different that it now uses clone() instead of clone3() I think clone* is a redherring here, it's just that clone3 was a problem with seccomp before. It's something glibc uses a lot (which is why it was a problem with seccomp+clone3) internally. (In reply to Sam James from comment #15) > I think clone* is a redherring here, it's just that clone3 was a problem > with seccomp before. It's something glibc uses a lot (which is why it was a > problem with seccomp+clone3) internally. ++ Thanks for the help so far! Do you have any more ideas for me to try? Seems like a dead end at the moment. At least out of curiosity I tried grepping for AVX instructions. I tried some commands like this: $ objdump -d ~/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/librustc_driver-a21dfa8672cc0cdd.so | awk '/^[0-9a-f]/ { l = $0 ; x = 1 }; /^ .*v(broad|insert|mask|per|zero)/ { if (x) { print l ; x = 0 } }' and -- sure enough -- libLLVM and librustc_driver (but not rustc itself) found some. So far so nice, but I also found some AVX in my system's librustc_driver (the working one)! Given this _and_ the fact that all the function names had "avx" somewhere in their names I'm completely sure the AVX code is used only if a runtime check passes. So _not_ the culprit. Oh well. |