Created attachment 917254 [details] llvm-19.1.4 build failure Build failure on llvm-core/llvm-19.1.4 (also tested 19.1.7, same error) on AMD64 Zen5. # emerge -pqv =llvm-core/llvm-19.1.4 emerge -pqv '=llvm-core/llvm-19.1.4::gentoo' [ebuild R ] llvm-core/llvm-19.1.4 USE="binutils-plugin libffi verify-sig* xml zstd -debug -debuginfod -doc -exegesis -libedit -test -z3" ABI_X86="(64) -32 (-x32)" LLVM_TARGETS="(AArch64) (AMDGPU) (ARM) (AVR) (BPF) (Hexagon) (Lanai) (LoongArch) (MSP430) (Mips) (NVPTX) (PowerPC) (RISCV) (Sparc) (SystemZ) (VE) (WebAssembly) (X86) (XCore) -ARC -CSKY -DirectX -M68k -SPIRV -Xtensa" emerge --info attached. build log attached.
Created attachment 917255 [details] emerge --info
My gcc was built with lto, if that's relevant. I tried the llvm build with -flto and without and get the same error in both cases.
# sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (__v8si)__Y); no "W_v8si" here, can you show the output of the command above?
(In reply to Zhixu Liu from comment #3) > # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h > return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (__v8si)__Y); > > no "W_v8si" here, can you show the output of the command above? # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (W_v8si)_[Y);
(In reply to Dan Arnold from comment #4) > (In reply to Zhixu Liu from comment #3) > > # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h > > return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (__v8si)__Y); > > > > no "W_v8si" here, can you show the output of the command above? > > # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h > return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (W_v8si)_[Y); I think this might be filesystem corruption. There has never been a 'W_v8si' in that file, looking at git history, and that line has been unchanged since 2011. commit 977e83a3edc1a58077e33143ad3cc1f9349d6197 Author: Kirill Yukhin <kirill.yukhin@intel.com> Date: Mon Aug 22 13:57:18 2011 +0000 Add support for AVX2 builtin functions. 2011-08-22 Kirill Yukhin <kirill.yukhin@intel.com>
(In reply to Sam James from comment #5) > (In reply to Dan Arnold from comment #4) > > (In reply to Zhixu Liu from comment #3) > > > # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h > > > return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (__v8si)__Y); > > > > > > no "W_v8si" here, can you show the output of the command above? > > > > # sed -n '642p' /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/avx2intrin.h > > return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (W_v8si)_[Y); > > I think this might be filesystem corruption. There has never been a 'W_v8si' > in that file, looking at git history, and that line has been unchanged since > 2011. > > commit 977e83a3edc1a58077e33143ad3cc1f9349d6197 > Author: Kirill Yukhin <kirill.yukhin@intel.com> > Date: Mon Aug 22 13:57:18 2011 +0000 > > Add support for AVX2 builtin functions. > > 2011-08-22 Kirill Yukhin <kirill.yukhin@intel.com> Weird. This is a brand new install. I did experience a segfault during a build so maybe that somehow corrupted it. I'm going to re-emerge gcc and see if that fixes it, thank you.
could be caused by cosmic ray? only one bit changed _ = 0x5f = 0101 1111 W = 0x57 = 0101 0111
(In reply to Zhixu Liu from comment #7) > could be caused by cosmic ray? only one bit changed > > _ = 0x5f = 0101 1111 > W = 0x57 = 0101 0111 if you haven't re-emerge yet, try 'equery check gcc'? if have done re-emerge, are there any backup exist so we can verify the checksum?
(In reply to Zhixu Liu from comment #7) > could be caused by cosmic ray? only one bit changed > > _ = 0x5f = 0101 1111 > W = 0x57 = 0101 0111 another difference, still one bit change [ = 0x5b = 0101 1011
I already re-emerged gcc, sorry! I'm guessing the checksum would have failed had I ran it before re-emerging. It passes now. But, I do have a log for what happened. I was emerging plasma-meta and then I opened another pane in tmux and tried to emerge lm-sensors (at the same time). I was also watching btop in another tmux pane at the same time. CPU was totally pegged, 100% on all 24 logical cores - it seemed like lm-sensors should have been moving faster (I emerged it later when idle and it took less than a minute) but I waited a while then, I ctrl-C'd the lm-sensors emerge and at that exact moment, the plasma-meta merge blew up with a segfault. Attaching my journalctl from the segfault. Maybe something had that file open and it got corrupted? I am using zfs-2.3.0 from guru if that's relevant; root is on ZFS.
Created attachment 917304 [details] emerge segfault log This segfault happened when I was emerging plasma-meta and then emerged lm-sensors simultaneously in another terminal. It seemed frozen (lm-sensors is tiny and emerges very quickly on this brand-new AMD Ryzen 9 9900X) so I ctrl-C'd the lm-sensors emerge and that's when the segfault happened.
Created attachment 917305 [details] qlop -m output qlop -m output around the time of the segfault.
Created attachment 917306 [details] qlop -qEm output qlop -qEm output around the time of the emerge segfault
This PC is brand new with brand new hardware, built yesterday, so if there's a possibility there's a hardware issue causing this I'd love to know. Thanks for bearing with me on this :)
(In reply to Dan Arnold from comment #10) > I am using zfs-2.3.0 from guru if that's relevant; root is on ZFS. I *have* hit various issues with ZFS before (such that I even have a wiki page listing them) but I'm not convinced that's what's going on here. note: Hopefully not from guru, as it's in ::gentoo, and guru isn't allowed to have ebuilds for ::gentoo packages. (In reply to Dan Arnold from comment #14) > This PC is brand new with brand new hardware, built yesterday, so if there's > a possibility there's a hardware issue causing this I'd love to know. Thanks > for bearing with me on this :) First candidate coming to mind here is XMP defaulting on in bios/uefi/firmware settings -- check that and for any other silly overclock stuff? A lot of OEMs turn them on by default these days. I'd also do a memtest at least overnight (ideally 12+ hours minimum).
(In reply to Sam James from comment #15) > (In reply to Dan Arnold from comment #10) > > I am using zfs-2.3.0 from guru if that's relevant; root is on ZFS. > > I *have* hit various issues with ZFS before (such that I even have a wiki > page listing them) but I'm not convinced that's what's going on here. > > note: Hopefully not from guru, as it's in ::gentoo, and guru isn't allowed > to have ebuilds for ::gentoo packages. > > (In reply to Dan Arnold from comment #14) > > This PC is brand new with brand new hardware, built yesterday, so if there's > > a possibility there's a hardware issue causing this I'd love to know. Thanks > > for bearing with me on this :) > > First candidate coming to mind here is XMP defaulting on in > bios/uefi/firmware settings -- check that and for any other silly overclock > stuff? A lot of OEMs turn them on by default these days. > > I'd also do a memtest at least overnight (ideally 12+ hours minimum). You're right, my zfs package is from gentoo (~amd64), not guru. I'm using the built-in EXPO profile for the RAM, running at 5600MT/s 1:1 with CAS latency 36 (RAM is DDR5 5600). The RAM is on the motherboard manufacturer's compatibility list. I'm not overclocking otherwise, that's the only thing I changed in the UEFI settings other than fan curves. I will try a memtest overnight. Thank you!
Haha, you were spot on Sam, it's bad RAM. Time: 0:20:09 Status: Failed! Pass: 0 Errors: 64360 pCPU Pass Test Failing Address Expected Found ---- ---- ---- --------------------- ----------------- ---------------- 6 0 8 000fc8adaf70 (63.1GB) 948338d38a501631 94c388d38a501631 ...etc We can close this bug, thanks for everyone's assistance!
Hardware error, resolving
You're most welcome -- sorry for the bad news, but really glad we got to the bottom of it quickly!