After upgrading to glibc-2.32-r1 emerge and portageq receive SIGILL and terminate: # emerge Illegal instruction # portageq Illegal instruction Reproducible: Always Steps to Reproduce: 1. Can reproduce it on two similiar machines just by upgrading glibc Actual Results: Broken glibc Expected Results: Working glibc dmesg: [17259097.333206] traps: emerge[7462] trap invalid opcode ip:7f9d56390868 sp:7ffde023e558 error:0 [17259097.333212] in libm-2.32.so[7f9d56317000+9b000] [17259112.509811] traps: portageq[7465] trap invalid opcode ip:7f1e2e351868 sp:7ffe829d55c8 error:0 [17259112.509816] in libm-2.32.so[7f1e2e2d8000+9b000] The offset 9b000 is identical on both system, but changes with gcc-version (minor difference in layout, I assume) Building with USE="-multiarch" yields working glibc. I will post follow ups with addiotional information.
Created attachment 657966 [details] emerge --info emerge --info
With FEATURES="test", I am getting for USE="multiarch": FAIL: elf/tst-ldconfig-ld_so_conf-update FAIL: io/tst-copy_file_range FAIL: math/test-double-acos FAIL: math/test-double-asin FAIL: math/test-double-pow FAIL: math/test-double-tgamma FAIL: math/test-double-vlen2-pow FAIL: math/test-double-vlen4-pow FAIL: math/test-float32x-acos FAIL: math/test-float32x-asin FAIL: math/test-float32x-pow FAIL: math/test-float32x-tgamma FAIL: math/test-float64-acos FAIL: math/test-float64-asin FAIL: math/test-float64-pow FAIL: math/test-float64-tgamma FAIL: stdlib/tst-system FAIL: string/tst-strerror FAIL: string/tst-strsignal Summary of test results: 19 FAIL -- With FEATURES="test" and USE="-multiarch": FAIL: elf/tst-ldconfig-ld_so_conf-update FAIL: io/tst-copy_file_range FAIL: stdlib/tst-system FAIL: string/tst-strerror FAIL: string/tst-strsignal Summary of test results: 5 FAIL -- There semms to be quite a difference regarding the math tests.
Short addition since the description was updated. The problem occurs with glibc versions 2.32 and 2.32-r1 (so, I assume it'S not the patches added in r1 causing this), I tried gcc versions 9.2, 9.3, 10.1, 10.2, binutils 2.32, 2.33.1, 2.34. So as far as I can tell gcc and binutils versions do not really matter.
Please attach glibc's build.log and get a backtrace with illegal instruction. Usually you can use core dump and gdb for that. Something like: $ gdb path/to/executable path/to/corecore (gdb) bt (gdb) disassemble
Can I force emerge to keep the build.log? Because the problems start in postrm: /usr/bin/sprof /usr/bin/pldd /sbin/sln /sbin/ldconfig >>> Installing (1 of 1) sys-libs/glibc-2.32-r1::gentoo * Defaulting /etc/host.conf:multi to on /usr/lib/portage/python3.7/phase-functions.sh: line 931: 4114 Illegal instruction "$PORTAGE_BIN_PATH"/ebuild-ipc exit $? Regarding gdb, will need to emerge that first, might take a while.
You can use PORTAGE_LOGDIR= variable to store build logs of successful builds. Something like: # PORTAGE_LOGDIR=/path/to/result emerge -v1 glibc
Can you also upload bad libm-2.32.so binary? I'll try to look at exact instruction that hides at problematic offset.
Created attachment 658202 [details] build.log Full build.log
Created attachment 658204 [details] libm.so Defective libm-2.32.so (compressed)
Created attachment 658206 [details] backtrace
Disassemble didn't work. Let me emerge glibc again, and let me see, if I can create a minimalistic prog to trigger the problem.
Okay, the minimalistic example I wrote in C calling pow indeed gets SIGILL aswell: Program received signal SIGILL, Illegal instruction. 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 (gdb) bt #0 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 #1 0x00007ffff7ec3484 in powf64 () from /lib64/libm.so.6 #2 0x00005555555551df in main () at math.c:8 (gdb) disassemble 0x00007ffff7f09860,0x00007ffff7f098ff Dump of assembler code from 0x7ffff7f09860 to 0x7ffff7f098ff: 0x00007ffff7f09860: add %rcx,%rdx 0x00007ffff7f09863: vmovq %rax,%xmm6 => 0x00007ffff7f09868: vfmaddsd %xmm4,0x8(%rdx),%xmm6,%xmm0 0x00007ffff7f0986f: vmovsd 0x594e9(%rip),%xmm6 # 0x7ffff7f62d60 0x00007ffff7f09877: vmulsd 0x594f1(%rip),%xmm0,%xmm9 # 0x7ffff7f62d70 0x00007ffff7f0987f: vfmaddsd 0x18(%rdx),%xmm6,%xmm2,%xmm3 0x00007ffff7f09886: vmovsd 0x594da(%rip),%xmm6 # 0x7ffff7f62d68 This would be an FMA4 Instruction, if I am not mistaken.
(In reply to Sven E. from comment #12) > Okay, the minimalistic example I wrote in C calling pow indeed gets SIGILL > aswell: > > Program received signal SIGILL, Illegal instruction. > 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 > > (gdb) bt > #0 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 > #1 0x00007ffff7ec3484 in powf64 () from /lib64/libm.so.6 > #2 0x00005555555551df in main () at math.c:8 > (gdb) disassemble 0x00007ffff7f09860,0x00007ffff7f098ff > Dump of assembler code from 0x7ffff7f09860 to 0x7ffff7f098ff: > 0x00007ffff7f09860: add %rcx,%rdx > 0x00007ffff7f09863: vmovq %rax,%xmm6 > => 0x00007ffff7f09868: vfmaddsd %xmm4,0x8(%rdx),%xmm6,%xmm0 > 0x00007ffff7f0986f: vmovsd 0x594e9(%rip),%xmm6 # 0x7ffff7f62d60 > 0x00007ffff7f09877: vmulsd 0x594f1(%rip),%xmm0,%xmm9 # > 0x7ffff7f62d70 > 0x00007ffff7f0987f: vfmaddsd 0x18(%rdx),%xmm6,%xmm2,%xmm3 > 0x00007ffff7f09886: vmovsd 0x594da(%rip),%xmm6 # 0x7ffff7f62d68 > > This would be an FMA4 Instruction, if I am not mistaken. Yeah, it's a 4-operand FMA. That means glibc mistakenly detects your CPU as capable of AVX+FMA4 and enable that implementation for trigonometry. There are a few moving parts here: 1. basic kernel needs support for AVX context save/restore 2. glibc cpuid detection of features on your CPU Can you install sys-apps/cpuid and upload output of 'cupid' and 'cpuid --raw'? It should report a bunch of leaf values and might ease tracing through CPU features detections. Or you can try yourself. glibc detects features at: sysdeps/x86/cpu-features.c:init_cpu_features(): https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/cpu-features.c;h=b0ded20486f299535fa3cbcb2f9021aaf3ab8503;hb=HEAD#l359 The specific things we are looking for are bits that enable fma implementation. I think it is a sysdeps/x86_64/fpu/multiarch/e_powf.c: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/fpu/multiarch/e_powf.c;h=c5bd42b099b581efd35c5b166829661dfb83d0f2;hb=HEAD#l30 There glibc uses only FMA selector: #include "ifunc-fma.h" https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/fpu/multiarch/ifunc-fma.h;h=0a25a44ab083093f5374f4c492ff073d5fdb8d91;hb=HEAD#l24 """ 24 static inline void * 25 IFUNC_SELECTOR (void) 26 { 27 const struct cpu_features* cpu_features = __get_cpu_features (); 28 29 if (CPU_FEATURE_USABLE_P (cpu_features, FMA) 30 && CPU_FEATURE_USABLE_P (cpu_features, AVX2)) 31 return OPTIMIZE (fma); 32 33 return OPTIMIZE (sse2); 34 } """ So the ultimate question is: whether your CPU and kernel support AVX2+FMA.
Created attachment 658222 [details] cpuid.txt.bz2 cpuid
Created attachment 658224 [details] cpuid.raw.txt.bz2 cpuid -raw
/proc cpuinfo says: model name : Intel(R) Xeon(R) CPU E5-2650L v4 @ 1.70GHz flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts nopl tsc_reliable nonstop_tsc pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx hypervisor lahf_lm kaiser arat cpuid identifies as haswell, however that Xeon CPU is sandy bridge uarch, if the research I did earlier is correct.
(In reply to Sergei Trofimovich from comment #13) > (In reply to Sven E. from comment #12) > > Okay, the minimalistic example I wrote in C calling pow indeed gets SIGILL > > aswell: > > > > Program received signal SIGILL, Illegal instruction. > > 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 > > > > (gdb) bt > > #0 0x00007ffff7f09868 in ?? () from /lib64/libm.so.6 > > #1 0x00007ffff7ec3484 in powf64 () from /lib64/libm.so.6 > > #2 0x00005555555551df in main () at math.c:8 > > (gdb) disassemble 0x00007ffff7f09860,0x00007ffff7f098ff > > Dump of assembler code from 0x7ffff7f09860 to 0x7ffff7f098ff: > > 0x00007ffff7f09860: add %rcx,%rdx > > 0x00007ffff7f09863: vmovq %rax,%xmm6 > > => 0x00007ffff7f09868: vfmaddsd %xmm4,0x8(%rdx),%xmm6,%xmm0 > > 0x00007ffff7f0986f: vmovsd 0x594e9(%rip),%xmm6 # 0x7ffff7f62d60 > > 0x00007ffff7f09877: vmulsd 0x594f1(%rip),%xmm0,%xmm9 # > > 0x7ffff7f62d70 > > 0x00007ffff7f0987f: vfmaddsd 0x18(%rdx),%xmm6,%xmm2,%xmm3 > > 0x00007ffff7f09886: vmovsd 0x594da(%rip),%xmm6 # 0x7ffff7f62d68 > > > > This would be an FMA4 Instruction, if I am not mistaken. > > Yeah, it's a 4-operand FMA. That means glibc mistakenly detects your CPU as > capable of AVX+FMA4 and enable that implementation for trigonometry. > > There are a few moving parts here: > 1. basic kernel needs support for AVX context save/restore > 2. glibc cpuid detection of features on your CPU > > Can you install sys-apps/cpuid and upload output of 'cupid' and 'cpuid > --raw'? It should report a bunch of leaf values and might ease tracing > through CPU features detections. > > Or you can try yourself. glibc detects features at: > > sysdeps/x86/cpu-features.c:init_cpu_features(): > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/cpu-features.c; > h=b0ded20486f299535fa3cbcb2f9021aaf3ab8503;hb=HEAD#l359 > > The specific things we are looking for are bits that enable fma > implementation. > > I think it is a > > sysdeps/x86_64/fpu/multiarch/e_powf.c: Correction: having looked at your libm.so the crash happens at __ieee754_pow_fma4. That is sysdeps/x86_64/fpu/multiarch/e_pow.c with fma4 ifunc: """ static inline void * IFUNC_SELECTOR (void) { const struct cpu_features* cpu_features = __get_cpu_features (); if (CPU_FEATURE_USABLE_P (cpu_features, FMA) && CPU_FEATURE_USABLE_P (cpu_features, AVX2)) return OPTIMIZE (fma); if (CPU_FEATURE_USABLE_P (cpu_features, FMA)) return OPTIMIZE (fma4); return OPTIMIZE (sse2); } """ And looks like we have a bug here. It should be a 'if (CPU_FEATURE_USABLE_P (cpu_features, FMA4))' ina second condition.
Yes, I think you are right, this looks plain wrong. One question though: Correction: having looked at your libm.so the crash happens at __ieee754_pow_fma4. That is sysdeps/x86_64/fpu/multiarch/e_pow.c with fma4 ifunc: How did you find this out, if I may ask?
Was fixed upstream. If you are feeling brave you can try upstream patch: https://sourceware.org/git/?p=glibc.git;a=patch;h=23af890b3f04e80da783ba64e6b6d94822e01d54 You will need to drop it with .patch extension to /etc/portage/patches/sys-libs/glibc and rebuild glibc.
(In reply to Sven E. from comment #18) > Yes, I think you are right, this looks plain wrong. > > One question though: > Correction: having looked at your libm.so the crash happens at > __ieee754_pow_fma4. That is sysdeps/x86_64/fpu/multiarch/e_pow.c with fma4 > ifunc: > > How did you find this out, if I may ask? I cheated a bit and built glibc locally with the same CFLAGS you were using (I only added -ggdb3 on top). Then searched for 'vfmaddsd %xmm4,0x8(%rdx),%xmm6,%xmm0' instruction sequence you had in gdb output and was lucky to have the same snippet. Instruction offset matched perfectly as well. Debugging symbols make it more obvious and show that the instruction is part of __ieee754_pow_fma4 function. I think you would see the same with -ggdb3 in CFLAGS.
Ah, thanks, that explains. For the sake of completeness I just built glibc with FEATURES="nostrip" and then gdb does indeed display the name too in the backtrace. Should have thought of that earlier :-/. --- Will this be taken upstream?
(In reply to Sven E. from comment #21) > Will this be taken upstream? The patch above mentions existing https://sourceware.org/PR26534
Queued into 2.32 patchset as: https://gitweb.gentoo.org/fork/glibc.git/commit/?h=gentoo/2.32&id=5752df8c01162de92e83a031f61e4441b4ea432b
Thanks for your efforts, inbetween I had already done a fast patch (mayself) and dropped it in as user patch. Since it is identical, we can close this as soon as you are done with rolling it out.
*** Bug 744586 has been marked as a duplicate of this bug. ***
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=55104ab0a33759928f0cb6bb8edc9a39dc3f5079 commit 55104ab0a33759928f0cb6bb8edc9a39dc3f5079 Author: Andreas K. Hüttel <dilfridge@gentoo.org> AuthorDate: 2020-09-25 18:53:13 +0000 Commit: Andreas K. Hüttel <dilfridge@gentoo.org> CommitDate: 2020-09-25 18:54:25 +0000 sys-libs/glibc: Revbump to 2.32 patchset 2 Contains the following fix: x86-64: Fix FMA4 detection in ifunc [BZ #26534] Closes: https://bugs.gentoo.org/740110 Package-Manager: Portage-3.0.4, Repoman-3.0.1 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> sys-libs/glibc/Manifest | 1 + sys-libs/glibc/glibc-2.32-r2.ebuild | 1505 +++++++++++++++++++++++++++++++++++ 2 files changed, 1506 insertions(+)