Created attachment 885848 [details] xz-utils-5.6.0[pgo] log I'm not sure if this is part of the PGO profiling process, but after the initial build, app-arch/xz-utils-5.6.0[pgo] runs its test suite. However, it fails all tests and the build gets cancelled. USE=-pgo builds fine. [ebuild R ] app-arch/xz-utils-5.6.0 USE="extra-filters nls pgo* static-libs verify-sig -doc"
Created attachment 885849 [details] emerge --info
It is supposed to run the test suite because PGO relies on running something to obtain training data. However, we definitely have a problem in that it's segfaulting... ``` make[2]: Leaving directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests' make check-TESTS make[2]: Entering directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests' make[3]: Entering directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests' /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112: 5787 Segmentation fault (core dumped) "$@" >> "$log_file" 2>&1 FAIL: test_index /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112: 5776 Segmentation fault (core dumped) "$@" >> "$log_file" 2>&1 FAIL: test_hardware /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112: 5789 Segmentation fault (core dumped) "$@" >> "$log_file" 2>&1 ``` Could you try with USE=-pgo, and FEATURES=test? USE="-pgo" FEATURES=test emerge -v1 app-arch/xz-utils Then if it fails, show me tests/test-suite.log, and we will have to probe to get a backtrace.
Created attachment 885850 [details] app-arch/xz-utils[pgo] FEATURES=test build log FEATURES=test does not work for me, I'm afraid. Relevant dmesg output: [ 6874.934145] test_hardware[46609]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd555380c8 error 14 in test_hardware[564b2ae1e000+2000] likely on CPU 8 (core 16, socket 0) [ 6874.934169] Code: Unable to access opcode bytes at 0x4186. [ 6874.937168] test_check[46605]: segfault at 41b0 ip 00000000000041b0 sp 00007ffff3b404a8 error 14 in test_check[5580a5e49000+2000] likely on CPU 11 (core 19, socket 0) [ 6874.937188] Code: Unable to access opcode bytes at 0x4186. [ 6874.939060] test_block_head[46621]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd0956c0a8 error 14 in test_block_header[55e7e5145000+2000] likely on CPU 11 (core 19, socket 0) [ 6874.939077] Code: Unable to access opcode bytes at 0x4186. [ 6874.940545] test_vli[46633]: segfault at 41b0 ip 00000000000041b0 sp 00007fff06908448 error 14 in test_vli[55baaeb94000+2000] likely on CPU 4 (core 8, socket 0) [ 6874.940560] Code: Unable to access opcode bytes at 0x4186. [ 6874.941136] test_stream_fla[46612]: segfault at 41b0 ip 00000000000041b0 sp 00007ffde0643368 error 14 in test_stream_flags[55e8502b9000+2000] likely on CPU 7 (core 12, socket 0) [ 6874.941149] Code: Unable to access opcode bytes at 0x4186. [ 6874.941155] test_index_hash[46626]: segfault at 41b0 ip 00000000000041b0 sp 00007fff64eaa068 error 14 in test_index_hash[55c4cef05000+2000] likely on CPU 10 (core 18, socket 0) [ 6874.941172] Code: Unable to access opcode bytes at 0x4186. [ 6874.941419] test_filter_fla[46618]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd3f1c6ba8 error 14 in test_filter_flags[55f9d60b9000+2000] likely on CPU 2 (core 4, socket 0) [ 6874.941436] Code: Unable to access opcode bytes at 0x4186. [ 6874.941621] test_index[46622]: segfault at 41b0 ip 00000000000041b0 sp 00007fff9ce4f648 error 14 in test_index[5586063d4000+2000] likely on CPU 1 (core 0, socket 0) [ 6874.941638] Code: Unable to access opcode bytes at 0x4186. [ 6874.942096] test_filter_str[46616]: segfault at 41b0 ip 00000000000041b0 sp 00007fffec36cf88 error 14 in test_filter_str[5556c5024000+2000] likely on CPU 15 (core 23, socket 0) [ 6874.942113] Code: Unable to access opcode bytes at 0x4186. [ 6874.942572] test_memlimit[46632]: segfault at 41b0 ip 00000000000041b0 sp 00007fff8c01f368 error 14 in test_memlimit[5636f4800000+2000] likely on CPU 3 (core 4, socket 0) [ 6874.942589] Code: Unable to access opcode bytes at 0x4186.
Created attachment 885851 [details] xz-utils-5.6.0[-pgo] FEATURES=test build log (successful) nvm, I did not read your comment properly. Sorry for that. Here is the log for USE=-pgo FEATURES=test.
Ugh, that's suspicious. That implies a possible miscompilation (so maybe a compiler bug, let's see) which requires profile data to trigger. Could I ask you to make it fail again with USE=pgo, and then upload the tarballed workdir?
Reproduced and speaking to Jia Tan. My usual *FLAGS work, but yours trigger it for me.
Great, thanks! There's no need for me to upload my workdir, then? Because that saves me the headache of making that 12MB tarball fit into the 1000KB attachment file size limit.
(In reply to Johannes Penßel from comment #7) > Great, thanks! There's no need for me to upload my workdir, then? Because > that saves me the headache of making that 12MB tarball fit into the 1000KB > attachment file size limit. You're off the hook ;)
I can't reproduce it yet in a Dockerfile I'm crafting. Backtrace though: ``` #0 0x00000000000041b6 in ?? () (gdb) bt #0 0x00000000000041b6 in ?? () #1 0x00007f0d2f4f9c75 in lzma_crc32 () from /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/src/liblzma/.libs/liblzma.so.5 #2 0x00007f0d2f5511e4 in elf_machine_rela (map=<optimized out>, scope=<optimized out>, reloc=0x7f0d2f4dd5c8, sym=0x7f0d2f4dafd8, version=<optimized out>, reloc_addr_arg=0x7f0d2f527b10 <lzma_crc32@got[plt]>, skip_ifunc=<optimized out>) at ../sysdeps/x86_64/dl-machine.h:314 #3 elf_dynamic_do_Rela (map=0x7f0d2f540160, scope=<optimized out>, reladdr=<optimized out>, relsize=<optimized out>, nrelative=<optimized out>, lazy=<optimized out>, skip_ifunc=<optimized out>) at /var/tmp/portage/sys-libs/glibc-2.39-r1/work/glibc-2.39/elf/do-rel.h:147 #4 _dl_relocate_object (l=l@entry=0x7f0d2f540160, scope=<optimized out>, reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:301 #5 0x00007f0d2f560d61 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2311 #6 0x00007f0d2f55d59f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7ffeee637260, dl_main=dl_main@entry=0x7f0d2f55f060 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140 #7 0x00007f0d2f55eda2 in _dl_start_final (arg=0x7ffeee637260) at rtld.c:494 #8 _dl_start (arg=0x7ffeee637260) at rtld.c:581 #9 0x00007f0d2f55db88 in _start () from /lib64/ld-linux-x86-64.so.2 #10 0x0000000000000001 in ?? () #11 0x00007ffeee6394dd in ?? () #12 0x0000000000000000 in ?? () (gdb) ``` I can only reproduce so far with -march=x86-64-v3, not with my usual -march=znver2.
(reposting w/ better bt) I can't reproduce it yet in a Dockerfile I'm crafting, but I can on my usual system with -march=x86-64-v3 but not my normal -march=znver2. Backtrace though: ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00000000000041b6 in ?? () (gdb) bt #0 0x00000000000041b6 in ?? () #1 0x00007f861b2fcc75 in crc32_resolve () at /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/src/liblzma/check/crc32_fast.c:140 #2 0x00007f861b3541e4 in elf_machine_rela (map=<optimized out>, scope=<optimized out>, reloc=0x7f861b2e05c8, sym=0x7f861b2ddfd8, version=<optimized out>, reloc_addr_arg=0x7f861b32ab10 <lzma_crc32@got[plt]>, skip_ifunc=<optimized out>) at ../sysdeps/x86_64/dl-machine.h:314 #3 elf_dynamic_do_Rela (map=0x7f861b343160, scope=<optimized out>, reladdr=<optimized out>, relsize=<optimized out>, nrelative=<optimized out>, lazy=<optimized out>, skip_ifunc=<optimized out>) at /var/tmp/portage/sys-libs/glibc-2.39-r1/work/glibc-2.39/elf/do-rel.h:147 #4 _dl_relocate_object (l=l@entry=0x7f861b343160, scope=<optimized out>, reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:301 #5 0x00007f861b363d61 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2311 #6 0x00007f861b36059f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7ffdeae5bd20, dl_main=dl_main@entry=0x7f861b362060 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140 #7 0x00007f861b361da2 in _dl_start_final (arg=0x7ffdeae5bd20) at rtld.c:494 #8 _dl_start (arg=0x7ffdeae5bd20) at rtld.c:581 #9 0x00007f861b360b88 in _start () from /lib64/ld-linux-x86-64.so.2 #10 0x0000000000000006 in ?? () #11 0x00007ffdeae5cfc9 in ?? () #12 0x00007ffdeae5d021 in ?? () #13 0x00007ffdeae5d026 in ?? () #14 0x00007ffdeae5d034 in ?? () #15 0x00007ffdeae5d03a in ?? () #16 0x00007ffdeae5d04b in ?? () #17 0x0000000000000000 in ?? () (gdb) ``` ``` (gdb) frame 1 #1 0x00007f861b2fcc75 in crc32_resolve () at /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/src/liblzma/check/crc32_fast.c:140 140 { (gdb) list 135 // This resolver is shared between all three dispatch methods. It serves as 136 // the ifunc resolver if ifunc is supported, otherwise it is called as a 137 // regular function by the constructor or first call resolution methods. 138 static crc32_func_type 139 crc32_resolve(void) 140 { 141 return is_arch_extension_supported() 142 ? &crc32_arch_optimized : &crc32_generic; 143 } 144 (gdb) ```
Created attachment 885874 [details] Dockerfile (take 1) First cut of a Dockerfile which reproduces it. Not sure if it's to do with CET (there's some gcc bugs open about CET + ifunc) or if it's to do with the order of binutils and gcc/glibc.
Created attachment 885875 [details] Dockerfile (take 2) OK no, I think it's somehow hardened related.
Created attachment 885876 [details] Dockerfile (take 3) Yeah, if I just change: -FROM gentoo/stage3:amd64-hardened-openrc +FROM gentoo/stage3 it starts to pass.
Created attachment 885877 [details] Dockerfile (take 4)
It's BIND_NOW + ifunc.
Hello! Its not exactly an issue with BIND_NOW. BIND_NOW only guarantees that the GNU IFUNC resolver will run before all the needed symbols are resolved. Its possible that this could still seg fault even without BIND_NOW. I was able to reproduce the issue with a simple test library and driver. I hope this can help narrow down the GCC compile issue. I noticed in order to reproduce, it needs a combination of Gentoo hardened and experimental GCC compiler. The experimental version alone or the Gentoo hardened flags with a stable GCC were not enough to cause the seg faults. It appears the code coverage profiling caused the compiler to emit a symbol that was not safe in a GNU IFUNC resolver. I didn't narrow down which exact flags caused this, but it will be some subset of the flags I list :) --- GCC version: Gentoo Hardened 14.0.9999 p, commit e54a7fbca63053b5753fd9ba543c27ef392d3084) 14.0.1 20240224 (experimental) 0394ae31e832c5303f3b4aad9c66710a30c097f0 --- Build commands for the library: Compile: x86_64-pc-linux-gnu-gcc -pthread -fvisibility=hidden -Wall -Wextra -Wvla -Wformat=2 -Winit-self -Wmissing-include-dirs -Wshift-overflow=2 -Wstrict-overflow=3 -Walloc-zero -Wduplicated-cond -Wfloat-equal -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wwrite-strings -Wdate-time -Wsign-conversion -Wfloat-conversion -Wlogical-op -Waggregate-return -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wmissing-variable-declarations -O2 -pipe -Wall -g1 -fno-omit-frame-pointer -march=x86-64-v3 -fcf-protection -flto=auto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -fprofile-update=atomic -fprofile-dir=/var/tmp/test/amd64-pgo -fprofile-generate=/var/tmp/test/amd64-pgo -fprofile-partial-training -c test.c -fPIC -DPIC -o test.o Create the shared library: x86_64-pc-linux-gnu-gcc -o libapp.so test.o -shared -Wl,-z,now -fPIC -lgcov --- Compile the driver, link the library: x86_64-pc-linux-gnu-gcc -o app main.c -lgcov -L. -lapp --- Execute the driver: LD_LIBRARY_PATH=. ./app --- test.c file contents: #include <stdlib.h> __attribute__((visibility("default"))) void *foo_ifunc2() __attribute__((ifunc("foo_resolver"))); __attribute__((visibility("default"))) int bar() { return 10; } static int f3() { return 5; } __attribute__((visibility("default"))) void *foo_resolver() { f3(); return bar; } __attribute__((optimize("O0"))) __attribute__((visibility("default"))) int func() { foo_ifunc2(); return 0; } --- main.c file contents: #include <stdio.h> extern int func(); int main(void) { printf( "Hello world %p\n", func); return 0; } --- I hope this helps!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=97ebdf452e739583cb3f1d5cbcff6bb145811e2a commit 97ebdf452e739583cb3f1d5cbcff6bb145811e2a Author: Sam James <sam@gentoo.org> AuthorDate: 2024-03-04 10:03:49 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-03-04 10:05:37 +0000 app-arch/xz-utils: workaround USE=pgo build failure Workaround a build failure with USE=pgo by disabling instrumentation of the crc{32,64} IFUNC resolvers. No revbump as it shouldn't affect runtime at all - instrumentation would kill it immediately if at all, it's not an issue from the profiled binaries, just the instrumentation to profile them. Bug: https://gcc.gnu.org/PR114115 Closes: https://bugs.gentoo.org/925415 Signed-off-by: Sam James <sam@gentoo.org> .../xz-utils-5.6.0-ifunc-crc-workaround.patch | 27 ++++++++++++++++++++++ app-arch/xz-utils/xz-utils-5.6.0-r1.ebuild | 1 + 2 files changed, 28 insertions(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5b2cdd1c7d1743ea2937248ccc02bca9517a5771 commit 5b2cdd1c7d1743ea2937248ccc02bca9517a5771 Author: Sam James <sam@gentoo.org> AuthorDate: 2024-03-09 21:02:15 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-03-09 21:02:15 +0000 app-arch/xz-utils: add 5.6.1 Bug: https://bugs.gentoo.org/925415 Signed-off-by: Sam James <sam@gentoo.org> app-arch/xz-utils/Manifest | 2 + app-arch/xz-utils/xz-utils-5.6.1.ebuild | 141 ++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+)
With hindsight 20/20 this is actually bug 928134 (CVE-2024-3094).
(In reply to Niklāvs Koļesņikovs from comment #19) > With hindsight 20/20 this is actually bug 928134 (CVE-2024-3094). NO, it isn't. Please don't confuse issues. The PGO issue here was always a real one, hence the upstream GCC bug remains open. The relevant part didn't fire on Gentoo.