Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 925415 - =app-arch/xz-utils-5.6.0[pgo]: segfaults when running tests
Summary: =app-arch/xz-utils-5.6.0[pgo]: segfaults when running tests
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-24 11:18 UTC by Johannes Penßel
Modified: 2024-05-24 22:24 UTC (History)
10 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
xz-utils-5.6.0[pgo] log (app-arch:xz-utils-5.6.0:20240224-110451.log,544.95 KB, text/x-log)
2024-02-24 11:18 UTC, Johannes Penßel
Details
emerge --info (file_925415.txt,9.35 KB, text/plain)
2024-02-24 11:20 UTC, Johannes Penßel
Details
app-arch/xz-utils[pgo] FEATURES=test build log (app-arch:xz-utils-5.6.0:20240224-112353.log,544.95 KB, text/x-log)
2024-02-24 11:31 UTC, Johannes Penßel
Details
xz-utils-5.6.0[-pgo] FEATURES=test build log (successful) (app-arch:xz-utils-5.6.0:20240224-113244.log,532.61 KB, text/x-log)
2024-02-24 11:34 UTC, Johannes Penßel
Details
Dockerfile (take 1) (file_925415.txt,2.06 KB, text/plain)
2024-02-24 15:00 UTC, Sam James
Details
Dockerfile (take 2) (file_925415.txt,1.35 KB, text/plain)
2024-02-24 15:13 UTC, Sam James
Details
Dockerfile (take 3) (file_925415.txt,1.20 KB, text/plain)
2024-02-24 15:31 UTC, Sam James
Details
Dockerfile (take 4) (file_925415.txt,1.17 KB, text/plain)
2024-02-24 15:32 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Johannes Penßel 2024-02-24 11:18:40 UTC
Created attachment 885848 [details]
xz-utils-5.6.0[pgo] log

I'm not sure if this is part of the PGO profiling process, but after the initial build, app-arch/xz-utils-5.6.0[pgo] runs its test suite. However, it fails all tests and the build gets cancelled. USE=-pgo builds fine.

[ebuild   R   ] app-arch/xz-utils-5.6.0  USE="extra-filters nls pgo* static-libs verify-sig -doc"
Comment 1 Johannes Penßel 2024-02-24 11:20:10 UTC
Created attachment 885849 [details]
emerge --info
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 11:22:55 UTC
It is supposed to run the test suite because PGO relies on running something to obtain training data.

However, we definitely have a problem in that it's segfaulting...
```
make[2]: Leaving directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests'
make  check-TESTS
make[2]: Entering directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests'
make[3]: Entering directory '/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0-abi_x86_64.amd64/tests'
/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112:  5787 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: test_index
/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112:  5776 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
FAIL: test_hardware
/var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/build-aux/test-driver: line 112:  5789 Segmentation fault      (core dumped) "$@" >> "$log_file" 2>&1
```

Could you try with USE=-pgo, and FEATURES=test?

USE="-pgo" FEATURES=test emerge -v1 app-arch/xz-utils

Then if it fails, show me tests/test-suite.log, and we will have to probe to get a backtrace.
Comment 3 Johannes Penßel 2024-02-24 11:31:25 UTC
Created attachment 885850 [details]
app-arch/xz-utils[pgo] FEATURES=test build log

FEATURES=test does not work for me, I'm afraid.

Relevant dmesg output:
[ 6874.934145] test_hardware[46609]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd555380c8 error 14 in test_hardware[564b2ae1e000+2000] likely on CPU 8 (core 16, socket 0)
[ 6874.934169] Code: Unable to access opcode bytes at 0x4186.
[ 6874.937168] test_check[46605]: segfault at 41b0 ip 00000000000041b0 sp 00007ffff3b404a8 error 14 in test_check[5580a5e49000+2000] likely on CPU 11 (core 19, socket 0)
[ 6874.937188] Code: Unable to access opcode bytes at 0x4186.
[ 6874.939060] test_block_head[46621]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd0956c0a8 error 14 in test_block_header[55e7e5145000+2000] likely on CPU 11 (core 19, socket 0)
[ 6874.939077] Code: Unable to access opcode bytes at 0x4186.
[ 6874.940545] test_vli[46633]: segfault at 41b0 ip 00000000000041b0 sp 00007fff06908448 error 14 in test_vli[55baaeb94000+2000] likely on CPU 4 (core 8, socket 0)
[ 6874.940560] Code: Unable to access opcode bytes at 0x4186.
[ 6874.941136] test_stream_fla[46612]: segfault at 41b0 ip 00000000000041b0 sp 00007ffde0643368 error 14 in test_stream_flags[55e8502b9000+2000] likely on CPU 7 (core 12, socket 0)
[ 6874.941149] Code: Unable to access opcode bytes at 0x4186.
[ 6874.941155] test_index_hash[46626]: segfault at 41b0 ip 00000000000041b0 sp 00007fff64eaa068 error 14 in test_index_hash[55c4cef05000+2000] likely on CPU 10 (core 18, socket 0)
[ 6874.941172] Code: Unable to access opcode bytes at 0x4186.
[ 6874.941419] test_filter_fla[46618]: segfault at 41b0 ip 00000000000041b0 sp 00007ffd3f1c6ba8 error 14 in test_filter_flags[55f9d60b9000+2000] likely on CPU 2 (core 4, socket 0)
[ 6874.941436] Code: Unable to access opcode bytes at 0x4186.
[ 6874.941621] test_index[46622]: segfault at 41b0 ip 00000000000041b0 sp 00007fff9ce4f648 error 14 in test_index[5586063d4000+2000] likely on CPU 1 (core 0, socket 0)
[ 6874.941638] Code: Unable to access opcode bytes at 0x4186.
[ 6874.942096] test_filter_str[46616]: segfault at 41b0 ip 00000000000041b0 sp 00007fffec36cf88 error 14 in test_filter_str[5556c5024000+2000] likely on CPU 15 (core 23, socket 0)
[ 6874.942113] Code: Unable to access opcode bytes at 0x4186.
[ 6874.942572] test_memlimit[46632]: segfault at 41b0 ip 00000000000041b0 sp 00007fff8c01f368 error 14 in test_memlimit[5636f4800000+2000] likely on CPU 3 (core 4, socket 0)
[ 6874.942589] Code: Unable to access opcode bytes at 0x4186.
Comment 4 Johannes Penßel 2024-02-24 11:34:51 UTC
Created attachment 885851 [details]
xz-utils-5.6.0[-pgo] FEATURES=test build log (successful)

nvm, I did not read your comment properly. Sorry for that. Here is the log for USE=-pgo FEATURES=test.
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 11:42:04 UTC
Ugh, that's suspicious. That implies a possible miscompilation (so maybe a compiler bug, let's see) which requires profile data to trigger.

Could I ask you to make it fail again with USE=pgo, and then upload the tarballed workdir?
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 11:57:28 UTC
Reproduced and speaking to Jia Tan. My usual *FLAGS work, but yours trigger it for me.
Comment 7 Johannes Penßel 2024-02-24 12:05:25 UTC
Great, thanks! There's no need for me to upload my workdir, then? Because that saves me the headache of making that 12MB tarball fit into the 1000KB attachment file size limit.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 12:21:11 UTC
(In reply to Johannes Penßel from comment #7)
> Great, thanks! There's no need for me to upload my workdir, then? Because
> that saves me the headache of making that 12MB tarball fit into the 1000KB
> attachment file size limit.

You're off the hook ;)
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 13:33:52 UTC Comment hidden (obsolete)
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 13:37:12 UTC
(reposting w/ better bt)

I can't reproduce it yet in a Dockerfile I'm crafting, but I can on my usual system with -march=x86-64-v3 but not my normal -march=znver2.

Backtrace though:
```
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000000041b6 in ?? ()
(gdb) bt
#0  0x00000000000041b6 in ?? ()
#1  0x00007f861b2fcc75 in crc32_resolve () at /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/src/liblzma/check/crc32_fast.c:140
#2  0x00007f861b3541e4 in elf_machine_rela (map=<optimized out>, scope=<optimized out>, reloc=0x7f861b2e05c8, sym=0x7f861b2ddfd8, version=<optimized out>,
    reloc_addr_arg=0x7f861b32ab10 <lzma_crc32@got[plt]>, skip_ifunc=<optimized out>) at ../sysdeps/x86_64/dl-machine.h:314
#3  elf_dynamic_do_Rela (map=0x7f861b343160, scope=<optimized out>, reladdr=<optimized out>, relsize=<optimized out>, nrelative=<optimized out>, lazy=<optimized out>,
    skip_ifunc=<optimized out>) at /var/tmp/portage/sys-libs/glibc-2.39-r1/work/glibc-2.39/elf/do-rel.h:147
#4  _dl_relocate_object (l=l@entry=0x7f861b343160, scope=<optimized out>, reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0) at dl-reloc.c:301
#5  0x00007f861b363d61 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at rtld.c:2311
#6  0x00007f861b36059f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7ffdeae5bd20, dl_main=dl_main@entry=0x7f861b362060 <dl_main>)
    at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140
#7  0x00007f861b361da2 in _dl_start_final (arg=0x7ffdeae5bd20) at rtld.c:494
#8  _dl_start (arg=0x7ffdeae5bd20) at rtld.c:581
#9  0x00007f861b360b88 in _start () from /lib64/ld-linux-x86-64.so.2
#10 0x0000000000000006 in ?? ()
#11 0x00007ffdeae5cfc9 in ?? ()
#12 0x00007ffdeae5d021 in ?? ()
#13 0x00007ffdeae5d026 in ?? ()
#14 0x00007ffdeae5d034 in ?? ()
#15 0x00007ffdeae5d03a in ?? ()
#16 0x00007ffdeae5d04b in ?? ()
#17 0x0000000000000000 in ?? ()
(gdb)
```

```
(gdb) frame 1
#1  0x00007f861b2fcc75 in crc32_resolve () at /var/tmp/portage/app-arch/xz-utils-5.6.0/work/xz-5.6.0/src/liblzma/check/crc32_fast.c:140
140     {
(gdb) list
135     // This resolver is shared between all three dispatch methods. It serves as
136     // the ifunc resolver if ifunc is supported, otherwise it is called as a
137     // regular function by the constructor or first call resolution methods.
138     static crc32_func_type
139     crc32_resolve(void)
140     {
141             return is_arch_extension_supported()
142                             ? &crc32_arch_optimized : &crc32_generic;
143     }
144
(gdb)
```
Comment 11 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 15:00:38 UTC
Created attachment 885874 [details]
Dockerfile (take 1)

First cut of a Dockerfile which reproduces it. Not sure if it's to do with CET (there's some gcc bugs open about CET + ifunc) or if it's to do with the order of binutils and gcc/glibc.
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 15:13:31 UTC
Created attachment 885875 [details]
Dockerfile (take 2)

OK no, I think it's somehow hardened related.
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 15:31:52 UTC
Created attachment 885876 [details]
Dockerfile (take 3)

Yeah, if I just change:
-FROM gentoo/stage3:amd64-hardened-openrc
+FROM gentoo/stage3
it starts to pass.
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 15:32:39 UTC
Created attachment 885877 [details]
Dockerfile (take 4)
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-24 16:13:04 UTC
It's BIND_NOW + ifunc.
Comment 16 Jia Tan 2024-02-24 21:19:48 UTC
Hello!

Its not exactly an issue with BIND_NOW. BIND_NOW only guarantees that the GNU IFUNC resolver will run before all the needed symbols are resolved. Its possible that this could still seg fault even without BIND_NOW.

I was able to reproduce the issue with a simple test library and driver. I hope this can help narrow down the GCC compile issue.

I noticed in order to reproduce, it needs a combination of Gentoo hardened and experimental GCC compiler. The experimental version alone or the Gentoo hardened flags with a stable GCC were not enough to cause the seg faults.

It appears the code coverage profiling caused the compiler to emit a symbol that was not safe in a GNU IFUNC resolver. I didn't narrow down which exact flags caused this, but it will be some subset of the flags I list :)

---

GCC version: Gentoo Hardened 14.0.9999 p, commit e54a7fbca63053b5753fd9ba543c27ef392d3084) 14.0.1 20240224 (experimental) 0394ae31e832c5303f3b4aad9c66710a30c097f0

---

Build commands for the library:

Compile:

x86_64-pc-linux-gnu-gcc -pthread -fvisibility=hidden -Wall -Wextra -Wvla -Wformat=2 -Winit-self -Wmissing-include-dirs -Wshift-overflow=2 -Wstrict-overflow=3 -Walloc-zero -Wduplicated-cond -Wfloat-equal -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wwrite-strings -Wdate-time -Wsign-conversion -Wfloat-conversion -Wlogical-op -Waggregate-return -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wmissing-variable-declarations -O2 -pipe -Wall -g1 -fno-omit-frame-pointer -march=x86-64-v3 -fcf-protection -flto=auto -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing -fprofile-update=atomic -fprofile-dir=/var/tmp/test/amd64-pgo -fprofile-generate=/var/tmp/test/amd64-pgo -fprofile-partial-training -c test.c -fPIC -DPIC -o test.o

Create the shared library:

x86_64-pc-linux-gnu-gcc -o libapp.so test.o -shared -Wl,-z,now -fPIC -lgcov

---

Compile the driver, link the library:

x86_64-pc-linux-gnu-gcc -o app main.c -lgcov -L. -lapp

---

Execute the driver:

LD_LIBRARY_PATH=. ./app


---

test.c file contents:

#include <stdlib.h>


__attribute__((visibility("default")))
void *foo_ifunc2() __attribute__((ifunc("foo_resolver")));


__attribute__((visibility("default")))
int bar()
{
    return 10;
}


static int f3()
{
    return 5;
}


__attribute__((visibility("default")))
void *foo_resolver()
{
    f3();
    return bar;
}


__attribute__((optimize("O0")))
__attribute__((visibility("default")))
int func()
{
    foo_ifunc2();
    return 0;
}

---

main.c file contents:

#include <stdio.h>


extern int func();

int main(void)
{
    printf( "Hello world %p\n", func);

    return 0;
}

---

I hope this helps!
Comment 17 Larry the Git Cow gentoo-dev 2024-03-04 10:05:47 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=97ebdf452e739583cb3f1d5cbcff6bb145811e2a

commit 97ebdf452e739583cb3f1d5cbcff6bb145811e2a
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-03-04 10:03:49 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-03-04 10:05:37 +0000

    app-arch/xz-utils: workaround USE=pgo build failure
    
    Workaround a build failure with USE=pgo by disabling instrumentation of the
    crc{32,64} IFUNC resolvers.
    
    No revbump as it shouldn't affect runtime at all - instrumentation would kill
    it immediately if at all, it's not an issue from the profiled binaries, just
    the instrumentation to profile them.
    
    Bug: https://gcc.gnu.org/PR114115
    Closes: https://bugs.gentoo.org/925415
    Signed-off-by: Sam James <sam@gentoo.org>

 .../xz-utils-5.6.0-ifunc-crc-workaround.patch      | 27 ++++++++++++++++++++++
 app-arch/xz-utils/xz-utils-5.6.0-r1.ebuild         |  1 +
 2 files changed, 28 insertions(+)
Comment 18 Larry the Git Cow gentoo-dev 2024-03-09 21:05:43 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5b2cdd1c7d1743ea2937248ccc02bca9517a5771

commit 5b2cdd1c7d1743ea2937248ccc02bca9517a5771
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-03-09 21:02:15 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-03-09 21:02:15 +0000

    app-arch/xz-utils: add 5.6.1
    
    Bug: https://bugs.gentoo.org/925415
    Signed-off-by: Sam James <sam@gentoo.org>

 app-arch/xz-utils/Manifest              |   2 +
 app-arch/xz-utils/xz-utils-5.6.1.ebuild | 141 ++++++++++++++++++++++++++++++++
 2 files changed, 143 insertions(+)
Comment 19 Niklāvs Koļesņikovs 2024-03-30 00:40:20 UTC Comment hidden (spam)
Comment 20 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-30 00:46:16 UTC
(In reply to Niklāvs Koļesņikovs from comment #19)
> With hindsight 20/20 this is actually bug 928134 (CVE-2024-3094).

NO, it isn't. Please don't confuse issues. The PGO issue here was always a real one, hence the upstream GCC bug remains open. The relevant part didn't fire on Gentoo.