Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 955635 - sys-libs/glibc: backtrace(3) call crashes with abort() when built with -fno-strict-aliasing
Summary: sys-libs/glibc: backtrace(3) call crashes with abort() when built with -fno-s...
Status: IN_PROGRESS
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 915000
  Show dependency tree
 
Reported: 2025-05-08 18:24 UTC by Henryk Paluch
Modified: 2025-06-14 18:30 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Output of: emerge --info (emerge--info.txt,5.43 KB, text/plain)
2025-05-08 18:24 UTC, Henryk Paluch
Details
gdb ./backtrace session of "bad" (glibc build with -fno-strict-aliasing) (gdb.log.bad5,25.51 KB, text/plain)
2025-05-13 18:47 UTC, Henryk Paluch
Details
gdb ./backtrace session of "ok" (glibc build with default flags) (gdb.log.ok5,25.51 KB, text/plain)
2025-05-13 18:54 UTC, Henryk Paluch
Details
List of failed tests on "bad" configuration (glibc build with -fno-strict-aliasing) (gentoo-srv2-bad-failed-tests.log,3.15 KB, text/x-log)
2025-05-14 17:28 UTC, Henryk Paluch
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henryk Paluch 2025-05-08 18:24:13 UTC
Created attachment 928164 [details]
Output of: emerge --info

Originally when building app-editors/emacs it failed when running "temacs" binary with Abort error. Only later I found that just glibc call of backtrace(3) is broken ("temacs" call it in main() function just to test it).

Using example code below from https://issues.chromium.org/issues/41145149

// ------------- backtrace.c - start -----------
#define _GNU_SOURCE
#include <execinfo.h>
#include<stdlib.h>
#include<stdio.h>

int main(int argc, char **argv)
{
	void *bt[10];
	int i, cnt;

        cnt = backtrace(bt, 10);
        printf("backtrace() = %i\n", cnt);
        for (i = 0; i < cnt; ++i)
                printf("\t%p\n", bt[i]);
	return EXIT_SUCCESS;
}
// ------------- backtrace.c - end -----------

Building using common:

cc -Wall -ggdb3    backtrace.c   -o backtrace

When just running:

./backtrace 
Aborted

When running session in GDB (I already followed https://wiki.gentoo.org/wiki/Debugging and https://wiki.gentoo.org/wiki/GCC/ICE_Reporting_Guide to get Debug symbols, I get something like:

$ gdb ./backtrace
...
(gdb) run
...
Program received signal SIGABRT, Aborted.
0x00007ffff7e5db0c in ?? () from /usr/lib64/libc.so.6
#0  0x00007ffff7e5db0c in ?? () from /usr/lib64/libc.so.6
#1  0x00007ffff7e07be6 in raise () from /usr/lib64/libc.so.6
#2  0x00007ffff7def8f7 in abort () from /usr/lib64/libc.so.6
#3  0x00007ffff7d9e79d in uw_init_context_1 (context=context@entry=0x7fffffffdbd0, outer_cfa=outer_cfa@entry=0x7fffffffde00, outer_ra=0x7ffff7ee72c1 <backtrace+97>) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1336
#4  0x00007ffff7dbd796 in _Unwind_Backtrace (trace=0x7ffff7ee71c0, trace_argument=0x7fffffffde00) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind.inc:296
#5  0x00007ffff7ee72c1 in backtrace () from /usr/lib64/libc.so.6
#6  0x00005555555551bc in main (argc=1, argv=0x7fffffffdfe8) at backtrace.c:12
(gdb) frame 3
#3  0x00007ffff7d9e79d in uw_init_context_1 (context=context@entry=0x7fffffffdbd0, outer_cfa=outer_cfa@entry=0x7fffffffde00, outer_ra=0x7ffff7ee72c1 <backtrace+97>) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1336
(gdb) list
1336      gcc_assert (code == _URC_NO_REASON);
1331      context->ra = ra;
1332      if (!ASSUME_EXTENDED_UNWIND_CONTEXT)
1333        context->flags = EXTENDED_CONTEXT_BIT;
1334    
1335      code = uw_frame_state_for (context, &fs);
1336      gcc_assert (code == _URC_NO_REASON);
1337    
1338    #if __GTHREADS
1339      {
1340        static __gthread_once_t once_regsizes = __GTHREAD_ONCE_INIT;

I tried to use "list" and "disass" but was unable to get value of "code" variable (print $code says "optimized out" and in case of disass I'm unable to find that gcc_assert - to infer value from registers).

I even tried standard rebuild procedure (sort of https://wiki.gentoo.org/wiki/Changing_the_CHOST_variable even when I did not change CHOST):

emerge -av1  sys-devel/binutils
emerge -a1v sys-devel/gcc
emerge -a1v sys-libs/glibc
emerge -av dev-debug/gdb

But with same results. What is even more puzzling - similar Gentoo installation (same glibc same gcc) works fine. Above
command in that case produces:

./backtrace 
backtrace() = 4
	0x560e3baf81bc
	0x7f76948ae16e
	0x7f76948ae229
	0x560e3baf80c5

Also emacs builds fine there.
Comment 1 Henryk Paluch 2025-05-09 07:49:47 UTC
Found exact cause - it was my custom "-fno-strict-aliasing" in COMMON_FLAGS.

When removed from COMMON_FLAGS in /etc/portage/make.conf and rebuilding

emerge -a1v sys-libs/glibc

Fixes crash in backtrace(3)
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-05-09 08:26:19 UTC
(In reply to Henryk Paluch from comment #1)
> Found exact cause - it was my custom "-fno-strict-aliasing" in COMMON_FLAGS.

I think this is still curious as -fno-strict-aliasing shouldn't cause that (the opposite could if a program normally builds with -fno-strict-aliasing and your -fstrict-aliasing overrode that).

I'd read your bug in bed earlier but hadn't yet had a chance to ask for some bits, but suspected it might be this.
Comment 3 Henryk Paluch 2025-05-09 15:21:27 UTC
I fully agree that such behavior of -fno-strict-aliasing is unexpected and counterintuitive, but I verified it on two Gentoo installations.

Once I just rebuild sys-libs/glibc without -fno-strict-aliasing, everything works properly:

* backtrace.c example works 
* emacs build works (temacs no longer aborts when run on build stage)
* and even binary /usr/src/linux/scripts/sorttable works - task 'SORTTAB vmlinux'
  at the end of kernel build (it also aborted before. I forget about it, but it turned to have same cause :-)

Once I just rebuild sys-libs/glibc with -fno-strict-aliasing, all regressions are back.

Yes, in sane world the opposite switch -fstrict-aliasing should break things, but real world is somehow playful and full of surprises.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-05-09 15:23:56 UTC
(In reply to Henryk Paluch from comment #3)
> I fully agree that such behavior of -fno-strict-aliasing is unexpected and
> counterintuitive, but I verified it on two Gentoo installations.
> 

What I'm saying is that it's a bug that merits investigation, not that you're wrong (hence reopening it).
Comment 5 Henryk Paluch 2025-05-09 15:31:37 UTC
Yes, I understood your comment that way (that -fno-strict-alias should not cause regressions - so there is something wrong on glibc or gcc side).

I forgot to mention why I used -fno-strict-aliasing at all. It was inspired
by following comment from https://man.openbsd.org/gcc-local.1

> The -O2 option does not include -fstrict-aliasing,
> as this option causes issues on some legacy code.
> -fstrict-aliasing is very unsafe with code that
> plays tricks with casts, bypassing the already weak type system of C.

And because I use default -O2 in COMMON_FLAGS I though that it would be right idea to add  -fno-strict-aliasing, but ... now we know what happened.
Comment 6 Henryk Paluch 2025-05-13 18:47:54 UTC
Created attachment 928763 [details]
gdb ./backtrace session of "bad" (glibc build with -fno-strict-aliasing)

Details will follow on next attachment...
Comment 7 Henryk Paluch 2025-05-13 18:54:28 UTC
Created attachment 928764 [details]
gdb ./backtrace session of "ok" (glibc build with default flags)

I tried to compare 2 different gdb sessions (one on Gentoo with "ok" glibc - working backtrace(3) and another with "bad" glibc - build with -fno-strict-aliasing).

Did this:
break  main
run
break break find_fde_tail # answer y
cont

The difference is here:
- find_fde_tail returns NULL on "bad" which later leads to abort
- find_fde_tail returns non-NULL data - program finishes without error

Here is relevant gdb part for "ok":

Breakpoint 2, find_fde_tail (dbase=0, pc=140737351776149, hdr=0x7ffff7dc4b88, bases=0x7fffffffdd48) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2-fde-dip.c:396
396	  if (hdr->version != 1)
+where
#0  find_fde_tail (dbase=0, pc=140737351776149, hdr=0x7ffff7dc4b88, bases=0x7fffffffdd48) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2-fde-dip.c:396
#1  _Unwind_Find_FDE (pc=<optimized out>, bases=bases@entry=0x7fffffffdd48) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2-fde-dip.c:552
#2  0x00007ffff7dbd50a in uw_frame_state_for (context=context@entry=0x7fffffffdca0, fs=fs@entry=0x7fffffffdb60) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1005
#3  0x00007ffff7dbe860 in uw_init_context_1 (context=context@entry=0x7fffffffdca0, outer_cfa=outer_cfa@entry=0x7fffffffded0, outer_ra=0x7ffff7ee8201 <backtrace+97>) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1335
#4  0x00007ffff7dbf796 in _Unwind_Backtrace (trace=0x7ffff7ee8100, trace_argument=0x7fffffffded0) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind.inc:296
#5  0x00007ffff7ee8201 in backtrace () from /usr/lib64/libc.so.6
#6  0x00005555555551bc in main (argc=1, argv=0x7fffffffe0b8) at backtrace.c:12
+info frame
Stack level 0, frame at 0x7fffffffdb00:
 rip = 0x7ffff7dc1e89 in find_fde_tail (/usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2-fde-dip.c:396); saved rip = 0x7ffff7dbd50a
 inlined into frame 1
 source language c.
 Arglist at unknown address.
 Locals at unknown address, Previous frame's sp in rsp
+next
399	  if (__builtin_expect (hdr->eh_frame_ptr_enc == (DW_EH_PE_sdata4
+next
406	      eh_frame = (_Unwind_Ptr) (p + value);
+next
407	      p += sizeof (value);
+next
418	  if (hdr->fde_count_enc != DW_EH_PE_omit
+next
_Unwind_Find_FDE (pc=<optimized out>, bases=bases@entry=0x7fffffffdd48) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2-fde-dip.c:552
552	      return find_fde_tail ((_Unwind_Ptr) pc, dlfo.dlfo_eh_frame,
+print hdr
No symbol "hdr" in current context.
+info locals
dlfo = {dlfo_flags = 0, dlfo_map_start = 0x7ffff7d9c000, dlfo_map_end = 0x7ffff7dc91e8, dlfo_link_map = 0x5555555592e0, dlfo_eh_frame = 0x7ffff7dc4b88, __dflo_reserved = {140737351639552, 93824992252640, 140737488345896, 140737488345892, 140737353949664, 140737488346096, 140737353563753}}
data = <optimized out>
ret = <optimized out>
dbase = <optimized out>
+next
530	      bases->func = (void *) func;
+info locals
data = <optimized out>
ret = <optimized out>
dbase = <optimized out>
+next
uw_frame_state_for (context=context@entry=0x7fffffffdca0, fs=fs@entry=0x7fffffffdb60) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1007
1007	  if (fde == NULL)
+print fde
$1 = (const struct dwarf_fde *) 0x7ffff7dc7430
+where
#0  uw_frame_state_for (context=context@entry=0x7fffffffdca0, fs=fs@entry=0x7fffffffdb60) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1007
#1  0x00007ffff7dbe860 in uw_init_context_1 (context=context@entry=0x7fffffffdca0, outer_cfa=outer_cfa@entry=0x7fffffffded0, outer_ra=0x7ffff7ee8201 <backtrace+97>) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind-dw2.c:1335
#2  0x00007ffff7dbf796 in _Unwind_Backtrace (trace=0x7ffff7ee8100, trace_argument=0x7fffffffded0) at /usr/src/debug/sys-devel/gcc-14.2.1_p20241221/gcc-14-20241221/libgcc/unwind.inc:296
#3  0x00007ffff7ee8201 in backtrace () from /usr/lib64/libc.so.6
#4  0x00005555555551bc in main (argc=1, argv=0x7fffffffe0b8) at backtrace.c:12
+list
1002	  if (context->ra == 0)
1003	    return _URC_END_OF_STACK;
1004	
1005	  fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
1006				  &context->bases);
1007	  if (fde == NULL)
1008	    {
1009	#ifdef MD_FALLBACK_FRAME_STATE_FOR
1010	      /* Couldn't find frame unwind info for this function.  Try a
1011		 target-specific fallback mechanism.  This will necessarily


On "bad" machine the "fde" is NULL at line 1007 - which wil execute some kind of fallback (lines 1008 and more) and ends up with abort.

Unfortunately I don't yet understand why find_fde_tail() returns NULL in "bad" case...
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-05-14 03:15:22 UTC
(Sorry, not had a chance to poke more yet, and probably won't for a week or two.)

Mind checking if FEATURES=test emerge -v1 glibc (i.e. the glibc testsuite) passes with -fno-strict-aliasing too?
Comment 9 Henryk Paluch 2025-05-14 17:28:38 UTC
Created attachment 928821 [details]
List of failed tests on "bad" configuration (glibc build with -fno-strict-aliasing)

Test Results are consistent with expectations:
- "bad" glibc build fails with error due 123 FAIL tests (list uploaded in this attachment)
- "ok" glibc build finishes successfully (not counting XFAIL - but they are same)

Failed tests include all backtrace tests:

$ fgrep backtrace gentoo-srv2-bad-failed-tests.log 

FAIL: debug/backtrace-tst
FAIL: debug/tst-backtrace2
FAIL: debug/tst-backtrace3
FAIL: debug/tst-backtrace4
FAIL: debug/tst-backtrace5
FAIL: debug/tst-backtrace6
FAIL: nptl/tst-backtrace1

Here is quick diff comparing summaries ("ok" vs "bad"):
diff -w  gentoo-srv2-failed-tests-summary.log gentoo-srv2-bad-failed-tests-summary.log
8c8,9
<    5313 PASS
---
>     123 FAIL
>    5190 PASS
12c13
< make[1]: Leaving directory '/var/tmp/portage/sys-libs/glibc-2.40-r8/work/glibc-2.40'
---
> make[1]: *** [Makefile:672: tests] Error 1


And here full summaries:

$ cat gentoo-srv2-failed-tests-summary.log # "ok" build

		=== Summary of results ===
   4949 PASS
     44 UNSUPPORTED
     17 XFAIL
     10 XPASS
make[1]: Leaving directory '/var/tmp/portage/sys-libs/glibc-2.40-r8/work/glibc-2.40'
		=== Summary of results ===
   5313 PASS
     99 UNSUPPORTED
     17 XFAIL
      8 XPASS
make[1]: Leaving directory '/var/tmp/portage/sys-libs/glibc-2.40-r8/work/glibc-2.40'

$ cat gentoo-srv2-bad-failed-tests-summary.log # "bad" build

		=== Summary of results ===
   4949 PASS
     44 UNSUPPORTED
     17 XFAIL
     10 XPASS
make[1]: Leaving directory '/var/tmp/portage/sys-libs/glibc-2.40-r8/work/glibc-2.40'
		=== Summary of results ===
    123 FAIL
   5190 PASS
     99 UNSUPPORTED
     17 XFAIL
      8 XPASS
make[1]: *** [Makefile:672: tests] Error 1