Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 942573 - >=sys-devel/gcc:14 miscompiles www-client/firefox[-clang], mail-client/thunderbird[-clang] (crash on startup when compiled in MakeDay in jsdate.cpp:429)
Summary: >=sys-devel/gcc:14 miscompiles www-client/firefox[-clang], mail-client/thunde...
Status: IN_PROGRESS
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on: 947021
Blocks: gcc-14 915000
  Show dependency tree
 
Reported: 2024-10-31 04:43 UTC by jospezial
Modified: 2024-12-27 13:12 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
backtrace (firefox_gcc_crash.txt,44.59 KB, text/plain)
2024-10-31 04:43 UTC, jospezial
Details
emerge --info (emerge--info.txt,13.22 KB, text/plain)
2024-10-31 04:44 UTC, jospezial
Details
valgrind on gcc firefox (firefox_valgrind_gcc.txt,23.53 KB, text/plain)
2024-10-31 17:51 UTC, jospezial
Details
thunderbird-132.0_gcc_build.log.bz2 part2 (thunderbird-132.0_gcc_build.log.bz2ab,519.81 KB, application/octet-stream)
2024-11-02 14:46 UTC, jospezial
Details
thunderbird-132.0_gcc_build.log.bz2 part1 (thunderbird-132.0_gcc_build.log.bz2aa,900.00 KB, application/x-bzip)
2024-11-02 14:49 UTC, jospezial
Details
firefox_valgrind_gcc libs_march_x86-64 (firefox_valgrind_gcc__libs_march_x86-64.txt,388.67 KB, text/plain)
2024-11-03 20:42 UTC, jospezial
Details
gdb-info-registers-all.txt (gdb.txt,9.45 KB, text/plain)
2024-12-04 19:51 UTC, Sam James
Details
instrumentation for missed emms hunting (x87-mmx-mixing.tgz,609 bytes, application/gzip)
2024-12-05 08:20 UTC, Alexander Monakov
Details
backtrace-on-sigill.txt (file_942573.txt,31.32 KB, text/plain)
2024-12-05 19:47 UTC, Sam James
Details
disas-of-hb_font_set_scale.txt (file_942573.txt,7.48 KB, text/plain)
2024-12-05 19:53 UTC, Sam James
Details
Unified_cpp_gfx_harfbuzz_src0.ii.xz (Unified_cpp_gfx_harfbuzz_src0.ii.xz,556.01 KB, application/x-xz)
2024-12-05 20:15 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jospezial 2024-10-31 04:43:53 UTC
Created attachment 907408 [details]
backtrace

Works when compiled with clang.

Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
0x00007ffff12e028f in MakeDay (year=<optimized out>, month=<optimized out>, date=<optimized out>) at /usr/src/debug/www-client/firefox-132.0/firefox-132.0/js/src/jsdate.cpp:429
429       double monthday = DayFromMonth(mn, leap);
Comment 1 jospezial 2024-10-31 04:44:27 UTC
Created attachment 907409 [details]
emerge --info
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 04:49:34 UTC
This will be fun :)

* Does Valgrind work on your CPU? I think it should but there's a few instructions on some old AMD CPUs which it can't handle. If you can, please try launching Firefox under Valgrind and show me the output.

* Can you tell me what -march=native expands to for you? resolve-march-native can give the value

* Does it happen with a fresh profile?

* Is it literally as soon as FF starts up?
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 04:52:08 UTC
My hunch is it'll be specific to some tuning which only happens on your older AMD, so when I have your values, I'll try repro.
Comment 4 jospezial 2024-10-31 07:26:35 UTC
(In reply to Sam James from comment #2)
> This will be fun :)
> 
> * Does Valgrind work on your CPU? I think it should but there's a few
> instructions on some old AMD CPUs which it can't handle. If you can, please
> try launching Firefox under Valgrind and show me the output.
I used it on the clang compiled Firefox, there is a lot of output but I am not sure if that is usable .
> 
> * Can you tell me what -march=native expands to for you?
> resolve-march-native can give the value

resolve-march-native
-march=amdfam10 --param=l1-cache-line-size=64 --param=l1-cache-size=64 --param=l2-cache-size=1024

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 16
model           : 6
model name      : AMD Athlon(tm) II X2 255 Processor
stepping        : 3
microcode       : 0x10000c8
cpu MHz         : 3100.000
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate vmmcall npt lbrv svm_lock nrip_save
bugs            : tlb_mmatch fxsave_leak sysret_ss_attrs null_seg amd_e400 spectre_v1 spectre_v2
bogomips        : 6228.14
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate


> 
> * Does it happen with a fresh profile?
Need to test it when I compile it again with gcc.
Unfortunately I can't use firefox-bin as fallback for a good working browser on my system. Bug 940767
> 
> * Is it literally as soon as FF starts up?
Yes. No window seen.
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 12:43:16 UTC
(In reply to jospezial from comment #4)
> (In reply to Sam James from comment #2)
> > This will be fun :)
> > 
> > * Does Valgrind work on your CPU? I think it should but there's a few
> > instructions on some old AMD CPUs which it can't handle. If you can, please
> > try launching Firefox under Valgrind and show me the output.
> I used it on the clang compiled Firefox, there is a lot of output but I am
> not sure if that is usable .

Please attach it compressed, although I will need it for the GCC build too unfortunately.

> > 
> > * Does it happen with a fresh profile?
> Need to test it when I compile it again with gcc.
> Unfortunately I can't use firefox-bin as fallback for a good working browser
> on my system. Bug 940767

Yeah, I wish I had some ideas for that too. Although, speaking of, I actually wonder if it works for you with a new Linux user, and/or also a clean FF profile. I find that bug really odd.
Comment 6 Holger Hoffstätte 2024-10-31 16:45:18 UTC
Shot in the dark: can you try disabling the JS JIT by setting:

javascript.options.baselinejit = false
javascript.options.ion = false

just to see whether it's the JIT or JS itself.
Comment 7 jospezial 2024-10-31 17:51:17 UTC
Created attachment 907496 [details]
valgrind on gcc firefox

I hope this helps you. I also unmasked and enabled valgrind USEflag on firefox.
(Bug 906509)
I don't know if that is good. The output looks nearly same as with clang firefox.

The crash also happens in new user profile.
Comment 8 jospezial 2024-10-31 18:31:19 UTC
(In reply to Holger Hoffstätte from comment #6)
> Shot in the dark: can you try disabling the JS JIT by setting:
> 
> javascript.options.baselinejit = false
> javascript.options.ion = false
> 
> just to see whether it's the JIT or JS itself.

That does not change the crash.
I did this setting from fedora where I share my home folder with gentoo.
Comment 9 jospezial 2024-10-31 19:23:54 UTC
If I look at
https://crash-stats.mozilla.org/signature/?signature=MakeDay&date=%3E%3D2024-04-30T18%3A38%3A00.000Z&date=%3C2024-10-31T18%3A38%3A00.000Z&_sort=-date

I mostly see there Windows as OS. Maybe because of http://support.microsoft.com/kb/982107 ?

And the problem seems to be very old.
https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL+MakeDay

https://bugzilla.mozilla.org/show_bug.cgi?id=635617
https://bugzilla.mozilla.org/show_bug.cgi?id=732897
13 year old bugs but still signature gets crash report hits.

Btw, what for stands that date 2022-02-08 in backtrace and why is it processed?
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 19:57:05 UTC
Building using gcc trunk with -march=k8 -mtune=k8 fails on znver4 because of pi2fd not being available.

Building using gcc trunk with -mtune=k8 works fine.
Comment 11 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 19:57:19 UTC
Could you try to get me the build.log from Firefox?
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 20:30:30 UTC
(In reply to jospezial from comment #7)
> Created attachment 907496 [details]
> valgrind on gcc firefox
> 
> I hope this helps you. I also unmasked and enabled valgrind USEflag on
> firefox.
> (Bug 906509)
> I don't know if that is good. The output looks nearly same as with clang
> firefox.
> 
> The crash also happens in new user profile.

Thank you -- unfortunately, it does not appear helpful (not your fault) because Valgrind is dying on that issue I mentioned with some AMD CPUs (it cannot recognise a somewhat-rare instruction) :(
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-31 20:32:21 UTC
(In reply to Sam James from comment #10)
> Building using gcc trunk with -march=k8 -mtune=k8 fails on znver4 because of
> pi2fd not being available.
> 
> Building using gcc trunk with -mtune=k8 works fine.

k8 != k10, retrying...
Comment 14 jospezial 2024-11-02 12:26:46 UTC
Thunderbird behaves the same.
I used thunderbird-128.4.0.ebuild and modified it for 132.0

-FIREFOX_PATCHSET="firefox-128esr-patches-04.tar.xz"
+FIREFOX_PATCHSET="firefox-132-patches-01.tar.xz"
-MOZ_ESR=yes
+MOZ_ESR=
-               --disable-gpsd \

The build log for jsdate.cpp has many warnings and notes.

I'm trying to upload the build.log. The uncompressed size is 40MB and with bzip2 --best it is 1,4MB. Maximum allowed is 1MB

Sam, I have sent it you per e-mail.
Comment 15 Joonas Niilola gentoo-dev 2024-11-02 12:53:47 UTC
Try xz -9
Comment 16 jospezial 2024-11-02 14:46:45 UTC
Created attachment 907681 [details]
thunderbird-132.0_gcc_build.log.bz2 part2

xz -9 did only save a few kb. xz -9 -e compressed to 1.1MB .

Now I use split on the bz2 file.
Comment 17 jospezial 2024-11-02 14:49:27 UTC
Created attachment 907682 [details]
thunderbird-132.0_gcc_build.log.bz2 part1

part1
Comment 18 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-02 17:53:22 UTC
Thanks, thunderbird might be a nicer case. Bit smaller and less complex. Will check log when back at pc.
Comment 19 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-03 00:07:38 UTC
Minor status update:
* Asked around for more hardware I can hopefully reproduce on
* Trying a chroot w/ qemu-user using `/usr/bin/qemu-x86_64 -cpu Opteron_G2-v1 /bin/bash` with firefox+xvfb-run
Comment 20 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-03 00:16:43 UTC
jospezial, while I work on those leads, can you try something a bit tedious for me? :(

I am really hoping we can get Valgrind output for that crash. The problem is, Valgrind can't decode certain old AMD instructions in libraries that Firefox uses.

In your output, there was:
```
vex amd64->IR: unhandled instruction bytes: 0xF 0xF 0x43 0x28 0xD 0x48 0x89 0x4 0x24 0x48
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==18537== valgrind: Unrecognised instruction at address 0x6998e1c.
==18537==    at 0x6998E1C: ??? (in /usr/lib64/libharfbuzz.so.0.61001.0)
==18537==    by 0x6BDFA89: ??? (in /usr/lib64/libpangoft2-1.0.so.0.5200.2)
==18537==    by 0x6904652: pango_font_get_hb_font (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x6927A0A: ??? (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x6928476: ??? (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x6928BCE: pango_shape_item (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x6915062: ??? (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x691681F: ??? (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x6918A29: ??? (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x691AB12: pango_layout_get_unknown_glyphs_count (in /usr/lib64/libpango-1.0.so.0.5200.2)
==18537==    by 0x5DDC771: ??? (in /usr/lib64/libgtk-3.so.0.2410.32)
==18537==    by 0x5DDCAD7: ??? (in /usr/lib64/libgtk-3.so.0.2410.32)
```

Could you try build pango+harfbuzz without -march=... and then try Valgrind + Firefox again?

If it fails again with "Unrecognised instruction" inside of non-Firefox, repeat the same steps (build $library without -march). If it is Firefox itself at the top of the stack, then we're stuck ofc.
Comment 21 jospezial 2024-11-03 20:42:05 UTC
Created attachment 907825 [details]
firefox_valgrind_gcc libs_march_x86-64

I have rebuilt media-libs/harfbuzz x11-libs/gtk+ x11-libs/pango with
 -march=x86-64 -Og -pipe -ggdb3 .

Now firefox and thunderbird crash with the same MakeDay segfault right after the window opens.
Valgrind now goes a lot further but has:

vex amd64->IR: unhandled instruction bytes: 0xF 0xF 0x44 0x24 0x20 0xD 0x48 0xC7 0x84 0x24
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=0F
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==30431== valgrind: Unrecognised instruction at address 0x83a0f99.
==30431==    at 0x83A0F99: mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer(mozilla::nsDisplayList*, mozilla::nsDisplayListBuilder*, WrFiltersHolder&&, mozilla::layers::WebRenderBackgroundData*, double) (WebRenderLayerManager.cpp:446)
==30431==    by 0xB5DD04F: mozilla::nsDisplayList::PaintRoot(mozilla::nsDisplayListBuilder*, gfxContext*, unsigned int, mozilla::Maybe<double>) (nsDisplayList.cpp:2294)
==30431==    by 0xB309C29: nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, mozilla::nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) (nsLayoutUtils.cpp:3195)
==30431==    by 0xB271B69: mozilla::PresShell::PaintInternal(nsView*, mozilla::PaintInternalFlags) (PresShell.cpp:6513)
==30431==    by 0xAF2363D: nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) (nsViewManager.cpp:406)
==30431==    by 0xAF23A4B: nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) (nsViewManager.cpp:341)
==30431==    by 0xAF24002: ProcessPendingUpdates (nsViewManager.cpp:896)
==30431==    by 0xAF24002: nsViewManager::ProcessPendingUpdates() (nsViewManager.cpp:882)
==30431==    by 0xB2470F8: nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsRefreshDriver::IsExtraTick) (nsRefreshDriver.cpp:2885)
==30431==    by 0xB247C21: TickDriver (nsRefreshDriver.cpp:368)
==30431==    by 0xB247C21: mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp, nsTArray<RefPtr<nsRefreshDriver> >&) [clone .isra.0] (nsRefreshDriver.cpp:346)
==30431==    by 0xB247D93: mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (nsRefreshDriver.cpp:362)
==30431==    by 0xB247F73: RunRefreshDrivers (nsRefreshDriver.cpp:952)
==30431==    by 0xB247F73: mozilla::VsyncRefreshDriverTimer::TickRefreshDriver(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType>, mozilla::TimeStamp) (nsRefreshDriver.cpp:862)
==30431==    by 0xB248940: NotifyVsyncTimerOnMainThread (nsRefreshDriver.cpp:593)
==30431==    by 0xB248940: operator() (nsRefreshDriver.cpp:565)
==30431==    by 0xB248940: mozilla::detail::RunnableFunction<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync(mozilla::VsyncEvent const&)::{lambda()#1}>::Run() (nsThreadUtils.h:548)
Comment 22 jospezial 2024-11-03 20:45:09 UTC
(In reply to Sam James from comment #20)
 
> Could you try build pango+harfbuzz without -march=... and then try Valgrind
> + Firefox again?
> 
> If it fails again with "Unrecognised instruction" inside of non-Firefox,
> repeat the same steps (build $library without -march). If it is Firefox
> itself at the top of the stack, then we're stuck ofc.

Can you explain me the benefit of compiling the libs a second time with same settings?
Or do I understand you wrong?
Comment 23 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-03 20:48:40 UTC
(In reply to jospezial from comment #22)

The idea was that if a different library shows up again, build that one with genreic -march too, until no libraries are causing an error.

So it might be pango, then you rebuild pango, then valgrind shows an issue in freetype, then ...
Comment 24 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-03 20:52:13 UTC
(In reply to jospezial from comment #21)
> Created attachment 907825 [details]
> firefox_valgrind_gcc libs_march_x86-64
> 
> I have rebuilt media-libs/harfbuzz x11-libs/gtk+ x11-libs/pango with
>  -march=x86-64 -Og -pipe -ggdb3 .
> 
> Now firefox and thunderbird crash with the same MakeDay segfault right after
> the window opens.
> Valgrind now goes a lot further but has:

Thanks. Unfortunately, we cannot proceed further:
> On x86 and amd64, there is no support for 3DNow! instructions. If the translator encounters these, Valgrind will generate a SIGILL when the instruction is executed. Apart from that, on x86 and amd64, essentially all instructions are supported, up to and including AVX and AES in 64-bit mode and SSSE3 in 32-bit mode. 32-bit mode does in fact support the bare minimum SSE4 instructions needed to run programs on MacOSX 10.6 on 32-bit targets. 

I have some more requests (thank you for your patience too):
* Could you create a binpkg of broken Firefox, ideally with debug symbols?
* Could you tell me if Firefox works if you drop -march=...? My current theory is that it's a GCC bug involving 3dnow instructions (which I will debug once we get there) but I need to be sure.
Comment 25 jospezial 2024-11-03 21:41:00 UTC
https://www.phoronix.com/news/Linux-Kernel-Drop-AMD-3DNow

And since then I have in /etc/portage/make.conf
the line 
CPU_FLAGS_X86="3dnow 3dnowext mmx mmxext popcnt sse sse2 sse3 sse4a"
changed to
CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4a"

But does gcc use that variable?

So GCC with native still uses 3dnow instructions on my system?
llvm/clang has removed that since 19.1. But I get a working firefox and tb with sys-devel/clang-18.1.8-r6

https://github.com/search?q=repo%3Allvm%2Fllvm-project+3dnow&type=commits&s=committer-date&o=desc

Where do you see 3dnow in the logs?
Comment 26 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-03 21:53:04 UTC
(In reply to jospezial from comment #25)
> https://www.phoronix.com/news/Linux-Kernel-Drop-AMD-3DNow
> 

This is a misunderstanding. Your CPU still supports 3dnow instructions and it still works. But the kernel dropped some accelerated paths using it.

> And since then I have in /etc/portage/make.conf
> the line 
> CPU_FLAGS_X86="3dnow 3dnowext mmx mmxext popcnt sse sse2 sse3 sse4a"
> changed to
> CPU_FLAGS_X86="mmx mmxext popcnt sse sse2 sse3 sse4a"
> 

This only controls hand-written asm in programs.

> But does gcc use that variable?

No.

> 
> So GCC with native still uses 3dnow instructions on my system?

Yes!

> llvm/clang has removed that since 19.1. But I get a working firefox and tb
> with sys-devel/clang-18.1.8-r6

My theory is that it is a GCC bug when it is emitting 3dnow instructions, so even older Clang + 3dnow would work.
Comment 27 jospezial 2024-11-04 10:58:26 UTC
(In reply to Sam James from comment #24)
> 
> I have some more requests (thank you for your patience too):
> * Could you create a binpkg of broken Firefox, ideally with debug symbols?
Non public Downloadlink per e-mail.
Comment 28 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-05 07:49:16 UTC
(In reply to jospezial from comment #27)
> Non public Downloadlink per e-mail.

Thanks! I have this saved locally.

Minor update:
* I'm currently preparing a machine (an Opteron 252) with some help from a kind volunteer & contributor. Its native -march is k8-sse3, not amdfam10.
* That machine seemed to work when I opened FF but it was built with GCC 13. I'm upgrading everything now to ~arch and so on.
* floppym has a Phenom which *does* reproduce the crash (!) but it is his main workstation so I can't have/request access to it. We may have to ask him to probe it.
* Chiitoo has mentioned he may have a Phenom around that may be an option too.

The differences between -march=k8-sse3 and -march=amdfam10 aren't too big, the main issue is -msse4a is available on amdfam10. If needed, I'll try diff your provided binary with mine with k8-sse3 to see if jsdate even differed much there.
Comment 29 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-05 07:51:05 UTC
jospezial, if you are able, it might be useful to know:
* does GCC 13 work? unfortunately, you cannot test this easily because Firefox depends on some C++ libraries. You would have to test it with USE=-system-icu at least. It's OK if you don't try this, but it may be useful data.

* does dropping -march=native help? (does "-O2 -mtune=native" fail?)

* does "-O2 -march=amdfam10" fail?
Comment 30 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-05 07:52:09 UTC
(In reply to Sam James from comment #29)
> jospezial, if you are able, it might be useful to know:
> * does GCC 13 work? unfortunately, you cannot test this easily because
> Firefox depends on some C++ libraries. You would have to test it with
> USE=-system-icu at least. It's OK if you don't try this, but it may be
> useful data.
> 
> * does dropping -march=native help? (does "-O2 -mtune=native" fail?)
> 
> * does "-O2 -march=amdfam10" fail?

* Under gdb, when you are at:


Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
0x00007ffff12e028f in MakeDay (year=<optimized out>, month=<optimized out>, date=<optimized out>) at /usr/src/debug/www-client/firefox-132.0/firefox-132.0/js/src/jsdate.cpp:429
429       double monthday = DayFromMonth(mn, leap);
(gdb) bt f

Can you please do: 'x/5i $pc'.
Comment 31 jospezial 2024-11-05 10:59:22 UTC
I am writing this now from firefox built with
-march=x86-64 -Og -pipe -ggdb3

I remember I have seen in build.log that firefox changes that to something like -march=x86-64 -O2 -pipe -gdwarf-4

So far no crash in these first minutes.
Comment 32 jospezial 2024-11-05 22:19:29 UTC
(In reply to Sam James from comment #30)
> 
> * Under gdb, when you are at:
> 
> 
> Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
> 0x00007ffff12e028f in MakeDay (year=<optimized out>, month=<optimized out>,
> date=<optimized out>) at
> /usr/src/debug/www-client/firefox-132.0/firefox-132.0/js/src/jsdate.cpp:429
> 429       double monthday = DayFromMonth(mn, leap);
> (gdb) bt f
> 
> Can you please do: 'x/5i $pc'.

from thunderbird
I hope it has all infos
because I cleaned up /usr/src/debug/ when I needed space.:
(gdb) x/5i $pc
=> 0x7ffff159ab8f:      movd   (%rdx,%rax,4),%xmm0
   0x7ffff159ab94:      cvtdq2pd %xmm0,%xmm0
   0x7ffff159ab98:      addsd  %xmm1,%xmm0
   0x7ffff159ab9c:      addsd  %xmm2,%xmm0
   0x7ffff159aba0:      subsd  %xmm4,%xmm0
Comment 33 jospezial 2024-11-07 09:29:28 UTC
(In reply to Sam James from comment #29)
> jospezial, if you are able, it might be useful to know:
> * does dropping -march=native help? (does "-O2 -mtune=native" fail?)
> 
> * does "-O2 -march=amdfam10" fail?

-march=amdfam10 -Og -pipe -ggdb3
Without mtune works. no crash. thunderbird-133.0_beta2
Comment 34 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 09:35:02 UTC
(In reply to jospezial from comment #33)
> (In reply to Sam James from comment #29)
> > jospezial, if you are able, it might be useful to know:
> > * does dropping -march=native help? (does "-O2 -mtune=native" fail?)
> > 
> > * does "-O2 -march=amdfam10" fail?
> 
> -march=amdfam10 -Og -pipe -ggdb3
> Without mtune works. no crash. thunderbird-133.0_beta2


Can you clarify?

-march=amdfam10 should already imply -mtune=amdfam10.

You can verify this with: 'for t in param target optimize optimizer; do cmd="gcc -Q --help=$t"; diff -U0 <(LANG=C $cmd -O2 -march=amdfam10) <(LANG=C $cmd -O2 -march=amdfam10 -mtune=amdfam10); done'.
Comment 35 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 09:35:36 UTC
But note that -march=amdfam10 is different from -march=native because -march=native may include more --param ...

So does '-march=amdfam10 --param=l1-cache-line-size=64 --param=l1-cache-size=64 --param=l2-cache-size=1024' crash?

And then -march=amdfam10' works?
Comment 36 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 09:35:48 UTC
(with -O2 on top of course for both)
Comment 37 jospezial 2024-11-07 09:59:23 UTC
(In reply to Sam James from comment #36)
> (with -O2 on top of course for both)

The build process changes that to -O2 anyway.
Comment 38 jospezial 2024-11-07 10:04:40 UTC
(In reply to Sam James from comment #34)
> (In reply to jospezial from comment #33)
> > (In reply to Sam James from comment #29)
> > > jospezial, if you are able, it might be useful to know:
> > > * does dropping -march=native help? (does "-O2 -mtune=native" fail?)
> > > 
> > > * does "-O2 -march=amdfam10" fail?
> > 
> > -march=amdfam10 -Og -pipe -ggdb3
> > Without mtune works. no crash. thunderbird-133.0_beta2
> 
> 
> Can you clarify?
> 
> -march=amdfam10 should already imply -mtune=amdfam10.
> 
> You can verify this with: 'for t in param target optimize optimizer; do
> cmd="gcc -Q --help=$t"; diff -U0 <(LANG=C $cmd -O2 -march=amdfam10) <(LANG=C
> $cmd -O2 -march=amdfam10 -mtune=amdfam10); done'.

no output
Comment 39 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 10:08:46 UTC
(In reply to jospezial from comment #38)
> > 
> > -march=amdfam10 should already imply -mtune=amdfam10.
> > 
> > You can verify this with: 'for t in param target optimize optimizer; do
> > cmd="gcc -Q --help=$t"; diff -U0 <(LANG=C $cmd -O2 -march=amdfam10) <(LANG=C
> > $cmd -O2 -march=amdfam10 -mtune=amdfam10); done'.
> 
> no output

Exactly - they're the same.
Comment 40 jospezial 2024-11-07 10:32:15 UTC
(In reply to jospezial from comment #32)
> (In reply to Sam James from comment #30)
> > 
> > * Under gdb, when you are at:
> > 
> > 
> > Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
> > 0x00007ffff12e028f in MakeDay (year=<optimized out>, month=<optimized out>,
> > date=<optimized out>) at
> > /usr/src/debug/www-client/firefox-132.0/firefox-132.0/js/src/jsdate.cpp:429
> > 429       double monthday = DayFromMonth(mn, leap);
> > (gdb) bt f
> > 
> > Can you please do: 'x/5i $pc'.
> 
> from thunderbird
> I hope it has all infos
> because I cleaned up /usr/src/debug/ when I needed space.:
> (gdb) x/5i $pc
> => 0x7ffff159ab8f:      movd   (%rdx,%rax,4),%xmm0
>    0x7ffff159ab94:      cvtdq2pd %xmm0,%xmm0
>    0x7ffff159ab98:      addsd  %xmm1,%xmm0
>    0x7ffff159ab9c:      addsd  %xmm2,%xmm0
>    0x7ffff159aba0:      subsd  %xmm4,%xmm0

Does that tell us why it crashes?
I have read this is SSE2 stuff. Not 3Dnow.
Could we isolate a testcase? My PC is working for about 7 hours on each build.
Comment 41 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 10:38:53 UTC
Yes, I'm trying to, but it's not easy when I can't yet reproduce it. The Opteron is still updating. If you are able to try work on that, that would be great though.

In the meantime, finding the exact options which do trigger it would help a lot.
Comment 42 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 10:39:29 UTC
(In reply to jospezial from comment #40)
> I have read this is SSE2 stuff. Not 3Dnow.

Yes, it's not necessarily 3dnow, but tuning related to the instructions you have (including 3dnow).
Comment 43 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-07 10:42:52 UTC
Understanding exactly which options do and don't trigger it would mean that I can at least have a minimised diff b/t binaries, and also speeds up being able to reproduce.

That includes understanding which instruction sets trigger it and if -mtune is required or not.
Comment 44 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-08 08:37:04 UTC
I can reproduce it now on the Opteron.
Comment 45 jospezial 2024-11-11 16:16:28 UTC
Sam, any news from your builds and debugging on your opteron?
Did you try the equivalent for your machine of
'-march=amdfam10 --param=l1-cache-line-size=64 --param=l1-cache-size=64 --param=l2-cache-size=1024'

or what says
resolve-march-native
Comment 46 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-11 16:33:36 UTC
I'll post updates as they occur. The only bits so far aren't really worth mentioning, but given I'm writing this anyway: -Og works, -O2 doesn't; __attribute__((optimize("O0")) on MakeDay still fails (as does noipa on it which I tried first). All of that was with -march=k8-sse3. I don't remember if I tried without it yet.

To give an idea: generally, miscompilation (suspected compiler bugs) take at least a week of effort usually for non-trivial applications for me. Some are much quicker though. Firefox on the other hand is a massive application where I don't even have the luxury of debugging it on a fast machine.

Last few days I was busy with other work. Iteration time is pretty painful and spent some time trying to set up an environment where it was smaller (build locally, copy over, run).
Comment 47 jospezial 2024-11-13 12:31:07 UTC
(In reply to Sam James from comment #35)
> But note that -march=amdfam10 is different from -march=native because
> -march=native may include more --param ...
> 
> So does '-march=amdfam10 --param=l1-cache-line-size=64
> --param=l1-cache-size=64 --param=l2-cache-size=1024' crash?

No crash, works.
-march=amdfam10 --param=l1-cache-line-size=64 --param=l1-cache-size=64 --param=l2-cache-size=1024 -O2 -pipe
firefox-133.0_beta7 USE="-valgrind"
Comment 48 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-13 14:04:08 UTC
That is a bit unexpected.
Comment 49 Alexander Monakov 2024-12-02 11:31:26 UTC
From looking at disassembly, I suspect the ParseISOStyleDate, the caller of MakeDay, is miscompiled by GCC such that MakeDay receives a NaN as the first argument.

Since you already can reliably hit the segfault under gdb, you may be able to confirm that by placing a breakpoint on MakeDay, and then using 'p $xmm0' to print the first argument (assuming the caller is ParseISOStyleDate on the first hit).
Comment 50 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 12:37:59 UTC
(In reply to Alexander Monakov from comment #49)
> From looking at disassembly, I suspect the ParseISOStyleDate, the caller of
> MakeDay, is miscompiled by GCC such that MakeDay receives a NaN as the first
> argument.
> 
> Since you already can reliably hit the segfault under gdb, you may be able
> to confirm that by placing a breakpoint on MakeDay, and then using 'p $xmm0'
> to print the first argument (assuming the caller is ParseISOStyleDate on the
> first hit).

Thread 1 hit Breakpoint 1, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $xmm0
$1 = {v8_bfloat16 = {0, 0, -1.654e-24, 4.969, 0, 0, 0, 0}, v8_half = {0, 0, -0.0019531, 2.3105, 0, 0, 0, 0}, v4_float = {0, 4.98730469, 0, 0}, v2_double = {2022, 0}, 
  v16_int8 = {0, 0, 0, 0, 0, -104, -97, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, -26624, 16543, 0, 0, 0, 0}, v4_int32 = {0, 1084200960, 0, 0}, v2_int64 = {
    4656607665491804160, 0}, uint128 = 4656607665491804160}
(gdb) bt
#0  0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
#1  0x00007ffff170cefd in bool ParseISOStyleDate<unsigned char>(js::DateTimeInfo::ForceUTC, unsigned char const*, unsigned long, JS::ClippedTime*) ()
   from target:/usr/lib64/firefox/libxul.so
[...]
Comment 51 Alexander Monakov 2024-12-04 13:59:01 UTC
Thanks. Here you show 2022 as the first argument (double year), which looks fine, but then presumably this invocation of MakeDay will not segfault at all.

In situations like these, when you want to breakpoint on the faulting call to a function, but it's not the first call, you can use the 'ignore' command in GDB, first to count the number of non-faulting calls, then to stop after the last faulting call:

(gdb) b MakeDay

breakpoint 1 at ...

(gdb) ignore 1 9999999

(gdb) r

after GDB reports the segfault:

(gdb) i b 1

GDB will inform you that breakpoint 1 was already hit N times. Next, 'ignore' it N-1 times, restart, at you will be stopped on the faulting call, can inspect its arguments, the callers' arguments, etc.
Comment 52 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 14:07:37 UTC
gah, thanks.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffff170a551 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) i b 1
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff170a190 <MakeDay(double, double, double)>
        breakpoint already hit 4 times
        ignore next 9999995 hits
(gdb) 

so

Thread 1 hit Breakpoint 1, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $xmm0
$1 = {v8_bfloat16 = {0, 0, -1.084e-19, 4.969, 0, 0, 0, 0}, v8_half = {0, 0, -0.0078125, 2.3105, 0, 0, 0, 0}, v4_float = {0, 4.98828125, 0, 0}, v2_double = {2024, 0}, 
  v16_int8 = {0, 0, 0, 0, 0, -96, -97, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, -24576, 16543, 0, 0, 0, 0}, v4_int32 = {0, 1084203008, 0, 0}, v2_int64 = {
    4656616461584826368, 0}, uint128 = 4656616461584826368}
(gdb) n
Single stepping until exit from function _ZL7MakeDayddd,
which has no line number information.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffff170a551 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $xmm0
$2 = {v8_bfloat16 = {0, 0, 96, 6.594, 0, 0, 0, 0}, v8_half = {0, 0, 3.375, 2.4121, 0, 0, 0, 0}, v4_float = {0, 6.60189819, 0, 0}, v2_double = {19723, 0}, v16_int8 = {0, 
    0, 0, 0, -64, 66, -45, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, 17088, 16595, 0, 0, 0, 0}, v4_int32 = {0, 1087587008, 0, 0}, v2_int64 = {4671150630914490368, 
    0}, uint128 = 4671150630914490368}
(gdb) n
Single stepping until exit from function _ZL7MakeDayddd,
which has no line number information.
[Thread 25819.25847 exited]
0x00007ffff1d3fbd0 in WasmTrapHandler(int, siginfo_t*, void*) () from target:/usr/lib64/firefox/libxul.so
(gdb)
Comment 53 Alexander Monakov 2024-12-04 15:17:29 UTC
Okay, 2024 is still correct. Can you check $xmm1 and $xmm2 (month and day, respectively) on entry too? I was looking at immolo's binaries, there's a chance yours are miscompiled differently.

If all of xmm0/xmm1/xmm2 on entry are fine, that would mean that my analysis of immolo's binary is inapplicable to you, and you'll have to step through your MakeDay to figure out what causes an out-of-bounds access (again, looking at full backtrace supplied by immolo I deduced that it was a rogue NaN).
Comment 54 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 15:43:27 UTC
(gdb) p $xmm0
$11 = {v8_bfloat16 = {0, 0, -1.084e-19, 4.969, 0, 0, 0, 0}, v8_half = {0, 0, -0.0078125, 2.3105, 0, 0, 0, 0}, v4_float = {0, 4.98828125, 0, 0}, v2_double = {2024, 0}, 
  v16_int8 = {0, 0, 0, 0, 0, -96, -97, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, -24576, 16543, 0, 0, 0, 0}, v4_int32 = {0, 1084203008, 0, 0}, v2_int64 = {
    4656616461584826368, 0}, uint128 = 4656616461584826368}

(gdb) p $xmm1
$8 = {v8_bfloat16 = {0, 0, 0, 2.5, 0, 0, 0, 0}, v8_half = {0, 0, 0, 2.0625, 0, 0, 0, 0}, v4_float = {0, 2.5, 0, 0}, v2_double = {8, 0}, v16_int8 = {0, 0, 0, 0, 0, 0, 32, 
    64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, 0, 16416, 0, 0, 0, 0}, v4_int32 = {0, 1075838976, 0, 0}, v2_int64 = {4620693217682128896, 0}, 
  uint128 = 4620693217682128896}

(gdb) p $xmm2
$9 = {v8_bfloat16 = {0, 0, 0, 2.625, 0, 0, 0, 0}, v8_half = {0, 0, 0, 2.0781, 0, 0, 0, 0}, v4_float = {0, 2.625, 0, 0}, v2_double = {12, 0}, v16_int8 = {0, 0, 0, 0, 0, 
    0, 40, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, 0, 16424, 0, 0, 0, 0}, v4_int32 = {0, 1076363264, 0, 0}, v2_int64 = {4622945017495814144, 0}, 
  uint128 = 4622945017495814144}
Comment 55 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 16:03:38 UTC
so it looks like it's fine and I need to step through?

I'll ask for advice on doing that if possible, but I will also build again manually given that'll be useful to have anyway (and am curious as to if it has the NaN instead).
Comment 56 Alexander Monakov 2024-12-04 16:27:45 UTC
If it works fine the first three times around, and segfaults on the fourth call with (2024.0, 8.0, 12.0) in arguments, that is surprising.

For stepping, the 'display' GDB command might be helpful to request printing values of floating-point registers after each 'si' command, for instance:

display $st0
display $xmm0.v2_double[0]
Comment 57 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 16:50:08 UTC
Thanks. Let me first double check (as I agree it's suspicious).
Comment 58 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 18:37:40 UTC
To verify:
```
# Attaching to the remote (gdbserver)
0x00007ffff7fe4840 in _start () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b MakeDay
Function "MakeDay" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (MakeDay) pending.
(gdb) ignore 1 9999999
Will ignore next 9999999 crossings of breakpoint 1.
(gdb) c
Continuing.

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffff170a551 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p yearday
i No symbol "yearday" in current context.
(gdb) i b 1
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff170a190 <MakeDay(double, double, double)>
        breakpoint already hit 4 times
        ignore next 9999995 hits
(gdb) 
```

OK, so N=4, ignore it 3 times. Had to restart the session as FF has an annoying "clean startup" warning/error prompt first if the last start failed.

Then trying again:
```
Reading symbols from target:/lib64/ld-linux-x86-64.so.2...
Reading /usr/lib/debug/.build-id/ed/a8453b0094ddfaae7ee9a1f557682089f9abef.debug from remote target...
0x00007ffff7fe4840 in _start () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b MakeDay
Function "MakeDay" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (MakeDay) pending.
(gdb) ignore 1 3
Will ignore next 3 crossings of breakpoint 1.
(gdb) c
Continuing.
[...]
Thread 1 hit Breakpoint 2, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $xmm0
$2 = {v8_bfloat16 = {0, 0, -1.654e-24, 4.969, 0, 0, 0, 0}, v8_half = {0, 0, -0.0019531, 2.3105, 0, 0, 0, 0}, v4_float = {0, 4.98730469, 0, 0}, v2_double = {2022, 0}, 
  v16_int8 = {0, 0, 0, 0, 0, -104, -97, 64, 0, 0, 0, 0, 0, 0, 0, 0}, v8_int16 = {0, 0, -26624, 16543, 0, 0, 0, 0}, v4_int32 = {0, 1084200960, 0, 0}, v2_int64 = {
    4656607665491804160, 0}, uint128 = 4656607665491804160}
(gdb) p $xmm1
$3 = {v8_bfloat16 = {0, 0, 0, 1.875, 2.503e-06, 1.685e-33, 1.414e-34, -nan(0x7e)}, v8_half = {0, 0, 0, 1.9844, 0.38477, 0.00015402, 0.00011039, -nan(0x3fe)}, v4_float = {
    0, 1.875, 1.68773512e-33, -nan(0x7e073c)}, v2_double = {1, -nan(0xe073c090c3628)}, v16_int8 = {0, 0, 0, 0, 0, 0, -16, 63, 40, 54, 12, 9, 60, 7, -2, -1}, v8_int16 = {
    0, 0, 0, 16368, 13864, 2316, 1852, -2}, v4_int32 = {0, 1072693248, 151795240, -129220}, v2_int64 = {4607182418800017408, -554995522193880}, 
  uint128 = 340272129060578498174165271179148918784}
```
so I made a mistake last time!
Comment 59 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 18:43:55 UTC
(although I wonder what happened in https://bugs.gentoo.org/942573#c52, and maybe the profile affected it, but w/e)
Comment 60 Alexander Monakov 2024-12-04 19:41:46 UTC
Oh, please also do 'p $ftag' on entry to MakeDay. It was visible in the pastebinned log supplied by immolo on IRC, but I don't have that part of conversation anymore. Perhaps some previous function executed an mmx/3dnow instruction without subsequent (f)emms, leaving x87 state invalid.
Comment 61 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 19:49:44 UTC
Immediately after the above prints in the same session:

(gdb) p $ftag
$6 = 65535

(gdb) info registers
rax            0x0                 0
rbx            0x0                 0
rcx            0xa                 10
rdx            0x8                 8
rsi            0x0                 0
rdi            0x0                 0
rbp            0x1                 0x1
rsp            0x7fffffff9db8      0x7fffffff9db8
r8             0xa                 10
r9             0x0                 0
r10            0x8                 8
r11            0x7e6               2022
r12            0x0                 0
r13            0x0                 0
r14            0x7fffffff9f90      140737488330640
r15            0x1                 1
rip            0x7ffff170a190      0x7ffff170a190 <MakeDay(double, double, double)>
eflags         0x242               [ ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
fs_base        0x7ffff7e9c780      140737352681344
gs_base        0x0                 0
(gdb)
Comment 62 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 19:51:30 UTC
Created attachment 913323 [details]
gdb-info-registers-all.txt
Comment 63 Alexander Monakov 2024-12-04 20:09:36 UTC
Hunting for the missing femms would have been a spicy challenge, but (un)fortunately $ftag being 0xffff is perfectly fine. But that would explain why the first three calls work fine, and the fourth call faults while arguments are still okay.

I hope Firefox is not multithreaded yet at that point, and its really, deterministically, the fourth call that segfaults each time?
Comment 64 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-04 20:12:45 UTC
(In reply to Alexander Monakov from comment #63)
> Hunting for the missing femms would have been a spicy challenge, but
> (un)fortunately $ftag being 0xffff is perfectly fine. But that would explain
> why the first three calls work fine, and the fourth call faults while
> arguments are still okay.
> 
> I hope Firefox is not multithreaded yet at that point, and its really,
> deterministically, the fourth call that segfaults each time?

I have bad news. Immediately after the above:

(gdb) p $ftag
$10 = 65535
(gdb) c
Continuing.
[Thread 17683.17713 exited]

Thread 1 hit Breakpoint 2, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $ftag
$11 = 65535
(gdb) c
Continuing.

Thread 1 hit Breakpoint 2, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, 0x00007ffff170a190 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $ftag
$12 = 20822
(gdb) c
Continuing.
[New Thread 17683.19969]

Thread 1 received signal SIGSEGV, Segmentation fault.
0x00007ffff170a551 in MakeDay(double, double, double) () from target:/usr/lib64/firefox/libxul.so
(gdb) p $ftag
$13 = 64854

(gdb) i b 1
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x00007ffff170a190 <MakeDay(double, double, double)>
        breakpoint already hit 8 times

so it's not deterministically 4th at all...
Comment 65 Alexander Monakov 2024-12-04 20:21:37 UTC
(gdb) p $ftag
$12 = 20822

Bingo! This confirms that MakeDay is being called with invalid x87 state (all x87 stack slots are in use). Woohoo, progress! The most likely cause would be some other function using an mmx or a 3dnow instruction, marking x87 stack registers used, and not releasing them via the emms or femms instruction.

(I don't have a ready recipe for that particular needle)
Comment 66 Alexander Monakov 2024-12-05 08:20:13 UTC
Created attachment 913370 [details]
instrumentation for missed emms hunting

Okay, here's a recipe. Use the attachment to create libgcc.a with instrumentation helpers, replace the system libgcc with that (gcc -print-file-name=libgcc.a gives you the location).

Then rebuild Firefox with

  -pg -mfentry -minstrument-return=call

in flags (how 'make test' in the attachment does). Then you should get SIGILL under GDB, check backtrace and work from there. With luck, you will be stopped exactly when the naughty function returns (if you're in the __return__ handler) or calls another function (if in the __fentry__ handler) with invalid x87 state.
Comment 67 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 18:55:54 UTC
Thread 1 received signal SIGILL, Illegal instruction.
0x00005555555cae3d in __return__ ()
(gdb) bt
#0  0x00005555555cae3d in __return__ ()
#1  0x00007fffeba939f5 in ?? ()
#2  0x00007fffebf3a013 in ?? ()
#3  0x00007ffff4e3006c in ?? () from target:/usr/lib64/libfreetype.so.6
#4  0x00007fffe530d780 in ?? ()
#5  0x0000000000000000 in ?? ()
(gdb) 

I'm not sure why the backtrace is useless, it was certainly built w/ -ggdb3 and `file` at least claims firefox isn't stripped.
Comment 68 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 19:36:15 UTC
.
Reading /home/sjames/build/ff-instrumented/dist/bin/libxul.so from remote target...
Error while mapping shared library sections:
`target:/home/sjames/build/ff-instrumented/dist/bin/libxul.so': not in executable format: file format not recognized
Comment 69 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 19:47:55 UTC
Created attachment 913410 [details]
backtrace-on-sigill.txt

Moving on from gdbserver for now..

#0  0x00005555555cae3d in __return__ ()
#1  0x00007fffeba939f5 in hb_font_set_scale (font=<optimized out>, x_scale=<optimized out>, y_scale=<optimized out>) at /home/sjames/git/firefox-132.0/gfx/harfbuzz/src/hb-font.cc:2347
#2  0x00007fffebf3a013 in gfxHarfBuzzShaper::CreateHBFont (aFont=0x7fffeba939f5 <hb_font_set_scale(hb_font_t*, int, int)+341>, aFontFuncs=<optimized out>, aCallbackData=aCallbackData@entry=0x7fffe01a95d8) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxHarfBuzzShaper.cpp:1323
#3  0x00007fffebf3a50e in gfxHarfBuzzShaper::Initialize (this=this@entry=0x7fffe01a9580) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxHarfBuzzShaper.cpp:1302
#4  0x00007fffebf3a794 in gfxHarfBuzzShaper::Initialize (this=this@entry=0x7fffe01a9580) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxHarfBuzzShaper.cpp:1305
#5  0x00007fffebefd7c4 in gfxFont::GetHarfBuzzShaper (this=this@entry=0x7fffe15e0c30) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxFont.cpp:1078
#6  0x00007fffebf0678b in gfxFont::ShapeText (this=0x7fffe15e0c30, aDrawTarget=<optimized out>, aText=<optimized out>, aOffset=0, aLength=4, aScript=mozilla::intl::Script::LATIN, aLanguage=<optimized out>, aVertical=<optimized out>, aRounding=<optimized out>, aShapedText=<optimized out>) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxFont.cpp:3443
#7  0x00007fffebf005ea in gfxFont::ShapeText (this=this@entry=0x7fffe15e0c30, aDrawTarget=aDrawTarget@entry=0x7fffe64cf340, aText=aText@entry=0x7fffffff2bd0 "File\020", aOffset=aOffset@entry=0, aLength=aLength@entry=4, aScript=aScript@entry=mozilla::intl::Script::LATIN, aLanguage=aLanguage@entry=0x7fffe08277a0, aVertical=aVertical@entry=false, aRounding=aRounding@entry=gfxFontShaper::RoundingFlags::kRoundY, aShapedText=aShapedText@entry=0x7fffdf3fe180) at /home/sjames/git/firefox-132.0/gfx/thebes/gfxFont.cpp:3412
Comment 70 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 19:53:30 UTC
Created attachment 913411 [details]
disas-of-hb_font_set_scale.txt
Comment 71 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 19:54:07 UTC
(gdb) p $ftag
$2 = 342
Comment 72 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 20:15:07 UTC
Created attachment 913414 [details]
Unified_cpp_gfx_harfbuzz_src0.ii.xz

/usr/bin/ccache /usr/bin/g++ -o Unified_cpp_gfx_harfbuzz_src0.o -c -I/home/sjames/build/ff-instrumented/dist/stl_wrappers -I/home/sjames/build/ff-instrumented/dist/system_wrappers -include /home/sjames/git/firefox-132.0/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fstrict-flex-arrays=1 -DNDEBUG=1 -DTRIMMED=1 '-DPACKAGE_VERSION="moz"' '-DPACKAGE_BUGREPORT="http://bugzilla.mozilla.org/"' -DHAVE_OT=1 -DHAVE_ROUND=1 -DHB_NO_BUFFER_VERIFY -DHB_NO_FALLBACK_SHAPE -DHB_NO_UCD -DHB_NO_UNICODE_FUNCS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DMOZ_SUPPORT_LEAKCHECKING -DSTATIC_EXPORTABLE_JS_API -I/home/sjames/git/firefox-132.0/gfx/harfbuzz/src -I/home/sjames/build/ff-instrumented/gfx/harfbuzz/src -I/home/sjames/build/ff-instrumented/dist/include -I/usr/include/nspr -I/usr/include/nss -I/usr/include/nspr -I/home/sjames/build/ff-instrumented/dist/include/nss -I/usr/include/pixman-1 -DMOZILLA_CLIENT -include /home/sjames/build/ff-instrumented/mozilla-config.h -fno-rtti -pthread -fno-sized-deallocation -fno-aligned-new -ffunction-sections -fdata-sections -fno-math-errno -fno-exceptions -pipe -fPIC -specs=/home/sjames/gcc.specs -O2 -ggdb3 -pipe -march=k8-sse3 -pg -mfentry -minstrument-return=call -gdwarf-4 -O2 -fomit-frame-pointer -funwind-tables -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wcomma-subscript -Wvolatile -Wno-deprecated-enum-enum-conversion -Wduplicated-cond -Wimplicit-fallthrough -Wlogical-op -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wformat -Wformat-security -Wformat-overflow=2 -Wno-psabi -Wno-error=builtin-macro-redefined -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/lib64/libffi/include -fno-strict-aliasing -ffp-contract=off -MD -MP -MF .deps/Unified_cpp_gfx_harfbuzz_src0.o.pp Unified_cpp_gfx_harfbuzz_src0.cpp -save-temps
Comment 73 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-05 22:14:13 UTC
(In reply to Sam James from comment #68)
> .
> Reading /home/sjames/build/ff-instrumented/dist/bin/libxul.so from remote
> target...
> Error while mapping shared library sections:
> `target:/home/sjames/build/ff-instrumented/dist/bin/libxul.so': not in
> executable format: file format not recognized

FTR, I think this is https://sourceware.org/bugzilla/show_bug.cgi?id=26196.
Comment 74 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-07 12:56:49 UTC
Thank you jospezial for the report, amonakov for the extensive help debugging & analysing the problem, immolo for doing initial debugging with amonakov, an unnamed contributor who kindly set up a machine for me to use and test on, and all other offers of help.

amonakov reported it to GCC at https://gcc.gnu.org/PR117926 and Uros has fixed it already on trunk (not yet backported to 14). I'll test the fix over the weekend. I'll let you know when a version in-tree is expected to work.
Comment 75 jospezial 2024-12-09 22:30:46 UTC
sys-devel/gcc-14.3.9999 www-client/firefox-133.0
-march=native -O2 -pipe

No crash, runs fine.
Comment 76 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-09 22:42:07 UTC
(In reply to jospezial from comment #75)
> sys-devel/gcc-14.3.9999 www-client/firefox-133.0
> -march=native -O2 -pipe
> 
> No crash, runs fine.

Fantastic. Uros backported it already on the 14 branch so I imagine you had it in there, depending on when you started the build.

Alexander pointed out on IRC that you will need to rebuild a lot of packages, unfortunately. The issue is that it's not as simple as a particular package crashing. The bug involved x87 FPU state being left corrupted which means it can "carry across" packages.

You have a few options:
1) We could try find which binaries on your system at least use MMX and rebuild those;
2) We could analyse those results and see if they seem miscompiled, and only rebuild those;
3) Just rebuild everything.

Which would you like to do?
Comment 77 Larry the Git Cow gentoo-dev 2024-12-15 01:11:19 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7e3e2d5257fe376adc87504bc09505eccff7aab0

commit 7e3e2d5257fe376adc87504bc09505eccff7aab0
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-12-15 01:10:54 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-12-15 01:10:54 +0000

    sys-devel/gcc: add 14.2.1_p20241214
    
    Bug: https://bugs.gentoo.org/942573
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-devel/gcc/Manifest                    |  1 +
 sys-devel/gcc/gcc-14.2.1_p20241214.ebuild | 54 +++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)
Comment 78 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-12-23 02:30:17 UTC
https://forums.gentoo.org/viewtopic-p-8849550.html may be another instance.
Comment 79 Larry the Git Cow gentoo-dev 2024-12-23 02:34:17 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=589141eab7000d561f95958da64317c461f3595b

commit 589141eab7000d561f95958da64317c461f3595b
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2024-12-23 02:30:05 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2024-12-23 02:33:48 +0000

    sys-devel/gcc: keyword 14.2.1_p20241221
    
    Has a bunch of misc. fixes but importantly fixes a miscompilation with
    3DNow! instructions where x87 FPU state was left corrupted. This only
    affected >= GCC 14 and is now fixed.
    
    Unfortunately, the nature of the bug means that all packages may need
    to be recompiled (see https://bugs.gentoo.org/942573#c76 for more detail
    there). We can still consider a news item describing how to find potentially
    affected packages, but that's not a reason to put off keywording (and shortly,
    stabilisation).
    
    Thanks again to amonakov for the help in debugging, jospezial for the report,
    immolo for initially working with amonakov on it, and all others who
    helped & offered help. And Uros for fixing it upstream, of course.
    
    Will file a stable bug soon. I'd been planning on keywording this
    shortly anyway but was waiting for Christmas for things to settle down:
    now is a good time, and also was prompted by a potential other report
    of this on the forums at https://forums.gentoo.org/viewtopic-p-8849550.html.
    
    Bug: https://gcc.gnu.org/PR117926
    Bug: https://bugs.gentoo.org/942573
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-devel/gcc/gcc-14.2.1_p20241221.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)