Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 950229 - sys-devel/gcc: www-client/firefox-135.0.1 built with -O3 -flto crashes after a while of use
Summary: sys-devel/gcc: www-client/firefox-135.0.1 built with -O3 -flto crashes after ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
: 950352 (view as bug list)
Depends on:
Blocks: lto
  Show dependency tree
 
Reported: 2025-02-24 12:39 UTC by Alfred Wingate
Modified: 2025-03-13 10:47 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info firefox (emerge--info-firefox.txt,25.15 KB, text/plain)
2025-02-24 12:39 UTC, Alfred Wingate
Details
gdb bt full (firefox-bt-full.txt,23.90 KB, text/plain)
2025-02-24 13:10 UTC, Alfred Wingate
Details
disassembly (file_950229.txt,78.16 KB, text/plain)
2025-02-24 23:46 UTC, Sam James
Details
register dump (file_950229.txt,19.29 KB, text/plain)
2025-02-24 23:46 UTC, Sam James
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alfred Wingate 2025-02-24 12:39:36 UTC
Created attachment 919803 [details]
emerge --info firefox

Reporting the known issue that sam has been afflicted with as well, now with attached backtraces.

#0  0x00007fb7a7cabefc in pthread_kill () at /lib64/libc.so.6
#1  0x00007fb7a7c47c96 in raise () at /lib64/libc.so.6
#2  0x00007fb7a0befd8a in nsProfileLock::FatalSignalHandler
    (signo=11, info=0x7fffc1c19db0, context=0x7fffc1c19c80)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/toolkit/profile/nsProfileLock.cpp:177
#3  0x00007fb7a7c47dc0 in <signal handler called> () at /lib64/libc.so.6
#4  mozilla::RefPtrTraits<mozilla::dom::BrowsingContext>::Release (aPtr=<optimized out>, aPtr=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/dist/include/mozilla/RefPtr.h:49
#5  RefPtr<mozilla::dom::BrowsingContext>::ConstRemovingRefPtrTraits<mozilla::dom::BrowsingContext>::Release
    (aPtr=<optimized out>, aPtr=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/dist/include/mozilla/RefPtr.h:409
#6  RefPtr<mozilla::dom::BrowsingContext>::~RefPtr (this=<optimized out>, this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/dist/include/mozilla/RefPtr.h:80
#7  mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext>::~MaybeDiscarded
    (this=<optimized out>, this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/dist/include/mozilla/dom/MaybeDiscarded.h:23
#8  IPC::ReadResult<mozilla::dom::MaybeDiscarded<mozilla::dom::BrowsingContext>, true>::~ReadResult
    (this=<optimized out>, this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/chromium/src/chrome/common/ipc_message_utils.h:248
#9  mozilla::dom::PSessionStoreParent::OnMessageReceived (this=<optimized out>, msg__=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/ipc/ipdl/PSessionStoreParent.cpp:352
#10 0x00007fb79fb566cd in mozilla::dom::PContentParent::OnMessageReceived (this=<optimized out>, msg__=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/ipc/ipdl/PContentParent.cpp:6735
#11 0x00007fb79cb48da1 in mozilla::ipc::MessageChannel::DispatchAsyncMessage
    (this=0x7fb754322c80, aProxy=0x7fb75b3b0da0, aMsg=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessageChannel.cpp:1790
#12 mozilla::ipc::MessageChannel::DispatchMessage
    (this=this@entry=0x7fb754322c80, aProxy=aProxy@entry=0x7fb75b3b0da0, aMsg=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessageChannel.cpp:1717
#13 0x00007fb79cb4a5df in mozilla::ipc::MessageChannel::RunMessage
    (this=<optimized out>, aProxy=0x7fb75b3b0da0, aTask=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessageChannel.cpp:1508
#14 mozilla::ipc::MessageChannel::RunMessage (this=<optimized out>, aProxy=0x7fb75b3b0da0, aTask=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessageChannel.cpp:1469
#15 mozilla::ipc::MessageChannel::MessageTask::Run (this=0x7fb752962350)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessageChannel.cpp:1608
#16 0x00007fb79c459252 in mozilla::RunnableTask::Run (this=0x7fb753bfd180)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:688
#17 0x00007fb79c4ce976 in mozilla::TaskController::RunTask (aTask=0x7fb753bfd180)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:247
#18 mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal
    (this=this@entry=0x7fb798773260, aProofOfLock=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:1015
#19 0x00007fb79c4cf574 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal
    (this=0x7fb798773260, aProofOfLock=...)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:838
#20 mozilla::TaskController::ProcessPendingMTTask (aMayWait=false, this=0x7fb798773260)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:624
#21 operator() (__closure=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/TaskController.cpp:336
#22 mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::<lambda()> >::Run(void)
    (this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/nsThreadUtils.h:548
#23 0x00007fb79c50346c in nsThread::ProcessNextEvent
    (this=0x7fb7a7a90280, aMayWait=<optimized out>, aResult=0x7fffc1c1bc70)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/nsThread.cpp:1159
#24 0x00007fb79cafeee2 in nsThread::ProcessNextEvent
    (aMayWait=false, this=0x7fb7a7a90280, aResult=0x7fffc1c1bc70)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/nsThread.cpp:1055
#25 NS_ProcessNextEvent (aMayWait=false, aThread=0x7fb7a7a90280)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/xpcom/threads/nsThreadUtils.cpp:480
#26 mozilla::ipc::MessagePump::Run (this=0x7fb79a7d9300, aDelegate=0x7fb7a7a69040)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/glue/MessagePump.cpp:85
--Type <RET> for more, q to quit, c to continue without paging--
#27 0x00007fb7a006f680 in MessageLoop::RunInternal (this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/chromium/src/base/message_loop.cc:369
#28 MessageLoop::RunHandler (this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/chromium/src/base/message_loop.cc:362
#29 MessageLoop::Run (this=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/ipc/chromium/src/base/message_loop.cc:344
#30 nsBaseAppShell::Run (this=0x7fb79a75c500)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/widget/nsBaseAppShell.cpp:148
#31 nsAppShell::Run (this=0x7fb79a75c500)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/widget/gtk/nsAppShell.cpp:470
#32 0x00007fb7a0c68361 in nsAppStartup::Run (this=0x7fb795efe600)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/dist/include/nsCOMPtr.h:751
#33 XREMain::XRE_mainRun (this=this@entry=0x7fffc1c1c400)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/toolkit/xre/nsAppRunner.cpp:5835
#34 0x00007fb7a0c6b362 in XREMain::XRE_main
    (this=0x7fffc1c1c400, argc=<optimized out>, argv=<optimized out>, aConfig=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/toolkit/xre/nsAppRunner.cpp:6075
#35 XRE_main (argc=<optimized out>, argv=<optimized out>, aConfig=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/toolkit/xre/nsAppRunner.cpp:6148
#36 0x000055d378478a25 in do_main (argc=<optimized out>, argv=<optimized out>, envp=envp@entry=0x7fffc1c1d7a8)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/browser/app/nsBrowserApp.cpp:232
#37 0x000055d37846cc7a in main (argc=<optimized out>, argv=<optimized out>, envp=0x7fffc1c1d7a8)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox-135.0.1/browser/app/nsBrowserApp.cpp:464
Comment 1 Alfred Wingate 2025-02-24 12:40:01 UTC
about:buildconfig

Build Configuration

Please be aware that this page doesn't reflect all the options used to build Firefox.
Build platform
target
x86_64-pc-linux-gnu
Build tools
Compiler 	Version 	Compiler flags
/usr/bin/x86_64-pc-linux-gnu-gcc -std=gnu17 	15.0.1 	-pthread -ffunction-sections -fdata-sections -fno-math-errno -pipe -fPIC -march=znver2 -pipe -fuse-linker-plugin -frecord-gcc-switches -Werror=strict-aliasing
/usr/bin/x86_64-pc-linux-gnu-g++ 	15.0.1 	-fno-rtti -pthread -fno-sized-deallocation -fno-aligned-new -ffunction-sections -fdata-sections -fno-math-errno -fno-exceptions -pipe -fPIC -march=znver2 -pipe -fuse-linker-plugin -frecord-gcc-switches -Werror=strict-aliasing -gdwarf-4 -O3 -fomit-frame-pointer -funwind-tables
/usr/lib/rust/1.85.0/bin/rustc 	1.85.0 	
Configure options

--enable-application=browser --host=x86_64-pc-linux-gnu --target=x86_64-pc-linux-gnu --enable-update-channel=release MOZBUILD_STATE_PATH=/var/tmp/notmpfs/portage/www-client/firefox-135.0.1/work/firefox_build --prefix=/usr --libdir=/usr/lib64 CPPFLAGS= 'CFLAGS=-march=znver2 -pipe -fuse-linker-plugin -frecord-gcc-switches -Werror=strict-aliasing' 'CXXFLAGS=-march=znver2 -pipe -fuse-linker-plugin -frecord-gcc-switches -Werror=strict-aliasing' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed -Wl,--defsym=__gentoo_check_ldflags__=0 -frecord-gcc-switches -Werror=strict-aliasing -Wl,--undefined-version -Wl,-rpath=/usr/lib64/firefox,--enable-new-dtags' --enable-optimize=-O3 --with-toolchain-prefix=x86_64-pc-linux-gnu- CC=x86_64-pc-linux-gnu-gcc LD=x86_64-pc-linux-gnu-ld CXX=x86_64-pc-linux-gnu-g++ HOST_CC=x86_64-pc-linux-gnu-gcc HOST_CXX=x86_64-pc-linux-gnu-g++ --enable-linker=bfd 'AS=x86_64-pc-linux-gnu-gcc -c' AR=x86_64-pc-linux-gnu-ar NM=x86_64-pc-linux-gnu-nm PKG_CONFIG=x86_64-pc-linux-gnu-pkg-config --enable-lto=full READELF=llvm-readelf RUSTC=/usr/lib/rust/1.85.0/bin/rustc CARGO=/usr/lib/rust/1.85.0/bin/cargo --disable-cargo-incremental --with-libclang-path=/usr/lib/llvm/19/lib64 --with-system-ffi --enable-rust-simd --with-system-icu --enable-default-toolkit=cairo-gtk3-x11-wayland --disable-wmf --with-system-av1 --disable-real-time-tracing --with-mozilla-api-keyfile=/var/tmp/notmpfs/portage/www-client/firefox-135.0.1/work/firefox-135.0.1/api-mozilla.key --with-google-location-service-api-keyfile=/var/tmp/notmpfs/portage/www-client/firefox-135.0.1/work/firefox-135.0.1/api-location.key --with-google-safebrowsing-api-keyfile=/var/tmp/notmpfs/portage/www-client/firefox-135.0.1/work/firefox-135.0.1/api-google.key --with-system-webp --with-system-graphite2 --with-system-harfbuzz --disable-geckodriver --enable-elf-hack=relr --with-unsigned-addon-scopes=app,system --allow-addon-sideload --with-system-libvpx --with-system-jpeg --without-wasm-sandboxed-libraries --with-system-nss --disable-updater --with-system-libevent --disable-crashreporter --disable-necko-wifi --disable-parental-controls --enable-system-pixman --disable-legacy-profile-creation XARGS=/usr/bin/xargs --disable-install-strip --with-system-zlib --enable-official-branding --x-includes=/usr/include --x-libraries=/usr/lib64
Comment 2 Alfred Wingate 2025-02-24 13:10:27 UTC
Created attachment 919805 [details]
gdb bt full
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-24 23:44:18 UTC
Example of a previous such bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1664151.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-24 23:46:21 UTC
Created attachment 919873 [details]
disassembly
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-24 23:46:34 UTC
[Saturday 25 January 2025] [09:11:40 Greenwich Mean Time] <sam_>        not sure I get it yet
[Saturday 25 January 2025] [09:11:42 Greenwich Mean Time] <sam_>        x
[Saturday 25 January 2025] [09:11:42 Greenwich Mean Time] <sam_>           0x00007ffff068382f <+1471>:  test   %rcx,%rcx
[Saturday 25 January 2025] [09:11:42 Greenwich Mean Time] <sam_>           0x00007ffff0683832 <+1474>:  je     0x7ffff0683845 <_ZN7mozilla3dom19PSessionStoreParent17OnMessageReceivedERKN3IPC7MessageE+1493>
[Saturday 25 January 2025] [09:11:42 Greenwich Mean Time] <sam_>        => 0x00007ffff0683834 <+1476>:  mov    (%rcx),%rax
[Saturday 25 January 2025] [09:12:12 Greenwich Mean Time] <sam_>        so %rcx is 0 and it dereferences i
[Saturday 25 January 2025] [09:12:45 Greenwich Mean Time] <sam_>        and it was supposed to be whatever is 0x58 on the stack?
[Saturday 25 January 2025] [09:12:49 Greenwich Mean Time] <sam_>        *is at
[Saturday 25 January 2025] [09:44:29 Greenwich Mean Time] <sam_>        i'm not sure if it's hard to follow because of the std::move
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-24 23:46:43 UTC
Created attachment 919874 [details]
register dump
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-25 08:27:05 UTC
For anyone else coming across this: what would be *really* useful is a way to reproduce it on demand.

It crashes with a high probability within say, 5 minutes of FF startup, but not always, and I can't trigger it on-demand.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 06:47:27 UTC
https://searchfox.org/mozilla-release/source/__GENERATED__/ipc/ipdl/PSessionStoreParent.cpp#352

#9  mozilla::dom::PSessionStoreParent::OnMessageReceived (this=<optimized out>, msg__=<optimized out>)
    at /usr/src/debug/www-client/firefox-135.0.1/firefox_build/ipc/ipdl/PSessionStoreParent.cpp:352
        maybe__aBrowsingContext = {mIsOk = true, mData = {mId = 57982058497, mPtr = {mRawPtr = 0x0}}}
        aBrowsingContext = <optimized out>
[...]
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 06:47:56 UTC
(I think we're in the PSessionStore::Msg_IncrementalSessionStoreUpdate__ID case, and then we die on leaving the scope when it destructs the obj)
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 06:50:21 UTC
then down the line

  ~RefPtr() {
    if (mRawPtr) {
      ConstRemovingRefPtrTraits<T>::Release(mRawPtr);
    }
  }

and I guess the check for it being null gets optimised out
Comment 11 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 07:10:52 UTC
kostadin gave me a useful reproducer, youtube.com shorts and scrolling down with just the down arrow key seems to hit it within <10s for me
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-02-28 07:12:50 UTC
*** Bug 950352 has been marked as a duplicate of this bug. ***
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 21:13:31 UTC
So far, bisected objects to:
* build/toolkit/components/sessionstore/Unified_cpp_sessionstore0.o
* build/docshell/base/Unified_cpp_docshell_base0.o
w/ LTO.

Combining the two of them builds fine but either building it with LTO (to allow merging with other LTO objects, not that there's many left, just thinking of the .a which I didn't need to split out) or without doesn't lead to the crash.

Uploaded a tarball [0] as a checkpoint of progress with preprocessed sources (Unified_cpp_sessionstore0.ii, Unified_cpp_docshell_base0.ii), compile/link commands for those, and the full link command + list of objects from bisection.

[0] https://dev.gentoo.org/~sam/bugs/gentoo/950229/950229-checkpoint.tar.xz
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 21:14:31 UTC
Next steps are (given the merging attempt failed):
* split the unified versions out to make it easier to see what's going on (and discard a lot of code) -- hopefully it still reproduces there, and/or
* do a full build w/ unified disabled and hope it still repros there and if needed redo full bisection of objects (bleh)
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 21:26:23 UTC
# Split out from Unified_cpp_docshell_base0.cpp                                                                                                                                           merge_1=(                                                                                                                                                                                         /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/BaseHistory.cpp                                                                                                                       /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/BrowsingContext.cpp                                                                                                                   /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/BrowsingContextGroup.cpp                                                                                                              /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/BrowsingContextWebProgress.cpp                                                                                                        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/CanonicalBrowsingContext.cpp                                                                                                          /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/ChildProcessChannelListener.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/LoadContext.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/SerializedLoadContext.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/WindowContext.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsAboutRedirector.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDSURIContentListener.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDocShell.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDocShellEditorData.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDocShellEnumerator.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDocShellLoadState.cpp                                                                                                               /opt/sam-debugging/ff/firefox-135.0.1/docshell/base/nsDocShellTelemetryUtils.cpp                                                                                                  ) 
                                                                                                                                                                                                                                                                                                                                                                                  # Split out from Unified_cpp_sessionstore0.cpp
merge_2=(                                                                                                                                                                                         /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/BrowserSessionStore.cpp                                                                                             /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/RestoreTabContentObserver.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreChangeListener.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreChild.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreFormData.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreListener.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreParent.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreRestoreData.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreScrollData.cpp
        /opt/sam-debugging/ff/firefox-135.0.1/toolkit/components/sessionstore/SessionStoreUtils.cpp
        /opt/sam-debugging/ff/build/ipc/ipdl/PSessionStore.cpp
        /opt/sam-debugging/ff/build/ipc/ipdl/PSessionStoreChild.cpp
        /opt/sam-debugging/ff/build/ipc/ipdl/PSessionStoreParent.cpp
        /opt/sam-debugging/ff/build/ipc/ipdl/SessionStoreTypes.cpp
)
Comment 16 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 22:27:45 UTC
(In reply to Sam James from comment #14)
> Next steps are (given the merging attempt failed):
> * split the unified versions out to make it easier to see what's going on
> (and discard a lot of code) -- hopefully it still reproduces there

This builds fine and runs fine w/o any crashes :(

> * do a full build w/ unified disabled and hope it still repros there and if
> needed redo full bisection of objects (bleh)

I'll try this next but will probably call it a night here.
Comment 17 stefan11111 2025-03-01 23:08:20 UTC
(In reply to Sam James from comment #13)
> So far, bisected objects to:
> * build/toolkit/components/sessionstore/Unified_cpp_sessionstore0.o
> * build/docshell/base/Unified_cpp_docshell_base0.o
> w/ LTO.
> 
> Combining the two of them builds fine but either building it with LTO (to
> allow merging with other LTO objects, not that there's many left, just
> thinking of the .a which I didn't need to split out) or without doesn't lead
> to the crash.
> 
> Uploaded a tarball [0] as a checkpoint of progress with preprocessed sources
> (Unified_cpp_sessionstore0.ii, Unified_cpp_docshell_base0.ii), compile/link
> commands for those, and the full link command + list of objects from
> bisection.
> 
> [0] https://dev.gentoo.org/~sam/bugs/gentoo/950229/950229-checkpoint.tar.xz

I'm sure you are aware of this, but you can't really bisect lto failures
by compiling some files with -flto and others without.

I learned that when debugging an lto segfault due to strict aliasing violations.
I ended up writing a separate file to put functions into, and got it down to
a few (3 iirc) functions that when built without lto, everything would work, but
build any of them with lto, and the segfault would start happening.
In the end, the strict aliasing violation wasn't in those functions, it was in
a completely different place.

If it helps give any ideas here, I tracked it down by adding volatile's to function args and then removing some of them, and bisected it that way.

When debugging that lto crash, gdb was showing some insane output.

Something like, foo = NULL, foo->bar = something, foo->bar->baz (segfault)

Again I'm sure you already know what can happen when debugging lto crashes,
but I wanted to point it out.
Comment 18 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 23:11:43 UTC
(In reply to stefan11111 from comment #17)
> (In reply to Sam James from comment #13)
> > So far, bisected objects to:
> > * build/toolkit/components/sessionstore/Unified_cpp_sessionstore0.o
> > * build/docshell/base/Unified_cpp_docshell_base0.o
> > w/ LTO.
> > 
> > Combining the two of them builds fine but either building it with LTO (to
> > allow merging with other LTO objects, not that there's many left, just
> > thinking of the .a which I didn't need to split out) or without doesn't lead
> > to the crash.
> > 
> > Uploaded a tarball [0] as a checkpoint of progress with preprocessed sources
> > (Unified_cpp_sessionstore0.ii, Unified_cpp_docshell_base0.ii), compile/link
> > commands for those, and the full link command + list of objects from
> > bisection.
> > 
> > [0] https://dev.gentoo.org/~sam/bugs/gentoo/950229/950229-checkpoint.tar.xz
> 
> I'm sure you are aware of this, but you can't really bisect lto failures
> by compiling some files with -flto and others without.
> 

No, you can -- that's what I did. The rub is partitioning, but it doesn't mean you can't do it at all, or that it has no value. It just means you can't (obviously) get it down to a single TU.

Any files built with LTO *can't* be relevant because they don't have any GIMPLE in them anymore, they're simply object files with no metadata.

(Most LTO issues also boil down to "more inlining" which is why you can almost-always transform an LTO bug into a single-file testcase in the end. It's just getting there that sucks.)
Comment 19 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 23:12:53 UTC
(In reply to Sam James from comment #18)
> 
> Any files built with LTO *can't* be relevant because they don't have any
> GIMPLE in them anymore, they're simply object files with no metadata.
> 

with->without
Comment 20 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 23:18:59 UTC
(https://gcc.gnu.org/PR117315 is a recent example where it ended up being the malloc attribute causing DSE a while down the line, by the way.)
Comment 21 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-01 23:19:35 UTC
s/DSE/DCE, and now bed..
Comment 22 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-02 17:13:10 UTC
Testing the patch from https://bugzilla.mozilla.org/show_bug.cgi?id=1790526 (https://bugzilla.mozilla.org/attachment.cgi?id=9465385), but if it works, I think it'd make sense too.

https://searchfox.org/mozilla-central/source/toolkit/components/sessionstore/SessionStoreParent.cpp#197

because of the access of aBrowsingContext in:
         aBrowsingContext.GetMaybeDiscarded()->Canonical(), aFormData,
it might, with LTO, conclude "ok, it's not null, as you accessed it before", and then kill the mRawPtr check later (which would be legitimate).
Comment 23 Joonas Niilola gentoo-dev 2025-03-02 17:23:42 UTC
Awesome, let me know if it works and can be added to 136.
Comment 24 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 09:36:01 UTC
(In reply to Joonas Niilola from comment #23)
> Awesome, let me know if it works and can be added to 136.

Thank you! It's looking very good so far. No luck crashing on the standard reproducer with a manual build, and my system FF has been patched since around the time of my comment. Please do add it in if you can on the next version.

I'll comment on the upstream bug/review later today.
Comment 25 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 09:36:20 UTC
Stefan, it would be nice if you could check it out and see if it helps your crash too. parona too?
Comment 26 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 13:13:11 UTC
(Not 15 specific; kostadin hit it w/ 14 at least, not 13. It also started between 129 and 131, we think.)
Comment 27 stefan11111 2025-03-03 14:36:24 UTC
(In reply to Sam James from comment #25)
> Stefan, it would be nice if you could check it out and see if it helps your
> crash too. parona too?

The patch fixes both my crashes.
Comment 28 Larry the Git Cow gentoo-dev 2025-03-03 20:17:02 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7dfd8a45f803cd33f4714dbcc664fc965a5748d9

commit 7dfd8a45f803cd33f4714dbcc664fc965a5748d9
Author:     Joonas Niilola <juippis@gentoo.org>
AuthorDate: 2025-03-03 20:13:21 +0000
Commit:     Joonas Niilola <juippis@gentoo.org>
CommitDate: 2025-03-03 20:16:59 +0000

    www-client/firefox: add 136.0
    
     - add a patch that's currently being worked upstream which fixes runtime
       issues when compiled with gcc and lto,
     - attempt removing
       "*bgo-925101-force-software-rendering-during-pgo-build.patch" as it's
       supposed to be fixed by mesa updates,
     - disable pref "permissions.manager.remote.enabled" by default - can be
       enabled by corporations managing their browsers via remote-settings,
     - handle rust-simd by enabling it on supported arches, so unkeyworded arches
       can probably compile the browser with --disable-rust-simd by default
       without editing the ebuild,
     - increase nss, icu and libpng version requirements,
     - remove our custom patch enabling vaapi on all amd cards since it's merged
       upstream,
     - remove our custom system-av1 & system-libvpx patches as they've been merged
       upstream.
    
    Bug: https://bugs.gentoo.org/950229
    Bug: https://bugs.gentoo.org/950305
    Signed-off-by: Joonas Niilola <juippis@gentoo.org>

 www-client/firefox/Manifest                      |  102 ++
 www-client/firefox/files/gentoo-default-prefs.js |    3 +
 www-client/firefox/firefox-136.0.ebuild          | 1382 ++++++++++++++++++++++
 3 files changed, 1487 insertions(+)
Comment 29 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 20:30:14 UTC
Thank you all! Given that we think it got introduced after the last ESR, closing.
Comment 30 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 20:30:29 UTC
(In reply to Sam James from comment #29)
> Thank you all! Given that we think it got introduced after the last ESR,
> closing.

or "introduced", I should say -- the upstream bug is older than that (2 years old) and it's been latent, I think.
Comment 31 stefan11111 2025-03-03 22:27:38 UTC
Can someone explain why this patch fixes the issue?

I know gcc removes NULL-pointer checks in cases like:

foo (char *ptr)
{
    if (ptr) bar(ptr);
}

bar (char *ptr)
{
    if (ptr) *ptr = 0;
}

And cases like:

foo (volatile char *ptr)
/* volatile so if won't get optimized for other reasons */
{
    *ptr = 0;
    if (ptr) *ptr = 1;
}

If it were the first case, then the NULL-pointer check wouldn't cause segfaults.
If it were the second case, It wouldn't have mattered if the NULL-pointer check was removed or not, a segfault would still happen.

Is there something I'm missing here?
Comment 32 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-03 22:38:02 UTC
It's nearly the first case.

int bar(int *p) {
    if (!p) /* check is deleted with LTO if all callers are known because it has to be non-NULL for the earlier printf to be valid */
        __builtin_abort(); 
}

int foo(int* p) {
    printf("%p\n", p); /* p must be non-null given we used it */
    bar(p);
}
Comment 33 Larry the Git Cow gentoo-dev 2025-03-04 14:43:01 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4e2aeadbeb3d2f677a1ed5d4286b090620c896ec

commit 4e2aeadbeb3d2f677a1ed5d4286b090620c896ec
Author:     Joonas Niilola <juippis@gentoo.org>
AuthorDate: 2025-03-04 14:37:32 +0000
Commit:     Joonas Niilola <juippis@gentoo.org>
CommitDate: 2025-03-04 14:37:32 +0000

    www-client/firefox: add 128.8.0
    
     - add a patch that's currently being worked upstream which fixes runtime
       issues when compiled with gcc and lto,
     - handle rust-simd by enabling it on supported arches, so unkeyworded arches
       can probably compile the browser with --disable-rust-simd by default
       without editing the ebuild,
     - sync the updated configure option from rapid to esr
       (system-ffi, update-channel).
     - while "permissions.manager.remote.enabled" is disabled through the default
       pref settings, the setting will actually only be active in the rapid (136)
       version.
    
    Bug: https://bugs.gentoo.org/950229
    Signed-off-by: Joonas Niilola <juippis@gentoo.org>

 www-client/firefox/Manifest               |  102 +++
 www-client/firefox/firefox-128.8.0.ebuild | 1380 +++++++++++++++++++++++++++++
 2 files changed, 1482 insertions(+)
Comment 35 stefan11111 2025-03-12 17:51:20 UTC
(In reply to Sam James from comment #32)
> It's nearly the first case.
> 
> int bar(int *p) {
>     if (!p) /* check is deleted with LTO if all callers are known because it
> has to be non-NULL for the earlier printf to be valid */
>         __builtin_abort(); 
> }
> 
> int foo(int* p) {
>     printf("%p\n", p); /* p must be non-null given we used it */
>     bar(p);
> }

Printing a NULL pointer is UB or something in c++?
I don't see why a NULL pointer can't be printed, as it's just printing
an unsigned long as hex, with some formatting.

$ cat printf.cpp
#include <stdio.h>

int main()
{
    int *p = NULL;
    printf("%p\n", p);
    return 0;
}

$ g++ printf.cpp -O3 -flto -o printf

$ ./printf
(nil)
Comment 36 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2025-03-13 10:47:44 UTC
I think printf was a bad example. The rest stands. Please read the linked post.