Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 941878 - www-client/firefox: reconsider IUSE=+clang default
Summary: www-client/firefox: reconsider IUSE=+clang default
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Mozilla Gentoo Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-10-20 07:19 UTC by Sam James
Modified: 2024-11-26 17:40 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-20 07:19:13 UTC
www-client/firefox currently defaults to +clang, but I'm not sure this is the right decision these days:
* +clang means more users are exposed to issues like bug 934385 (and this kind of thing crops up on every new LLVM and Rust version)
* in general, I think +clang needs some specific (ongoing) rationale to retain, rather than it being the status quo
* the bug it was added for (bug 727028) is long fixed

WDYT?
Comment 1 Joonas Niilola gentoo-dev 2024-10-20 07:35:26 UTC
It's default because it's "upstream-recommended" - ie upstream builds their releases with clang. And that's what devmanual recommends to do, too.
As you linked, there was a long time period when gcc didn't work at all (IIRC it was even longer than in the linked bug, because upstream didn't test with gcc at all). Even though upstream _does_ enable gcc builds with their CI now, I've anxiously been waiting for another huge gcc break to happen. We still have some issues with gcc, like +jumbo-build randomly crashing. 

Also, mine and immolo's own weak empiric speedometer benchmarks tests shows that +clang does give a minor but constant performance increase compared to -clang. Although I do acknowledge it's pretty subjective and shouldn't be taken seriously, as we are not benchmark gurus. I would actually like to do similar benchmarks as what SuSE used to do (sorry, the sources are now 404) which in their tests benefited GCC. Let's remember that SuSE have paid GCC developers and currently they do build their Firefox with GCC, so their whole distro may have some GCC-magic we don't - and it's easier to optimize a binary distro for that.

But you're right. The user experience is currently pretty awful. I'd hate to lose the (possibility of) increased performance when switching over, but I'm fine trying it out on next version bump. Let me know how you feel after reading my rambling on the subject :P
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-20 07:57:01 UTC
(In reply to Joonas Niilola from comment #1)
> It's default because it's "upstream-recommended" - ie upstream builds their
> releases with clang. And that's what devmanual recommends to do, too.

I think the same argument would also then apply to -O3 though (which is perhaps not a terrible idea for FF, actually, as Clang -O3 doesn't even change much, so it'd be more like GCC -O3). But we don't have a good consistent policy on what to do when upstream build with -O3 (or LTO).

> As you linked, there was a long time period when gcc didn't work at all
> (IIRC it was even longer than in the linked bug, because upstream didn't
> test with gcc at all). Even though upstream _does_ enable gcc builds with
> their CI now, I've anxiously been waiting for another huge gcc break to
> happen. We still have some issues with gcc, like +jumbo-build randomly
> crashing. 

It doesn't help that it's kind of a circular thing. I noticed some issues when I started -clang recently on Firefox with GCC trunk and it led to a GCC bug being fixed (quickly!) as well as a performance improvement in FF. But less testing means less improvements means a worse state.

> 
> Also, mine and immolo's own weak empiric speedometer benchmarks tests shows
> that +clang does give a minor but constant performance increase compared to
> -clang. Although I do acknowledge it's pretty subjective and shouldn't be
> taken seriously, as we are not benchmark gurus. I would actually like to do
> similar benchmarks as what SuSE used to do (sorry, the sources are now 404)
> which in their tests benefited GCC.

https://build.opensuse.org/package/show/mozilla:Factory/MozillaFirefox

I see the following notable patches:
* https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/files/mozilla-pgo.patch?expand=1 (IIRC we have one for PGO but it might be an older version of this, not compared)
* https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/files/mozilla-silence-no-return-type.patch?expand=1 (this will shut up -Wreturn-type with GCC but may also provide better results, especially given some are in Skia)

For PGO, they make sure to run xvfb-run explicitly (no idea if that's needed these days):
```
%if 0%{?do_profiling}
xvfb-run --server-args="-screen 0 1920x1080x24" \
%endif
./mach build -v
```

Fedora has https://src.fedoraproject.org/rpms/firefox/blob/rawhide/f/firefox-gcc-always-inline.patch but I think that may be a compile-time difference they're referring to in terms of speed (always_inline ofc might make a runtime perf. difference but the comment kind of reads like it's talking about compile-time).

> Let's remember that SuSE have paid GCC
> developers and currently they do build their Firefox with GCC, so their
> whole distro may have some GCC-magic we don't - and it's easier to optimize
> a binary distro for that.

That's a fair point, but they don't AFAIK. I know who the suse developers on GCC are and they are very much upstream-first.

> 
> But you're right. The user experience is currently pretty awful. I'd hate to
> lose the (possibility of) increased performance when switching over, but I'm
> fine trying it out on next version bump. Let me know how you feel after
> reading my rambling on the subject :P

IMO the pain of bug 934385 means that the balance tips s.t. those who seriously want Clang should work on it ;)

I suggest we try it out and see how it goes. The next version bump should contain that bug I reported wrt __builtin_shuffle as well which improved some GL code.
Comment 3 Joonas Niilola gentoo-dev 2024-10-20 08:02:41 UTC
https://documentation.suse.com/sbp/devel-tools/html/SBP-GCC-10/index.html#sec-gcc10-firefox here's the document I was referring to. Although it's a bit old, at least the CPU used is somewhat recent (AMD Ryzen 7 3700X).

Interesting part:
> Note that, since a big portion of Firefox is written in Rust and the whole 
> program analysis is limited to the parts written in C++, the LTO benefits 
> are smaller than the typical case 

> Work on the Rust GCC front-end has started only recently but we hope that we 
> will overcome this limitation.

does this mean lto on rust parts doesn't "work" at all with gcc, while clang backed by lld and llvm can make rust-lto work? 

Other quotes I want to raise from the doc: 
> To evaluate how GCC is doing compared to other compilers, we regularly 
> compare the Firefox binaries produced by GCC to those emitted by LLVM/Clang, 
> which is currently the preferred compiler by the team at Mozilla. 

>  In the tp50 benchmark and LTO plus PGO configurations, the aggressive 
> optimizing for size means that GCC is slower, albeit measurably. Because 
> the singletons test is part of the train run, GCC does not have this issue 
> when running this benchmark (see section 8.4 for more details on the train 
> run). In the responsiveness test, GCC wins with profile feedback but lags a 
> little without it. Because Mozilla Firefox is a large application that is 
> intended to be built with PGO, we consider the results with profile 
> feedback the most important. Running speedometer on the AMD machine leads to 
> similar results. 

I'll reply to your points in a new message to make this easier to read.
Comment 4 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-20 08:12:18 UTC
(In reply to Joonas Niilola from comment #3)
> https://documentation.suse.com/sbp/devel-tools/html/SBP-GCC-10/index.
> html#sec-gcc10-firefox here's the document I was referring to. Although it's
> a bit old, at least the CPU used is somewhat recent (AMD Ryzen 7 3700X).
> 

Ah, yeah, I found the newer version of that doc for GCC 12 and it indeed doesn't have the FF section apart from briefly.

For GCC 11, it's https://documentation.suse.com/sbp/devel-tools/html/SBP-GCC-11/index.html#sec-gcc11-firefox.

> Interesting part:
> > Note that, since a big portion of Firefox is written in Rust and the whole 
> > program analysis is limited to the parts written in C++, the LTO benefits 
> > are smaller than the typical case 
> 
> > Work on the Rust GCC front-end has started only recently but we hope that we 
> > will overcome this limitation.
> 
> does this mean lto on rust parts doesn't "work" at all with gcc, while clang
> backed by lld and llvm can make rust-lto work? 
> 

This is right and a shame. We won't have that until gccrs is ready. I don't know if Honza has some plan to somehow make it work without gccrs (I don't see how it's possible but I wonder what was meant by that last part...).
Comment 5 Joonas Niilola gentoo-dev 2024-10-20 08:12:44 UTC
(In reply to Sam James from comment #2)
> 
> I think the same argument would also then apply to -O3 though (which is
> perhaps not a terrible idea for FF, actually, as Clang -O3 doesn't even
> change much, so it'd be more like GCC -O3). But we don't have a good
> consistent policy on what to do when upstream build with -O3 (or LTO).

The linked SuSE benchmark above talks about this, actually. -O3 makes a huge difference in Firefox only with lto. 


> 
> I see the following notable patches:
> *
> https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/
> files/mozilla-pgo.patch?expand=1 (IIRC we have one for PGO but it might be
> an older version of this, not compared)

We have essentially the same patch in our tarballs. I usually take it from Fedora's code repos because Fedora is faster to bump Firefox than openSUSE is. But the patch is pretty much identical.


> *
> https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/
> files/mozilla-silence-no-return-type.patch?expand=1 (this will shut up
> -Wreturn-type with GCC but may also provide better results, especially given
> some are in Skia)
> 
> For PGO, they make sure to run xvfb-run explicitly (no idea if that's needed
> these days):
> ```
> %if 0%{?do_profiling}
> xvfb-run --server-args="-screen 0 1920x1080x24" \
> %endif
> ./mach build -v
> ```

For PGO and with +X we also run it through virtx, which utilizes xvfb-run. With -X +wayland, I don't actually know how it works. A user named ambasta provided the code snippet that works with -X +wayland.


> 
> Fedora has
> https://src.fedoraproject.org/rpms/firefox/blob/rawhide/f/firefox-gcc-always-
> inline.patch but I think that may be a compile-time difference they're
> referring to in terms of speed (always_inline ofc might make a runtime perf.
> difference but the comment kind of reads like it's talking about
> compile-time).

We have a modified version of https://bugzilla.mozilla.org/show_bug.cgi?id=1874059#c3 this patch which turns on "-fvisibility=hidden", "-fvisibility-inlines-hidden" for clang:

$ cat 0022-bmo-1874059-fix-libcxx-18.patch 
--- a/build/moz.configure/toolchain.configure   2024-03-16 22:53:15.409390707 +0000
+++ b/build/moz.configure/toolchain.configure   2024-03-16 22:57:02.661805132 +0000
@@ -2183,10 +2183,10 @@
 set_define("_LIBCPP_HIDE_FROM_ABI", libcxx_override_visibility.hide_from_abi)
 
 
-@depends(target, build_environment)
-def visibility_flags(target, env):
+@depends(target, build_environment, c_compiler)
+def visibility_flags(target, env, c_compiler):
     if target.os != "WINNT":
-        if target.kernel == "Darwin":
+        if target.kernel == "Darwin" or (c_compiler.type == "clang" and c_compiler.version >= "17.0.0"):
             return ("-fvisibility=hidden", "-fvisibility-inlines-hidden")
         return (
             "-I%s/system_wrappers" % os.path.join(env.dist),



> 
> I suggest we try it out and see how it goes. The next version bump should
> contain that bug I reported wrt __builtin_shuffle as well which improved
> some GL code.

You mean these patches which are already included? :)
 0028-bmo-1917964-gcc-15-swgl-fix.patch
 0029-bmo-1917964-gcc-15-swgl-fix-2.patch
Comment 6 Larry the Git Cow gentoo-dev 2024-10-29 11:29:14 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=cef95379ab8792e466d965c27d26a4f095a8feda

commit cef95379ab8792e466d965c27d26a4f095a8feda
Author:     Joonas Niilola <juippis@gentoo.org>
AuthorDate: 2024-10-29 10:02:24 +0000
Commit:     Joonas Niilola <juippis@gentoo.org>
CommitDate: 2024-10-29 11:27:37 +0000

    www-client/firefox: add 132.0
    
     - drop 'append-ldflags "-Wl,--compress-debug-sections=zlib"'
     - drop apulse-related messages - apulse[sdk] should be a 1:1 replacement for
       pulseaudio,
     - drop bunch of upstreamed patches,
     - include a patch in an attempt to enhance results when using gcc to build,
     - toggle the default 'clang' use flag off.
    
    Closes: https://bugs.gentoo.org/939445
    Closes: https://bugs.gentoo.org/941878
    Signed-off-by: Joonas Niilola <juippis@gentoo.org>

 www-client/firefox/Manifest             |  102 +++
 www-client/firefox/firefox-132.0.ebuild | 1336 +++++++++++++++++++++++++++++++
 2 files changed, 1438 insertions(+)
Comment 7 Joonas Niilola gentoo-dev 2024-10-29 11:40:52 UTC
ESR will follow.

https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/files/mozilla-silence-no-return-type.patch?expand=1 needed a rebase and if that becomes a monthly affair, I'm not interested. People can utilize it via /etc/portage/patches. The inlining patch from Fedora is included.
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-29 15:33:16 UTC
(In reply to Joonas Niilola from comment #7)
> ESR will follow.
> 

Thanks!

> https://build.opensuse.org/projects/mozilla:Factory/packages/MozillaFirefox/
> files/mozilla-silence-no-return-type.patch?expand=1 needed a rebase and if
> that becomes a monthly affair, I'm not interested. People can utilize it via
> /etc/portage/patches. The inlining patch from Fedora is included.

If it does become an issue:
1) that's totally reasonable & np;
2) I'll try take a look to see what happened (if anything) with it being upstreamed etc
Comment 9 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-10-29 15:35:43 UTC
I have a suspicion that the root of it might be that e.g. SK_ABORT doesn't get the right attributes.
Comment 10 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-26 14:22:26 UTC
For completeness:

commit c4bc0b2fa035a160a15a017e38f21b8db9f2b1e3
Author: Joonas Niilola <juippis@gentoo.org>
Date:   Tue Nov 26 16:03:45 2024 +0200

    www-client/firefox: add 133.0

     - add support for (64-bit) riscv-musl,
     - require autoconf-2.71,
     - restore +clang use flag since the issues that made us disable it are mostly
       solved with slotted rust approach.

    Signed-off-by: Joonas Niilola <juippis@gentoo.org>
Comment 11 Joonas Niilola gentoo-dev 2024-11-26 15:25:12 UTC
Yeah, that was a short-lived experiment :) 

As said before, +clang gives by default a better experience since it reduces resources and time needed to compile Firefox, and provides better performance out of the box. I think we all were surprised, especially users, by the increased resource requirements with gcc.

Curiously btw 
https://src.fedoraproject.org/rpms/firefox/c/3b51a028b9adf9231eb80ef6c3f9de935eb5a186?branch=rawhide

we took this off previously. If I recall the conversation we had about these flags, they were supposed to be noop?
Comment 12 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-26 15:31:19 UTC
It served its purpose in avoiding the Rust hell while we waited for Rust slotting (which came around much quicker than I was expecting, which is great). I hope the numbers improve and we can look at it again in a while.

> If I recall the conversation we had about these flags, they were supposed to be noop?

No, they shouldn't be a noop, but they'll make linking slower. But I think only -Wl,--no-keep-memory is supported by LLD (and no idea how effective it is), it doesn't support -Wl,--reduce-memory-overheads.
Comment 13 Joonas Niilola gentoo-dev 2024-11-26 15:40:22 UTC
We have this currently in 115:

if use clang ; then
	# Nothing to do
	:;
elif use lto ; then
	append-ldflags -Wl,--no-keep-memory
else
	append-ldflags -Wl,--no-keep-memory -Wl,--reduce-memory-overheads
fi

but it's only for arm.
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-11-26 15:48:41 UTC
This is something we keep doing in a few ebuilds ad-hoc and I keep thinking about it. It doesn't cater for people who have low-memory amd64 systems and I don't want to penalise everyone building on amd64.

Maybe we could do two things:
1) Add a check-reqs function / add a toolchain-funcs (whatever) function which adds these flags if memory is <N
2) Update the handbook and wiki to recommend these flags if users have lower amounts of RAM

Or what do you think? They do help quite a bit.
Comment 15 Joonas Niilola gentoo-dev 2024-11-26 17:40:25 UTC
(In reply to Sam James from comment #14)
> This is something we keep doing in a few ebuilds ad-hoc and I keep thinking
> about it. It doesn't cater for people who have low-memory amd64 systems and
> I don't want to penalise everyone building on amd64.
> 
> Maybe we could do two things:
> 1) Add a check-reqs function / add a toolchain-funcs (whatever) function
> which adds these flags if memory is <N
> 2) Update the handbook and wiki to recommend these flags if users have lower
> amounts of RAM
> 
> Or what do you think? They do help quite a bit.

I've been meaning to add a check-req function for memory, but there are just so many conditions (and I'd have to update them frequently with new versions) it's not easy. 

Like, makeopts, -+clang, +-lto, +-pgo...

so yeah, not sure where to draw the red line.

Handbook/wiki addition could be made, but I doubt people are going to "find" them if the build crashes. I mean, people don't often even realize they've oomed until someone looks at the logs and tells them that.

So, yeah, not sure what is the best choice here, but defaulting to +clang should be a good start.