Summary: | www-client/firefox-99.0: Needs ~ 64 GB RAM to build with specific use flags | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Simeon Simeonov <sgs> |
Component: | Current packages | Assignee: | Mozilla Gentoo Team <mozilla> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ajak, amanzharov, dharding, gentoobugs, joe, jstein, kuba.iluvatar, Letto2, lo48576, mail, perfect007gentleman, rodolfo.boer, vovan |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
build.log
emerge --info |
Description
Simeon Simeonov
2022-04-07 16:41:35 UTC
Created attachment 769286 [details]
build.log
I have had this, too. Despite having plenty of RAM, lld consumed more and more until the OOM killer kicked in. If nobody has a better idea, I will perhaps try again without lto. Note that I do not have pgo enabled. USE="-clang" seems to fix the build for me Getting hit by this too, having 16GB RAM with nothing else running on a system that was previously able to build Firefox 98 even with /var/tmp mounted as zram including pgo and lto support. I re-tried with PORTAGE_TMPDIR set to a HDD to give it the maximum amount of ram. The linker seems to loop until it fully runs out of memory and the OOM kicks in, regardless of the available memory. With 64GB of RAM, I get this result : 9:49.78 Your build was successful! I was watching "top", and memory used got up to 15-16GB during certain parts of the build. You may be able to squeak-by using some memory saving tricks like : MAKEOPTS="-j1" and/or booting the machine with no GUI ... Same here. Using clang, too. Can it be related with: --with-libclang-path=/usr/lib/llvm/13/lib64 [...] checking the target C compiler version... 14.0.0 ? And the error is: /usr/lib/llvm/14/bin/../../../../lib/clang/14.0.0/include/emmintrin.h:2378:19: error: use of undeclared identifier '__builtin_elementwise_max', err: true No lto, 8GB RAM, no oom. Yes, I would also guess the problem has something to do with mixing llvm 13 and 14. And then there is the issue of having built rust with system-llvm, so being tied to lld-13. So now I will go on with USE=-clang. lld seems to hang when trying to link libmozavcodec, so if any use flag disables avcodec that would be a workaround for this issue. I don't think any use flag does this (altough openh264, system-av1, system-libvpx and system-webp seem somewhat related...) Maybe I'll try to switch to building without /var/tmp mounted as tmpfs or to USE="-clang" later this evening. I an unable to build www-client/firefox-99.0[clang,lto] too.
Specifically, /usr/bin/ld.lld OOMs when linking libxul.so.
Build log snippets:
> --with-libclang-path=/usr/lib/llvm/13/lib64 Gentoo default
> checking for the target C++ compiler... /usr/lib/llvm/13/bin/x86_64-pc-linux-gnu-clang++
Versions:
$ /usr/bin/ld.lld --version
LLD 13.0.1 (compatible with GNU linkers)
[IP-] [ ] sys-devel/lld-13.0.1:0
[IP-] [ ] sys-devel/clang-11.1.0:11/11.1
[IP-] [ ] sys-devel/clang-13.0.1:13
[IP-] [ ] sys-devel/clang-14.0.0-r1:14
[IP-] [ ] sys-devel/llvm-11.1.0:11
[IP-] [ ] sys-devel/llvm-13.0.1:13
[IP-] [ ] sys-devel/llvm-14.0.0:14
> Specifically, /usr/bin/ld.lld OOMs when linking libxul.so.
Update: with enough swap it does eventually build successfully in this USE="clang lto" config.
The peak memory usage of the ld.lld process while linking libxul.so is about 40 GiB, however.
I really think this is because of mixing lld-13 on a clang-14 system and the title should be updated to match that. Digging around mozilla's bugzilla and it doesn't appear like we should be blocking 14 in both cases (see: https://bugzilla.mozilla.org/show_bug.cgi?id=1758780 ). Doing a test compile right now and will open a new bug to lift the max slot if it succeeds. Nope, still uses obscene amounts of ram with lld-14, can't build it on a 32GB ram system. I have this problem on a system with only llvm and clang 13 so it doesn't seem a problem of conflicting llvm versions. Managed to successfully build FF99 with the following USE flags with lld 14: clang dbus gmp-autoupdate hwaccel lto openh264 pgo pulseaudio system-av1 system-harfbuzz system-icu system-jpeg system-libevent system-libvpx system-png system-webp wayland -debug -eme-free -geckodriver -hardened -jack -libproxy -screencast -selinux -sndio -wifi emerge --info here: https://pastebin.com/JWPK0hzs Basically what I did: I changed the ebuild manually to accept lld 14 with simple digit swaps. Also guess that rust-1.60 binary played some role too. I have 32G RAM and set tmp ZRAM to '36G'. Peak mem consumption was about 29-30GB (KDE Desktop + Google Chrome with few tabs to watch YT) and peak amount of data in tmp was about 5G compressed (which is 12G+ uncompressed). Guess that saved me from running out of RAM. Building process took about 49-50 mins, when usually it was 33-34 mins, and lld phase (or what it is) was ~15 minutes, while lld ate 66% of available RAM. Did it twice. sys-devel/clang-13.0.1 dev-lang/rust-1.59.0 www-client/firefox-99.0::gentoo was built with the following: USE="clang dbus geckodriver gmp-autoupdate hardened hwaccel lto openh264 pulseaudio screencast system-av1 system-harfbuzz system-icu system-jpeg system-libevent system-libvpx system-png system-webp wayland wifi -debug -eme-free -jack -libproxy -pgo (-selinux) -sndio" ABI_X86="(64)" 16g ram + 18g swap is not enough to link libxul.so Adding 10g swap file solves link issue. Peak ld.ldd memory comsumption near to 30G PID %MEM VIRT SWAP RES CODE DATA SHR nMaj nDRT %CPU COMMAND *** 78,6 28,8g 15,2g 12,1g 0,0m 27,9g 136,8m 3,3m 0 197,4 /usr/bin/ld.lld --eh-frame-hdr -m elf_x86_64 -shared -o libxul.so ... Just another voice in the chorus: Ran into the same system. Was able to get it compiled by adding >20 GB swapfile. Maybe it's time to add append-ldflags "-Wl,--no-keep-memory" somewhere in the ebuild. This memory requirement is insane... (In reply to Lars Wendler (Polynomial-C) from comment #17) > Maybe it's time to add > > append-ldflags "-Wl,--no-keep-memory" > > somewhere in the ebuild. Does this flag even exist for lld? The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fd5635e0c7269edb0261a35ff9e779cfc26288e9 commit fd5635e0c7269edb0261a35ff9e779cfc26288e9 Author: Joonas Niilola <juippis@gentoo.org> AuthorDate: 2022-04-09 15:25:02 +0000 Commit: Joonas Niilola <juippis@gentoo.org> CommitDate: 2022-04-09 15:25:02 +0000 www-client/firefox: enable llvm:14 for 99.0 Bug: https://bugs.gentoo.org/836587 Bug: https://bugs.gentoo.org/837122 Signed-off-by: Joonas Niilola <juippis@gentoo.org> www-client/firefox/firefox-99.0.ebuild | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) So I haven't been able to reproduce this. When I pushed firefox-99, I had rust-1.59 built with llvm:13 stack and nothing from llvm:14 installed. Today, I've done multiple clang+lto+pgo runs while testing llvm:14, and the container capped at 22 GiB memory usage with MAKEOPTS="-j16". The lowest was 19 GiB. And while that seems higher than ever, and higher compared to ESR, it's not out of the ordinary for linking to require 1 GiB memory per thread. And for me it seems to be pretty close to 1 GiB, maybe few hundred MB higher. So I hope this has been caused by some mismatched versions between rust, llvm, clang and lld - which wouldn't be the first time it's caused troubles... Created attachment 769703 [details]
emerge --info
Since you can't reproduce please include your emerge --info.
I managed to get it compiled disabling tmpfs and having things be clang/lld-14 pure and not mixed. Still, it cuts it close on a 32GB system without swap.
(In reply to Joonas Niilola from comment #20) > So I haven't been able to reproduce this. When I pushed firefox-99, I had > rust-1.59 built with llvm:13 stack and nothing from llvm:14 installed. I have Rust-1.60, LLVM-14. Cannot build Firefox-99 with LTO. Have no problems with <=FF-98.0.2. IMO, that Mozilla's issue. (In reply to Perfect Gentleman from comment #22) > (In reply to Joonas Niilola from comment #20) > > So I haven't been able to reproduce this. When I pushed firefox-99, I had > > rust-1.59 built with llvm:13 stack and nothing from llvm:14 installed. > > I have Rust-1.60, LLVM-14. Cannot build Firefox-99 with LTO. Have no > problems with <=FF-98.0.2. > IMO, that Mozilla's issue. Yeah, Firefox-ESR capped at <7 GB with clang+lto+pgo. Could you just for reference tell the max amount of RAM used with 98.0.2? 99.0 got stuck linking libxul.so for 20 minutes which also raised memory usage to 20 GB, but it passed and the browser runs fine. So there's definitely something going on in there. FYI I've disabled domstreams, but it didn't seem to affect the huge RAM usage when building Firefox. https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=8bc3be8f915ac1fe436546c0b007c7e293600b9f I'm gonna get a new build.log and flag a Mozilla ticket about this. Weird that no other distro seems to be suffering from this, we can't be the only ones using clang+lld to build Firefox. So I fear it's something to do with our rust+lld connections. I think the issue might be introduced by one of the patches from the new patchset of firefox 99. When there was no official ebuild available I copied firefox-98.0.2.ebuild to a local repository and renamed it to firefox-99.0.ebuild. Then I removed 0033-resolve-fs-symlinks-bmo1753182.patch from the firefox-98-patches-04j-org.tar.xz patchset as it did not apply to firefox 99. Compiling firefox 99 using the reduced patchset from firefox 98 with clang and lto works fine without using a huge amount of memory. When I try to emerge firefox 99 using the official ebuild and official patchset one lld process takes forever and fails with huge memory consumption: /usr/bin/ld.lld --eh-frame-hdr -m elf_x86_64 -shared -o libxul.so /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.1/../../../../lib64/crti.o /usr/lib/llvm/14/bin/../../../../lib/clang/14.0.0/lib/linux/clang_rt.crtbegin-x86_64.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.1 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.1/../../../../lib64 -L/lib/../lib64 -L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.1/../../../../x86_64-pc-linux-gnu/lib -L/lib -L/usr/lib -plugin-opt=mcpu=znver3 -plugin-opt=O2 -plugin-opt=-function-sections -plugin-opt=-data-sections -z defs --gc-sections -h libxul.so /mnt/portage/compilespace/portage/www-client/firefox-99.0/work/firefox_build/toolkit/library/build/libxul_so.list -plugin-opt=-import-instr-limit=10 -plugin-opt=-import-hot-multiplier=30 -lpthread -O2 --as-needed -z relro -z now -z relro -z now --compress-debug-sections=zlib -rpath=/usr/lib64/firefox --enable-new-dtags -z noexecstack -z text -z relro -z nocopyreloc -Bsymbolic-functions -rpath-link /mnt/portage/compilespace/portage/www-client/firefox-99.0/work/firefox_build/dist/bin -rpath-link /usr/lib ../../../js/src/build/libjs_static.a /mnt/portage/compilespace/portage/www-client/firefox-99.0/work/firefox_build/x86_64-unknown-linux-gnu/release/libgkrust.a ../../../security/sandbox/linux/libmozsandbox.so ../../../config/external/lgpllibs/liblgpllibs.so ../../../config/external/sqlite/libmozsqlite3.so ../../../widget/gtk/mozgtk/libmozgtk.so --version-script symverscript -licui18n -licuuc -licudata -laom -ldav1d -lasound -lrt -lm -ldl -lX11 -lXcomposite -lXdamage -lXext -lXfixes -lXrandr -lXrender -lXtst -lpthread -lc -lffi -lplds4 -lplc4 -lnspr4 -lz -lssl3 -lsmime3 -lnss3 -lnssutil3 -lfreetype -lfontconfig -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -lharfbuzz -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0 -lgraphite2 -lwebpdemux -lwebp -levent -lvpx -lpixman-1 -ldbus-glib-1 -ldbus-1 -lxcb-shm -lX11-xcb -lxcb -lXcursor -lXi -lstdc++ -lm /usr/lib/llvm/14/bin/../../../../lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a -l:libunwind.so -lpthread -lc /usr/lib/llvm/14/bin/../../../../lib/clang/14.0.0/lib/linux/libclang_rt.builtins-x86_64.a -l:libunwind.so /usr/lib/llvm/14/bin/../../../../lib/clang/14.0.0/lib/linux/clang_rt.crtend-x86_64.o /usr/lib/gcc/x86_64-pc-linux-gnu/11.2.1/../../../../lib64/crtn.o Firefox is build with "clang dbus gmp-autoupdate hardened hwaccel lto openh264 pulseaudio system-av1 system-harfbuzz system-icu system-jpeg system-libevent system-libvpx system-webp -debug -eme-free -geckodriver -jack -libproxy -pgo -screencast (-selinux) -sndio -system-png -wayland -wifi" app-misc/pax-utils: 1.3.3::gentoo app-shells/bash: 5.1_p16::gentoo dev-java/java-config: 2.3.1::gentoo dev-lang/perl: 5.34.1::gentoo dev-lang/python: 3.9.12::gentoo, 3.10.4::gentoo dev-lang/rust: 1.60.0::gentoo dev-util/cmake: 3.23.0::gentoo dev-util/meson: 0.61.4-r2::gentoo sys-apps/baselayout: 2.8::gentoo sys-apps/sandbox: 2.29::gentoo sys-apps/systemd: 250.4-r1::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.71-r1::gentoo sys-devel/automake: 1.16.5::gentoo sys-devel/binutils: 2.38-r1::gentoo sys-devel/binutils-config: 5.4.1::gentoo sys-devel/clang: 14.0.0-r1::gentoo sys-devel/gcc: 11.2.1_p20220115::gentoo sys-devel/gcc-config: 2.5-r1::gentoo sys-devel/libtool: 2.4.7::gentoo sys-devel/lld: 14.0.0::gentoo sys-devel/llvm: 14.0.0::gentoo sys-devel/make: 4.3::gentoo sys-kernel/linux-headers: 5.17::gentoo (virtual/os-headers) sys-libs/glibc: 2.35-r2::gentoo (In reply to Florian K. from comment #24) > I think the issue might be introduced by one of the patches from the new > patchset of firefox 99. > What a great hint. We have 3 patches for gcc+pgo and I had to rebase them for 99.0 because gcc+pgo failed, so I may have missed something there. On it! So just looking at the patch listings, there is one new patch for Firefox 99, 0031-pgo-use-toolchain-disable-watchdog-fix-on-gcc.patch, which seems to change -flto=thin to -flto in build/moz.configure/lto-pgo.configure. I'm trying a rebuild right now with just those lines from that patch reverted and will I report when finished (previously, I could not successfully link libxul for Firefox 99 even with 32GB RAM and 32GB swap). Yep, can already say that's extra. Gonna do runs with clang/gcc +lto +pgo and see what happens. So dropping -flto lines from the 0031-pgo-use-toolchain-disable-watchdog-fix-on-gcc.patch was sufficient for firefox-99 to build for me. (In reply to Joonas Niilola from comment #23) > (In reply to Perfect Gentleman from comment #22) > > (In reply to Joonas Niilola from comment #20) > > > So I haven't been able to reproduce this. When I pushed firefox-99, I had > > > rust-1.59 built with llvm:13 stack and nothing from llvm:14 installed. > > > > I have Rust-1.60, LLVM-14. Cannot build Firefox-99 with LTO. Have no > > problems with <=FF-98.0.2. > > IMO, that Mozilla's issue. > > Yeah, Firefox-ESR capped at <7 GB with clang+lto+pgo. Could you just for > reference tell the max amount of RAM used with 98.0.2? > > 99.0 got stuck linking libxul.so for 20 minutes which also raised memory > usage to 20 GB, but it passed and the browser runs fine. So there's > definitely something going on in there. > > FYI I've disabled domstreams, but it didn't seem to affect the huge RAM > usage when building Firefox. > https://gitweb.gentoo.org/repo/gentoo.git/commit/ > ?id=8bc3be8f915ac1fe436546c0b007c7e293600b9f > > I'm gonna get a new build.log and flag a Mozilla ticket about this. Weird > that no other distro seems to be suffering from this, we can't be the only > ones using clang+lld to build Firefox. So I fear it's something to do with > our rust+lld connections. Don't know. I've 16GB of RAM, portage_tmp_dir in tmpfs. Built FF with LTO+PGO. Before 99 it was okay. The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7ed1fb8a42366e60ea020b1dc40983dd7b414243 commit 7ed1fb8a42366e60ea020b1dc40983dd7b414243 Author: Joonas Niilola <juippis@gentoo.org> AuthorDate: 2022-04-10 12:53:21 +0000 Commit: Joonas Niilola <juippis@gentoo.org> CommitDate: 2022-04-10 12:54:40 +0000 www-client/firefox: update patch set for 99.0 - accidentally enabled full-lto instead of thinlto with clang, which requires tons more of RAM. With bfd (gcc) we already use '--no-keep-memory'. Closes: https://bugs.gentoo.org/837122 Signed-off-by: Joonas Niilola <juippis@gentoo.org> www-client/firefox/Manifest | 2 +- www-client/firefox/firefox-99.0.ebuild | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Thanks! Here's an image to show the difference we had I guess. https://bug1644409.bmoattachments.org/attachment.cgi?id=9156178 (In reply to Joonas Niilola from comment #31) > Thanks! Here's an image to show the difference we had I guess. > > https://bug1644409.bmoattachments.org/attachment.cgi?id=9156178 Thanks juippis! (In reply to Larry the Git Cow from comment #30) > - accidentally enabled full-lto instead of thinlto with clang, which So I guess at least these people who managed to build the package in its previous config (despite huge memory requirements) got themselves a really well optimized browser. ld.lld: warning: Linking two modules of different target triples: '/var/tmp/portage/www-client/firefox-99.0.1/work/firefox_build/x86_64-unknown-linux-gnu/release/libgkrust.a(encoding_c-49418a4a95d4a761.encoding_c.509369a0-cgu.0.rcgu.o at 97910880)' is 'x86_64-unknown-linux-gnu' whereas '/var/tmp/portage/www-client/firefox-99.0.1/work/firefox_build/toolkit/library/build/../../../dom/media/eme/Unified_cpp_dom_media_eme0.o' is 'x86_64-pc-linux-gnu' This warnings during link occurs only if -flto=thin. May be it is ld.ldd bug. |