Summary: | www-client/firefox-94.0.2: crashes with glibc 2.34 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Andreas Fink <finkandreas> |
Component: | Current packages | Assignee: | Gentoo Toolchain Maintainers <toolchain> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ajak, da5id2001, de.techno, gentoo, herrtimson, jlp.bugs, jstein, mail, mozilla, sam |
Priority: | Normal | Keywords: | PATCH |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://sourceware.org/bugzilla/show_bug.cgi?id=28752 | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=803950 https://bugzilla.redhat.com/show_bug.cgi?id=2084588 |
||
Whiteboard: | Workaround: use dev-libs/glib[-fam] | ||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 803482 | ||
Attachments: |
small reproducer
nss-database-yield-lock.patch |
Description
Andreas Fink
2021-12-03 12:21:22 UTC
Created attachment 757256 [details]
small reproducer
Your reproducer doesn't crash on <glibc-2.34 but crashes as expected on systems using glibc-2.34 (I also verified on Fedora for example). I am assigning this bug to toolchain for now. They should tell us if this is a problem in glibc or working as designed. Because you mentioned you experienced this crash when using Firefox: Could you please describe how to reproduce with firefox? Haven't seen this crash in firefox yet and I am also failing to find crash reports at https://crash-stats.mozilla.org/ for Fedora. I have described to ways to consistently crash it here: https://bugs.gentoo.org/803950#c17 I added a backtrace in this comment: https://bugs.gentoo.org/803950#c14 The backtrace involves the functions `g_app_info_get_default_for_type` and `FAMOpen`. I do not have any insight why the two crashing actions (dragging an image / opening the context menu of ublock origin) enters at some point the function `g_app_info_get_default_for_type` which ultimately leads to calling `getpwuid(getuid())` (this happens /usr/lib64/libfam.so.0 as a result of a call to `FAMOpen`). Maybe I have a non-standard setup where libfam is involved but not on Fedore, no idea. But I'm not the only one who sees this exact crash. Also there's a discussion in the forum about this crash: https://forums.gentoo.org/viewtopic-p-8650991.html?sid=30f593c8b7d401d9b8a22662ebed0cb3 Forgot to mention that I also send it to the libc-help mailinglist: https://sourceware.org/pipermail/libc-help/2021-December/006061.html [Note that there was some discussion first in the other bug and an experimental patch was posted there too: https://bugs.gentoo.org/803950#c15]. Compiling dev-libs/glib with USE=-fam solves the issue too. Maybe on Fedore glib is not compiled with FAM support, that is why there is no bugreport showing up on Fedoras bugtracker. So far it seems that you need to compile dev-libs/glib +fam, use glibc-2.34, do NOT use the patch that I posted in the other bug report, and then you probably experience the crash (start dragging an image, open contet menu of ublock origin addon). I experienced the described UI behaviour wrt uBlock Origin (and similar for Multi-Account Containers) on updating to glibc-2.34 and firefox-bin-96. I can confirm rebuilding dev-libs/glib USE="-fam" prevents it. Andreas, is this the best workaround for the time being, or would it be better instead to apply the glibc patch from the other bug? I also use "dev-libs/glib -fam" as workaround. I'm not going to enable FAM support, even when the bug is fixed in glibc. After finding out about it and reading about FAM I got the impression that it is not really useful. Also there is the glibc patch in the other bugreport. Sam reported the bug at the libc bugzilla (see URL above) with a different (and more appropriate) patch. *** Bug 832504 has been marked as a duplicate of this bug. *** The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6415ce699bf1dafc403be7464df662b2879687e8 commit 6415ce699bf1dafc403be7464df662b2879687e8 Author: Andreas K. Hüttel <dilfridge@gentoo.org> AuthorDate: 2022-02-12 18:44:46 +0000 Commit: Andreas K. Hüttel <dilfridge@gentoo.org> CommitDate: 2022-02-12 18:44:46 +0000 Add patch for bug 828070 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28752 Bug: https://bugs.gentoo.org/828070 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> ...ault-in-getpwuid-when-stat-fails-BZ-28752.patch | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) Any plans for new sys-libs/glibc release with updated glibc patchset containing the above fix? (In reply to Maciej S. Szmigiero from comment #11) > Any plans for new sys-libs/glibc release with updated glibc patchset > containing the above fix? We tend to batch them up with stable backports once they're posted. > We tend to batch them up with stable backports once they're posted.
So we are waiting for upstream to apply the patch, then backport it to glibc 2.34, correct?
(In reply to Maciej S. Szmigiero from comment #13) > > We tend to batch them up with stable backports once they're posted. > > So we are waiting for upstream to apply the patch, then backport it to glibc > 2.34, correct? No, we don't need to wait for this one in particular. We just usually don't do a new patchset release until a few fixes are queued up. I'll check what the situation is when I'm not on mobile. (In reply to Sam James from comment #14) > (In reply to Maciej S. Szmigiero from comment #13) > > > We tend to batch them up with stable backports once they're posted. > > > > So we are waiting for upstream to apply the patch, then backport it to glibc > > 2.34, correct? > > No, we don't need to wait for this one in particular. We just usually don't > do a new patchset release until a few fixes are queued up. I'll check what > the situation is when I'm not on mobile. https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6415ce699bf1dafc403be7464df662b2879687e8 This is in the (currently unkeyworded) =sys-libs/glibc-2.34-r9. Please test and give feedback. Thanks! (In reply to Sam James from comment #15) > This is in the (currently unkeyworded) =sys-libs/glibc-2.34-r9. Please test > and give feedback. Thanks for quickly providing an updated glibc Sam. With this version the Firefox web page process indeed no longer crashes. Instead, it hangs, with a spinner rotating endlessly. To be sure that it isn't something specific to my setup it would be great if somebody else had confirmed this finding. Andreas? My test picture: https://en.wikichip.org/wiki/File:intel_nehalem_lynfield_die_shot.jpg My test system package versions: * www-client/firefox Latest version available: 97.0.1 Latest version installed: 97.0.1 * sys-devel/clang Latest version available: 13.0.1 Latest version installed: 13.0.1 * sys-devel/gcc Latest version available: 11.2.1_p20220115 Latest version installed: 11.2.1_p20211127 (In reply to Maciej S. Szmigiero from comment #16) > > To be sure that it isn't something specific to my setup it would be great if > somebody else had confirmed this finding. > Andreas? > I am seeing this behavior too, as before with #803950 starting MOZ_DISABLE_CONTENT_SANDBOX=1 did get it going. * www-client/firefox Latest version available: 97.0.1 Latest version installed: 97.0.1 * sys-devel/clang Latest version available: 13.0.1 Latest version installed: 13.0.1 * sys-devel/gcc Latest version available: 11.2.1_p20220115 Latest version installed: 11.2.1_p20220115 (11.2.1 is active) Same behaviour here (no tabs will load). firefox-bin-97.0.1 glibc-2.34-r9 Resolved by rebuilding dev-libs/glib with USE="-fam" I think we need someone hitting this to report it to Firefox upstream (Mozilla) then. I'm pretty confident the glibc patch is right (although it could still be an issue on that side) though. (In reply to Sam James from comment #19) > I think we need someone hitting this to report it to Firefox upstream > (Mozilla) then. I'm pretty confident the glibc patch is right (although it > could still be an issue on that side) though. ... although that said, I can't explain why -fam still fixes it. So maybe I'm missing something. (In reply to Sam James from comment #20) > (In reply to Sam James from comment #19) > > I think we need someone hitting this to report it to Firefox upstream > > (Mozilla) then. I'm pretty confident the glibc patch is right (although it > > could still be an issue on that side) though. > > ... although that said, I can't explain why -fam still fixes it. So maybe > I'm missing something. waaait. We're taking a lock in nss_database_check_reload_and_get() and not yielding it when we return. Can someone try the attached patch please? Created attachment 766418 [details, diff]
nss-database-yield-lock.patch
(Note this patch is *on top* of the previous changes, so just add it in /etc/portage/patches and emerge -v1 glibc).
(In reply to Sam James from comment #21) > (In reply to Sam James from comment #20) > > (In reply to Sam James from comment #19) > > > I think we need someone hitting this to report it to Firefox upstream > > > (Mozilla) then. I'm pretty confident the glibc patch is right (although it > > > could still be an issue on that side) though. > > > > ... although that said, I can't explain why -fam still fixes it. So maybe > > I'm missing something. > > waaait. We're taking a lock in nss_database_check_reload_and_get() and not > yielding it when we return. > > Can someone try the attached patch please? I can confirm that the attached patch fixes the issue for me indeed, thanks Sam! (but the patch needs a small whitespace change to apply to sys-libs/glibc-2.34-r9). Sorry I haven't found the time to reproduce this yet, but I'm glad to see the progress that's happening in here. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6002612f230d2b8d88fefba6c6477a20e77efc23 commit 6002612f230d2b8d88fefba6c6477a20e77efc23 Author: Andreas K. Hüttel <dilfridge@gentoo.org> AuthorDate: 2022-03-07 01:03:46 +0000 Commit: Andreas K. Hüttel <dilfridge@gentoo.org> CommitDate: 2022-03-07 01:03:46 +0000 Additional fixup for the glibc/firefox/seccomp interaction Bug: https://bugs.gentoo.org/828070 Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> ...0302-Drop-glibc-lock-when-returning-early.patch | 36 ++++++++++++++++++++++ 1 file changed, 36 insertions(+) (In reply to Sam James from comment #22) > Created attachment 766418 [details, diff] [details, diff] > nss-database-yield-lock.patch > > (Note this patch is *on top* of the previous changes, so just add it in > /etc/portage/patches and emerge -v1 glibc). Patch is in 2.34-r10 Don't forget to update the proposed patch upstream at ${URL}. (In reply to Maciej S. Szmigiero from comment #27) > Don't forget to update the proposed patch upstream at ${URL}. Done & submitted: https://sourceware.org/bugzilla/show_bug.cgi?id=28752#c2. Thanks to all who helped out in this bug, we've got there! Fix landed upstream: - https://sourceware.org/git/?p=glibc.git;a=commit;h=3fdf0a205b622e40fa7e3c4ed1e4ed4d5c6c5380 - https://sourceware.org/git/?p=glibc.git;a=commit;h=ace9e3edbca62d978b1e8f392d8a5d78500272d9 It'll be in the next patchset for glibc-2.35 in Gentoo (possibly 2.34, but we're looking to move to 2.35 completely soon). Fixed upstream and in stable now via 2.35. |