828070 – www-client/firefox-94.0.2: crashes with glibc 2.34

Bug 828070 - www-client/firefox-94.0.2: crashes with glibc 2.34

Summary: www-client/firefox-94.0.2: crashes with glibc 2.34

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo Toolchain Maintainers

URL:	https://sourceware.org/bugzilla/show_...
Whiteboard:	Workaround: use dev-libs/glib[-fam]
Keywords:	PATCH

Duplicates (1):	832504 (view as bug list)
Depends on:
Blocks:	glibc-2.34
	Show dependency tree

Reported:	2021-12-03 12:21 UTC by Andreas Fink
Modified:	2022-08-08 09:16 UTC (History)
CC List:	10 users (show)

See Also:	803950 https://bugzilla.redhat.com/show_bug.cgi?id=2084588
Package list:
Runtime testing required:	---

Attachments
small reproducer (test.c,1.62 KB, text/x-csrc) 2021-12-03 12:22 UTC, Andreas Fink	Details
nss-database-yield-lock.patch (file_828070.txt,721 bytes, patch) 2022-03-06 12:08 UTC, Sam James	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas Fink 2021-12-03 12:21:22 UTC

This is a very in depth conclusion of me with a small reproducer program that segfaults with glibc-2.34, while with glibc-2.33 it gives a "Permission denied" error.
My reproducer (hopefully) is doing the same steps that lead to a crash in firefox-94.0.2 (and probably many other versions too).

Let me first start with an strace of firefox when I get a segmentation fault in libc, these are the last lines of the strace:
getuid()                                = 1000
newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|012545400720, st_size=0, ...}, 0) = 262
--- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_errno=EINVAL, si_call_addr=0x7f5c85cbaaaa, si_syscall=__NR_newfstatat, si_arch=AUDIT_ARCH_X86_64} ---
socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [33, 35]) = 0
sendmsg(28, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\220\0\0\0\0\0\0\0", iov_len=16}, {iov_base="/etc/nsswitch.conf\0", iov_len=19}, {iov_base=NULL, iov_len=0}], msg_iovlen=3, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[35]}], msg_controllen=24, msg_flags=0}, MSG_NOSIGNAL) = 35
close(35)                               = 0
recvmsg(33, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\0\0\0\0", iov_len=4}, {iov_base="\1\374\0\0\0\0\0\0\255p@\0\0\0\0\0\1\0\0\0\0\0\0\0\244\201\0\0\0\0\0\0"..., iov_len=144}], msg_iovlen=2, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 148
close(33)                               = 0
rt_sigreturn({mask=[]})                 = 0
newfstatat(AT_FDCWD, "/", {st_mode=S_IFBLK|014535000562, st_rdev=makedev(0, 0), ...}, 0) = 262
--- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_errno=EINVAL, si_call_addr=0x7f5c85cbaaaa, si_syscall=__NR_newfstatat, si_arch=AUDIT_ARCH_X86_64} ---
socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [33, 35]) = 0
sendmsg(28, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2\0\0\0\0\0\0\0\220\0\0\0\0\0\0\0", iov_len=16}, {iov_base="/\0", iov_len=2}, {iov_base=NULL, iov_len=0}], msg_iovlen=3, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[35]}], msg_controllen=24, msg_flags=0}, MSG_NOSIGNAL) = 18
close(35)                               = 0
recvmsg(33, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\363\377\377\377", iov_len=4}, {iov_base="", iov_len=144}], msg_iovlen=2, msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 4
close(33)                               = 0
writev(2, [{iov_base="Sandbox: ", iov_len=9}, {iov_base="Failed errno -13 op stat flags 0"..., iov_len=40}, {iov_base="\n", iov_len=1}], 3) = 50
rt_sigreturn({mask=[]})                 = -1 EACCES (Permission denied)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
rt_sigaction(SIGSEGV, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f5c85bffb40}, NULL, 8) = 0
rt_sigreturn({mask=[]})                 = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++


Studying the sandbox of firefox a little bit, it seems that all filesystem calls are blocked by seccomp, such that SIGSYS is raised, which in turn is caught in a signal handler where firefox decides whether a file is allowed to be accessed or not and sets a new rt_sigreturn value.
As you can see there is this syscall: newfstatat(AT_FDCWD, "/", {st_mode=S_IFBLK|014535000562, st_rdev=makedev(0, 0), ...}, 0) = 262
It is directly followed by SIGSYS, and then some messages are being sent around between processes to decide whether access should be granted or not, and the decision is -13==EACCESS (don't ask me why it shows up as -1 in rt_sigreturn)


Now my reproducer is skipping all the socketpair/sendmsg/recvmsg part, and just sets rt_sigreturn to the values as observed in firefox, i.e. I and up with this strace:
getuid()                                = 1000
newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=01200367467, st_size=1000, ...}, 0) = 262
--- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x7f040a040aaa, si_syscall=__NR_newfstatat, si_arch=AUDIT_ARCH_X86_64} ---
rt_sigreturn({mask=[]})                 = 0
newfstatat(AT_FDCWD, "/", {st_mode=000, st_size=0, ...}, 0) = 262
--- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x7f040a040aaa, si_syscall=__NR_newfstatat, si_arch=AUDIT_ARCH_X86_64} ---
rt_sigreturn({mask=[]})                 = -1 EACCES (Permission denied)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=NULL} ---
+++ killed by SIGSEGV +++
Segmentation fault

This strace is the similar to the one above besides the messaging, but it seems also irrelevant to the crash.
Running the reproducer with glibc-2.33 I cannot observe a segmentation fault (but I can observe a Permission denied).

In my humble oppinion there are two problems:
1) glibc should not segfault (see my patch in the referenced bug that would at least avoid the crash)
2) The operation that is denied by the firefox sandbox is an fstatat on `/`, which is difficult to understand from the manpage, but in my oppinion it should always be an allowed call (since fstatat needs no permissions on the file itself, only execute permissions that lead to the path itself, which is kind of difficult to understand what it actually means for the root directory)

What I try to say: Even when glibc does not segfault, I think that firefox is behaving incorrectly with its sandbox rules.
Also I want to note that I do not know if setting the registers in the sigsys_handler is allowed and leads to defined behaviour, but this is a copy of what firefox does. So if it is undefined behaviour it should be reported to firefox that fiddling with the return values of syscalls is not allowed.

Reproducible: Always

Steps to Reproduce:
1.Compile reproducer test.c: gcc test.c -lseccomp
2.Execute: ./a.out (strace ./a.out)
3.Watch it crash with glibc-2.34

Comment 1 Andreas Fink 2021-12-03 12:22:30 UTC

Created attachment 757256 [details]
small reproducer

Comment 2 Thomas Deutschmann (RETIRED) gentoo-dev

2021-12-06 15:59:12 UTC

Your reproducer doesn't crash on <glibc-2.34 but crashes as expected on systems using glibc-2.34 (I also verified on Fedora for example).

I am assigning this bug to toolchain for now. They should tell us if this is a problem in glibc or working as designed.

Because you mentioned you experienced this crash when using Firefox: Could you please describe how to reproduce with firefox? Haven't seen this crash in firefox yet and I am also failing to find crash reports at https://crash-stats.mozilla.org/ for Fedora.

Comment 3 Andreas Fink 2021-12-06 16:18:25 UTC

I have described to ways to consistently crash it here: https://bugs.gentoo.org/803950#c17

I added a backtrace in this comment: https://bugs.gentoo.org/803950#c14
The backtrace involves the functions `g_app_info_get_default_for_type` and `FAMOpen`.

I do not have any insight why the two crashing actions (dragging an image / opening the context menu of ublock origin) enters at some point the function `g_app_info_get_default_for_type` which ultimately leads to calling `getpwuid(getuid())` (this happens /usr/lib64/libfam.so.0 as a result of a call to `FAMOpen`).

Maybe I have a non-standard setup where libfam is involved but not on Fedore, no idea. But I'm not the only one who sees this exact crash. Also there's a discussion in the forum about this crash: https://forums.gentoo.org/viewtopic-p-8650991.html?sid=30f593c8b7d401d9b8a22662ebed0cb3

Comment 4 Andreas Fink 2021-12-06 16:19:24 UTC

Forgot to mention that I also send it to the libc-help mailinglist: https://sourceware.org/pipermail/libc-help/2021-December/006061.html

Comment 5 Sam James archtester

2021-12-06 19:43:28 UTC

[Note that there was some discussion first in the other bug and an experimental patch was posted there too: https://bugs.gentoo.org/803950#c15].

Comment 6 Andreas Fink 2021-12-07 14:02:54 UTC

Compiling dev-libs/glib with USE=-fam solves the issue too. Maybe on Fedore glib is not compiled with FAM support, that is why there is no bugreport showing up on Fedoras bugtracker.

So far it seems that you need to compile dev-libs/glib +fam, use glibc-2.34, do NOT use the patch that I posted in the other bug report, and then you probably experience the crash (start dragging an image, open contet menu of ublock origin addon).

Comment 7 Robin Bankhead 2022-01-13 17:06:04 UTC

I experienced the described UI behaviour wrt uBlock Origin (and similar for Multi-Account Containers) on updating to glibc-2.34 and firefox-bin-96.

I can confirm rebuilding dev-libs/glib USE="-fam" prevents it.

Andreas, is this the best workaround for the time being, or would it be better instead to apply the glibc patch from the other bug?

Comment 8 Andreas Fink 2022-01-13 19:06:31 UTC

I also use "dev-libs/glib -fam" as workaround. I'm not going to enable FAM support, even when the bug is fixed in glibc. After finding out about it and reading about FAM I got the impression that it is not really useful.

Also there is the glibc patch in the other bugreport. Sam reported the bug at the libc bugzilla (see URL above) with a different (and more appropriate) patch.

Comment 9 Sam James archtester

2022-02-01 15:00:50 UTC

*** Bug 832504 has been marked as a duplicate of this bug. ***

Comment 10 Larry the Git Cow gentoo-dev

2022-02-12 18:45:22 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6415ce699bf1dafc403be7464df662b2879687e8

commit 6415ce699bf1dafc403be7464df662b2879687e8
Author:     Andreas K. Hüttel <dilfridge@gentoo.org>
AuthorDate: 2022-02-12 18:44:46 +0000
Commit:     Andreas K. Hüttel <dilfridge@gentoo.org>
CommitDate: 2022-02-12 18:44:46 +0000

    Add patch for bug 828070
    
    Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28752
    Bug: https://bugs.gentoo.org/828070
    Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>

 ...ault-in-getpwuid-when-stat-fails-BZ-28752.patch | 39 ++++++++++++++++++++++
 1 file changed, 39 insertions(+)

Comment 11 Maciej S. Szmigiero 2022-02-20 19:16:41 UTC

Any plans for new sys-libs/glibc release with updated glibc patchset containing the above fix?

Comment 12 Sam James archtester

2022-02-20 19:17:49 UTC

(In reply to Maciej S. Szmigiero from comment #11)
> Any plans for new sys-libs/glibc release with updated glibc patchset
> containing the above fix?

We tend to batch them up with stable backports once they're posted.

Comment 13 Maciej S. Szmigiero 2022-02-20 19:20:23 UTC

> We tend to batch them up with stable backports once they're posted.

So we are waiting for upstream to apply the patch, then backport it to glibc 2.34, correct?

Comment 14 Sam James archtester

2022-02-20 19:34:17 UTC

(In reply to Maciej S. Szmigiero from comment #13)
> > We tend to batch them up with stable backports once they're posted.
> 
> So we are waiting for upstream to apply the patch, then backport it to glibc
> 2.34, correct?

No, we don't need to wait for this one in particular. We just usually don't do a new patchset release until a few fixes are queued up. I'll check what the situation is when I'm not on mobile.

Comment 15 Sam James archtester

2022-02-22 02:33:22 UTC

(In reply to Sam James from comment #14)
> (In reply to Maciej S. Szmigiero from comment #13)
> > > We tend to batch them up with stable backports once they're posted.
> > 
> > So we are waiting for upstream to apply the patch, then backport it to glibc
> > 2.34, correct?
> 
> No, we don't need to wait for this one in particular. We just usually don't
> do a new patchset release until a few fixes are queued up. I'll check what
> the situation is when I'm not on mobile.

https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6415ce699bf1dafc403be7464df662b2879687e8

This is in the (currently unkeyworded) =sys-libs/glibc-2.34-r9. Please test and give feedback. Thanks!

Comment 16 Maciej S. Szmigiero 2022-02-22 16:35:42 UTC

(In reply to Sam James from comment #15)
> This is in the (currently unkeyworded) =sys-libs/glibc-2.34-r9. Please test
> and give feedback.

Thanks for quickly providing an updated glibc Sam.

With this version the Firefox web page process indeed no longer crashes.
Instead, it hangs, with a spinner rotating endlessly.

To be sure that it isn't something specific to my setup it would be great if somebody else had confirmed this finding.
Andreas?

My test picture: 
https://en.wikichip.org/wiki/File:intel_nehalem_lynfield_die_shot.jpg

My test system package versions:
*  www-client/firefox
      Latest version available: 97.0.1
      Latest version installed: 97.0.1

*  sys-devel/clang
      Latest version available: 13.0.1
      Latest version installed: 13.0.1

*  sys-devel/gcc
      Latest version available: 11.2.1_p20220115
      Latest version installed: 11.2.1_p20211127

Comment 17 Brandon Penglase 2022-02-25 00:03:48 UTC

(In reply to Maciej S. Szmigiero from comment #16)
> 
> To be sure that it isn't something specific to my setup it would be great if
> somebody else had confirmed this finding.
> Andreas?
> 

I am seeing this behavior too, as before with #803950 starting MOZ_DISABLE_CONTENT_SANDBOX=1 did get it going.

*  www-client/firefox
      Latest version available: 97.0.1
      Latest version installed: 97.0.1

*  sys-devel/clang
      Latest version available: 13.0.1
      Latest version installed: 13.0.1

*  sys-devel/gcc
      Latest version available: 11.2.1_p20220115
      Latest version installed: 11.2.1_p20220115
(11.2.1 is active)

Comment 18 Robin Bankhead 2022-02-25 09:46:40 UTC

Same behaviour here (no tabs will load).

firefox-bin-97.0.1
glibc-2.34-r9

Resolved by rebuilding dev-libs/glib with USE="-fam"

Comment 19 Sam James archtester

2022-03-06 11:53:00 UTC

I think we need someone hitting this to report it to Firefox upstream (Mozilla) then. I'm pretty confident the glibc patch is right (although it could still be an issue on that side) though.

Comment 20 Sam James archtester

2022-03-06 11:55:16 UTC

(In reply to Sam James from comment #19)
> I think we need someone hitting this to report it to Firefox upstream
> (Mozilla) then. I'm pretty confident the glibc patch is right (although it
> could still be an issue on that side) though.

... although that said, I can't explain why -fam still fixes it. So maybe I'm missing something.

Comment 21 Sam James archtester

2022-03-06 12:07:33 UTC

(In reply to Sam James from comment #20)
> (In reply to Sam James from comment #19)
> > I think we need someone hitting this to report it to Firefox upstream
> > (Mozilla) then. I'm pretty confident the glibc patch is right (although it
> > could still be an issue on that side) though.
> 
> ... although that said, I can't explain why -fam still fixes it. So maybe
> I'm missing something.

waaait. We're taking a lock in nss_database_check_reload_and_get() and not yielding it when we return.

Can someone try the attached patch please?

Comment 22 Sam James archtester

2022-03-06 12:08:02 UTC

Created attachment 766418 [details, diff]
nss-database-yield-lock.patch

(Note this patch is *on top* of the previous changes, so just add it in /etc/portage/patches and emerge -v1 glibc).

Comment 23 Maciej S. Szmigiero 2022-03-06 16:12:35 UTC

(In reply to Sam James from comment #21)
> (In reply to Sam James from comment #20)
> > (In reply to Sam James from comment #19)
> > > I think we need someone hitting this to report it to Firefox upstream
> > > (Mozilla) then. I'm pretty confident the glibc patch is right (although it
> > > could still be an issue on that side) though.
> > 
> > ... although that said, I can't explain why -fam still fixes it. So maybe
> > I'm missing something.
> 
> waaait. We're taking a lock in nss_database_check_reload_and_get() and not
> yielding it when we return.
> 
> Can someone try the attached patch please?

I can confirm that the attached patch fixes the issue for me indeed, thanks Sam!
(but the patch needs a small whitespace change to apply to sys-libs/glibc-2.34-r9).

Comment 24 Joonas Niilola gentoo-dev

2022-03-06 17:15:27 UTC

Sorry I haven't found the time to reproduce this yet, but I'm glad to see the progress that's happening in here.

Comment 25 Larry the Git Cow gentoo-dev

2022-03-07 01:04:27 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/toolchain/glibc-patches.git/commit/?id=6002612f230d2b8d88fefba6c6477a20e77efc23

commit 6002612f230d2b8d88fefba6c6477a20e77efc23
Author:     Andreas K. Hüttel <dilfridge@gentoo.org>
AuthorDate: 2022-03-07 01:03:46 +0000
Commit:     Andreas K. Hüttel <dilfridge@gentoo.org>
CommitDate: 2022-03-07 01:03:46 +0000

    Additional fixup for the glibc/firefox/seccomp interaction
    
    Bug: https://bugs.gentoo.org/828070
    Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>

 ...0302-Drop-glibc-lock-when-returning-early.patch | 36 ++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Comment 26 Andreas K. Hüttel archtester

2022-03-07 01:11:18 UTC

(In reply to Sam James from comment #22)
> Created attachment 766418 [details, diff] [details, diff]
> nss-database-yield-lock.patch
> 
> (Note this patch is *on top* of the previous changes, so just add it in
> /etc/portage/patches and emerge -v1 glibc).

Patch is in 2.34-r10

Comment 27 Maciej S. Szmigiero 2022-03-09 14:50:17 UTC

Don't forget to update the proposed patch upstream at ${URL}.

Comment 28 Sam James archtester

2022-03-14 17:05:05 UTC

(In reply to Maciej S. Szmigiero from comment #27)
> Don't forget to update the proposed patch upstream at ${URL}.

Done & submitted: https://sourceware.org/bugzilla/show_bug.cgi?id=28752#c2.

Comment 29 Sam James archtester

2022-06-19 02:18:10 UTC

Thanks to all who helped out in this bug, we've got there!

Fix landed upstream:
- https://sourceware.org/git/?p=glibc.git;a=commit;h=3fdf0a205b622e40fa7e3c4ed1e4ed4d5c6c5380
- https://sourceware.org/git/?p=glibc.git;a=commit;h=ace9e3edbca62d978b1e8f392d8a5d78500272d9

It'll be in the next patchset for glibc-2.35 in Gentoo (possibly 2.34, but we're looking to move to 2.35 completely soon).

Comment 30 Sam James archtester

2022-08-08 09:16:06 UTC

Fixed upstream and in stable now via 2.35.