``` ======================================================================== Local information disclosure in systemd-coredump (CVE-2025-4598) ======================================================================== ------------------------------------------------------------------------ Background ------------------------------------------------------------------------ While working on Ubuntu's apport, we remembered that various other distributions (Red Hat Enterprise Linux 9 and Fedora for example) use systemd-coredump as a core-dump handler in /proc/sys/kernel/core_pattern (instead of apport). We began to wonder: how does systemd-coredump solve the kill-and-replace race condition that we exploited against apport? Similarly to apport, systemd-coredump writes all core files into a hard-coded directory, /var/lib/systemd/coredump/. Before December 2022, systemd-coredump allowed users to read all of their core files (through file ACLs), including the core files of SUID or SGID programs, which of course allowed local attackers to read the contents of /etc/shadow by simply crashing su for example; this vulnerability was CVE-2022-4415, discovered and published by Matthias Gerstner: https://www.openwall.com/lists/oss-security/2022/12/21/3 This old vulnerability was patched by introducing a new function, grant_user_access(), which decides whether a user should be allowed to read a core file or not, by analyzing the /proc/pid/auxv of the crashed process: if its AT_UID and AT_EUID match, and if its AT_GID and AT_EGID match, and if its AT_SECURE flag is 0, then read access is allowed; otherwise (if the crashed process is SUID or SGID), read access is denied (only root can read the core file). ------------------------------------------------------------------------ Analysis ------------------------------------------------------------------------ Unfortunately, we soon realized that systemd-coredump does not provide any protection at all against the kill-and-replace race condition that we exploited in apport. In other words, an attacker can simply crash a SUID process such as unix_chkpwd, SIGKILL and replace it with a non-SUID process (before its /proc/pid/auxv is analyzed by systemd-coredump), and therefore gain read access to the core file of the crashed SUID process, and hence to the contents of /etc/shadow. On the one hand, exploiting systemd-coredump is easier than exploiting apport, because we do not need to replace the crashed SUID process with a namespaced process: we can replace it with any non-SUID process, whose AT_UID and AT_EUID match, whose AT_GID and AT_EGID match, and whose AT_SECURE flag is 0. On the other hand, winning the kill-and-replace race condition against systemd-coredump is harder: unlike apport, systemd-coredump is written in C, and its initialization takes little time. To widen the window of the race condition, we pass an argv[0] of 128K '\177' characters to the SUID process: this slows down the analysis of its /proc/pid/cmdline (by systemd-coredump, before the analysis of its /proc/pid/auxv) and gives us enough time to replace the crashed SUID process with a non-SUID process. ```
I guess this is fixed in v257.6? I see several coredump related changes there. https://github.com/systemd/systemd/commits/v257.6
(In reply to Mike Gilbert from comment #1) > I guess this is fixed in v257.6? I see several coredump related changes > there. > > https://github.com/systemd/systemd/commits/v257.6 Yeah, looks like they rolled stable releases for each series.
(Sorry, to be explicit: yes, it's fixed in those releases)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=f3c67366ac9d9ae76f4e0d9fedfc6f7d44501f5d commit f3c67366ac9d9ae76f4e0d9fedfc6f7d44501f5d Author: Sam James <sam@gentoo.org> AuthorDate: 2025-06-06 05:06:36 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2025-06-06 05:06:36 +0000 sys-apps/systemd: add 257.6 Bug: https://bugs.gentoo.org/956816 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/systemd/Manifest | 1 + sys-apps/systemd/systemd-257.6.ebuild | 571 ++++++++++++++++++++++++++++++++++ 2 files changed, 572 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=459bc2aa304be935a0d19b9eb3326839f45a6c3c commit 459bc2aa304be935a0d19b9eb3326839f45a6c3c Author: Sam James <sam@gentoo.org> AuthorDate: 2025-06-06 04:57:35 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2025-06-06 04:57:35 +0000 sys-apps/systemd: add 256.16 Bug: https://bugs.gentoo.org/956816 Signed-off-by: Sam James <sam@gentoo.org> sys-apps/systemd/Manifest | 1 + sys-apps/systemd/systemd-256.16.ebuild | 572 +++++++++++++++++++++++++++++++++ 2 files changed, 573 insertions(+)
There have also been some kernel-side changes: commit 536f763c0e611854ddcc5d49f779b3024ad426e1 Author: Christian Brauner <brauner@kernel.org> Date: Mon Apr 14 15:55:07 2025 +0200 coredump: hand a pidfd to the usermode coredump helper commit b5325b2a270fcaf7b2a9a0f23d422ca8a5a8bdea upstream. Give userspace a way to instruct the kernel to install a pidfd into the usermode helper process. This makes coredump handling a lot more reliable for userspace. In parallel with this commit we already have systemd adding support for this in [1]. We create a pidfs file for the coredumping process when we process the corename pattern. When the usermode helper process is forked we then install the pidfs file as file descriptor three into the usermode helpers file descriptor table so it's available to the exec'd program. Since usermode helpers are either children of the system_unbound_wq workqueue or kthreadd we know that the file descriptor table is empty and can thus always use three as the file descriptor number. Note, that we'll install a pidfd for the thread-group leader even if a subthread is calling do_coredump(). We know that task linkage hasn't been removed due to delay_group_leader() and even if this @current isn't the actual thread-group leader we know that the thread-group leader cannot be reaped until @current has exited. [brauner: This is a backport for the v6.12 series. The upstream kernel has changed pidfs_alloc_file() to set O_RDWR implicitly instead of forcing callers to set it. Let's minimize the churn and just let the coredump umh handler raise O_RDWR.] Link: https://github.com/systemd/systemd/pull/37125 [1] Link: https://lore.kernel.org/20250414-work-coredump-v2-3-685bf231f828@kernel.org Tested-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> commit 53b37363eff91da17bbf2b3cf87ecc0e1611fe55 Author: Christian Brauner <brauner@kernel.org> Date: Mon Apr 14 15:55:06 2025 +0200 coredump: fix error handling for replace_fd() commit 95c5f43181fe9c1b5e5a4bd3281c857a5259991f upstream. The replace_fd() helper returns the file descriptor number on success and a negative error code on failure. The current error handling in umh_pipe_setup() only works because the file descriptor that is replaced is zero but that's pretty volatile. Explicitly check for a negative error code. Link: https://lore.kernel.org/20250414-work-coredump-v2-2-685bf231f828@kernel.org Tested-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>