Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 926652 - MemoryDenyWriteExecute breaks ARMv5 due to RWX mappings in binaries
Summary: MemoryDenyWriteExecute breaks ARMv5 due to RWX mappings in binaries
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: ARM Linux
: Low minor (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-03-10 08:20 UTC by Calvin Owens
Modified: 2024-04-02 17:42 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Calvin Owens 2024-03-10 08:20:14 UTC
MemoryDenyWriteExecute is working as intended, but is impossible to successfully use because everything in stage3-armv5tel-systemd-mergedusr appears to have W+X mappings:

armv5 ~ # grep libc.so.6 /proc/1/maps
b692b000-b6a9c000 r-xp 00000000 00:0d 13950      /usr/lib/libc.so.6
b6a9c000-b6a9e000 r-xp 00170000 00:0d 13950      /usr/lib/libc.so.6
b6a9e000-b6a9f000 rwxp 00172000 00:0d 13950      /usr/lib/libc.so.6

...so prctl(PR_SET_MDWE), correctly, prevents all subsequent attempts to execve():

  prctl(PR_SET_MDWE, PR_MDWE_REFUSE_EXEC_GAIN, 0, 0, 0) = 0
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=11, filter=0x1ca7e28}) = 0
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=42, filter=0x1ca9508}) = 0
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=34, filter=0x1ca8a70}) = 0
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=12, filter=0x1ca0990}) = 0
  personality(0xffffffff)                 = 0xc00000 (PER_LINUX|READ_IMPLIES_EXEC|ADDR_LIMIT_32BIT)
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=8, filter=0x1ca82b8}) = 0
  brk(0x1cde000)                          = 0x1cde000
  brk(0x1cff000)                          = 0x1cff000
  seccomp(SECCOMP_SET_MODE_FILTER, 0, {len=517, filter=0x1cfa348}) = 0
  execve("/usr/lib/systemd/systemd-journald", ["/usr/lib/systemd/systemd-journal"...], 0x1c9c790 /* 14 vars */) = -1 EACCES (Permission denied)
  --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
  +++ killed by SIGSEGV +++

Applying debug patch [0] to the kernel yields this backtrace from the failed execve():

  [    6.212802][   T68] ------------[ cut here ]------------
  [    6.213002][   T68] WARNING: CPU: 0 PID: 68 at include/linux/mman.h:200 mmap_region+0x60c/0x794
  [    6.213278][   T68] CPU: 0 PID: 68 Comm: systemd-journal Not tainted 6.8.0-rc7-arm-00239-g08d1972cbe6f #4
  [    6.213475][   T68] Hardware name: ARM-Versatile (Device Tree Support)
  [    6.214022][   T68] [<c010ae54>] (unwind_backtrace) from [<c0108ac4>] (show_stack+0x18/0x1c)
  [    6.214245][   T68] [<c0108ac4>] (show_stack) from [<c0732398>] (dump_stack_lvl+0x38/0x5c)
  [    6.214377][   T68] [<c0732398>] (dump_stack_lvl) from [<c01198cc>] (__warn+0x7c/0xf4)
  [    6.214507][   T68] [<c01198cc>] (__warn) from [<c0727a10>] (warn_slowpath_fmt+0x70/0x90)
  [    6.214663][   T68] [<c0727a10>] (warn_slowpath_fmt) from [<c026c474>] (mmap_region+0x60c/0x794)
  [    6.214790][   T68] [<c026c474>] (mmap_region) from [<c026c97c>] (do_mmap+0x380/0x3cc)
  [    6.214954][   T68] [<c026c97c>] (do_mmap) from [<c024eac8>] (vm_mmap_pgoff+0xb8/0xf4)
  [    6.215113][   T68] [<c024eac8>] (vm_mmap_pgoff) from [<c02e4fa4>] (elf_load+0x18c/0x1f0)
  [    6.215280][   T68] [<c02e4fa4>] (elf_load) from [<c02e5580>] (load_elf_binary+0x578/0xf1c)
  [    6.215448][   T68] [<c02e5580>] (load_elf_binary) from [<c02a0ea0>] (bprm_execve+0x1a8/0x364)
  [    6.215600][   T68] [<c02a0ea0>] (bprm_execve) from [<c02a1640>] (do_execveat_common+0x18c/0x1b0)
  [    6.215724][   T68] [<c02a1640>] (do_execveat_common) from [<c02a2180>] (sys_execve+0x34/0x3c)
  [    6.215848][   T68] [<c02a2180>] (sys_execve) from [<c01001d0>] (__sys_trace_return+0x0/0x10)
  [    6.216003][   T68] Exception stack(0xd0b15fa8 to 0xd0b15ff0)
  [    6.216163][   T68] 5fa0:                   bea44a60 bea449a8 00bba528 00bc5798 00bba770 00bba770
  [    6.216295][   T68] 5fc0: bea44a60 bea449a8 00bc5798 0000000b 00bba770 bea44798 bea44788 bea4472c
  [    6.216411][   T68] 5fe0: b6da0cfc bea4448c b6b4f938 b69a1dcc
  [    6.216545][   T68] ---[ end trace 0000000000000000 ]---

Applying debug patch [1] to the kernel makes everything work, proving PR_SET_MDWE is the problem.  See [2] and [3] and [4] for discussion about a similar MDWE problem on parisc.

I guess this is because pre-ARMv6 lack NX support? But:

  armv5 ~ # grep heap /proc/1/maps
  01ec7000-01fab000 rwxp 00000000 00:00 0          [heap]
  armv5 ~ # grep stack /proc/1/maps
  bee62000-bee83000 rw-p 00000000 00:00 0          [stack]

...so clearly, at least the stack can be mapped RW without being executable? I don't get it.

[0] https://gist.github.com/jcalvinowens/d5b46c707284d8ec2bac6e9ac7a07140
[1] https://gist.github.com/jcalvinowens/50ca950d26d1f9a453fe39b98bb1941a 
[2] https://bugs.gentoo.org/916469
[3] https://github.com/systemd/systemd/issues/29775
[4] https://lore.kernel.org/linux-parisc/87lebjz9z6.fsf@gentoo.org/T/#u

Reproducible: Always

Steps to Reproduce:
1. Create VM image with stage3-armv5tel-systemd-mergedusr-20240309T100447Z.tar.xz
2. Boot image using qemu-system-arm -M versatilepb
3. Observe failures
Actual Results:  
[    6.533025][    T1] systemd[1]: Starting Journal Service...                                                                                                                                                                                                                        
[    7.832046][    T1] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV              
[    7.834339][    T1] systemd[1]: systemd-journald.service: Failed with result 'signal'.
[    7.836415][    T1] systemd[1]: Failed to start Journal Service.                                                        
[FAILED] Failed to start Journal Service.                                                                                  
See 'systemctl status systemd-journald.service' for details.
[    7.895879][    T1] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2.
[    8.012296][    T1] systemd[1]: Starting Journal Service...                                                                                                                                                      
[    9.024597][    T1] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=11/SEGV
[    9.026651][    T1] systemd[1]: systemd-journald.service: Failed with result 'signal'.                                  
[    9.044774][    T1] systemd[1]: Failed to start Journal Service.  
[FAILED] Failed to start Journal Service.  
See 'systemctl status systemd-journald.service' for details.                                                               
[    9.075509][    T1] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 3.
[    9.207493][    T1] systemd[1]: Starting Journal Service...          


Expected Results:  
System should work as normal

Workaround is simple, just remove MemoryDenyWriteExecute= from system units
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-10 08:22:57 UTC
There's a fix on its way for older ARM: https://lore.kernel.org/linux-parisc/20240227013546.15769-4-zev@bewilderbeest.net/T/#t.
Comment 2 Calvin Owens 2024-03-10 08:50:50 UTC
Thanks Sam, I'll just mark this as fixed when that hits upstream.

Answering my own question, as to how READ_IMPLIES_EXEC can be true, yet I can find a supposedly non-executable stack:

  armv5 ~ # grep stack /proc/1/maps
  bee62000-bee83000 rw-p 00000000 00:00 0          [stack]

...it's here, it effectively skips the check in do_mmap everything else hits: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/exec.c#n750
Comment 3 Calvin Owens 2024-03-10 16:37:56 UTC
I'm a little surprised there wasn't more pushback on that kernel patch. Unlike with parisc, systemd is arguably at fault: it could check for READ_IMPLIES_EXEC in the return from sys_personality() and not issue the PR_SET_MDWE prctl() in that case. Maybe it should anyway.

Something like https://gist.github.com/jcalvinowens/cdbddd7749c390b723146e7b4c9c9f2f