Summary: | sys-apps/man-db-2.8.2 USE=seccomp man: nroff: Bad system call (core dumped) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Georgy Yakovlev <gyakovlev> |
Component: | Current packages | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | cjwatson |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
strace -f -s 1024 -o man.trace man man output
Strace output of man-db-2.8.5-r1 |
Description
Georgy Yakovlev
2018-03-17 23:33:09 UTC
also tried applying this patch https://git.savannah.gnu.org/cgit/man-db.git/commit/?id=232899c9776625e75cec72a4fb9e588968a6fa2f but it does not make any difference. it was happening in a chroot, can no longer re-produce after booting properly into the system... Copying the upstream maintainer. Could you please run the command in question under "strace -f -s 1024 -o man.trace" and attach the resulting man.trace file? (Make sure that it actually fails when you do so; sometimes strace can perturb the environment enough to make the problem go away, especially if it's timing-dependent.) (In reply to Colin Watson from comment #4) > Could you please run the command in question under "strace -f -s 1024 -o > man.trace" and attach the resulting man.trace file? (Make sure that it > actually fails when you do so; sometimes strace can perturb the environment > enough to make the problem go away, especially if it's timing-dependent.) I tried to re-create this one with no success so far. I guess it was something temporary on my side in that particular chroot. I'm planning more installs in a couple of weeks, if I encounter this I'll let you guys know. No data from me for now. Created attachment 527168 [details]
strace -f -s 1024 -o man.trace man man output
I was able to replicate this again in a chroot.
strace output uploaded.
this time it's version 2.8.3, so it's still affected.
I've created a snapshot this time, let me know if anything else is needed, I'll run it.
From another trace with unwinding. 101197 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0 <unfinished ...> 101197 <... socket resumed> ) = 41 > /lib64/libc-2.26.so(socket+0x7) [0x10b157] > /lib64/libc-2.26.so(open_socket+0x4d) [0x14ad2d] 101197 --- SIGSYS {si_signo=SIGSYS, si_code=SYS_SECCOMP, si_call_addr=0x7f856f132157, si_syscall=__NR_socket, si_arch=AUDIT_ARCH_X86_64} --- some pipeline debug at the moment Waiting for pipeline: (cd /usr/share/man && /usr/libexec/man-db/zsoelim) | (cd /usr/share/man && /usr/libexec/man-db/manconv -f UTF-8:ISO-8859-1 -t UTF-8//IGNORE) | (cd /usr/share/man && preconv -e UTF-8) | (cd /usr/share/man && tbl) | (cd /usr/share/man && nroff -mandoc -c -rLL=186n -rLT=186n -Tutf8) [input: {-1, NULL}, output: {-1, NULL}] Active processes (5): "/usr/libexec/man-db/zsoelim" (8375) -> 0 "/usr/libexec/man-db/manconv" (8376) -> 0 "preconv" (8377) -> 0 "tbl" (8378) -> 0 "nroff" (8379) -> 159 man: nroff: Bad system call (core dumped) Interesting. Normally trying to create a socket is due to some kind of strange preloaded antivirus thing or similar, but there's no evidence of that here. What's your /bin/sh (assuming that /usr/bin/nroff is #!/bin/sh)? Is it possible that it's bash configured with SYSLOG_HISTORY, or something like that? (In reply to Colin Watson from comment #8) > Interesting. Normally trying to create a socket is due to some kind of > strange preloaded antivirus thing or similar, but there's no evidence of > that here. > > What's your /bin/sh (assuming that /usr/bin/nroff is #!/bin/sh)? Is it > possible that it's bash configured with SYSLOG_HISTORY, or something like > that? nothing weird with bash or nroff, /bin/sh is just bash, no logger support enabled. I've found it however: it tries to open socket to glibc's nscd. it's a chroot and nscd is not running yet in here. nscd is running outside of chroot, however stopping or starting it on the upper host does not make any difference in the chroot. both host and chroot glibc have nscd support compiled in. the only thing that helped is building glibc in the chroot without nscd support, so it does not attempt call nscd functions. otherwise calls to nscd are hardwired into glibc and any call to a map-related function (getpwnam(), gethostbyname() etc...) will try to query nscd first. that explains why I can't reproduce it after I actually boot the chroot. so I think filters either need to account for that, or bypass nscd explicitly with something like __nss_disable_nscd on glibc systems. check this bug for example https://sourceware.org/bugzilla/show_bug.cgi?id=13696 they want to disable nscd exactly because they want to omit calling socket() in a sandbox, similar to this problem. hope that helps. contents of /etc/nsswitch.conf: passwd: compat files shadow: compat files group: compat files hosts: files dns networks: files dns services: db files protocols: db files rpc: db files ethers: db files netmasks: files netgroup: files bootparams: files automount: files aliases: files but changing it does not really help, because glibc still tries to call nscd in almost any case. Thanks. I haven't quite been able to reproduce this so far, but I can see the general shape of the problem. Looking at bash's startup sequence, it may also depend on which environment variables are already set (since bash does extra work to calculate them if they aren't); is it possible that variables such as HOME or SHELL were unset in the environment where you hit this bug, or that you have something unusual in your shell startup scripts? I think, really, the only way we're going to be able to avoid this reliably is to avoid executing the shell under seccomp; there are just too many unknowns, and I don't want to permit socket access in general as that weakens the sandbox too much. My plan is to refactor man a little so that it uses groff directly (if applicable) rather than going through the nroff wrapper. I do filter env by running /usr/bin/env -i HOME=/root TERM="${TERM}" chroot ... but run source /etc/profile right after entering it. will be doing another chroot build testing soon and will keep an eye on environment state. I agree, the bug is kinda hard to reproduce and the issue is not really that important. seccomp does it job and it's a good thing. normal usage is not affected. you were absolutely right. I tried to chroot with unclean environment, without filtering and found out exact variable. the reason: missing SHELL=/bin/bash Setting it was enough to fix weird behaviour with bash trying to open nscd socket. Example: nscd running on host chroot in /mnt/gentoo, glibc compiled with nscd, but it's not running of course. chroot /mnt/gentoo /bin/bash (not filtering env here, SHELL=/bin/bash) unset SHELL man man man: nroff: Bad system call (core dumped) man: command exited with status 159: (cd /usr/share/man && /usr/libexec/man-db/zsoelim) | (cd /usr/share/man && /usr/libexec/man-db/manconv -f UTF-8:ISO-8859-1 -t UTF-8//IGNORE) | (cd /usr/share/man && preconv -e UTF-8) | (cd /usr/share/man && tbl) | (cd /usr/share/man && nroff -mandoc -c -rLL=186n -rLT=186n -Tutf8) export SHELL=/bin/bash man man displays the page. no longer hit it on current versions, even with unset SHELL =) closing, thanks. The problem still reproduces even for the latest version =sys-apps/man-db-2.8.5-r1 On i386 architecture (if that matters) I had to workaround by USE=-seccomp (In reply to Max Satula from comment #15) > The problem still reproduces even for the latest version please strace the failing program and attach the log so we can see which syscall is failing Created attachment 742401 [details]
Strace output of man-db-2.8.5-r1
(In reply to SpanKY from comment #16) > (In reply to Max Satula from comment #15) > > The problem still reproduces even for the latest version > > please strace the failing program and attach the log so we can see which > syscall is failing Well, I looked into the strace log I attached, that seems to be a call to clock_gettime64. A clock_gettime function is already listed in src/sandbox.c, but in 32-bit system clock_gettime64 is used. After adding a line (the patch is listed below) I could compile it with USE=seccomp and run it successfully. --- lib/sandbox.c 2021-10-01 15:02:30.875185637 -0400 +++ lib/sandbox.c 2021-10-01 15:02:38.938228602 -0400 @@ -270,6 +270,7 @@ /* systemd: SystemCallFilter=@default */ SC_ALLOW ("clock_getres"); SC_ALLOW ("clock_gettime"); + SC_ALLOW ("clock_gettime64"); SC_ALLOW ("clock_nanosleep"); SC_ALLOW ("execve"); SC_ALLOW ("exit"); In the previous message, whenever I mention src/sandbox.c, that is actually lib/sandbox.c (In reply to Max Satula from comment #15) > The problem still reproduces even for the latest version > =sys-apps/man-db-2.8.5-r1 2.8.5-r1 is quite old. This was fixed in man-db-2.9.4, which is available and stable in Gentoo. |