Created attachment 515300 [details, diff] run x86 syscall test on x86 only Hello, the check for a broken vendor kernel introduced in bug https://bugs.gentoo.org/279260 causes oops-like kernel log entries on aarch64: [ 189.143682] glibc-test[2118]: syscall 1000 [ 189.143728] Code: aa0503e4 aa0603e5 aa0703e6 d4000001 (b13ffc1f) [ 189.143750] CPU: 1 PID: 2118 Comm: glibc-test Not tainted 4.15.0-rc7-00232-g2c1cfa499018 #3 [ 189.143755] Hardware name: SoPine with baseboard (DT) [ 189.143762] pstate: 80000000 (Nzcv daif -PAN -UAO) [ 189.143774] pc : 0xffffb8fb0104 [ 189.143779] lr : 0xaaaab43c563c [ 189.143781] sp : 0000ffffd4fa1180 [ 189.143786] x29: 0000ffffd4fa1190 x28: 0000000000000000 [ 189.143795] x27: 0000000000000000 x26: 0000000000000000 [ 189.143802] x25: 0000000000000000 x24: 0000000000000000 [ 189.143809] x23: 0000000000000000 x22: 0000000000000000 [ 189.143816] x21: 0000aaaab43c564c x20: 0000000000000000 [ 189.143823] x19: 0000aaaab43c5770 x18: 0000000000000a03 [ 189.143829] x17: 0000aaaab43d6020 x16: 0000ffffb8fb00e0 [ 189.143837] x15: 0000ffffb8ed4000 x14: 0000ffffb8ed7540 [ 189.143844] x13: 0000ffffb8ee45d8 x12: 0000000000000000 [ 189.143851] x11: 0000000000000020 x10: 0000000000000000 [ 189.143857] x9 : 00000000000000ff x8 : 00000000000003e8 [ 189.143864] x7 : e607cc2262a01600 x6 : e607cc2262a01600 [ 189.143872] x5 : 0000ffffd4fa12c0 x4 : 0000000000000000 [ 189.143879] x3 : 0000000000000000 x2 : 0000aaaab43c5630 [ 189.143886] x1 : 0000ffffd4fa12d8 x0 : 0000ffffd4fa12c8 They can be disabled using /proc/sys/debug/exception-trace. This however would obscure legitimate reports of actual problems of this kind as well as various other kinds of misbehaviour. Strangely enough, x86_64 does not produce this kind of message even though sysctl exception-trace is on by default. Since the check is a number of years old and only relevant to a small window of kernel versions on x86 I propose to either remove it or apply it on x86 only.
It will be gone on 2.27 (out end of the month).
the whole point of that test is to locate broken kernels exactly like this because glibc will actively try to use newer syscalls than the kernel might support which in turn will lead to crashes like this. in fact, the test was added not for x86 at all, but for people using broken vendor kernels on non-x86 hardware. so both restricting the test and dropping it are wrong. file a bug with your vendor instead.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e1643a784e0c0626933e93d60e6dc229deb0757e commit e1643a784e0c0626933e93d60e6dc229deb0757e Author: Andreas K. Hüttel <dilfridge@gentoo.org> AuthorDate: 2018-01-20 19:14:07 +0000 Commit: Andreas K. Hüttel <dilfridge@gentoo.org> CommitDate: 2018-01-20 19:14:23 +0000 sys-libs/glibc: Re-add check for bug 279260, see bug 645016 comment 2 Bug: https://bugs.gentoo.org/279260 Bug: https://bugs.gentoo.org/645016 Package-Manager: Portage-2.3.19, Repoman-2.3.6 sys-libs/glibc/glibc-9999.ebuild | 6 ++++++ 1 file changed, 6 insertions(+)}
I think you misunderstood: - the kernel is not crashing. It's just displaying a message that scarily looks like an oops. - the kernel is not broken. It's mainline, even had it with gentoo-sources. It's behaving exactly as intended: https://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/arm64/kernel/traps.c#L530. - the only thing on that system triggering that kernel message is your test. glibc isn't actively trying anything. Apparently aarch64 upstream wants to know when userland starts issuing unimplemented syscalls. The ebuild trying to protect users from broken kernels is the one thing that makes it look as if something is broken, causing unnecessary confusion.
(In reply to Michael Weiser from comment #4) you're unfamiliar with how glibc actually works. it frequently attempts syscalls that don't exist and then falls back when it detects they don't. which is basically what this test does because that behavior has been known to break, not only on vendor kernels, but sometimes with mainline kernels. if you don't like the kernel message, take it up with LKML. there's no way the test can attach extra metadata to that specific message since it isn't the test logging things.
if your kernel isn't actually crashing, then it's more of an UPSTREAM issue
here's the upstream thread Michael started: http://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/555397.html looks like they'll be changing the aarch64 kernel to not do this.