645016 – sys-libs/glibc: check for broken kernel causes oops-like kernel log entries on non-broken kernels

Bug 645016 - sys-libs/glibc: check for broken kernel causes oops-like kernel log entries on non-broken kernels

Summary: sys-libs/glibc: check for broken kernel causes oops-like kernel log entries o...

Status:	RESOLVED UPSTREAM

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	ARM64 Linux

Importance:	Normal normal
Assignee:	Gentoo Toolchain Maintainers

URL:	http://lists.infradead.org/pipermail/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2018-01-19 10:41 UTC by Michael Weiser
Modified:	2018-01-22 19:01 UTC (History)
CC List:	0 users

See Also:	279260
Package list:
Runtime testing required:	---

Attachments
run x86 syscall test on x86 only (glibc-2.26-r5.ebuild-x86-syscall-test.patch,601 bytes, patch) 2018-01-19 10:41 UTC, Michael Weiser	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Weiser 2018-01-19 10:41:03 UTC

Created attachment 515300 [details, diff]
run x86 syscall test on x86 only

Hello,

the check for a broken vendor kernel introduced in bug https://bugs.gentoo.org/279260 causes oops-like kernel log entries on aarch64:

[  189.143682] glibc-test[2118]: syscall 1000
[  189.143728] Code: aa0503e4 aa0603e5 aa0703e6 d4000001 (b13ffc1f) 
[  189.143750] CPU: 1 PID: 2118 Comm: glibc-test Not tainted 4.15.0-rc7-00232-g2c1cfa499018 #3
[  189.143755] Hardware name: SoPine with baseboard (DT)
[  189.143762] pstate: 80000000 (Nzcv daif -PAN -UAO)
[  189.143774] pc : 0xffffb8fb0104
[  189.143779] lr : 0xaaaab43c563c
[  189.143781] sp : 0000ffffd4fa1180
[  189.143786] x29: 0000ffffd4fa1190 x28: 0000000000000000 
[  189.143795] x27: 0000000000000000 x26: 0000000000000000 
[  189.143802] x25: 0000000000000000 x24: 0000000000000000 
[  189.143809] x23: 0000000000000000 x22: 0000000000000000 
[  189.143816] x21: 0000aaaab43c564c x20: 0000000000000000 
[  189.143823] x19: 0000aaaab43c5770 x18: 0000000000000a03 
[  189.143829] x17: 0000aaaab43d6020 x16: 0000ffffb8fb00e0 
[  189.143837] x15: 0000ffffb8ed4000 x14: 0000ffffb8ed7540 
[  189.143844] x13: 0000ffffb8ee45d8 x12: 0000000000000000 
[  189.143851] x11: 0000000000000020 x10: 0000000000000000 
[  189.143857] x9 : 00000000000000ff x8 : 00000000000003e8 
[  189.143864] x7 : e607cc2262a01600 x6 : e607cc2262a01600 
[  189.143872] x5 : 0000ffffd4fa12c0 x4 : 0000000000000000 
[  189.143879] x3 : 0000000000000000 x2 : 0000aaaab43c5630 
[  189.143886] x1 : 0000ffffd4fa12d8 x0 : 0000ffffd4fa12c8 

They can be disabled using /proc/sys/debug/exception-trace. This however would obscure legitimate reports of actual problems of this kind as well as various other kinds of misbehaviour. Strangely enough, x86_64 does not produce this kind of message even though sysctl exception-trace is on by default.

Since the check is a number of years old and only relevant to a small window of kernel versions on x86 I propose to either remove it or apply it on x86 only.

Comment 1 Andreas K. Hüttel archtester

2018-01-19 19:28:55 UTC

It will be gone on 2.27 (out end of the month).

Comment 2 SpanKY gentoo-dev

2018-01-20 18:02:36 UTC

the whole point of that test is to locate broken kernels exactly like this because glibc will actively try to use newer syscalls than the kernel might support which in turn will lead to crashes like this.

in fact, the test was added not for x86 at all, but for people using broken vendor kernels on non-x86 hardware.  so both restricting the test and dropping it are wrong.

file a bug with your vendor instead.

Comment 3 Larry the Git Cow gentoo-dev

2018-01-20 19:14:35 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e1643a784e0c0626933e93d60e6dc229deb0757e

commit e1643a784e0c0626933e93d60e6dc229deb0757e
Author:     Andreas K. Hüttel <dilfridge@gentoo.org>
AuthorDate: 2018-01-20 19:14:07 +0000
Commit:     Andreas K. Hüttel <dilfridge@gentoo.org>
CommitDate: 2018-01-20 19:14:23 +0000

    sys-libs/glibc: Re-add check for bug 279260, see bug 645016 comment 2
    
    Bug: https://bugs.gentoo.org/279260
    Bug: https://bugs.gentoo.org/645016
    Package-Manager: Portage-2.3.19, Repoman-2.3.6

 sys-libs/glibc/glibc-9999.ebuild | 6 ++++++
 1 file changed, 6 insertions(+)}

Comment 4 Michael Weiser 2018-01-20 21:33:41 UTC

I think you misunderstood:

- the kernel is not crashing. It's just displaying a message that scarily looks like an oops.

- the kernel is not broken. It's mainline, even had it with gentoo-sources. It's behaving exactly as intended: https://elixir.free-electrons.com/linux/v4.15-rc8/source/arch/arm64/kernel/traps.c#L530.

- the only thing on that system triggering that kernel message is your test. glibc isn't actively trying anything. Apparently aarch64 upstream wants to know when userland starts issuing unimplemented syscalls.

The ebuild trying to protect users from broken kernels is the one thing that makes it look as if something is broken, causing unnecessary confusion.

Comment 5 SpanKY gentoo-dev

2018-01-21 02:40:48 UTC

(In reply to Michael Weiser from comment #4)

you're unfamiliar with how glibc actually works.  it frequently attempts syscalls that don't exist and then falls back when it detects they don't.  which is basically what this test does because that behavior has been known to break, not only on vendor kernels, but sometimes with mainline kernels.

if you don't like the kernel message, take it up with LKML.  there's no way the test can attach extra metadata to that specific message since it isn't the test logging things.

Comment 6 SpanKY gentoo-dev

2018-01-21 03:34:57 UTC

if your kernel isn't actually crashing, then it's more of an UPSTREAM issue

Comment 7 SpanKY gentoo-dev

2018-01-22 19:01:29 UTC

here's the upstream thread Michael started:
  http://lists.infradead.org/pipermail/linux-arm-kernel/2018-January/555397.html

looks like they'll be changing the aarch64 kernel to not do this.