Summary: | io_submit syscall oops on alpha linux 4.19.0+ | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Dmitry V. Levin <gentoo.dl> |
Component: | Current packages | Assignee: | Alpha Porters <alpha> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | slyfox |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | Alpha | ||
OS: | Linux | ||
URL: | https://lkml.org/lkml/2018/12/30/141 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | 0001-alpha-fix-page-fault-handling-for-r16-r18-targets.patch |
Description
Dmitry V. Levin
2018-11-27 06:00:57 UTC
Confirmed it locally in qemu-system-alpha. Bisected down to: commit 95af8496ac48263badf5b8dde5e06ef35aaace2b Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat May 26 19:43:16 2018 -0400 aio: shift copyin of iocb into io_submit_one() Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> :040000 040000 20dd44ac4706540b1c1d4085e4269bd8590f4e80 05d477161223e5062f2f781b462e0222c733fe3d M fs Poked at it a bit more. I think this commit only exposed a bug in gcc's code generation or a latent bug in declared register effects in alpha assembly macros in arch/alpha. Trying to pin-point exact place where I can isolate what goes wrong. So far adding printk() statements after 'get_user': if (unlikely(get_user(user_iocb, iocbpp + i))) { ret = -EFAULT; goto err; } in 'SYSCALL_DEFINE3(io_submit, ...' makes the bug disappear. Looking more into it. Created attachment 559046 [details, diff] 0001-alpha-fix-page-fault-handling-for-r16-r18-targets.patch 0001-alpha-fix-page-fault-handling-for-r16-r18-targets.patch fixes kernel crash for me in qemu-system-alpha. Proposed the patch upstream as: https://lkml.org/lkml/2018/12/30/141 Tobias applied the patch on monolith and I ran 'make check' from strace git master. Machine seems to have survived. FTR, the machine always survived those calls (we don't panic on oops). However, it would leave behind a stuck-in-D aio kernel thread. These would accumulate over time and cause bogus load averages. With the patch applied, the strace test suite no longer causes stuck aio threads or oopses, so we're good. The affected tests pass on monolith now, thanks! The patch is now in the master branch and will be in Linux 5.0. |