Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 646438 - =sys-kernel/gentoo-sources-4.15.0: BUG: unable to handle kernel NULL pointer dereference at (null)
Summary: =sys-kernel/gentoo-sources-4.15.0: BUG: unable to handle kernel NULL pointer ...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://git.kernel.org/pub/scm/linux/...
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2018-02-02 19:32 UTC by Alexander Sergeyev
Modified: 2018-03-01 19:28 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
null pointer dereference screen (null-pointer-dereference.png,116.69 KB, image/png)
2018-02-02 19:32 UTC, Alexander Sergeyev
Details
multi calltrace screen (nmi.png,98.00 KB, image/png)
2018-02-02 19:33 UTC, Alexander Sergeyev
Details
freeze screen (no kallsyms) (kthread-starvation.png,90.14 KB, image/png)
2018-02-02 19:35 UTC, Alexander Sergeyev
Details
kernel configuration (kconfig,111.13 KB, text/plain)
2018-02-02 19:39 UTC, Alexander Sergeyev
Details
patch, suitable for epatch() on 4.15 (v2-platform-x86-dell-laptop-Allocate-buffer-on-heap-rather-than-globally.patch,14.60 KB, patch)
2018-02-04 10:06 UTC, Alexander Sergeyev
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexander Sergeyev 2018-02-02 19:32:08 UTC
The following partial kernel log is recovered from a screen image, full (visible on the screen) logs are attached as images.
Kernel panics happen early during bootup (init/userspace is not reached).

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: dell_set_arguments+0x7/0x40
PGD 0 P4D 0 
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 0 PID: 45 Comm: kworker/0:1 Not tainted 4.15.0-gentoo #6
Hardware name: Dell Inc. Latitude E5570/0R75KF, BIOS 1.18.6 12/08/2017
Workqueue: events azx_probe_work
RIP: 0010:dell_set_arguments+0x7/0x40
<...>
Call Trace:
dell_micmute_led_set+0x31/0x54
alc_fixup_dell_wmi+0x3f/0xd0
apply_fixup+0xea/0x180
patch_alc269+0x336/0x5c0
hda_codec_driver_probe+0x46/0xd0
driver_probe_device+0x241/0x330
? __driver_attach+0x90/0x90
bus_for_each_drv+0x70/0xb0
__device_attach+0xe5/0x140
bus_prove_device+0x82/0x90
device_add+0x3a8/0x5d0
snd_hdac_device_register+0xd/0x40
snd_hda_codec_configure+0x32/0x130
azx_codec_configure+0x2a/0x60
azx_probe_work+0x43c/0x8f0
process_one_work+0x17c/0x2f0
worker_thread+0x2x/0x380
? process_one_work+0x2f0/0x2f0
kthread+0x106/0x120
? __kthread_bind_mask+0x60/0x60
ret_from_fork+0x35/0x40
Code: 48 f4 08 01 bf 40 00 00 00 48 c7 c2 e0 a9 c8 a9 48 89 c1 e8 7c e0 8c ff eb d0 66 2e 0f 1f 84 00 00 00 00 00 48 8b 05 19 e1 a0 01 <48> c7 00 00 00 00 00 48 c7 40 10 00 00 00 00 89 78 04 48 c7 40
RIP: dell_set_arguments+0x7/0x40 RSP: ffffb5d1001c7c50
<...>
Kernel panic - not syncing: Fatal exception

Kernel: sys-kernel/gentoo-sources-4.15.0:4.15.0 experimental -build -symlink
Compiler: sys-devel/gcc-7.2.0-r1:7.2.0 cxx fortran go mpx nptl openmp pie sanitize ssp vtv -altivec -awt -cilk -debug -doc -fixed-point -gcj -graphite -hardened -jit -libssp -multilib -nls -objc -objc++ -objc-gc -pch -pgo -regression-test -vanilla

There are different variants of errors that might happen that includes things like: ~30 seconds freeze with kthread starvation messages or multi-calltrace panics with NMI (see attachments).
Comment 1 Alexander Sergeyev 2018-02-02 19:32:55 UTC
Created attachment 517610 [details]
null pointer dereference screen
Comment 2 Alexander Sergeyev 2018-02-02 19:33:53 UTC
Created attachment 517612 [details]
multi calltrace screen
Comment 3 Alexander Sergeyev 2018-02-02 19:35:45 UTC
Created attachment 517614 [details]
freeze screen (no kallsyms)
Comment 4 Alexander Sergeyev 2018-02-02 19:39:14 UTC
Created attachment 517616 [details]
kernel configuration

Note: configuration includes efi stub and compiled-in cmdline (redacted out).
Comment 5 Alexander Sergeyev 2018-02-02 20:06:50 UTC
I will soon try to bisect the bug using the linux-stable repository. I've started doing so a day ago, but failed to get through with it due to unsufficient time available.

Though, strange things happened shortly after the unsuccessfull kernel upgrade. I rolled back to a known-good 4.14.12 kernel and experienced another issue (didn't happen before): waking up from the suspend-to-ram state, my machine instantly rebooted (consequently reproduced). BIOS settings interface gave me unusual freezes couple of times (long freezes, like a minute or so), but after switching POST diagnostic mode from minimal to thorough -- I can no longer reproduce the problem with suspend-to-ram.

Having all that said, I'm not really sure that bisecting will be reliable -- since it appears that some state is preserved during kernel switches. I mean I already have a previosly reliable kernel giving me a new problem (reboots during wakeups). It might be an irrelevant hardware failure, but it sure seems like too big of a coincidence to me.

Anyway, I would really appreciate some guidance here.
Comment 6 Alexander Sergeyev 2018-02-04 10:04:49 UTC
Upstream patch (not merged so far): https://patchwork.kernel.org/patch/10194287/
Also see https://lkml.org/lkml/2018/2/3/113

I rebased the patch for 4.15, it's attached.
Comment 7 Alexander Sergeyev 2018-02-04 10:06:26 UTC
Created attachment 517766 [details, diff]
patch, suitable for epatch() on 4.15
Comment 9 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-02-08 07:35:21 UTC
thanks
Comment 10 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-02-28 19:05:03 UTC
patch upstreamed in 4.15.7
https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.15.7
Comment 11 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-02-28 19:27:35 UTC
still not upstream
Comment 12 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-02-28 20:14:51 UTC
added in gentoo-sources-4.15.7
Comment 13 Jiri Netolicky 2018-03-01 09:16:16 UTC
There is a bug in genpatches-4.15.8.base 2901_allocate_buffer_on_heap_rather_than_globally.patch

Make 4.15.7 kernel fail with

drivers/platform/x86/dell-laptop.c: In function ‘dell_rfkill_set’:
drivers/platform/x86/dell-laptop.c:441:2: error: implicit declaration of function ‘dell_fill_request’; did you mean ‘dell_send_request’? [-Werror=implicit-function-declaration]
  dell_fill_request(&buffer, 0, 0, 0, 0);
Comment 14 Alexander Sergeyev 2018-03-01 09:58:25 UTC
> There is a bug in genpatches-4.15.8.base

In the original patch (the function is renamed):

 17 -static void dell_set_arguments(u32 arg0, u32 arg1, u32 arg2, u32 arg3)                              
 18 +static void dell_fill_request(struct calling_interface_buffer *buffer,                              
 19 +                              u32 arg0, u32 arg1, u32 arg2, u32 arg3)                                           

In 2901_allocate_buffer_on_heap_rather_than_globally.patch:

 15 -void dell_set_arguments(u32 arg0, u32 arg1, u32 arg2, u32 arg3)                                     
 16 +void dell_set_arguments(struct calling_interface_buffer *buffer,                                    
 17 +        u32 arg0, u32 arg1, u32 arg2, u32 arg3)
Comment 15 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-03-01 13:35:16 UTC
oh, i missed that.

I will add a revision in some hour.
Comment 16 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-03-01 14:35:06 UTC
added, someone can confirm me that it work ?
Comment 17 Alexander Sergeyev 2018-03-01 17:14:56 UTC
> can confirm me that it work?

Well, it definitely works on 4.15.2 since I'm using the exact same patch via epatch. The only changed file is dell-laptop.c and it haven't been touched between 4.15.2 and 4.15.7. So, everything should be fine. I will be able to try 4.15.7-r1 when egencache is completed, it takes some time.
Comment 18 Alexander Sergeyev 2018-03-01 19:19:02 UTC
> I will be able to try 4.15.7-r1

Compiled and running without problems.
Comment 19 Arisu Tachibana Gentoo Infrastructure gentoo-dev 2018-03-01 19:28:36 UTC
ok so we can close this