The following partial kernel log is recovered from a screen image, full (visible on the screen) logs are attached as images. Kernel panics happen early during bootup (init/userspace is not reached). BUG: unable to handle kernel NULL pointer dereference at (null) IP: dell_set_arguments+0x7/0x40 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 0 PID: 45 Comm: kworker/0:1 Not tainted 4.15.0-gentoo #6 Hardware name: Dell Inc. Latitude E5570/0R75KF, BIOS 1.18.6 12/08/2017 Workqueue: events azx_probe_work RIP: 0010:dell_set_arguments+0x7/0x40 <...> Call Trace: dell_micmute_led_set+0x31/0x54 alc_fixup_dell_wmi+0x3f/0xd0 apply_fixup+0xea/0x180 patch_alc269+0x336/0x5c0 hda_codec_driver_probe+0x46/0xd0 driver_probe_device+0x241/0x330 ? __driver_attach+0x90/0x90 bus_for_each_drv+0x70/0xb0 __device_attach+0xe5/0x140 bus_prove_device+0x82/0x90 device_add+0x3a8/0x5d0 snd_hdac_device_register+0xd/0x40 snd_hda_codec_configure+0x32/0x130 azx_codec_configure+0x2a/0x60 azx_probe_work+0x43c/0x8f0 process_one_work+0x17c/0x2f0 worker_thread+0x2x/0x380 ? process_one_work+0x2f0/0x2f0 kthread+0x106/0x120 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x35/0x40 Code: 48 f4 08 01 bf 40 00 00 00 48 c7 c2 e0 a9 c8 a9 48 89 c1 e8 7c e0 8c ff eb d0 66 2e 0f 1f 84 00 00 00 00 00 48 8b 05 19 e1 a0 01 <48> c7 00 00 00 00 00 48 c7 40 10 00 00 00 00 89 78 04 48 c7 40 RIP: dell_set_arguments+0x7/0x40 RSP: ffffb5d1001c7c50 <...> Kernel panic - not syncing: Fatal exception Kernel: sys-kernel/gentoo-sources-4.15.0:4.15.0 experimental -build -symlink Compiler: sys-devel/gcc-7.2.0-r1:7.2.0 cxx fortran go mpx nptl openmp pie sanitize ssp vtv -altivec -awt -cilk -debug -doc -fixed-point -gcj -graphite -hardened -jit -libssp -multilib -nls -objc -objc++ -objc-gc -pch -pgo -regression-test -vanilla There are different variants of errors that might happen that includes things like: ~30 seconds freeze with kthread starvation messages or multi-calltrace panics with NMI (see attachments).
Created attachment 517610 [details] null pointer dereference screen
Created attachment 517612 [details] multi calltrace screen
Created attachment 517614 [details] freeze screen (no kallsyms)
Created attachment 517616 [details] kernel configuration Note: configuration includes efi stub and compiled-in cmdline (redacted out).
I will soon try to bisect the bug using the linux-stable repository. I've started doing so a day ago, but failed to get through with it due to unsufficient time available. Though, strange things happened shortly after the unsuccessfull kernel upgrade. I rolled back to a known-good 4.14.12 kernel and experienced another issue (didn't happen before): waking up from the suspend-to-ram state, my machine instantly rebooted (consequently reproduced). BIOS settings interface gave me unusual freezes couple of times (long freezes, like a minute or so), but after switching POST diagnostic mode from minimal to thorough -- I can no longer reproduce the problem with suspend-to-ram. Having all that said, I'm not really sure that bisecting will be reliable -- since it appears that some state is preserved during kernel switches. I mean I already have a previosly reliable kernel giving me a new problem (reboots during wakeups). It might be an irrelevant hardware failure, but it sure seems like too big of a coincidence to me. Anyway, I would really appreciate some guidance here.
Upstream patch (not merged so far): https://patchwork.kernel.org/patch/10194287/ Also see https://lkml.org/lkml/2018/2/3/113 I rebased the patch for 4.15, it's attached.
Created attachment 517766 [details, diff] patch, suitable for epatch() on 4.15
The patch is merged into mainline: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cbd7b8a76b79a2ff6112ef2e77031b694843b8a1
thanks
patch upstreamed in 4.15.7 https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.15.7
still not upstream
added in gentoo-sources-4.15.7
There is a bug in genpatches-4.15.8.base 2901_allocate_buffer_on_heap_rather_than_globally.patch Make 4.15.7 kernel fail with drivers/platform/x86/dell-laptop.c: In function ‘dell_rfkill_set’: drivers/platform/x86/dell-laptop.c:441:2: error: implicit declaration of function ‘dell_fill_request’; did you mean ‘dell_send_request’? [-Werror=implicit-function-declaration] dell_fill_request(&buffer, 0, 0, 0, 0);
> There is a bug in genpatches-4.15.8.base In the original patch (the function is renamed): 17 -static void dell_set_arguments(u32 arg0, u32 arg1, u32 arg2, u32 arg3) 18 +static void dell_fill_request(struct calling_interface_buffer *buffer, 19 + u32 arg0, u32 arg1, u32 arg2, u32 arg3) In 2901_allocate_buffer_on_heap_rather_than_globally.patch: 15 -void dell_set_arguments(u32 arg0, u32 arg1, u32 arg2, u32 arg3) 16 +void dell_set_arguments(struct calling_interface_buffer *buffer, 17 + u32 arg0, u32 arg1, u32 arg2, u32 arg3)
oh, i missed that. I will add a revision in some hour.
added, someone can confirm me that it work ?
> can confirm me that it work? Well, it definitely works on 4.15.2 since I'm using the exact same patch via epatch. The only changed file is dell-laptop.c and it haven't been touched between 4.15.2 and 4.15.7. So, everything should be fine. I will be able to try 4.15.7-r1 when egencache is completed, it takes some time.
> I will be able to try 4.15.7-r1 Compiled and running without problems.
ok so we can close this