Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 932391 - x11-drivers/nvidia-drivers-550.78 oops on ACPI events
Summary: x11-drivers/nvidia-drivers-550.78 oops on ACPI events
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Ionen Wolkens
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-21 23:16 UTC by Hank Leininger
Modified: 2024-05-22 00:34 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
oops w/6.6.30 + nvidia-drivers-550.78 (oops_6.6.30_nvidia-550.78,5.29 KB, text/plain)
2024-05-21 23:16 UTC, Hank Leininger
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hank Leininger 2024-05-21 23:16:16 UTC
Created attachment 893692 [details]
oops w/6.6.30 + nvidia-drivers-550.78

With x11-drivers/nvidia-drivers-550.78 on a Lenovo P16v Gen 1 AMD, I get an oops ~whenever plugging or unplugging AC power. Also the system cannot shutdown -r or shutdown -h cleanly.

Reproduced with gentoo-sources-{6.6.30,6.6.31,6.8.10,6.9.1}, and vanilla-sources-6.9.1.

Downgrading to nvidia-drivers-535 avoids the problem.

Full oops attached but first the most reproducible bits:

localhost kernel: [   10.541555] BUG: kernel NULL pointer dereference, address: 000000000000064c
^^^
this varies; often 0000... but sometimes ffff....

localhost kernel: [   10.541672] CPU: 0 PID: 1612 Comm: kworker/0:2 Tainted: P           O    T  6.6.30-gentoo-4 #1
localhost kernel: [   10.541698] Hardware name: LENOVO 21FE001WUS/21FE001WUS, BIOS N3VET28W (1.10 ) 09/05/2023
^^^
reproduced with multiple BIOS versions

localhost kernel: [   10.541722] Workqueue: kacpi_notify acpi_os_execute_deferred
^^^
it's always these two in the Workqueue line

localhost kernel: [   10.554162]  acpi_ev_notify_dispatch+0x4c/0xa0
localhost kernel: [   10.554747]  acpi_os_execute_deferred+0x16/0x30
^^^
consistent. These decoded on 6.6.30 to:
acpi_ev_notify_dispatch (drivers/acpi/acpica/evmisc.c:180)
acpi_os_execute_deferred (drivers/acpi/osl.c:847)

Seems very similar to this:
https://forums.developer.nvidia.com/t/rtx-3060-ti-driver-550-54-14-fedora-kinoite-39-6-7-6-200-fc39-x86-64-kernel-oops-on-boot-and-no-display/284746/2

...Except that person said the problem went away after upgrading from 550.45.14 to 550.76, whereas I'm getting it w/550.78.

As to shutdown / reboot behavior: whether an oops has occurred or not, shutdown -h / shutdown -r stops services, etc., but then the system is stuck at a blank screen w/power on. No power off nor reboot. When in that state, you have to hold the power button down for like 15 seconds to get it to power off, after which you can power it back up.

Gentoo's default ACPI actions don't catch the AC/battery events, so in the logs I'll upload, ACPI event unhandled events are also included. However even if new entries are made in /etc/acpi/default.sh to catch these, the oops/reboot behavior persists.
Comment 1 Ionen Wolkens gentoo-dev 2024-05-22 00:29:35 UTC
There's really little I can do downstream about bugs in nvidia-drivers unless it's a packaging issue (better to take it upstream) -- doesn't help when it affect only specific hardware that I do not have. And at this point I wouldn't mask 550.78 given it works and improves things for plenty of people.

Lenovo laptops in particular did have plenty of issues with 550 branch though, see the massive thread at: https://forums.developer.nvidia.com/t/series-550-freezes-laptop/284772 -- so I would personally recommend to stick to the (still supported by nvidia) 535 with these until sorted out by nvidia.

Alternatively, could try the just released (today) new beta drivers 555, may or may not help (or even make things worse). Albeit they're unkeyworded as a precaution due to being a beta, so you'd need `x11-drivers/nvidia-drivers:0/555 **` in package.accept_keywords.

(In reply to Hank Leininger from comment #0)
> Seems very similar to this:
> https://forums.developer.nvidia.com/t/rtx-3060-ti-driver-550-54-14-fedora-kinoite-39-6-7-6-200-fc39-x86-64-kernel-oops-on-boot-and-no-display/284746/2
Maybe different/unrelated, that backtrace is showing several nvidia-related lines while yours doesn't, not to say it isn't nvidia-drivers triggering something.

Anyhow, better to report upstream.
Comment 2 Hank Leininger 2024-05-22 00:34:16 UTC
Good call, I'll attempt to report upstream. Mostly wanted to create this so that it'd be discoverable in case others have a similar issue (and then we may find out more hardware than just this model of Lenovo that's affected).

It took many head-scratchings to point the finger at NVidia since it was not super obvious in the oops - worse before I found I could reproduce that and the _only_ symptom was "can't shutdown -r" which I've hardly ever encountered before.