I use gentoo-kernel-bin with nvidia-drivers. After upgrading to 5.17.4, I noticed when GDM loaded, I only had a mouse cursor on an otherwise black screen. I was also unable to change to other TTYs. Reverting to the latest 5.16.x gentoo-kernel-bin resolved the issue. ionen in IRC hypothesized that this is caused by the upstream fedora change to disable FB_EFI and replace it with SIMPLEDRM. I will begin exploring a custom-compiled gentoo-sources with a minimal change set to resolve the symptoms. Reproducible: Always Steps to Reproduce: 1. Boot an otherwise-working system using nvidia-drivers with 5.17.x 2. See X and text consoles not working properly. Actual Results: Black screen with only mouse cursor, and strange behavior when attempting to use text consoles. Expected Results: System to continue working as it did with 5.16.x and earlier. A reddit thread on /r/fedora seems to reference this breakage as well -> https://www.reddit.com/r/Fedora/comments/tvu8h5/fedora_36_beta_nvidia_drivers_failing_reverting/i3c0c5w/
Possibly related so adding my info to the mix (if this needs a seperate ticket instead please let me know), I have the same kernel (although gentoo-sources equivalent version) and driver and /dev/dri does not get populated for me (I'm using the card in headless mode ony - for decoding support) ❯ lsmod Module Size Used by nvidia_modeset 1163264 1 nvidia 39141376 18 nvidia_modeset nvidia_drm 16384 0 ❯ /opt/bin/nvidia-smi Thu May 5 09:22:03 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.68.02 Driver Version: 510.68.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P2200 On | 00000000:01:00.0 Off | N/A | | 45% 26C P8 3W / 75W | 1MiB / 5120MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ❯ ls /dev/dri* zsh: no matches found: /dev/dri* ❯ lspci -v | grep nvidia -B10 01:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2200] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Dell GP106GL [Quadro P2200] Flags: bus master, fast devsel, latency 0, IRQ 191, NUMA node 0, IOMMU group 1 Memory at b9000000 (32-bit, non-prefetchable) [size=16M] Memory at 380fe0000000 (64-bit, prefetchable) [size=256M] Memory at 380ff0000000 (64-bit, prefetchable) [size=32M] I/O ports at 4000 [size=128] Expansion ROM at ba000000 [virtual] [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nvidia_drm, nvidia
I recently discovered the system this error was occurring on had a stick of ram going bad. I'd consider my input suspect at best at this point, and since I've switched from NVIDIA -> AMD I cannot attempt to reproduce anymore. Leaving the bug open for Simon Alman's sake; if you figure out your problem Simon I can close this bug.
I fixed my own issue by compiling in CONFIG_DRM_NOUVEAU as a module and booting adding nvidia-drm.modeset=1 to my grub config via GRUB_CMDLINE_LINUX. Now /dev/dri is populated as expected: ❯ tree /dev/dri /dev/dri ├── by-path │ ├── pci-0000:01:00.0-card -> ../card0 │ └── pci-0000:01:00.0-render -> ../renderD128 ├── card0 └── renderD128 1 directory, 4 files ❯ lsmod Module Size Used by nvidia_uvm 1241088 0 nvidia_drm 69632 0 nvidia_modeset 1146880 2 nvidia_drm nvidia 40833024 19 nvidia_uvm,nvidia_modeset backlight 20480 1 nvidia_modeset ❯ uname -r 5.19.0-gentoo I've moved on from 5.17.4 and validated agains the latest which is 5.19.0 at the time of comment. I'm happy for this to be closed off as user error.
*** Bug 873751 has been marked as a duplicate of this bug. ***
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=619d0545c04fd9b68c4ab27a1baaa63521c32e9d commit 619d0545c04fd9b68c4ab27a1baaa63521c32e9d Author: Ionen Wolkens <ionen@gentoo.org> AuthorDate: 2022-10-01 16:18:05 +0000 Commit: Ionen Wolkens <ionen@gentoo.org> CommitDate: 2022-10-02 03:58:23 +0000 x11-drivers/nvidia-drivers: warn about simpledrm again + FB/nouveau Thought SIMPLEDRM issues had improved. Last time I tried, X was still working just without tty console display so not "that" bad (that was with kernel 5.14.x or so). gentoo-kernel-bin uses Fedora's configs which enables SIMPLEDRM since 5.17.x or so. Formerly without FB_EFI then later re-enabled (was under the impression this improved things, but I only try -bin when stabilizing drivers, so 5.15.x), however SIMPLEDRM=y takes priority and X/wayland breaks entirely and then messes up the tty (worse than before). Difference between gentoo-kernel-bin and Fedora's is that they now patch their kernel to let this configuration work: https://src.fedoraproject.org/rpms/kernel/blob/e762b5dd/f/patch-5.19-redhat.patch#_882 (seems they do not do this for kernel-6.0, unsure for status with it) Have not found a (working) way to disable SIMPLEDRM from the kernel's commandline, so merely adding a warning for bug #840439 if it's builtin For FB_EFI or FB_VESA to work (aka get a console), also need to disable SYSFB_SIMPLEFB. FB_SIMPLE seems broken since kernel-5.18.13 due to another issue. Albeit this doesn't stop X from working. Ideal would be for gentoo-kernel* to do it by default, but non-bin gentoo-kernel users using the generic config can (tested with 5.19.12): mkdir -p /etc/kernel/config.d && cat <<EOF > /etc/kernel/config.d/50nvidia.config # CONFIG_DRM_SIMPLEDRM is not set # CONFIG_SYSFB_SIMPLEFB is not set EOF (this is what gentoo-kernel-bin-5.15.x has) While here also add an overdue warning for builtin nouveau (formerly skipped given CONFIG_CHECK was unsuitable), and try to inform about making the tty console work even though nvidia-drivers doesn't drive it. Bug: https://bugs.gentoo.org/840439 Signed-off-by: Ionen Wolkens <ionen@gentoo.org> .../nvidia-drivers/nvidia-drivers-390.154.ebuild | 63 +++++++++++++++++++++- .../nvidia-drivers-470.141.03.ebuild | 63 +++++++++++++++++++++- .../nvidia-drivers/nvidia-drivers-510.85.02.ebuild | 63 +++++++++++++++++++++- .../nvidia-drivers/nvidia-drivers-515.49.19.ebuild | 63 +++++++++++++++++++++- .../nvidia-drivers/nvidia-drivers-515.65.01.ebuild | 63 +++++++++++++++++++++- .../nvidia-drivers/nvidia-drivers-515.76.ebuild | 63 +++++++++++++++++++++- 6 files changed, 372 insertions(+), 6 deletions(-)
I've prepared 'g3' config for this, and hopefully will remember to bump when bumping kernels next.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6509199fb5ec004813bf8d19b36c3382e8d2b3c5 commit 6509199fb5ec004813bf8d19b36c3382e8d2b3c5 Author: Michał Górny <mgorny@gentoo.org> AuthorDate: 2022-10-02 07:48:04 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2022-10-02 16:15:42 +0000 sys-kernel/gentoo-kernel: Add a 5.19 config bump reminder Bug: https://bugs.gentoo.org/840439 Signed-off-by: Michał Górny <mgorny@gentoo.org> sys-kernel/gentoo-kernel/gentoo-kernel-5.19.12.ebuild | 1 + 1 file changed, 1 insertion(+)
Thanks. On a side-note, had a look at fedora's kernel configs for 6.0-rc7 and they're trying to disable CONFIG_FB_* again (no FB_EFI nor FB_VESA), unsure if permanent or just a rawhide-only thing but may want to keep an eye on that whenever do a 6.0.x given without DRM_SIMPLEDRM that leaves nothing for early boot. 6.0-rc7's kernel-x86_64-rhel.config on the other hand keeps both and has SIMPLEDRM disabled, albeit still has SYSFB_SIMPLEFB which will likely still cause issues with nvidia for the console.
Situation may improve eventually from the drivers side, nvidia is aware of these. https://github.com/NVIDIA/open-gpu-kernel-modules/issues/228 (simpledrm) https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341 (sysfb) Worst case could add a patch to nvidia-drivers to make it force-disable simpledrm, but that'd still leave users without a console and may potentially have other adverse effects.
Gave gentoo-kernel-bin-5.19.13 and works perfectly for me (just a simple single gpu 1070 setup with EFI though), console and X/wayland work and can switch back & forth. Think this can be closed, unless want to keep it open to track support for fedora's defaults with nvidia, but it's probably not worth trying to push simpledrm in a generic kernel build for a while.
Thanks for reporting back. I agree that the bug can be closed at this point. If anything else needs to be done, we can address that in a subsequent bug.