Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 840439

Summary: sys-kernel/gentoo-kernel-bin-5.17.4: various breakages with nvidia-drivers
Product: Gentoo Linux Reporter: Jay Faulkner <jay>
Component: Current packagesAssignee: Distribution Kernel Project <dist-kernel>
Status: RESOLVED FIXED    
Severity: critical CC: haven, ionen, mgorny, sam, telans, thezombiehunter
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Jay Faulkner 2022-04-24 04:44:17 UTC
I use gentoo-kernel-bin with nvidia-drivers. After upgrading to 5.17.4, I noticed when GDM loaded, I only had a mouse cursor on an otherwise black screen. I was also unable to change to other TTYs. Reverting to the latest 5.16.x gentoo-kernel-bin resolved the issue.

ionen in IRC hypothesized that this is caused by the upstream fedora change to disable FB_EFI and replace it with SIMPLEDRM. I will begin exploring a custom-compiled gentoo-sources with a minimal change set to resolve the symptoms.

Reproducible: Always

Steps to Reproduce:
1. Boot an otherwise-working system using nvidia-drivers with 5.17.x
2. See X and text consoles not working properly.

Actual Results:  
Black screen with only mouse cursor, and strange behavior when attempting to use text consoles.

Expected Results:  
System to continue working as it did with 5.16.x and earlier.

A reddit thread on /r/fedora seems to reference this breakage as well -> https://www.reddit.com/r/Fedora/comments/tvu8h5/fedora_36_beta_nvidia_drivers_failing_reverting/i3c0c5w/
Comment 1 Simon Alman 2022-05-05 08:18:06 UTC
Possibly related so adding my info to the mix (if this needs a seperate ticket instead please let me know), I have the same kernel (although gentoo-sources equivalent version) and driver and /dev/dri does not get populated for me (I'm using the card in headless mode ony - for decoding support) 

❯ lsmod
Module                  Size  Used by
nvidia_modeset       1163264  1
nvidia              39141376  18 nvidia_modeset
nvidia_drm             16384  0

❯ /opt/bin/nvidia-smi
Thu May  5 09:22:03 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.68.02    Driver Version: 510.68.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P2200        On   | 00000000:01:00.0 Off |                  N/A |
| 45%   26C    P8     3W /  75W |      1MiB /  5120MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

❯ ls /dev/dri*
zsh: no matches found: /dev/dri*

❯ lspci -v | grep nvidia -B10

01:00.0 VGA compatible controller: NVIDIA Corporation GP106GL [Quadro P2200] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Dell GP106GL [Quadro P2200]
        Flags: bus master, fast devsel, latency 0, IRQ 191, NUMA node 0, IOMMU group 1
        Memory at b9000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 380fe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 380ff0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 4000 [size=128]
        Expansion ROM at ba000000 [virtual] [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia_drm, nvidia
Comment 2 Jay Faulkner 2022-05-18 22:24:44 UTC
I recently discovered the system this error was occurring on had a stick of ram going bad. I'd consider my input suspect at best at this point, and since I've switched from NVIDIA -> AMD I cannot attempt to reproduce anymore.

Leaving the bug open for Simon Alman's sake; if you figure out your problem Simon I can close this bug.
Comment 3 Simon Alman 2022-08-02 11:57:30 UTC
I fixed my own issue by compiling in CONFIG_DRM_NOUVEAU as a module and booting adding nvidia-drm.modeset=1 to my grub config via GRUB_CMDLINE_LINUX. 

Now /dev/dri is populated as expected:

❯ tree /dev/dri
/dev/dri
├── by-path
│   ├── pci-0000:01:00.0-card -> ../card0
│   └── pci-0000:01:00.0-render -> ../renderD128
├── card0
└── renderD128

1 directory, 4 files

❯ lsmod
Module                  Size  Used by
nvidia_uvm           1241088  0
nvidia_drm             69632  0
nvidia_modeset       1146880  2 nvidia_drm
nvidia              40833024  19 nvidia_uvm,nvidia_modeset
backlight              20480  1 nvidia_modeset

❯ uname -r
5.19.0-gentoo

I've moved on from 5.17.4 and validated agains the latest which is 5.19.0 at the time of comment. I'm happy for this to be closed off as user error.
Comment 4 Ionen Wolkens gentoo-dev 2022-10-02 03:59:32 UTC
*** Bug 873751 has been marked as a duplicate of this bug. ***
Comment 5 Larry the Git Cow gentoo-dev 2022-10-02 03:59:52 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=619d0545c04fd9b68c4ab27a1baaa63521c32e9d

commit 619d0545c04fd9b68c4ab27a1baaa63521c32e9d
Author:     Ionen Wolkens <ionen@gentoo.org>
AuthorDate: 2022-10-01 16:18:05 +0000
Commit:     Ionen Wolkens <ionen@gentoo.org>
CommitDate: 2022-10-02 03:58:23 +0000

    x11-drivers/nvidia-drivers: warn about simpledrm again + FB/nouveau
    
    Thought SIMPLEDRM issues had improved. Last time I tried, X was still
    working just without tty console display so not "that" bad (that was
    with kernel 5.14.x or so).
    
    gentoo-kernel-bin uses Fedora's configs which enables SIMPLEDRM since
    5.17.x or so. Formerly without FB_EFI then later re-enabled (was under
    the impression this improved things, but I only try -bin when
    stabilizing drivers, so 5.15.x), however SIMPLEDRM=y takes priority
    and X/wayland breaks entirely and then messes up the tty (worse than
    before).
    
    Difference between gentoo-kernel-bin and Fedora's is that they now
    patch their kernel to let this configuration work:
    https://src.fedoraproject.org/rpms/kernel/blob/e762b5dd/f/patch-5.19-redhat.patch#_882
    (seems they do not do this for kernel-6.0, unsure for status with it)
    
    Have not found a (working) way to disable SIMPLEDRM from the kernel's
    commandline, so merely adding a warning for bug #840439 if it's builtin
    
    For FB_EFI or FB_VESA to work (aka get a console), also need to disable
    SYSFB_SIMPLEFB. FB_SIMPLE seems broken since kernel-5.18.13 due to
    another issue. Albeit this doesn't stop X from working.
    
    Ideal would be for gentoo-kernel* to do it by default, but non-bin
    gentoo-kernel users using the generic config can (tested with 5.19.12):
    
    mkdir -p /etc/kernel/config.d &&
    cat <<EOF > /etc/kernel/config.d/50nvidia.config
    # CONFIG_DRM_SIMPLEDRM is not set
    # CONFIG_SYSFB_SIMPLEFB is not set
    EOF
    
    (this is what gentoo-kernel-bin-5.15.x has)
    
    While here also add an overdue warning for builtin nouveau (formerly
    skipped given CONFIG_CHECK was unsuitable), and try to inform about
    making the tty console work even though nvidia-drivers doesn't drive
    it.
    
    Bug: https://bugs.gentoo.org/840439
    Signed-off-by: Ionen Wolkens <ionen@gentoo.org>

 .../nvidia-drivers/nvidia-drivers-390.154.ebuild   | 63 +++++++++++++++++++++-
 .../nvidia-drivers-470.141.03.ebuild               | 63 +++++++++++++++++++++-
 .../nvidia-drivers/nvidia-drivers-510.85.02.ebuild | 63 +++++++++++++++++++++-
 .../nvidia-drivers/nvidia-drivers-515.49.19.ebuild | 63 +++++++++++++++++++++-
 .../nvidia-drivers/nvidia-drivers-515.65.01.ebuild | 63 +++++++++++++++++++++-
 .../nvidia-drivers/nvidia-drivers-515.76.ebuild    | 63 +++++++++++++++++++++-
 6 files changed, 372 insertions(+), 6 deletions(-)
Comment 6 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-10-02 07:46:33 UTC
I've prepared 'g3' config for this, and hopefully will remember to bump when bumping kernels next.
Comment 7 Larry the Git Cow gentoo-dev 2022-10-02 16:15:46 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6509199fb5ec004813bf8d19b36c3382e8d2b3c5

commit 6509199fb5ec004813bf8d19b36c3382e8d2b3c5
Author:     Michał Górny <mgorny@gentoo.org>
AuthorDate: 2022-10-02 07:48:04 +0000
Commit:     Michał Górny <mgorny@gentoo.org>
CommitDate: 2022-10-02 16:15:42 +0000

    sys-kernel/gentoo-kernel: Add a 5.19 config bump reminder
    
    Bug: https://bugs.gentoo.org/840439
    Signed-off-by: Michał Górny <mgorny@gentoo.org>

 sys-kernel/gentoo-kernel/gentoo-kernel-5.19.12.ebuild | 1 +
 1 file changed, 1 insertion(+)
Comment 8 Ionen Wolkens gentoo-dev 2022-10-03 01:29:06 UTC
Thanks.

On a side-note, had a look at fedora's kernel configs for 6.0-rc7 and they're trying to disable CONFIG_FB_* again (no FB_EFI nor FB_VESA), unsure if permanent or just a rawhide-only thing but may want to keep an eye on that whenever do a 6.0.x given without DRM_SIMPLEDRM that leaves nothing for early boot.

6.0-rc7's kernel-x86_64-rhel.config on the other hand keeps both and has SIMPLEDRM disabled, albeit still has SYSFB_SIMPLEFB which will likely still cause issues with nvidia for the console.
Comment 9 Ionen Wolkens gentoo-dev 2022-10-03 01:36:07 UTC
Situation may improve eventually from the drivers side, nvidia is aware of these.

https://github.com/NVIDIA/open-gpu-kernel-modules/issues/228 (simpledrm)
https://github.com/NVIDIA/open-gpu-kernel-modules/issues/341 (sysfb)

Worst case could add a patch to nvidia-drivers to make it force-disable simpledrm, but that'd still leave users without a console and may potentially have other adverse effects.
Comment 10 Ionen Wolkens gentoo-dev 2022-10-04 22:49:06 UTC
Gave gentoo-kernel-bin-5.19.13 and works perfectly for me (just a simple single gpu 1070 setup with EFI though), console and X/wayland work and can switch back & forth.

Think this can be closed, unless want to keep it open to track support for fedora's defaults with nvidia, but it's probably not worth trying to push simpledrm in a generic kernel build for a while.
Comment 11 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-10-05 06:49:48 UTC
Thanks for reporting back. I agree that the bug can be closed at this point. If anything else needs to be done, we can address that in a subsequent bug.