Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 670340 - x11-drivers/nvidia-drivers-410.xx do not work [found workaround]
Summary: x11-drivers/nvidia-drivers-410.xx do not work [found workaround]
Status: RESOLVED DUPLICATE of bug 667362
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Jeroen Roovers (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-05 02:17 UTC by Ivan
Modified: 2021-02-20 03:45 UTC (History)
10 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge--info,6.85 KB, text/plain)
2018-11-05 02:17 UTC, Ivan
Details
dmesg (dmesg.log,68.03 KB, text/x-log)
2018-11-05 02:18 UTC, Ivan
Details
/var/log/messages (messages,6.84 KB, text/plain)
2018-11-05 02:26 UTC, Ivan
Details
Xorg.0.log (Xorg.0.log,30.20 KB, text/plain)
2018-11-05 02:30 UTC, Ivan
Details
.config - 4.18.16-gentoo Kernel Configuration (kernel_config.txt,108.65 KB, text/plain)
2018-11-05 02:32 UTC, Ivan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan 2018-11-05 02:17:05 UTC
Created attachment 554108 [details]
emerge --info

* What I did:

1. Emerged latest nvidia-drivers.
2. Rebooted after that.
3. Noticed that OS itself booted fine.
4. Found out that X server failed to start and I got tty1 promt.

* What I expected:

I expected to see X, sddm and KDE loading successfully.

* What I tried to do:

1. I tried to use every available from portage 410.xx drivers with 4.18.xx kernels and also tried 410.73 with 4.18.xx and 4.19.0. No luck.
2. I tried to make nvidia-xconfig with new driver (from tty1 after first boot with new driver) and reboot later. Didn't work out.
3. I also tried to blacklist nvidia modules. Didn't help.
4. I tried to build nvidia-drivers-410.73 with the patch from here: https://devtalk.nvidia.com/default/topic/1043346/nvidia-driver-v410-73-fails-to-build-functional-modules/ (with modified paths according to Chris Torske at https://bugs.gentoo.org/669902#c1). Builds fine, but doesn't solve my problem.


* What I noticed:

1. Usually after successful launch of X, DE etc, I get the following output:
$ lsmod | grep nv
nvidia_drm             40960  7
nvidia_modeset       1060864  19 nvidia_drm
nvidia              13549568  943 nvidia_modeset

And when I fail to load with new driver, I get only "nvidia" module. 

According to dmesg with 410.73:
ivan@pc ~ $ cat dmesg.log | grep nvid
[    1.692049] nvidia: loading out-of-tree module taints kernel.
[    1.692054] nvidia: module license 'NVIDIA' taints kernel.
[    1.703574] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[    1.703858] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem


And successful load with 396.54:
ivan@pc ~ $ dmesg | grep nvid
[    2.045808] nvidia: loading out-of-tree module taints kernel.
[    2.045813] nvidia: module license 'NVIDIA' taints kernel.
[    2.055768] nvidia-nvlink: Nvlink Core is being initialized, major device number 244
[    2.055979] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    2.063402] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.54  Tue Aug 14 23:08:44 PDT 2018
[    2.065804] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver
[    2.065806] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0
[    2.493934] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
[    8.716686] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs


2. Sometimes I got messages:
timeout 'nvidia-udev.sh add'
slow: 'nvidia-udev.sh add'
timeout: killing 'nvidia-udev.sh add'
slow: 'nvidia-udev.sh add'

Which is similar to https://bugs.gentoo.org/667362#c0

3. I also receive sometimes (again, sometimes) following messages in /var/log/messages:
NVRM: API mismatch: the client has the version 410.73, but\x0aNVRM: this kernel module has the version 396.54.  Please\x0aNVRM: make sure that this kernel module and all NVIDIA driver\x0aNVRM: components have the same version.

However, I made sure that I built 410.73 against the kernel that I load. Checked multiple times. 

I also think that this might be kind of hardware problem, because I received kernel panic when I did 'emerge @module-rebuild' to re-emerge nvidia-driver-410.73 after I tried to boot with 410.73. But I am not sure.
Comment 1 Ivan 2018-11-05 02:18:05 UTC
Created attachment 554110 [details]
dmesg
Comment 2 Ivan 2018-11-05 02:25:21 UTC
Found out that 

> NVRM: API mismatch: the client has the version 410.73, but\x0aNVRM: this kernel module has the version 396.54.  Please\x0aNVRM: make sure that this kernel module and all NVIDIA driver\x0aNVRM: components have the same version.

took place BEFORE I actually reboot, so that is not relevant.
Comment 3 Ivan 2018-11-05 02:26:41 UTC
Created attachment 554114 [details]
/var/log/messages
Comment 4 Ivan 2018-11-05 02:30:52 UTC
Created attachment 554116 [details]
Xorg.0.log
Comment 5 Ivan 2018-11-05 02:32:48 UTC
Created attachment 554118 [details]
.config - 4.18.16-gentoo Kernel Configuration
Comment 6 Jeroen Roovers (RETIRED) gentoo-dev 2018-11-05 09:13:10 UTC
Comment on attachment 554116 [details]
Xorg.0.log

>[     7.968] (--) Log file renamed from "/var/log/Xorg.pid-3852.log" to "/var/log/Xorg.0.log"

...

>[    10.477] (II) NVIDIA(0): Setting mode "DVI-D-0: nvidia-auto-select @1920x1080 +0+0 {ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0, ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}"
>[   891.841] (II) config/udev: Adding input device Plantronics Plantronics GameCom 780 (/dev/input/event7)

...

>[   891.860] (II) event7  - Plantronics Plantronics GameCom 780: device is a keyboard
>[  6250.366] (II) config/udev: removing device Plantronics Plantronics GameCom 780

...

>[  6282.424] (II) NVIDIA(GPU-0): Deleting GPU-0
>[  6282.426] (II) Server terminated successfully (0). Closing log file.

Looks like it worked just fine.
Comment 7 Ivan 2018-11-06 01:56:28 UTC
Tried with new kernel 4.19.1. As expected, doesn't work too.

Dmesg says:
[  182.952613] udevd[2153]: timeout 'nvidia-udev.sh add'
[  182.952626] udevd[2153]: slow: 'nvidia-udev.sh add' [2305]
[  183.953608] udevd[2153]: timeout: killing 'nvidia-udev.sh add' [2305]
[  183.953622] udevd[2153]: slow: 'nvidia-udev.sh add' [2305]
[  183.953717] udevd[2153]: 'nvidia-udev.sh add' [2305] terminated by signal 9 (Killed)
Comment 8 Ivan 2018-11-09 18:16:01 UTC
So it appears that I was able to circumwent that issue by rethinking all comments about blacklisting modules coming from wise people.

At last I noticed that IF I blacklist all modules to prevent them from loading (by udev, from what I know), I can actually load modules manually via modprobe and somehow that works perfectly. NOTE: I couldn't load or remove nvidia modules if I haven't blacklisted them. 

After that the solution was simple. Probably it's not the best way, maybe it's plain dumb way, but it works for me.

So here's what I did:

1. Added the in /etc/modprobe.d/blacklist.conf following lines:
blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset

(basically, I just blacklisted all nvidia modules that usually are loaded, which you can see by typing 'lsmod | grep -i nvidia' when your DE works)

2. Created file /etc/local.d/nvidia-udev-workaround.start
Added the following lines in it:
#!/bin/sh

echo "NVIDIA WORKAROUND IN PROGRESS";
modprobe nvidia_drm;

3. Made that script executable by:
chmod +x /etc/local.d/nvidia-udev-workaround.start

4. Made sure that local appears in default runlevel:
rc-update show default

If there's no "local", in order to try that workaround, you should add it by rc-update add local default

Then reboot.

Works for me with 410.73 and 415.13 nvidia-drivers, with 4.18.17 and 4.19.1 kernels.
Comment 9 alpir 2018-12-01 06:51:53 UTC
Confirm this bug with kernel 4.19.3 and nvidia-drivers-415.18.
Comment 10 Valeriy Malov 2018-12-15 14:09:52 UTC
Related to bug #667362?

I can reproduce it with GTX 660.
Maybe it's worth changing keywords on 410.x from stable to unstable.
Comment 11 Alexander Polozov 2018-12-16 11:05:32 UTC
(In reply to alpir from comment #9)
> Confirm this bug with kernel 4.19.3 and nvidia-drivers-415.18.

I was able to run my X server with nvidia-drivers-415.18 just comenting one last string "#options nvidia NVreg_DeviceFileMode=432 NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=27 NVreg_ModifyDeviceFiles=1" in /etc/modprobe.d/nvidia.conf
Comment 12 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-17 00:31:33 UTC
*** Bug 667362 has been marked as a duplicate of this bug. ***
Comment 13 Jeroen Roovers (RETIRED) gentoo-dev 2018-12-17 00:32:08 UTC

*** This bug has been marked as a duplicate of bug 667362 ***
Comment 14 David Bařina 2019-11-07 13:27:55 UTC
(In reply to Ivan from comment #8)
> So it appears that I was able to circumwent that issue by rethinking all
> comments about blacklisting modules coming from wise people.
> 
> At last I noticed that IF I blacklist all modules to prevent them from
> loading (by udev, from what I know), I can actually load modules manually
> via modprobe and somehow that works perfectly. NOTE: I couldn't load or
> remove nvidia modules if I haven't blacklisted them. 
> 
> After that the solution was simple. Probably it's not the best way, maybe
> it's plain dumb way, but it works for me.
> 
> So here's what I did:
> 
> 1. Added the in /etc/modprobe.d/blacklist.conf following lines:
> blacklist nvidia
> blacklist nvidia_drm
> blacklist nvidia_modeset
> 
> (basically, I just blacklisted all nvidia modules that usually are loaded,
> which you can see by typing 'lsmod | grep -i nvidia' when your DE works)
> 
> 2. Created file /etc/local.d/nvidia-udev-workaround.start
> Added the following lines in it:
> #!/bin/sh
> 
> echo "NVIDIA WORKAROUND IN PROGRESS";
> modprobe nvidia_drm;
> 
> 3. Made that script executable by:
> chmod +x /etc/local.d/nvidia-udev-workaround.start
> 
> 4. Made sure that local appears in default runlevel:
> rc-update show default
> 
> If there's no "local", in order to try that workaround, you should add it by
> rc-update add local default
> 
> Then reboot.
> 
> Works for me with 410.73 and 415.13 nvidia-drivers, with 4.18.17 and 4.19.1
> kernels.

Same problem here. The module blacklisting helped me (preventing eudev to load the nvidia module). However, the /etc/local.d/nvidia-udev-workaround.start trick is really not necessary, the /etc/modules-load.d/ is a better place to do this.
Comment 15 gletonai 2020-07-02 19:26:14 UTC
(In reply to Ivan from comment #8)
(In reply to David Bařina from comment #14)
This worked.
Comment 16 kartebi 2021-02-20 03:45:55 UTC
I think its -fomit-frame-pointer
deleted /lib/modules, rebuilding kernel and nvidia-drivers without it everything back to normal...