Created attachment 554108 [details] emerge --info * What I did: 1. Emerged latest nvidia-drivers. 2. Rebooted after that. 3. Noticed that OS itself booted fine. 4. Found out that X server failed to start and I got tty1 promt. * What I expected: I expected to see X, sddm and KDE loading successfully. * What I tried to do: 1. I tried to use every available from portage 410.xx drivers with 4.18.xx kernels and also tried 410.73 with 4.18.xx and 4.19.0. No luck. 2. I tried to make nvidia-xconfig with new driver (from tty1 after first boot with new driver) and reboot later. Didn't work out. 3. I also tried to blacklist nvidia modules. Didn't help. 4. I tried to build nvidia-drivers-410.73 with the patch from here: https://devtalk.nvidia.com/default/topic/1043346/nvidia-driver-v410-73-fails-to-build-functional-modules/ (with modified paths according to Chris Torske at https://bugs.gentoo.org/669902#c1). Builds fine, but doesn't solve my problem. * What I noticed: 1. Usually after successful launch of X, DE etc, I get the following output: $ lsmod | grep nv nvidia_drm 40960 7 nvidia_modeset 1060864 19 nvidia_drm nvidia 13549568 943 nvidia_modeset And when I fail to load with new driver, I get only "nvidia" module. According to dmesg with 410.73: ivan@pc ~ $ cat dmesg.log | grep nvid [ 1.692049] nvidia: loading out-of-tree module taints kernel. [ 1.692054] nvidia: module license 'NVIDIA' taints kernel. [ 1.703574] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 1.703858] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem And successful load with 396.54: ivan@pc ~ $ dmesg | grep nvid [ 2.045808] nvidia: loading out-of-tree module taints kernel. [ 2.045813] nvidia: module license 'NVIDIA' taints kernel. [ 2.055768] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 2.055979] nvidia 0000:03:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem [ 2.063402] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 396.54 Tue Aug 14 23:08:44 PDT 2018 [ 2.065804] [drm] [nvidia-drm] [GPU ID 0x00000300] Loading driver [ 2.065806] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:03:00.0 on minor 0 [ 2.493934] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs [ 8.716686] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs 2. Sometimes I got messages: timeout 'nvidia-udev.sh add' slow: 'nvidia-udev.sh add' timeout: killing 'nvidia-udev.sh add' slow: 'nvidia-udev.sh add' Which is similar to https://bugs.gentoo.org/667362#c0 3. I also receive sometimes (again, sometimes) following messages in /var/log/messages: NVRM: API mismatch: the client has the version 410.73, but\x0aNVRM: this kernel module has the version 396.54. Please\x0aNVRM: make sure that this kernel module and all NVIDIA driver\x0aNVRM: components have the same version. However, I made sure that I built 410.73 against the kernel that I load. Checked multiple times. I also think that this might be kind of hardware problem, because I received kernel panic when I did 'emerge @module-rebuild' to re-emerge nvidia-driver-410.73 after I tried to boot with 410.73. But I am not sure.
Created attachment 554110 [details] dmesg
Found out that > NVRM: API mismatch: the client has the version 410.73, but\x0aNVRM: this kernel module has the version 396.54. Please\x0aNVRM: make sure that this kernel module and all NVIDIA driver\x0aNVRM: components have the same version. took place BEFORE I actually reboot, so that is not relevant.
Created attachment 554114 [details] /var/log/messages
Created attachment 554116 [details] Xorg.0.log
Created attachment 554118 [details] .config - 4.18.16-gentoo Kernel Configuration
Comment on attachment 554116 [details] Xorg.0.log >[ 7.968] (--) Log file renamed from "/var/log/Xorg.pid-3852.log" to "/var/log/Xorg.0.log" ... >[ 10.477] (II) NVIDIA(0): Setting mode "DVI-D-0: nvidia-auto-select @1920x1080 +0+0 {ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0, ForceCompositionPipeline=On, ForceFullCompositionPipeline=On}" >[ 891.841] (II) config/udev: Adding input device Plantronics Plantronics GameCom 780 (/dev/input/event7) ... >[ 891.860] (II) event7 - Plantronics Plantronics GameCom 780: device is a keyboard >[ 6250.366] (II) config/udev: removing device Plantronics Plantronics GameCom 780 ... >[ 6282.424] (II) NVIDIA(GPU-0): Deleting GPU-0 >[ 6282.426] (II) Server terminated successfully (0). Closing log file. Looks like it worked just fine.
Tried with new kernel 4.19.1. As expected, doesn't work too. Dmesg says: [ 182.952613] udevd[2153]: timeout 'nvidia-udev.sh add' [ 182.952626] udevd[2153]: slow: 'nvidia-udev.sh add' [2305] [ 183.953608] udevd[2153]: timeout: killing 'nvidia-udev.sh add' [2305] [ 183.953622] udevd[2153]: slow: 'nvidia-udev.sh add' [2305] [ 183.953717] udevd[2153]: 'nvidia-udev.sh add' [2305] terminated by signal 9 (Killed)
So it appears that I was able to circumwent that issue by rethinking all comments about blacklisting modules coming from wise people. At last I noticed that IF I blacklist all modules to prevent them from loading (by udev, from what I know), I can actually load modules manually via modprobe and somehow that works perfectly. NOTE: I couldn't load or remove nvidia modules if I haven't blacklisted them. After that the solution was simple. Probably it's not the best way, maybe it's plain dumb way, but it works for me. So here's what I did: 1. Added the in /etc/modprobe.d/blacklist.conf following lines: blacklist nvidia blacklist nvidia_drm blacklist nvidia_modeset (basically, I just blacklisted all nvidia modules that usually are loaded, which you can see by typing 'lsmod | grep -i nvidia' when your DE works) 2. Created file /etc/local.d/nvidia-udev-workaround.start Added the following lines in it: #!/bin/sh echo "NVIDIA WORKAROUND IN PROGRESS"; modprobe nvidia_drm; 3. Made that script executable by: chmod +x /etc/local.d/nvidia-udev-workaround.start 4. Made sure that local appears in default runlevel: rc-update show default If there's no "local", in order to try that workaround, you should add it by rc-update add local default Then reboot. Works for me with 410.73 and 415.13 nvidia-drivers, with 4.18.17 and 4.19.1 kernels.
Confirm this bug with kernel 4.19.3 and nvidia-drivers-415.18.
Related to bug #667362? I can reproduce it with GTX 660. Maybe it's worth changing keywords on 410.x from stable to unstable.
(In reply to alpir from comment #9) > Confirm this bug with kernel 4.19.3 and nvidia-drivers-415.18. I was able to run my X server with nvidia-drivers-415.18 just comenting one last string "#options nvidia NVreg_DeviceFileMode=432 NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=27 NVreg_ModifyDeviceFiles=1" in /etc/modprobe.d/nvidia.conf
*** Bug 667362 has been marked as a duplicate of this bug. ***
*** This bug has been marked as a duplicate of bug 667362 ***
(In reply to Ivan from comment #8) > So it appears that I was able to circumwent that issue by rethinking all > comments about blacklisting modules coming from wise people. > > At last I noticed that IF I blacklist all modules to prevent them from > loading (by udev, from what I know), I can actually load modules manually > via modprobe and somehow that works perfectly. NOTE: I couldn't load or > remove nvidia modules if I haven't blacklisted them. > > After that the solution was simple. Probably it's not the best way, maybe > it's plain dumb way, but it works for me. > > So here's what I did: > > 1. Added the in /etc/modprobe.d/blacklist.conf following lines: > blacklist nvidia > blacklist nvidia_drm > blacklist nvidia_modeset > > (basically, I just blacklisted all nvidia modules that usually are loaded, > which you can see by typing 'lsmod | grep -i nvidia' when your DE works) > > 2. Created file /etc/local.d/nvidia-udev-workaround.start > Added the following lines in it: > #!/bin/sh > > echo "NVIDIA WORKAROUND IN PROGRESS"; > modprobe nvidia_drm; > > 3. Made that script executable by: > chmod +x /etc/local.d/nvidia-udev-workaround.start > > 4. Made sure that local appears in default runlevel: > rc-update show default > > If there's no "local", in order to try that workaround, you should add it by > rc-update add local default > > Then reboot. > > Works for me with 410.73 and 415.13 nvidia-drivers, with 4.18.17 and 4.19.1 > kernels. Same problem here. The module blacklisting helped me (preventing eudev to load the nvidia module). However, the /etc/local.d/nvidia-udev-workaround.start trick is really not necessary, the /etc/modules-load.d/ is a better place to do this.
(In reply to Ivan from comment #8) (In reply to David Bařina from comment #14) This worked.
I think its -fomit-frame-pointer deleted /lib/modules, rebuilding kernel and nvidia-drivers without it everything back to normal...