Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 504326

Summary: x11-drivers/nvidia-drivers-334.21 - efi mode boot problem with /opt/bin/nvidia-smi
Product: Gentoo Linux Reporter: Ulenrich <ulenrich>
Component: Current packagesAssignee: David Seifert <soap>
Status: RESOLVED DUPLICATE    
Severity: normal CC: ionen, qrilka, stijn+gentoo, xarthisius, zerochaos
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Ulenrich 2014-03-12 15:58:55 UTC
I am able to boot in
hybrid-mbr and
efi mode

If I boot in efi mode I get a big scattered bar above the kde menu bar.
I found out it is related to /opt/bin/nvidia-smi

I copied /etc/udev/rules.d/99-nvidia.rules
and disable the first Action=add line, which starts
/usr/lib/udev/nvidia-udev.sh
      add|ADD)    /opt/bin/nvidia-smi

This is a workaround until I am relogin:
I get the same video error (a big scattering bar) again.
Plan: I am going to try without any bin/nvidia-smi (removing the file)
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2014-03-12 16:01:25 UTC
Can you attach a screenshot showing the big scattering bar?
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2014-03-12 16:10:31 UTC
Also,

1) Please post your `emerge --info' output in a comment.
2) Tell us what graphics cards are installed on that system.
Comment 3 Ulenrich 2014-03-12 16:35:19 UTC
I cannot provide a screenshot because it is somehow input related:
if I move the mouse or 
I hack some text into a Konsole window 
which is as big such that it reaches the place of the scattering bar.

I have an older macMini 
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation C79 [GeForce 9400] [10de:0861] (rev b1)

I found out it is not directly /opt/bin/nvidia-smi related:
The first login without having that executable is working perfectly but with any relogin into kde I get the same quirky screen. I already have the experience it can crash my system after some hours ignoring this.

I have Gentoo~unstable systemd-208.9999 system (I tried this git because of this video error). Mesa I tried the stable one and the very new mesa-10.1 version. It is not related to my kernel, because I have copied all of a derivative Debian-siduciton kernel which results the same behavior. But starting the Debian installation with that linux-13.6 is without quirks. Debian-sid has gcc-4.8,systemd-204,mesa-9.2.2 installed but else is quiet the same as on my Gentoo~unstable.
Comment 4 Ulenrich 2014-03-12 16:41:41 UTC
It may be more general efi nvidia related, only efi mode boot: 
Because of this video error I installed a very new partition having Gentoo+stable without proprietary nvidia:
The first boot stalls with some video error on the console long before X:
A crash with no input possible. Any reboot as well.

But If I boot successfully any other partition and then WARM reboot
my new Gentoo+stable noveau installation works well.

This doesn't happen booted in hybrid-mbr mode.
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2014-03-13 00:45:05 UTC
(In reply to Ulenrich from comment #3)
> I cannot provide a screenshot because it is somehow input related:

Use a (camera) phone, maybe? Or describe in (more) words what you see. Right now we have no idea what it is you are seeing.
Comment 6 Ulenrich 2014-03-14 11:16:35 UTC
I wonder how I get this my bug a valuable one:
I have written too much, 
at first I thought I knew the cause, but only now I know:

It is that proprietary nvidia-drivers are not fully support efi:
It is only possible to load the xorg-server once.
Thus if there is an udev rule which pulls in the nvidia kernel module early
then the second start, the loading of kdm is scrambled, something memory of input related

Workaround for me:
1. copy /etc/udev/rules.d/99-nvidia.rules 
commenting out 1. line which indirectly started /opt/bin/smi

2. /usr/share/config/kdm/kdmrc 
TerminateServer=false
This way I can do relogins with X-kdm

Maybe this is only an issue for older efi (I have a MacMini of 2009)
and not modern uefi versions.
Comment 7 Jeroen Roovers (RETIRED) gentoo-dev 2014-03-14 14:35:54 UTC
(In reply to Ulenrich from comment #6)
> It is that proprietary nvidia-drivers are not fully support efi:
> It is only possible to load the xorg-server once.
> Thus if there is an udev rule which pulls in the nvidia kernel module early
> then the second start, the loading of kdm is scrambled, something memory of
> input related

Trying to load nvidia.ko a second time is a NOOP. It looks like your system is perhaps trying to start two X servers, which has nothing to do with nvidia.ko.

> Workaround for me:
> 1. copy /etc/udev/rules.d/99-nvidia.rules 
> commenting out 1. line which indirectly started /opt/bin/smi.

x11-drivers/nvidia-drivers does /not/ install /etc/udev/rules.d/99-nvidia.rules

It does install /lib/udev/rules.d/99-nvidia.rules which does not include that line. If you have that file in /etc, then you should remove it anyway or make sure it works properly yourself.

> 2. /usr/share/config/kdm/kdmrc 
> TerminateServer=false
> This way I can do relogins with X-kdm

I cannot tell what you actually changed there, or how it directly relates to x11-drivers/nvidia-drivers.

> Maybe this is only an issue for older efi (I have a MacMini of 2009)
> and not modern uefi versions.

You keep mentioning this, but EFI and UEFI shouldn't affect what the operating system does while starting up services.
Comment 8 Ulenrich 2014-03-14 15:29:31 UTC
@Jeroen 
I copied this file 
/usr/lib64/udev/rules.d/99-nvidia.rules

to /etc/udev/rules.d/99-nvidia.rules 
to override and outcomment this first line:
ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"

Which runs /usr/lib64/udev/nvidia-udev.sh
with this action in pre X boot stage:
    add|ADD)
        /opt/bin/nvidia-smi > /dev/null
Comment 9 Stijn Tintel 2015-03-23 23:54:05 UTC
Seeing this problem as well, with nvidia-drivers-346.47. Disabling nvidia-smi in nvidia-udev.sh solves the problem, otherwise I see a stack trace during boot and pretty soon after the system freezes. Please fix this, this causes headaches after every kernel update.

--- nvidia-udev.sh.bak  2015-03-24 00:51:43.262017277 +0100
+++ nvidia-udev.sh      2015-03-24 00:44:44.543335853 +0100
@@ -7,7 +7,7 @@
 
 case $1 in
        add|ADD)
-               /opt/bin/nvidia-smi > /dev/null
+               #/opt/bin/nvidia-smi > /dev/null
                ;;
        remove|REMOVE)
                rm -f /dev/nvidia*
Comment 10 Stijn Tintel 2015-06-28 18:57:02 UTC
Once again bitten by this bug, caused by the udev script that runs "nvidia-smi" after modprobe nvidia:

jun 28 20:27:54 taz kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
jun 28 20:27:54 taz kernel: IP: [<ffffffff815498a0>] __down+0x3b/0x8e
jun 28 20:27:54 taz kernel: PGD 6603db067 PUD 306120067 PMD 0 
jun 28 20:27:54 taz kernel: Oops: 0002 [#1] PREEMPT SMP 
jun 28 20:27:54 taz kernel: Modules linked in: nvidia(PO+) dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag bnep bluetooth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntr
jun 28 20:27:54 taz kernel:  snd_pcm e1000e snd_timer ptp snd i2c_i801 lpc_ich firewire_ohci pps_core xhci_pci soundcore mei_me shpchp tpm_infineon tpm_tis processor tpm button nls_iso8859_1 nls_cp437 vfat fat openvswitch pptp gre pppox ppp_generic slhc netconsole vhba(O) vhost_net tun vhost 
jun 28 20:27:54 taz kernel:  hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid ohci_pci ohci_hcd uhci_hcd usb_storage hid arcmsr sg ehci_pci xhci_hcd ehci_hcd sr_mod cdrom firewire_core crc_itu_t usbcore usb_common [last unloaded: nvidia]
jun 28 20:27:54 taz kernel: CPU: 9 PID: 32388 Comm: nvidia-smi Tainted: P           O    4.0.5-gentoo #1
jun 28 20:27:54 taz kernel: Hardware name: System manufacturer System Product Name/P9X79 WS, BIOS 4701 08/26/2014
jun 28 20:27:54 taz kernel: task: ffff880612a26340 ti: ffff88082633c000 task.ti: ffff88082633c000
jun 28 20:27:54 taz kernel: RIP: 0010:[<ffffffff815498a0>]  [<ffffffff815498a0>] __down+0x3b/0x8e
jun 28 20:27:54 taz kernel: RSP: 0018:ffff88082633fb38  EFLAGS: 00010092
jun 28 20:27:54 taz kernel: RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: ffffffffa24d74a0
jun 28 20:27:54 taz kernel: RDX: ffff88082633fb38 RSI: ffffffff817b579b RDI: ffffffffa24d7480
jun 28 20:27:54 taz kernel: RBP: ffff88082633fb78 R08: 000060f7b0006580 R09: 0000000000000292
jun 28 20:27:54 taz kernel: R10: 00000000000000d0 R11: ffffffffa21f4866 R12: ffffffffa24d7480
jun 28 20:27:54 taz kernel: R13: ffff880612a26340 R14: 00000000000000ff R15: ffff88030422d538
jun 28 20:27:54 taz kernel: FS:  00007f4cbbb2c700(0000) GS:ffff88084fd20000(0000) knlGS:0000000000000000
jun 28 20:27:54 taz kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
jun 28 20:27:54 taz kernel: CR2: 0000000000000000 CR3: 0000000580fd7000 CR4: 00000000000407e0
jun 28 20:27:54 taz kernel: Stack:
jun 28 20:27:54 taz kernel:  ffffffffa24d74a0 0000000000000000 0000000000000003 00000000000000ff
jun 28 20:27:54 taz kernel:  ffff88082633fb78 ffffffffa24d7480 ffff88030cea0000 0000000000000003
jun 28 20:27:54 taz kernel:  ffff88082633fba8 ffffffff8108281c 000060f7b0006580 0000000000000292
jun 28 20:27:54 taz kernel: Call Trace:
jun 28 20:27:54 taz kernel:  [<ffffffff8108281c>] down+0x3c/0x50
jun 28 20:27:54 taz kernel:  [<ffffffffa21f4b8f>] nvidia_open+0x3af/0x8c0 [nvidia]
jun 28 20:27:54 taz kernel:  [<ffffffffa21f3b28>] nvidia_frontend_open+0x48/0xa0 [nvidia]
jun 28 20:27:54 taz kernel:  [<ffffffff8114a7da>] chrdev_open+0x9a/0x1d0
jun 28 20:27:54 taz kernel:  [<ffffffff8114a740>] ? cdev_put+0x30/0x30
jun 28 20:27:54 taz kernel:  [<ffffffff81144047>] do_dentry_open.isra.13+0xf7/0x320
jun 28 20:27:54 taz kernel:  [<ffffffff811442e9>] vfs_open+0x49/0x50
jun 28 20:27:54 taz kernel:  [<ffffffff81151e42>] do_last+0x132/0xdf0
jun 28 20:27:54 taz kernel:  [<ffffffff8115464b>] path_openat+0x7b/0x630
jun 28 20:27:54 taz kernel:  [<ffffffff810d8f07>] ? acct_account_cputime+0x17/0x20
jun 28 20:27:54 taz kernel:  [<ffffffff81155dd5>] do_filp_open+0x35/0x90
jun 28 20:27:54 taz kernel:  [<ffffffff8154b2c9>] ? _raw_spin_unlock+0x9/0x20
jun 28 20:27:54 taz kernel:  [<ffffffff8116224f>] ? __alloc_fd+0x9f/0x130
jun 28 20:27:54 taz kernel:  [<ffffffff81145344>] do_sys_open+0x124/0x220
jun 28 20:27:54 taz kernel:  [<ffffffff8101094d>] ? syscall_trace_enter_phase1+0x10d/0x180
jun 28 20:27:54 taz kernel:  [<ffffffff81145459>] SyS_open+0x19/0x20
jun 28 20:27:54 taz kernel:  [<ffffffff8154b9b6>] system_call_fastpath+0x16/0x1b
jun 28 20:27:54 taz kernel: Code: 49 89 fc 53 48 bb ff ff ff ff ff ff ff 7f 48 83 ec 28 48 89 4d c0 48 8b 47 28 48 89 57 28 65 4c 8b 2c 25 00 aa 00 00 48 89 45 c8 <48> 89 10 4c 89 6d d0 c6 45 d8 00 4c 89 e7 49 c7 45 00 02 00 00
jun 28 20:27:54 taz kernel: RIP  [<ffffffff815498a0>] __down+0x3b/0x8e
jun 28 20:27:54 taz kernel:  RSP <ffff88082633fb38>
jun 28 20:27:54 taz kernel: CR2: 0000000000000000
jun 28 20:27:54 taz kernel: ---[ end trace 84cb727e5ed71186 ]---
jun 28 20:27:54 taz kernel: note: nvidia-smi[32388] exited with preempt_count 1
Comment 11 Jeroen Roovers (RETIRED) gentoo-dev 2015-07-02 04:37:38 UTC
So should we add a USE=efi (In reply to Stijn Tintel from comment #9)
> Seeing this problem as well, with nvidia-drivers-346.47. Disabling
> nvidia-smi in nvidia-udev.sh solves the problem, otherwise I see a stack
> trace during boot and pretty soon after the system freezes. Please fix this,
> this causes headaches after every kernel update.
> 
> --- nvidia-udev.sh.bak  2015-03-24 00:51:43.262017277 +0100
> +++ nvidia-udev.sh      2015-03-24 00:44:44.543335853 +0100
> @@ -7,7 +7,7 @@
>  
>  case $1 in
>         add|ADD)
> -               /opt/bin/nvidia-smi > /dev/null
> +               #/opt/bin/nvidia-smi > /dev/null
>                 ;;
>         remove|REMOVE)
>                 rm -f /dev/nvidia*

That's not a bug fix. I can't fix problems in nvidia-smi: please talk to Nvidia directly about that.
Comment 12 Stijn Tintel 2015-07-03 14:17:37 UTC
(In reply to Jeroen Roovers from comment #11)
> So should we add a USE=efi (In reply to Stijn Tintel from comment #9)
> > Seeing this problem as well, with nvidia-drivers-346.47. Disabling
> > nvidia-smi in nvidia-udev.sh solves the problem, otherwise I see a stack
> > trace during boot and pretty soon after the system freezes. Please fix this,
> > this causes headaches after every kernel update.
> > 
> > --- nvidia-udev.sh.bak  2015-03-24 00:51:43.262017277 +0100
> > +++ nvidia-udev.sh      2015-03-24 00:44:44.543335853 +0100
> > @@ -7,7 +7,7 @@
> >  
> >  case $1 in
> >         add|ADD)
> > -               /opt/bin/nvidia-smi > /dev/null
> > +               #/opt/bin/nvidia-smi > /dev/null
> >                 ;;
> >         remove|REMOVE)
> >                 rm -f /dev/nvidia*
> 
> That's not a bug fix. I can't fix problems in nvidia-smi: please talk to
> Nvidia directly about that.

The nvidia-udev.sh script doesn't seem to come with the driver, but was added to fix #376527. I am inclined to say that the bug is caused by the udev script, and that it should be fixed there.

The problem doesn't always occur, so I suspect that sometimes nvidia-smi is being run too soon, when the driver isn't fully initialized yet and thus causing problems. Fix could be as simple as adding "sleep 2" or so before running nvidia-smi.

Other option would indeed be a USE flag to optionally install the udev script.
Comment 13 Rick Farina (Zero_Chaos) gentoo-dev 2015-09-18 17:56:30 UTC
(In reply to Stijn Tintel from comment #12)
> The nvidia-udev.sh script doesn't seem to come with the driver, but was
> added to fix #376527. I am inclined to say that the bug is caused by the
> udev script, and that it should be fixed there.
> 
> The problem doesn't always occur, so I suspect that sometimes nvidia-smi is
> being run too soon, when the driver isn't fully initialized yet and thus
> causing problems. Fix could be as simple as adding "sleep 2" or so before
> running nvidia-smi.
> 
> Other option would indeed be a USE flag to optionally install the udev
> script.

the udev script should be run only when the module is inserted, you can try adding some sleep in there to test your theory of nvidia-smi loading before the driver is ready to deal with it.  while I don't love the idea of adding random sleep in there I would be open to a small sleep if it fixes your bug.
Comment 14 Stijn Tintel 2015-10-03 14:28:32 UTC
(In reply to Rick Farina (Zero_Chaos) from comment #13)
> the udev script should be run only when the module is inserted, you can try
> adding some sleep in there to test your theory of nvidia-smi loading before
> the driver is ready to deal with it.  while I don't love the idea of adding
> random sleep in there I would be open to a small sleep if it fixes your bug.

I've added "sleep 1" after I wrote my last comment, and have not seen this bug since.
Comment 15 Rick Farina (Zero_Chaos) gentoo-dev 2015-10-03 21:59:14 UTC
this should be fixed if Jer ever stabilizes the fixed udev script.
Comment 16 Stijn Tintel 2016-02-12 14:37:05 UTC
Another update of nvidia-drivers, and I run into this problem again:

feb 12 15:30:37 taz kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
feb 12 15:30:37 taz kernel: IP: [<ffffffff8156838c>] __down+0x3c/0xa0
feb 12 15:30:37 taz kernel: PGD 8195cd067 PUD 8195cc067 PMD 0
feb 12 15:30:37 taz kernel: Oops: 0002 [#1] PREEMPT SMP
feb 12 15:30:37 taz kernel: Modules linked in: iTCO_wdt iTCO_vendor_support intel_rapl iosf_mbi acpi_cpufreq(-) x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul nvidia(PO+) c
feb 12 15:30:37 taz kernel:  xts gf128mul aes_x86_64 cbc sha512_generic sha256_generic sha1_generic iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi e1000 fuse overlay nfs lockd grace s
feb 12 15:30:37 taz kernel: CPU: 4 PID: 2040 Comm: nvidia-smi Tainted: P           O    4.4.0-gentoo #1
feb 12 15:30:37 taz kernel: Hardware name: System manufacturer System Product Name/P9X79 WS, BIOS 4701 08/26/2014
feb 12 15:30:37 taz kernel: task: ffff8800ad8d5580 ti: ffff88081877c000 task.ti: ffff88081877c000
feb 12 15:30:37 taz kernel: RIP: 0010:[<ffffffff8156838c>]  [<ffffffff8156838c>] __down+0x3c/0xa0
feb 12 15:30:37 taz kernel: RSP: 0018:ffff88081877fbc0  EFLAGS: 00010086
feb 12 15:30:37 taz kernel: RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 00000000000000ff
feb 12 15:30:37 taz kernel: RDX: ffffffffa1a7ab60 RSI: ffffffff817d1fbc RDI: ffffffffa1a7ab40
feb 12 15:30:37 taz kernel: RBP: ffff88081877fc00 R08: 000060f7c0008370 R09: 000000000000003d
feb 12 15:30:37 taz kernel: R10: 0000000000000000 R11: ffffffffa11a979e R12: ffffffffa1a7ab40
feb 12 15:30:37 taz kernel: R13: ffff8800ad8d5580 R14: 0000000000000003 R15: ffff880817c8a1c8
feb 12 15:30:37 taz kernel: FS:  00007f6e1b07d700(0000) GS:ffff88083fc80000(0000) knlGS:0000000000000000
feb 12 15:30:37 taz kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
feb 12 15:30:37 taz kernel: CR2: 0000000000000000 CR3: 0000000819552000 CR4: 00000000000406e0
feb 12 15:30:37 taz kernel: Stack:
feb 12 15:30:37 taz kernel:  ffffffffa1a7ab60 0000000000000000 ffff8800bb6a4600 ffff8808188c8000
feb 12 15:30:37 taz kernel:  0000000000000003 ffffffffa1a7ab40 ffff8800bb6a4600 ffff8808188c8000
feb 12 15:30:37 taz kernel:  ffff88081877fc20 ffffffff8108b1cc 0000000000000282 ffff8800bb6a4600
feb 12 15:30:37 taz kernel: Call Trace:
feb 12 15:30:37 taz kernel:  [<ffffffff8108b1cc>] down+0x3c/0x50
feb 12 15:30:37 taz kernel:  [<ffffffffa11a9977>] nvidia_open+0x257/0x300 [nvidia]
feb 12 15:30:37 taz kernel:  [<ffffffffa11a8328>] nvidia_frontend_open+0x58/0xc0 [nvidia]
feb 12 15:30:37 taz kernel:  [<ffffffff8115996a>] chrdev_open+0x9a/0x1c0
feb 12 15:30:37 taz kernel:  [<ffffffff811598d0>] ? cdev_put+0x20/0x20
feb 12 15:30:37 taz kernel:  [<ffffffff81153a49>] do_dentry_open.isra.13+0x149/0x2d0
feb 12 15:30:37 taz kernel:  [<ffffffff8115488a>] vfs_open+0x4a/0x50
feb 12 15:30:37 taz kernel:  [<ffffffff81162492>] path_openat+0x352/0x1140
feb 12 15:30:37 taz kernel:  [<ffffffff81164599>] do_filp_open+0x79/0xd0
feb 12 15:30:37 taz kernel:  [<ffffffff81569e99>] ? _raw_spin_unlock+0x9/0x20
feb 12 15:30:37 taz kernel:  [<ffffffff81170307>] ? __alloc_fd+0xb7/0x170
feb 12 15:30:37 taz kernel:  [<ffffffff81154bd0>] do_sys_open+0x120/0x210
feb 12 15:30:37 taz kernel:  [<ffffffff81154cd9>] SyS_open+0x19/0x20
feb 12 15:30:37 taz kernel:  [<ffffffff8156a3db>] entry_SYSCALL_64_fastpath+0x16/0x6e
feb 12 15:30:37 taz kernel: Code: bb ff ff ff ff ff ff ff 7f 65 4c 8b 2c 25 80 ae 00 00 48 83 e4 f0 48 83 ec 20 48 8b 47 28 48 89 14 24 48 89 67 28 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10
feb 12 15:30:37 taz kernel: RIP  [<ffffffff8156838c>] __down+0x3c/0xa0
feb 12 15:30:37 taz kernel:  RSP <ffff88081877fbc0>
feb 12 15:30:37 taz kernel: CR2: 0000000000000000
feb 12 15:30:37 taz kernel: ---[ end trace 1d67269097ae32d1 ]---
feb 12 15:30:37 taz kernel: note: nvidia-smi[2040] exited with preempt_count 1

As I mentioned before, adding "sleep 1" in /lib/udev/nvidia-udev.sh before running /opt/bin/nvidia-smi fixes the problem.

#!/bin/sh

if [ $# -ne 1 ]; then
        echo "Invalid args" >&2
        exit 1
fi

case $1 in
        add|ADD)
                #hopefully this prevents infinite loops like bug #454740
                if lsmod | grep -iq nvidia; then
                        sleep 1
                        /opt/bin/nvidia-smi > /dev/null
                fi
                ;;
        remove|REMOVE)
                rm -f /dev/nvidia*
                ;;
esac

exit 0


Please add the sleep 1 in the script. Thanks.
Comment 17 Stijn Tintel 2016-05-20 20:03:28 UTC
And once again:

vgaarb: device changed decodes: PCI:0000:01:00.0,olddecodes=none,decodes=none:owns=io+mem
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8156a86c>] __down+0x3c/0xa0
PGD 234cf8067 PUD 23487a067 PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nvidia(PO+) rfcomm xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag
 snd_pcm snd_timer snd mei_me soundcore shpchp nls_iso8859_1 nls_cp437 vfat fat processor tpm_infineon tpm_tis tpm button sch_fq_codel openvswitch nf_defrag_ipv6
 hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid ohci_pci ohci_hcd uhci_hcd usb_storage hid arcmsr s
CPU: 1 PID: 21613 Comm: nvidia-smi Tainted: P           O    4.4.10-gentoo #1
Hardware name: System manufacturer System Product Name/P9X79 WS, BIOS 4701 08/26/2014
task: ffff880818195580 ti: ffff8802321a8000 task.ti: ffff8802321a8000
RIP: 0010:[<ffffffff8156a86c>]  [<ffffffff8156a86c>] __down+0x3c/0xa0
RSP: 0018:ffff8802321abbc0  EFLAGS: 00010086
RAX: 0000000000000000 RBX: 7fffffffffffffff RCX: 00000000000000ff
RDX: ffffffffa27e5420 RSI: ffffffff817d28fc RDI: ffffffffa27e5400
RBP: ffff8802321abc00 R08: 000060f7c0008040 R09: 0000000000000013
R10: 0000000000000000 R11: ffffffffa1f0e75e R12: ffffffffa27e5400
R13: ffff880818195580 R14: 0000000000000003 R15: ffff880817db7a08
FS:  00007fa506fea700(0000) GS:ffff88083fc20000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000007b1084000 CR4: 00000000000406e0
Stack:
 ffffffffa27e5420 0000000000000000 ffff880233e9e600 ffff880232198000
 0000000000000003 ffffffffa27e5400 ffff880233e9e600 ffff880232198000
 ffff8802321abc20 ffffffff8108b1ec 0000000000000282 ffff880233e9e600
Call Trace:
 [<ffffffff8108b1ec>] down+0x3c/0x50
 [<ffffffffa1f0e937>] nvidia_open+0x257/0x300 [nvidia]
 [<ffffffffa1f0d323>] nvidia_frontend_open+0x53/0xa0 [nvidia]
 [<ffffffff81159f6a>] chrdev_open+0x9a/0x1c0
 [<ffffffff81159ed0>] ? cdev_put+0x20/0x20
 [<ffffffff81154049>] do_dentry_open.isra.13+0x149/0x2d0
 [<ffffffff81154e8a>] vfs_open+0x4a/0x50
 [<ffffffff81162ca7>] path_openat+0x557/0x10e0
 [<ffffffff81164b69>] do_filp_open+0x79/0xd0
 [<ffffffff8156c369>] ? _raw_spin_unlock+0x9/0x20
 [<ffffffff81170917>] ? __alloc_fd+0xb7/0x170
 [<ffffffff811551d0>] do_sys_open+0x120/0x210
 [<ffffffff811552d9>] SyS_open+0x19/0x20
 [<ffffffff8156c89b>] entry_SYSCALL_64_fastpath+0x16/0x6e
Code: bb ff ff ff ff ff ff ff 7f 65 4c 8b 2c 25 80 ae 00 00 48 83 e4 f0 48 83 ec 20 48 8b 47 28 48 89 14 24 48 89 67 28 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10
RIP  [<ffffffff8156a86c>] __down+0x3c/0xa0
 RSP <ffff8802321abbc0>
CR2: 0000000000000000
---[ end trace 2302e17022023252 ]---
note: nvidia-smi[21613] exited with preempt_count 1

Please fix this. There are 2 solutions offered, sleep which is ugly, or make installation of this script optional via a USE flag. Make a choice and fix it. I am getting really pissed of that I need to hard reset my box every time nvidia-drivers was updated.
Comment 18 f0o 2020-10-31 08:58:50 UTC
Old ticket, just shipping in my 2cents as I found it while writing a new report about something related:

01:00.0 3D controller: NVIDIA Corporation TU117GLM [Quadro T2000 Mobile / Max-Q] (rev a1)
        Subsystem: Dell TU117GLM [Quadro T2000 Mobile / Max-Q]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

I cannot reproduce this issue or get any stacktrace whatsoever..
Comment 19 Ionen Wolkens gentoo-dev 2021-03-04 21:33:06 UTC
nvidia-udev.sh will likely be removed, so if there was still any issues here it'll likely go away with it

*** This bug has been marked as a duplicate of bug 454740 ***