452762 – x11-drivers/nvidia-drivers - install udev rule only when build with USE="-X"

Bug 452762 - x11-drivers/nvidia-drivers - install udev rule only when build with USE="-X"

Summary: x11-drivers/nvidia-drivers - install udev rule only when build with USE="-X"

Status:	RESOLVED OBSOLETE

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	AMD64 Linux

Importance:	Normal minor (vote)
Assignee:	David Seifert

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-18 04:54 UTC by Oliver Deppert
Modified:	2020-11-18 18:32 UTC (History)
CC List:	5 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
lsmod before nvidia udev rule (lsmod_before,596 bytes, text/plain) 2013-03-06 18:16 UTC, Oliver Deppert	Details
lsmod after nvidia udev rule (lsmod_after,629 bytes, text/plain) 2013-03-06 18:17 UTC, Oliver Deppert	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Oliver Deppert 2013-01-18 04:54:04 UTC

Hi all,

since nvidia-drivers-295.33 there exists an udev-rule in order to set the proper permissions of the video-device for all who uses cuda, but doesn't run an X-server. See "nvidia-udev.sh" and the coresponding rule under /lib/udev/rules.d.

The changelog reads "Add support for creating device nodes for NVIDIA graphics cards when not using X. Users of CUDA only specificially need this. Work for this done by Rick Farina <sidhayn@gmail.com> bug #376527"

I suggest to only install this udev rule, if nvidia-drivers is installed without the X-flag. Typically, when X starts, the permissions for the nvidia-device is set at this point and doesn't need to be set earlier by udev.

Background:
I'm running an initramfs with Fbsplash and uvesafb compiled directly into the kernel. As soon as my system starts and udev detects my nvidia card the script nvidia-udev.sh gets executed. This results in a black screen and the splash screen vanishes. If I remove the nvidia-udev rule, the splash-screen starts and can be seen until X is started.

Due to the fact, that users without X (server-environment) typically doesn't use a splash screen, I suggest to install and use the nvidia-udev rule in order to get access to the nvidia-device for cuda and stuff.
All the others using X don't really need this udev-rule, cause as soon as X starts, the nvidia-module gets loaded automatically and the permissions for the device are set at this point.

Trying to set the permissions earlier in the boot phase by udev will destroy the possibility to run a splash screen by fbsplash.

regards,
Oliver Deppert

Reproducible: Always

Steps to Reproduce:
1. installing >=nvidia-drivers-295.33, fbsplash and uvesafb
2. rebooting system => splash-screen vanishes as soon as udev execudes the nvidia-rule early in the boot-phase
3. removing 99-nvidia-rule under /lib/udev/rules.d
4. rebooting system => splash-screen and progress-bar can be seen up to X starts
Actual Results:
loading nvidia-module and/or setting permissions for nvidia-nodes destroys the framebuffer available for fbsplash.

Expected Results:
setting permissions of nvidia nodes or loading nvidia-module shouldn't overwrite the video-framebuffer

Comment 1 Sergey Popov gentoo-dev

2013-01-18 09:23:47 UTC

please, do NOT CC arches yourself

Comment 2 Sergey Popov gentoo-dev

2013-01-18 09:27:04 UTC

Let's see what maintainers think about this...

Comment 3 Doug Goldstein (RETIRED) gentoo-dev

2013-01-18 15:03:25 UTC

Its not about permissions, its about initializing hardware at boot. The cards are uninitialized and the drivers not loaded. So you're saying once you load the nvidia driver that uvesafb and fbsplash don't work. So when you shutdown it doesn't give you a shutting down splash screen? That seems like a failure to me wrt to the configuration with uvesafb and fbsplash. Are you using the frame buffer driver for nvidia?

Comment 4 Oliver Deppert 2013-01-19 07:25:36 UTC

Hi Doug,

>So when you shutdown it doesn't give you a shutting down splash screen? 

No, when I shut down my system I got a nice bootscreen. Also the console decoration works fine when I switch to the console during running a X-session. So, in principle uvesafb and fbsplash/fbdecoration works perfectly.

>That seems like a failure to me wrt to the configuration with uvesafb and fbsplash.

No...as I sayed, everthing works fine if I remove the udev-rule for the nvidia-driver manually. My start-up phase looks like this:

uvesafb is hard-coded into the kernel.

1) boot into self build initramfs
2) asking my password for the crypted hard-drive (cryptsetup)
3) after this my initramfs calls a "reprint" on the framebuffer (uvesafb) to bring the bootscreen back on screen. 
4) I can see the bootscreen for one or two seconds until udev execudes the nvidia-script. As soon as this happens, my boot-screen vanishes and the screen is black. If I either remove the udev-rule or remove the nvidia-module, the bootscrenn doesn't vanish.

For me it looks like that the udev-rule "initializes" the nvidia-card by loading the module. As soon as this happens, the nvidia-driver "takes" the control over the framebuffer and therefore my screen gests blanked. 

Not tested yet: I think, it could work if I implement udev (with all its rules) and the nvidia-module already in my initramfs. Then I think, udev will be called before the bootscreen starts and a "repaint" will force the bootscreen back to the uvesafb. But this is just a thesis, cause as I sayed...on shutdown (nvidia-module running) I can see a splash-screen!
But on the other hand, I don't like to build such a complex initramfs, cause I don't need any rules and modules such early in the boot-phase.

>Are you using the frame buffer driver for nvidia?
As far as I know, there is no frame-buffer driver in the properetary nvidia-driver. I followed the gentoo-wiki which recommends to use uvesafb complementary to the nvidia-driver. If there is a "framebuffer-driver" in the properetary nvidia-driver, I could give a try on this?!

I also tried different things to make this combination running during the boot-phase, like:
(/etc/modprobe.d/blacklist.conf)
# Sometimes loading a framebuffer driver at boot gets the console black
install pci:v*d*sv*sd*bc03sc*i* /bin/true

but this didn't have an effect concerning my problem.

The only "workaround" is to remove the nvidia udev-rule.

Therefore I was asking me, why I need this udev-rule when I have running a X-environment. Due to the fact, that the nvidia-ebuild already has the "X-flag" it would be easy to implement, cause I think everyone building the nvidia-module with X don't need the rule and the others (no X, servers) typically don't have a splash screen!

regards,
Oliver Deppert

Comment 5 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-04 22:56:46 UTC

Please don't remove this use flag.  I can build with X support, use my intel for X 99% of the time, I'd like to be able to compute on my nvidia.

Comment 6 Jeroen Roovers (RETIRED) gentoo-dev

2013-03-05 04:14:45 UTC

Couldn't you use CONFIG_PROTECT to stop updates from changing your udev rules? For special cases like yours that's the ideal solution.

Also, I never see this behaviour (nvidia.ko only gets loaded when X is started) but then I don't use uvesafb either. What could be triggering this in your case?

Comment 7 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-05 04:33:26 UTC

Tbh I'm not certain how this happens.  The udev rule doesn't load the kernel module for you at all, it triggers when the module is loaded.  The rule simply creates the devices that would also be created when you start X.  I don't understand how this would interfere with your framebuffer but I really don't see how it could unless the framebuffer is configured in some odd way.

you are specifically passing video=uvesafb at boot right?

Can we get any kind of debug info on what exactly is happening? dmesg may have something useful?

Comment 8 Oliver Deppert 2013-03-06 18:16:57 UTC

Created attachment 341134 [details]
lsmod before nvidia udev rule

=> no nvidia module!

Comment 9 Oliver Deppert 2013-03-06 18:17:59 UTC

Created attachment 341136 [details]
lsmod after nvidia udev rule

=> after triggering nvidia udev rule the module is loaded "automatically" !

Comment 10 Oliver Deppert 2013-03-06 18:18:24 UTC

Hi,

ok...thanks for the hints, but my framebuffer device definitivly works like a charme...

and yes, I also pass video=uvesavb in grub...

so far, I figured out, that I don't need to delete the rule...at the moment I've only commented this line:

#ACTION=="add", DEVPATH=="/module/nvidia", SUBSYSTEM=="module", RUN+="nvidia-udev.sh $env{ACTION}"

I can reproduce the bug quite simple:
-starting my laptop
-after xdm starts, I change to a console (ALT+F1)
-I stopped xdm (/etc/init.d/xdm stop)
-unload the nvidia module (rmmod nvidia)

=> after this, I made a print out of lsmod (lsmod_before, attached)

-then I triggered the command manualy from my console (/lib/udev/nvidia-udev.sh add)

=> my console vanishes I didn't come back....I've attached a video, showing the bug descriped...

I've also made an print-out of lsmod "after" the screen becomes dark (lsmod_after, attached)

As you can see, the nvidia module is loaded after I've excuted the command "nvidia-udev.sh add"...

exactly the same happens, when I start my laptop...I can see the bootsplash/framebuffer decoration up to the moment udev trigger the nvidia rule...then, the screen gets black until xdm starts...then it comes back...

so, at the moment it looks like "loading" the nvidia module "overwrites" the framebuffer console!

regards,
Oliver

Comment 11 Oliver Deppert 2013-03-06 18:20:08 UTC

btw...

neither /var/log/messages nor dmesg showed anything in the log after the screen vanishes....

Comment 12 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-06 18:29:57 UTC

(In reply to comment #9)
> Created attachment 341136 [details]
> lsmod after nvidia udev rule
> 
> => after triggering nvidia udev rule the module is loaded "automatically" !

The rule doesn't load the module, at least it didn't in my testing.  Udev may very well be loading the module, that's what udev does after all, it detects your hardware and loads appropriate modules...

Comment 13 Oliver Deppert 2013-03-06 18:32:05 UTC

here is the link to the video showing the bug in detail, after switching to a console, stopping xdm and unloading nvidia module....

https://docs.google.com/file/d/0B2OfgWxpfOBSREJ4bFJwS2puelk/edit?usp=sharing

you can see what's happening, as soon as I trigger "lib/udev/nvidia-udev.sh add" manually...

Comment 14 Oliver Deppert 2013-03-06 18:34:37 UTC

(In reply to comment #12)
> (In reply to comment #9)
> > Created attachment 341136 [details]
> > lsmod after nvidia udev rule
> > 
> > => after triggering nvidia udev rule the module is loaded "automatically" !
> 
> The rule doesn't load the module, at least it didn't in my testing.  Udev
> may very well be loading the module, that's what udev does after all, it
> detects your hardware and loads appropriate modules...

aha, strage....very strage....in my lsmod no nvidia module is there....as soon as a run "/lib/udev/nvidia-udev.sh add", there is a nvidia module....so, for me it looks like "creating" the device /dev/nvidia forces udev to load the module in the background....these are the results of my testing....sorry for not to be consistent with your results...

regards,
Oliver

Comment 15 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-06 19:00:33 UTC

There seems to be an amount of confusion here.

the rule does not load the module... it doesn't, it just doesn't.

The rule triggers when the module is loaded.

What you are doing it manually running the rule, claiming the hardware is detected and that is causing it to create devices which causes the kernel to detect the hardware and load the module.

This is enormously non-standard and makes no sense as a test case.

The kernel is detecting your nvidia at boot, which then loads the module, which then runs the rule.  If you want a test case that makes sense, what happens when you boot without that rule in place vs with?  The simple act of creating the devices shouldn't have any effect on your console and that is all this rule does.  Something else is loading the module or this rule wouldn't trigger at all (you know, unless you manually run it).

Comment 16 Oliver Deppert 2013-03-07 04:57:21 UTC

>>If you want a test case that makes sense, what happens when you boot without that rule in place vs with?

As I already mentioned above, if I remove the rule and reboot, everything works like it should...no nvidia module is loaded until xserver starts...if I boot with the rule, the screen gets black as soon as the rule gets executed! Then the device gets created, the kernel detects my nvidia card and udev forces to load the module...there, even with such a testcase it looks like the udev rule trigges the module to be loaded.

>>The kernel is detecting your nvidia at boot, which then loads the module, which then runs the rule.

No, the kernel is only able to detect my nvidia card as soon as /dev/nvidia is available...before that, my kernel doesn't no anything about my nvidia card. The rule makes the /dev/nvidia available at boot...then the kernel detects this and with the help of udev it loads the module...triggering the udev rule manually shows exactly this behavior...at the beginning there is no /dev/nvidia...after triggering the rule, the device gets created and "then" the module is load, not vize versa...

regards,
Oliver

Comment 17 Oliver Deppert 2013-03-07 05:37:48 UTC

Hi,

ok, I rechecked everything....and I have to say, that you are right...sorry...the order is really: reboot - kernel detects nvidia card - kernel loads the nvidia card - udev rule gets executed...

I started my laptop without xdm in the default runlevel and without the nvidia-udev rule...I've found out, that the module is already loaded "before" xdm starts...at the moment I'm not sure what exactly is loading the nvidia module...I don't need this module such early...I need the module just before or at the same time my xserver starts...I've checked everthing...in /etc/conf.d/modules no nvidia module is listed...but the kernel "automatically" loads the module quite early in the boot phase...

nevertheless, after this test and running from console (running uvesafb and fbdecoration) the command "/opt/bin/nvidia-smi" "manually"...the screen vanishes...

so, at the moment I claim, that nvidia-smi, when executed is doing something with my framebuffer which doesn't like this...

regards,
Oliver

Comment 18 Oliver Deppert 2013-03-07 05:59:23 UTC

Hi all,

maybe I've found a compromise...

As I said, at the moment it looks like running /opt/bin/nvidia-smi interfers with my framebuffer in some kind of way...on the other hand, some of you needs /dev/nvidia to be created to run some GPU calculations on the nvidia-card even when running the X-server with a graphic card different from nvidia...

so, I've found out (source: https://forums.opensuse.org/english/get-technical-help-here/hardware/477074-nvidia-gt440-framebuffer.html and here: https://forums.opensuse.org/english/get-technical-help-here/64-bit/471770-12-1-hangs-time-time-2.html), that running:

/opt/bin/nvidia-smi -pm 1
works for me...so far, running with this option "also" creates /dev/nvidia (good for you) and the framebuffer is still there (good for me)...the only thing is, that the screen fliggers shortly after running this, but the uvesafb doesn't vanish.

on the other hand:
/opt/bin/nvidia-smi -pm 0 (default, just like running /opt/nvidia-smi) breaks again my frambuffer.

I suggest to modify the nvidia rule to /opt/bin/nvidia-smi -pm 1....and then everyone should be happy...

I'm not sure what is causing this problem...but I know, that with older nvidia drivers (27x.x and before) I've never had this fb-problem....but I'm also not sure, if the udev-rule was available for the "old" nvidia-modules...

maybe someone has an idea, why -pm 1 works and -pm 0 not?!

regards,
Oliver

Comment 19 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-07 15:16:44 UTC

> maybe someone has an idea, why -pm 1 works and -pm 0 not?!

I'm glad you found the real issue, and I'm more than willing to compromise IF someone, anyone, can answer the above question (or at least a few people test it to make sure there are no side effects.  Honestly this setting is rather meaningless to me even after reading the man.

I wrote the udev rule because I needed devices on boot, that's all I need, if -pm 1 makes the devices and doesn't break the card for others I'm more than happy to take that change, but I need a really good explanation how it won't break things or a few normal users to test it tell me it's okay.

Comment 20 Oliver Deppert 2013-03-07 19:00:43 UTC

Hi,

ok...sounds like a deal...lets make sure some guys here give a try on "-pm 1"....the only thing I've found about persistence mode is that:

http://timelordz.com/wiki/Nvidia-Settings#Persistence_Mode

so, for me not doing cuda at all, -pm 1 would be nice...

At the moment I'm running my system on -pm 1....as expected no problems up to now...

regards,
Oliver

Comment 21 Rick Farina (Zero_Chaos) gentoo-dev

2013-03-08 18:24:39 UTC

Some additional info

http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/

Comment 22 Zenitur 2015-03-09 03:52:33 UTC

Fixed in 331.xx - added nvidia-uvm driver for making CUDA works without X-Server.