Summary: | x11-drivers/nvidia-drivers - NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:1140) installed in this system is not supported by the 337.12 NVIDIA Linux driver release. | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Nuria Masip <nuriamp1988> |
Component: | Current packages | Assignee: | David Seifert <soap> |
Status: | RESOLVED WONTFIX | ||
Severity: | normal | CC: | amonakov+bugs.gentoo, cyberbat83, drunkenbatman, feiticeir0, main.haarp, mattsch, mva, pacho, rei4dan |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Errors from nvidia-drivers in message log
emerge --info Boot spam Error after removing nvidia-modprobe on boot |
Description
Nuria Masip
2014-04-19 22:56:07 UTC
Created attachment 375334 [details]
emerge --info
How do you expect a browser plugin to load kernel modules? Obviously it is not adobe-flash doing that, and it possibly is bumblebee that triggers this. But it isn't even that. Bumblebee is probably erroneously doing the same stupid thing over and over and over, fully expecting it to succeed the next time. Because what could be possibly wrong here? "The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:1140) installed in this system is not supported by the 337.12 NVIDIA Linux driver release." Oh, well, bumblebee couldn't have known about that. Managing which driver to install for which card isn't what it's good at. Opera and Flash are completely out of the loop then. Could you give the stable 331 branch and the "short-lived" 334 branch a go instead? I didn't mean to actually close this. Hello Jeroen, I already have stated in the forum link that bumblebee is working well on my system, also optirun/primusrun glxspheres work perfectly and logs show how the nvidia kernel module loads and unloads successfully, this is why I didn't point a finger at it. Needless to say, I have an Nvidia 620M and I have support for some more years to come. It is obvious that adobe-flash can't load kernel modules, it doesn't even have privileges, I'm sorry for that mistake. Anyways, this is what I have got done so far... 0.- Tried to emerge nvidia-drivers-334.21-r3, failed because it only works on linux d3.14. 1.- Switched to linux-3.12.13-gentoo and emerged them, still happens. 2.- Re-emerged bbswitch-0.8 and bumblebee-3.2.1 just in case, still the same bug. 3.- Went further back to nvidia-drivers-331.67 and the same error still appears. 4.- Repeated step 2. No change. 5.- Downloaded adobe-flash-11.2.202.346.ebuild from http://sources.gentoo.org/cgi-bin/viewvc.cgi/gentoo-x86/www-plugins/adobe-flash/adobe-flash-11.2.202.346.ebuild?hideattic=0&view=log to /usr/portage and updated the relevant Manifest signatures, installed and the same error keeps on going. 6.- Downgraded bumblebee-3.2.1 to bumblebee-3.1 and still the same :/ I'll go back to linux-3.14, bumblebee-3.2.1, adobe-flash-11.2.202.350 and nvidia-drivers-337.12 while I await new instructions. I'd like to add that "jhallward" and "cyberjun" on the forums have the same issue so there is something going on. Thank you. (In reply to Nuria Masip from comment #4) > Hello Jeroen, > > I already have stated in the forum link that [...] I don't see what the forums have to do with this bug report. I don't mind if you include a link, but I'm certainly not going to waste my time reading forums. > 0.- Tried to emerge nvidia-drivers-334.21-r3, failed because it only works > on linux d3.14. What does you mean? 334.21-r3 works with every Linux 3.* kernel from 2.6.9 and newer but /older than/ 3.14. > 1.- Switched to linux-3.12.13-gentoo and emerged them, still happens. > 2.- Re-emerged bbswitch-0.8 and bumblebee-3.2.1 just in case, still the same > bug. > 3.- Went further back to nvidia-drivers-331.67 and the same error still > appears. Do you mean the error I copied to the Summary, or something else? > 4.- Repeated step 2. No change. > 5.- Downloaded adobe-flash-11.2.202.346.ebuild from [...] Why are you still talking about adobe-flash? > 6.- Downgraded bumblebee-3.2.1 to bumblebee-3.1 and still the same :/ > > I'll go back to linux-3.14, bumblebee-3.2.1, adobe-flash-11.2.202.350 and > nvidia-drivers-337.12 while I await new instructions. It's nice to know there is a magical combination that works (in the strict understanding that adobe-flash has nothing in common with the others). > I'd like to add that "jhallward" and "cyberjun" on the forums have the same > issue so there is something going on. As I said, I don't care about forums. I'd rather see twenty duplicate bug reports with proper corroborating and dissenting information that efficiently narrows down the suspects. I'll CC the bumblebee maintainers as I don't see how nvidia-drivers is misbehaving here. The "NVRM: ..." is mystifying, unless bumblebee is doing something wicked with PCI IDs. (In reply to Jeroen Roovers from comment #5) > (In reply to Nuria Masip from comment #4) > > Hello Jeroen, > > > > I already have stated in the forum link that [...] > > I don't see what the forums have to do with this bug report. I don't mind if > you include a link, but I'm certainly not going to waste my time reading > forums. > no comment. > > 0.- Tried to emerge nvidia-drivers-334.21-r3, failed because it only works > > on linux d3.14. > > What does you mean? 334.21-r3 works with every Linux 3.* kernel from 2.6.9 > and newer but /older than/ 3.14. > It means that in order to test older nvidia drivers I did downgrade to the latest stable kernel. I guess you expect to be notified of every change I make in a test system. > > 1.- Switched to linux-3.12.13-gentoo and emerged them, still happens. > > 2.- Re-emerged bbswitch-0.8 and bumblebee-3.2.1 just in case, still the same > > bug. > > 3.- Went further back to nvidia-drivers-331.67 and the same error still > > appears. > > Do you mean the error I copied to the Summary, or something else? > For proof: With 334.21 I only get this on the message log: NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:1140) NVRM: installed in this system is not supported by the 334.21 NVRM: NVIDIA Linux driver release. Please see 'Appendix NVRM: A - Supported NVIDIA GPU Products' in this release's NVRM: README, available on the Linux driver download page NVRM: at www.nvidia.com. nvidia: probe of 0000:01:00.0 failed with error -1 NVRM: The NVIDIA probe routine failed for 1 device(s). NVRM: None of the NVIDIA graphics adapters were initialized! [drm] Module unloaded NVRM: NVIDIA init module failed! [...] ad infinitum. With 331.67 I get: NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:1140) NVRM: installed in this system is not supported by the 331.67 NVRM: NVIDIA Linux driver release. Please see 'Appendix NVRM: A - Supported NVIDIA GPU Products' in this release's NVRM: README, available on the Linux driver download page NVRM: at www.nvidia.com. nvidia: probe of 0000:01:00.0 failed with error -1 NVRM: The NVIDIA probe routine failed for 1 device(s). NVRM: None of the NVIDIA graphics adapters were initialized! [drm] Module unloaded NVRM: NVIDIA init module failed! [...] ad infinitum. It has a difference, It doesn't print the stack trace, I don't know if it is because of a different nvidia-driver o a different kernel version. > > 4.- Repeated step 2. No change. > > 5.- Downloaded adobe-flash-11.2.202.346.ebuild from [...] > > Why are you still talking about adobe-flash? I followed comment #1 in the forum post. > > 6.- Downgraded bumblebee-3.2.1 to bumblebee-3.1 and still the same :/ > > > > I'll go back to linux-3.14, bumblebee-3.2.1, adobe-flash-11.2.202.350 and > > nvidia-drivers-337.12 while I await new instructions. > > It's nice to know there is a magical combination that works (in the strict > understanding that adobe-flash has nothing in common with the others). > I didn't mean that it worked, just that I'm going back to the "emerge --info" system and I'm lending a hand to test any other combinations any dev would ask. Anyway, disabling HW acceleration in adobe-flash makes this adobe-flash unrelated issue not happen again in my day-to-day usage. > > I'd like to add that "jhallward" and "cyberjun" on the forums have the same > > issue so there is something going on. > > As I said, I don't care about forums. I'd rather see twenty duplicate bug > reports with proper corroborating and dissenting information that > efficiently narrows down the suspects. > And I think I'd rather like a proper discussion in a forum and after some confirmations, corrections or ideas, placing a proper bug report. Also, why would anyone think that a bugzilla report would be more efficient way solving newlyfound problems than any other method of communication? I'll quote gentoo's bug reporting guidelines: "A bug tracker is used for technical reports and chitchat should be avoided. Keep them in the forums, IRC or mailing-lists" > I'll CC the bumblebee maintainers as I don't see how nvidia-drivers is > misbehaving here. The "NVRM: ..." is mystifying, unless bumblebee is doing > something wicked with PCI IDs. Thank you. BTW, forums comment #6 and #7 think that the problem may be some code inserting the module without properly initializing bbswitch. udev? bbswitch? ... FWIW, I've provided an explanation in bug 507098, comment 13. In short, making nvidia-modprobe suid "breaks" bumblebee-using systems because there the kernel module is supposed to only be loaded by the bumblebee daemon; suid nvidia-modprobe circumvents this. My suggestion is to introduce a USE flag to install nvidia-modprobe non-suid (I suppose it could be appreciated by some non-bumblebee users as well). Regarding this particular bug report, NVRM message is due to nvidia.ko not being able to bring up the card after it's been powered down by bbswitch; it's old behavior, and I assume not a bug. High cpu usage from repeating attempts to load the kernel module _is_ a bug imo though; afaik you also get such behavior from running 'modprobe nvidia' on a gentoo system without an nvidia gpu. At the moment I don't know how the endless retries are triggered. Since =nvidia-drivers-331.49-r1, where nvidia-modprobe was introduced, loading a page with Flash somehow triggers nvidia.ko insertion, which fails for obvious reasons (nvidia GPU is offline). This either begins an endless loop of modprobes, or hangs with the modprobe process in "D" state (as well as some other processes, including nvidia-smi). A workaround is removing /opt/bin/nvidia-modprobe (reverting the changes introduced in bug #505092). So far I haven't seen any side effects of this binary being gone (OpenGL, CUDA, all working as before). Disabling nvidia-modprobe installation by a USE flag sounds like a good idea, because then it could block x11-misc/bumblebee and/or sys-power/bbswitch. (In reply to Martin Sekera from comment #8) > Since =nvidia-drivers-331.49-r1, where nvidia-modprobe was introduced, > loading a page with Flash somehow triggers nvidia.ko insertion, which fails Somehow triggers? What triggers it and how do you stop it from doing that? I bet it's everything to do with using the current OpenGL/WebGL/GLES libraries[1], which of course are Nvidia's, which trigger the modprobe. You can't make that switch without also swapping out the libraries. > for obvious reasons (nvidia GPU is offline). This either begins an endless > loop of modprobes, or hangs with the modprobe process in "D" state (as well > as some other processes, including nvidia-smi). > > A workaround is removing /opt/bin/nvidia-modprobe (reverting the changes > introduced in bug #505092). So far I haven't seen any side effects of this > binary being gone (OpenGL, CUDA, all working as before). Right, so that's a "workaround". > Disabling nvidia-modprobe installation by a USE flag sounds like a good > idea, because then it could block x11-misc/bumblebee and/or > sys-power/bbswitch. The "workaround" is now the fix? [1] Not necessarily Adobe Flash, because at the same time that's loaded, the web page you're visiting might be trying to check for HTML5 video support. (In reply to Jeroen Roovers from comment #9) > (In reply to Martin Sekera from comment #8) > > Since =nvidia-drivers-331.49-r1, where nvidia-modprobe was introduced, > > loading a page with Flash somehow triggers nvidia.ko insertion, which fails > > Somehow triggers? What triggers it and how do you stop it from doing that? > > I bet it's everything to do with using the current OpenGL/WebGL/GLES > libraries[1], which of course are Nvidia's, which trigger the modprobe. You > can't make that switch without also swapping out the libraries. > > > for obvious reasons (nvidia GPU is offline). This either begins an endless > > loop of modprobes, or hangs with the modprobe process in "D" state (as well > > as some other processes, including nvidia-smi). > > Couldn't we make a shell script that is called nvidia-modprobe and in it conditionally call the nvidia-modprobe binary if the nvidia GPU is not offline (caused by bbswitch) thereby avoiding the endless calls and filling up the syslog? (In reply to Jeroen Roovers from comment #9) > (In reply to Martin Sekera from comment #8) > > Since =nvidia-drivers-331.49-r1, where nvidia-modprobe was introduced, > > loading a page with Flash somehow triggers nvidia.ko insertion, which fails > > Somehow triggers? What triggers it and how do you stop it from doing that? Is my explanation in bug 507098, comment 13 somehow not sufficient? > The "workaround" is now the fix? The minimally invasive fix is to introduce USE=suid to let users opt-out of installing nvidia-modprobe suid, ewarn for USE="+uvm -suid", and let bumblebee depend on nvidia-drivers[-suid] in future. (In reply to Alexander Monakov from comment #11) > Is my explanation in bug 507098, comment 13 somehow not sufficient? That's a duplicate of this bug report. Why do you expect me to read it? (In reply to Jeroen Roovers from comment #12) > (In reply to Alexander Monakov from comment #11) > > Is my explanation in bug 507098, comment 13 somehow not sufficient? > > That's a duplicate of this bug report. Why do you expect me to read it? OK, it's not, but you probably get what I mean. (In reply to Alexander Monakov from comment #11) > The minimally invasive fix is to introduce USE=suid to let users opt-out of > installing nvidia-modprobe suid, ewarn for USE="+uvm -suid", and let > bumblebee depend on nvidia-drivers[-suid] in future. Let's assume that a) this isn't going to happen and that b) you actually need to fix bumblebee since it is obviously broken. If you use Nvidia's libraries, then you can expect it to load its own libraries in support. If you do not want that to happen, you'll have to figure out a way to point whatever is loaded under X to use the generic media-libs/mesa libraries instead. (In reply to Jeroen Roovers from comment #13) > (In reply to Jeroen Roovers from comment #12) > > (In reply to Alexander Monakov from comment #11) > > > Is my explanation in bug 507098, comment 13 somehow not sufficient? > > > > That's a duplicate of this bug report. Why do you expect me to read it? > > OK, it's not, but you probably get what I mean. Note that I was linking to a specific comment that provides new information, not the bug report in general. I expected you to at least read that comment in isolation. And as for the "duplicate" part, in my second comment there I question why it was closed as a duplicate, because it really doesn't look to me as such. (In reply to Jeroen Roovers from comment #14) > (In reply to Alexander Monakov from comment #11) > > The minimally invasive fix is to introduce USE=suid to let users opt-out of > > installing nvidia-modprobe suid, ewarn for USE="+uvm -suid", and let > > bumblebee depend on nvidia-drivers[-suid] in future. > > Let's assume that a) this isn't going to happen and that b) you actually > need to fix bumblebee since it is obviously broken. If you use Nvidia's > libraries, then you can expect it to load its own libraries in support. If > you do not want that to happen, you'll have to figure out a way to point > whatever is loaded under X to use the generic media-libs/mesa libraries > instead. I'm confused. When I say "please adjust modprobe.d for nvidia-uvm" you quickly roll out a new ebuild, and when then I follow up to say, "oops, that breaks bumblebee" you quickly re-roll it again. But here you're acting so uncooperatively, why did the mood change so much? Can you please provide some perspective on why USE=suid isn't going to happen? There are a few packages already with such use flag. (In reply to Alexander Monakov from comment #16) > Can you please provide some perspective on why USE=suid isn't going to > happen? 1) Installing nvidia-modprobe setuid is what Nvidia says we should do[1]. 2) Without setuid, CUDA fails for non-privileged users (bug #505092), and this was corrected in the affected ebuilds. [1] nvidia-modprobe(1): "When installed by nvidia-installer , nvidia-modprobe is installed setuid root." *** Bug 509418 has been marked as a duplicate of this bug. *** (In reply to Nuria Masip from comment #0) > Created attachment 375332 [details] > Errors from nvidia-drivers in message log > My system is: amd64, gentoo-sources-3.12.20, x11-drivers/nvidia-drivers-334.21-r3, notebook with optimus: I made similar errors went away (after some googling) doing theese things: 1) Turning on CONFIG_RCU_FAST_NO_HZ and CONFIG_NO_HZ_IDLE in kernel 2) Adding 'blacklist nvidia' to /etc/modprobe.d/blacklist.conf I suggest that 2) is the decision, but I don't want to recompile my kernel now to test that it's not a 1) . And now my problem is that after loading some page with flash (after turning off flash plugin there is no such error) my note turns on nvidia card (nvidia modules is loaded without bbswitch, and discrete card indication on my notebook is turned on) and doesn't turn it off till restarting of bumblebee. Remaining bug has gone away with gentoo-sources-3.14.4 and nvidia-drivers-337.19 (In reply to cyberbat from comment #20) > Remaining bug has gone away with gentoo-sources-3.14.4 and > nvidia-drivers-337.19 I'm sorry, bug seamed gone away because nvidia-drivers beta just crashed when loading. So bug with nvidia.ko being loaded beside bbswitch and doesn't unloaded till restarting bumblebee. NB: it's exactly caused by only adobe-flash. Compeletey disabling flash plugin in browsers helps. HTML5 video doesn't turn on nvidia card, flash only. FWIW, in a machine with no Nvidia graphics card, calling nvidia-modprobe triggers this behavior. # nvidia-modprobe modprobe: ERROR: could not insert 'nvidia': No such device But it keeps loading the nvidia driver. I get these two messages repeated multiple times a second in my logs: ... kernel: NVRM: NVIDIA init module failed! ... kernel: NVRM: No NVIDIA graphics adapter found! pkill -9 systemd-udevd Stops the endless loop. I have nvidia-drivers-337.19 and kernel 3.13.4 installed. Nvidia driver is not necessary in this machine, so it's not critical for me. This started happening after May 22nd when I did some major software updates. (In reply to Cengiz Gunay from comment #22) > FWIW, in a machine with no Nvidia graphics card, calling nvidia-modprobe > triggers this behavior. > > # nvidia-modprobe > modprobe: ERROR: could not insert 'nvidia': No such device > > But it keeps loading the nvidia driver. I get these two messages repeated > multiple times a second in my logs: > ... kernel: NVRM: NVIDIA init module failed! > ... kernel: NVRM: No NVIDIA graphics adapter found! > > pkill -9 systemd-udevd > Stops the endless loop. So you're saying that: 1) You run nvidia-modprobe (ignoring for the moment that you shouldn't ever need to do that on a system with no Nvidia graphics hardware). 2) systemd-udevd then basically takes over and keeps trying. If you don't need nvidia-modprobe then you simply shouldn't run it in the first place, and as far as I am aware, a properly configured system should never trigger nvidia-modprobe automatically. A proper configuration would include having none of the Nvidia library symlinks in /usr/LIBDIR/ installed through eselect opengl. Having the Nvidia libraries linked up there _would_ trigger nvidia-modprobe. Far more interesting is why systemd-udevd would try to load the module (repeatedly) despite the obvious failure. I am aware that nvidia-modprobe might be trying to do too much and that it currently doesn't adhere to a static configuration that an admin might set up to inhibit some of its behaviour. In the 337.25 HTML documentation, optimus.html explains that "[i]f the setuid root nvidia-modprobe(1) utility is installed (the default when the driver is installed from .run file), this should be handled automatically." It goes on to explain that on some systems a bad ACPI configuration would throw a spanner in the works, so that's what you all might be looking at here. As far as I can tell from the documentation, on such systems both devices should always be visible so that the drivers can manage them. Having the BIOS "hide" the Nvidia graphics device is apparently a bad thing. commonproblems.html merely mentions "[s]uch problems are typically beyond the control of the NVIDIA driver, which relies on proper cooperation of ACPI and the System BIOS to retrieve important information about the GPU, including the Video BIOS." Same problem here on the 3.12 kernel. Created attachment 382180 [details]
Boot spam
Seen at boot
Created attachment 382182 [details]
Error after removing nvidia-modprobe on boot
This started up for me on an optimus machine after a recent update on K3.15.8 + nvidia 3.40.24 and bumblebee, though it triggers well before any flash is played -- the console was being spammed with the contents of "boot_spam.txt" before GDM had even loaded. After deleting /opt/bin/nvidia "systemd-udevd[4067]: Failed to apply ACL on /dev/dri/card1: Invalid argument" as can be seen in "reboot-nvidia-modprobe.txt" However, the nvidia card is powered on. Trying to manually turn it off via 'tee /proc/acpi/bbswitch <<<OFF' shows the following in the log: bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF If the bumblebee service is restarted, the card is then powered off. |