Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 433102 - x11-drivers/nvidia-drivers - CONFIG_{AMD,INTEL}_IOMMU causes kernel module to fail?
Summary: x11-drivers/nvidia-drivers - CONFIG_{AMD,INTEL}_IOMMU causes kernel module to...
Status: RESOLVED CANTFIX
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: David Seifert
URL:
Whiteboard:
Keywords:
: 507010 507452 507592 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-08-28 21:25 UTC by Tango
Modified: 2020-11-18 18:31 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (file_433102.txt,4.90 KB, text/plain)
2012-08-28 21:25 UTC, Tango
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tango 2012-08-28 21:25:05 UTC
Created attachment 322482 [details]
emerge --info

I have been triing sense the gentoo-sources-3.x.x kernels became stable to install any working kernel.  I have successfully installed and booted 3 seperate 3.x.x-kernels.  All have failed when I attempt to run startx.

kernel-3.4.9 (DEMSG)
[    8.368285] nvidia: module license 'NVIDIA' taints kernel.
[    8.368289] Disabling lock debugging due to kernel taint
[    8.382108] vgaarb: device changed decodes: PCI:0000:08:00.0,olddecodes=io+mem,decodes=none:owns=io+mem
NVRM: loading NVIDIA UNIX x86_64 Kernel Module  295.71  Thu Aug  2 19:22:08 PDT 2012
AMD-Vi: Event logged [IO_PAGE_FAULT device=08:00.0 domain=0x0016 address=0x000000041a386200 flags=0x0010]

(snipped 5 additional messages just like the one above 

[   17.341563] NVRM: RmInitAdapter failed! (0x27:0x38:1190)
[   17.341573] NVRM: rm_init_adapter(0) failed


/var/log/Xorg.0.log Reports vaious errors but the mai one was this.
[   910.102] (EE) NVIDIA(0): Failed to initialize the NVIDIA GPU at PCI:8:0:0.  Please
[   910.102] (EE) NVIDIA(0):     check your system's kernel log for additional error
[   910.102] (EE) NVIDIA(0):     messages and refer to Chapter 8: Common Problems in the
[   910.102] (EE) NVIDIA(0):     README for additional information.
[   910.102] (EE) NVIDIA(0): Failed to initialize the NVIDIA graphics device!
[   910.102] (II) UnloadModule: "nvidia"
[   910.102] (II) UnloadSubModule: "wfb"
[   910.102] (II) UnloadSubModule: "fb"
[   910.102] (EE) Screen(s) found, but none have a usable configuration.
[   910.102] 
Fatal server error:
[   910.102] no screens found
[   910.102]

Way early in the boot process dmesg was also complaing that my IOMMU BIOS setting was disabled.  I verified it was in fact turned on and didn't give it much more thought as my working 2.6.37-gentoo-r4 was aslo giving me the same message.  I have 16G's of ram and system-resources reports 15.7G's so I ignored the warrning.

No hardware changes kernels configured almost identical in as far as possible.
2.6.37-gentoo-r4, boots very quick all devices load properly (including nvidia) and X starts without errors

3.4.9-gnetoo, boots very quick all devices load properly (including nvidia) and X constantly fails to start.  As soon as I attempt to start X the above dmesg's appear at the bottom of the list.

After a week of bannging my head against the wall I was able to work around the problem by setting "amd_iommu=off" in the grub command line.

Not sure if this is a kernel bug or an nvidia-drivers bug or what affect if any "amd_iommu=off" will have on my system and performance.

Documentaion for iommu in the kernel seems poor and vague.
Comment 1 Doug Goldstein (RETIRED) gentoo-dev 2012-09-04 00:49:44 UTC
What happens when you disable the IOMMU in the BIOS? The off state maybe called Direct Access. It seems to me that Linux is doing something with the IOMMU setup giving the graphics card driver a range to live in but the NVidia driver maps parts of the card outside of that. I believe Linux now uses AMD's IOMMU to perform remapping and such like it used to with the GART so that's where the theory is coming from as well as your log where the VGA Arbitor is doing some mucking with io+mem which leads me to believe its using the IOMMU instead of GART.

What graphics card do you have?
What motherboard and motherboard chipset do you have?
What processor do you have?

One of my Gentoo machines is an AMD 780 with a Phenom II processor and a GeForce 9800 but I'm not seeing this, though I'm not sure if that setup has an IOMMU or if its on.
Comment 2 Tango 2012-09-04 01:50:42 UTC
(In reply to comment #1)
> What happens when you disable the IOMMU in the BIOS? The off state maybe
> called Direct Access. It seems to me that Linux is doing something with the
> IOMMU setup giving the graphics card driver a range to live in but the
> NVidia driver maps parts of the card outside of that. I believe Linux now
> uses AMD's IOMMU to perform remapping and such like it used to with the GART
> so that's where the theory is coming from as well as your log where the VGA
> Arbitor is doing some mucking with io+mem which leads me to believe its
> using the IOMMU instead of GART.

Its called IOMMU.  Interestingly enough this also solves the problem.  After removing the amd_iommu=off from the boot command line in grub and disabling the IOMMU in the BIOS.  The system booted prefectly and X starts without complaints.  Also the system reports the correct amount of installed RAM of 15.7GB.  I have a total of 16GB on board.

I will run in this set up as long as the system remains stable and see what affects, if any it has on the system.
> 
> What graphics card do you have?

MSI GeForce GTX 460 with 1024MB RAM
> What motherboard and motherboard chipset do you have?

MSI 890FXA-GD70
> What processor do you have?

AMD Phenom II X6 1100T Black Edition CPU
> 
> One of my Gentoo machines is an AMD 780 with a Phenom II processor and a
> GeForce 9800 but I'm not seeing this, though I'm not sure if that setup has
> an IOMMU or if its on.

I have also reported this upstream at https://bugzilla.kernel.org/show_bug.cgi?id=42782

I think I reported it in the wrong kernel component, but I could not find the first bug I located there, that provided me the insite and my first solution for wroking around the problem.  This problem seems to cropping up in different kernel components.
Comment 3 Martin Jansa 2012-11-08 05:56:14 UTC
(In reply to comment #2)
> (In reply to comment #1)
> > What happens when you disable the IOMMU in the BIOS? The off state maybe
> > called Direct Access. It seems to me that Linux is doing something with the
> > IOMMU setup giving the graphics card driver a range to live in but the
> > NVidia driver maps parts of the card outside of that. I believe Linux now
> > uses AMD's IOMMU to perform remapping and such like it used to with the GART
> > so that's where the theory is coming from as well as your log where the VGA
> > Arbitor is doing some mucking with io+mem which leads me to believe its
> > using the IOMMU instead of GART.
> 
> Its called IOMMU.  Interestingly enough this also solves the problem.  After
> removing the amd_iommu=off from the boot command line in grub and disabling
> the IOMMU in the BIOS.  The system booted prefectly and X starts without
> complaints.  Also the system reports the correct amount of installed RAM of
> 15.7GB.  I have a total of 16GB on board.

I have similar problem, disabling IOMMU in BIOS wasn't enough for me, iommu=soft on kernel cmdline work for me, otherwise I get:
(EE) NVIDIA(GPU-0): WAIT: (E, 0, 0x837d)
while starting xserver

jama linux # grep -i IOMMU .config
CONFIG_GART_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
CONFIG_IOMMU_SUPPORT=y
CONFIG_AMD_IOMMU=y
# CONFIG_AMD_IOMMU_STATS is not set
# CONFIG_INTEL_IOMMU is not set
# CONFIG_IOMMU_DEBUG is not set
# CONFIG_IOMMU_STRESS is not set

> I will run in this set up as long as the system remains stable and see what
> affects, if any it has on the system.
> > 
> > What graphics card do you have?
> 
> MSI GeForce GTX 460 with 1024MB RAM

GTX260 

> > What motherboard and motherboard chipset do you have?

ASUS M5A99X EVO

> MSI 890FXA-GD70
> > What processor do you have?
> 
> AMD Phenom II X6 1100T Black Edition CPU

AMD FX-8120

It worked fine for me with previous motherboard

ASUS M3A32-MVP DELUXE


> > One of my Gentoo machines is an AMD 780 with a Phenom II processor and a
> > GeForce 9800 but I'm not seeing this, though I'm not sure if that setup has
> > an IOMMU or if its on.
> 
> I have also reported this upstream at
> https://bugzilla.kernel.org/show_bug.cgi?id=42782
> 
> I think I reported it in the wrong kernel component, but I could not find
> the first bug I located there, that provided me the insite and my first
> solution for wroking around the problem.  This problem seems to cropping up
> in different kernel components.
Comment 4 Chí-Thanh Christopher Nguyễn gentoo-dev 2013-05-04 17:15:38 UTC
780 does not support IOMMU. For Socket AM3/3+, 890FX, 970, 990X and 990FX support IOMMU as do the server and FM1/2 chipsets.
Comment 5 Reuben Martin 2013-07-07 04:24:24 UTC
I have the same MB, and wanted to make note that you have to make sure you are using BIOS version 1.8 for IOMMU to work correctly. It is broken on all BIOS releases after that.

If you have a later version of BIOS installed, you might get it to boot if you set iommu=soft in the kernel parameters.
Comment 6 Reuben Martin 2013-07-08 00:21:43 UTC
I got it working.

Add the following to your kernel parameters:

iommu=1 iommu=pt iommu=memmap=4

From dmesg:
[    0.714386] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[    0.714657] AMD-Vi: Initialized for Passthrough Mode


Nvidia module works as expected.
Comment 7 Reuben Martin 2013-07-27 17:30:58 UTC
Starting with kernel 3.10, you can manually override the bogus settings given by the firmware table.

In my case, adding ivrs_ioapic[6]=00:14.0 fixed it, but the mapping numbers are different for each version of firware, so you may have to dump the iommu mappings to figure out what needs to be changes (add amd_iommu_dump to the kernel options)

This change wrapped up the last bits of missing IOMMU functionality by enabling interrupt remapping:

[    0.719830] AMD-Vi: Interrupt remapping enabled

You can read more details about it here: https://bbs.archlinux.org/viewtopic.php?id=163102

Make sure you keep the iommu=pt kernel option in place. That keeps the nvidia-driver from self destructing.
Comment 8 Jeroen Roovers (RETIRED) gentoo-dev 2014-04-12 17:27:44 UTC
*** Bug 507010 has been marked as a duplicate of this bug. ***
Comment 9 Jeroen Roovers (RETIRED) gentoo-dev 2014-04-12 17:28:08 UTC
*** Bug 507452 has been marked as a duplicate of this bug. ***
Comment 10 Jeroen Roovers (RETIRED) gentoo-dev 2014-04-13 22:54:12 UTC
*** Bug 507592 has been marked as a duplicate of this bug. ***