When running chromium, pymol or some other programms using opengl, from time to time the driver crashes?. The apps are freezeing and I see this in my dmesg: [ 536.756263] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 [ 536.756268] IP: [<ffffffff8158ff97>] iommu_no_mapping+0x7/0x100 [ 536.756273] PGD 0 [ 536.756275] Oops: 0000 [#1] PREEMPT SMP [ 536.756276] Modules linked in: ip6table_mangle nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_amanda vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) snd_hda_codec_hdmi snd_hda_codec_realtek nvidia(PO) snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc [ 536.756292] CPU: 6 PID: 5011 Comm: chrome Tainted: P AW O 3.12.16-lh #1 [ 536.756293] Hardware name: System manufacturer System Product Name/P8P67 REV 3.1, BIOS 3602 11/01/2012 [ 536.756294] task: ffff8803a54987b0 ti: ffff8803e2dde000 task.ti: ffff8803e2dde000 [ 536.756295] RIP: 0010:[<ffffffff8158ff97>] [<ffffffff8158ff97>] iommu_no_mapping+0x7/0x100 [ 536.756297] RSP: 0018:ffff8803e2ddfd28 EFLAGS: 00010246 [ 536.756298] RAX: ffffffff81590090 RBX: 0000000000000000 RCX: 0000000000000000 [ 536.756299] RDX: 0000000000000001 RSI: ffff8803a50b7218 RDI: 0000000000000000 [ 536.756299] RBP: ffff8803a50b7280 R08: 0000000000000000 R09: 00000000d6e87000 [ 536.756300] R10: ffff880080000000 R11: 0000000000000001 R12: ffff8803a50b7218 [ 536.756301] R13: 000077ff80000000 R14: 0000000000000000 R15: ffff8803a50b7200 [ 536.756302] FS: 00002b82fa0a0800(0000) GS:ffff88041ed80000(0000) knlGS:0000000000000000 [ 536.756303] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 536.756304] CR2: 0000000000000088 CR3: 000000000e971000 CR4: 00000000000407e0 [ 536.756304] Stack: [ 536.756305] 0000000000000000 ffff8803a50b7280 ffff8803a50b7218 ffffffff815900a5 [ 536.756307] 0000000000000000 ffff8803a50b7280 ffffea0000000000 000077ff80000000 [ 536.756308] 0000000000000000 ffff8803a50b7200 ffffffffa05e3a3d 0000000000000098 [ 536.756309] Call Trace: [ 536.756312] [<ffffffff815900a5>] ? intel_unmap_sg+0x15/0x120 [ 536.756349] [<ffffffffa05e3a3d>] ? nv_free_system_pages+0xad/0x3a0 [nvidia] [ 536.756369] [<ffffffffa05dda6d>] ? nv_free_pages+0xbd/0xd0 [nvidia] [ 536.756389] [<ffffffffa05dddae>] ? nvidia_close+0x32e/0x440 [nvidia] [ 536.756409] [<ffffffffa05e67df>] ? nvidia_frontend_close+0x3f/0x90 [nvidia] [ 536.756410] [<ffffffff811b4655>] ? __fput+0xb5/0x200 [ 536.756413] [<ffffffff810e9c5c>] ? task_work_run+0xac/0xd0 [ 536.756415] [<ffffffff810d27aa>] ? do_exit+0x77a/0xa60 [ 536.756417] [<ffffffff8113fdcb>] ? __secure_computing+0x6b/0x240 [ 536.756418] [<ffffffff810d2af4>] ? do_group_exit+0x34/0xa0 [ 536.756420] [<ffffffff810d2b6b>] ? SyS_exit_group+0xb/0x10 [ 536.756422] [<ffffffff8173531b>] ? tracesys+0xdd/0xe2 [ 536.756423] Code: fa 6f 00 48 89 de e8 29 c9 c0 ff 5b 89 e8 5d 41 5c 41 5d c3 b8 f4 ff ff ff e9 46 ff ff ff e8 9f ec 19 00 90 41 54 55 53 48 89 fb <48> 81 bf 88 00 00 00 80 54 9c 81 0f 85 c8 00 00 00 48 8b 87 c0 [ 536.756437] RIP [<ffffffff8158ff97>] iommu_no_mapping+0x7/0x100 [ 536.756439] RSP <ffff8803e2ddfd28> [ 536.756439] CR2: 0000000000000088 [ 536.756442] ---[ end trace 2f61fa8ceb94d612 ]--- [ 548.919276] Fixing recursive fault but reboot is needed! This happens with any kernel sources including gentoo-sources. # lspci -s 01:00.0 -vvv 01:00.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 4000] (rev a3) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 0780 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 63 Region 0: Memory at f4000000 (32-bit, non-prefetchable) [size=32M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=128M] Region 3: Memory at e8000000 (64-bit, prefetchable) [size=64M] Region 5: I/O ports at e000 [size=128] [virtual] Expansion ROM at f6000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee003d8 Data: 0000 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <256ns, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nvidia Kernel modules: nvidia
Run nvidia-bug-report.sh and report it upstream.
(In reply to Jeroen Roovers from comment #1) > Run nvidia-bug-report.sh and report it upstream. Hopefully you don't plan to stabilize it. This driver is broken.
(In reply to Justin Lecher from comment #2) > Hopefully you don't plan to stabilize it. This driver is broken. 14 Mar 2014; Jeroen Roovers <jer@gentoo.org> nvidia-drivers-334.21.ebuild: Stable for AMD64 x86 too. I would expect more bug reports in that case.
This happens reproducibly when ever I close chromium. But there is no segf or any other dump of information.
I fire up chromium, go to html5test.com, the test doesn't succeed means it hangs/runs forever, I close chromium and get this.
What happens when you set intel_iommu=off on the kernel command line?
So far completely disabling iommu code in the kernel helps here. But I will observe this further. This has also ben reported to NVIDIA
*** Bug 507452 has been marked as a duplicate of this bug. ***
Looks exactly like bug #433102 and bug #410631.
*** This bug has been marked as a duplicate of bug 433102 ***