Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 455432 - x11-drivers/xf86-video-ati - Repeated GPU lockup and crash
Summary: x11-drivers/xf86-video-ati - Repeated GPU lockup and crash
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo X packagers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-04 13:48 UTC by Brendan Jurd
Modified: 2013-02-18 12:10 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Excerpt from /var/log/messages during a series of GPU lockups (messages-radeon-lockups,17.52 KB, text/plain)
2013-02-04 13:48 UTC, Brendan Jurd
Details
Output of emerge --info (emerge.info,14.66 KB, text/plain)
2013-02-04 13:49 UTC, Brendan Jurd
Details
Current copy of /usr/src/linux/.config for 3.8.0-rc6 (kernel.config,69.67 KB, text/plain)
2013-02-04 13:53 UTC, Brendan Jurd
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Brendan Jurd 2013-02-04 13:48:33 UTC
Created attachment 337912 [details]
Excerpt from /var/log/messages during a series of GPU lockups

I have a Radeon HD 4850 (RV770), and something goes very badly wrong when I try to use X.

I believe the problems started when I switched from ati-drivers to xf86-video-ati, but at this point I have spent so much time emerging various different versions of things, I can't remember when it began.

The problems have been observed with about four different major kernel versions and both XFCE and KDE.  It pains me to tell you that the video card works fine under Windows.

The symptoms are:

1) I start up /etc/init.d/xdm.  KDM or Slim comes up okay.
2) I enter my username and password.
3) Sometime during the login process, the screen freezes up -- the mouse cursor usually will still move but clicking has no effect and the system doesn't seem to respond to keyboard events.
4) A few seconds later, the screen goes black.
5) A few seconds later, I get "no signal" on my monitor.
6) A few seconds after that, the monitor wakes back up.
7) Sometimes the screen is working for a second or two, but most often it comes back up already frozen, and we repeat from (3) until ...
8) this whole thing loops through about three or four freezes, and then it gets to a point where it just doesn't come back up at all.  When this happens, I can't even Ctrl-Alt-F1 to get out to the TTY, and the only way to recover the system is to SSH in and SIGKILL the X process.

I have attached an excerpt of what /var/log/messages looks like while this plays out.  Quite a few other people have reported bugs with similar error output, but the fixes for those problems do not seem to have solved my problem.  It seems that the messages describe a common set of symptoms which can be triggered by a variety of different bugs.

Upgrading to later kernel versions (3.7 and 3.8) seems to have made the behaviour more consistent -- that is, it fails every time instead of just most times.

I'm not really sure where I should be reporting this bug to (freedesktop?  kernel?) so I thought I would begin here.  Any guidance on where to direct this, or how to diagnose further, would be greatly appreciated.

lspci output follows:

02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV770 [Radeon HD 4850] (prog-if 00 [VGA controller])
	Subsystem: Giga-byte Technology Device 21b4
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 43
	Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M]
	Region 2: Memory at dfff0000 (64-bit, non-prefetchable) [size=64K]
	Region 4: I/O ports at e800 [size=256]
	Expansion ROM at dffc0000 [disabled] [size=128K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
		Address: 00000000fee0300c  Data: 41a1
	Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
	Kernel driver in use: radeon


Thank you for your time.
Comment 1 Brendan Jurd 2013-02-04 13:49:22 UTC
Created attachment 337916 [details]
Output of emerge --info
Comment 2 Brendan Jurd 2013-02-04 13:53:21 UTC
Created attachment 337920 [details]
Current copy of /usr/src/linux/.config for 3.8.0-rc6
Comment 3 Chí-Thanh Christopher Nguyễn gentoo-dev 2013-02-04 20:42:07 UTC
Please also provide your Xorg.0.log
You have vesafb enabled in your kernel, does it become active according to dmesg? If yes, this can cause problems.
Comment 4 Brendan Jurd 2013-02-05 01:29:15 UTC
(In reply to comment #3)
> Please also provide your Xorg.0.log
> You have vesafb enabled in your kernel, does it become active according to
> dmesg? If yes, this can cause problems.

Hi Chí-Thanh,

You picked it -- I disabled VESA framebuffer and my problems disappeared.  I haven't seen a single GPU lockup since making the change.

If vesafb can interfere with radeon this severely, maybe the two options should block each other in the kernel?

In any case, thanks very much for your help.
Comment 5 Chí-Thanh Christopher Nguyễn gentoo-dev 2013-02-05 03:35:17 UTC
The Xorg configuration guide already recommends disabling all framebuffer drivers, including vesafb.
http://www.gentoo.org/doc/en/xorg-config.xml

In principle, vesafb handover should ensure that radeon KMS smoothly takes over, but in practice that sometimes does not work properly. You can report this at https://bugs.freedesktop.org/ if you wish, but I don't know whether there is interest upstream for fixing this.
Comment 6 Brendan Jurd 2013-02-05 04:17:29 UTC
(In reply to comment #5)
> The Xorg configuration guide already recommends disabling all framebuffer
> drivers, including vesafb.
> http://www.gentoo.org/doc/en/xorg-config.xml

Yeah, the documentation is right, that was my fault.  I should have read more carefully when configuring the kernel.  The guide could maybe be a little bit more explicit and specifically call out VESA (and, I guess, UVESA) by name as something you need to disable.  It does say "VGA" which I suppose includes those two.  But given how severe and perplexing the failure can be, it might be reasonable to make a bigger fuss about it.

> 
> In principle, vesafb handover should ensure that radeon KMS smoothly takes
> over, but in practice that sometimes does not work properly. You can report
> this at https://bugs.freedesktop.org/ if you wish, but I don't know whether
> there is interest upstream for fixing this.

I don't think I'll bother them about it.  Thank you again for your help, and sorry for the noise.