Created attachment 337912 [details] Excerpt from /var/log/messages during a series of GPU lockups I have a Radeon HD 4850 (RV770), and something goes very badly wrong when I try to use X. I believe the problems started when I switched from ati-drivers to xf86-video-ati, but at this point I have spent so much time emerging various different versions of things, I can't remember when it began. The problems have been observed with about four different major kernel versions and both XFCE and KDE. It pains me to tell you that the video card works fine under Windows. The symptoms are: 1) I start up /etc/init.d/xdm. KDM or Slim comes up okay. 2) I enter my username and password. 3) Sometime during the login process, the screen freezes up -- the mouse cursor usually will still move but clicking has no effect and the system doesn't seem to respond to keyboard events. 4) A few seconds later, the screen goes black. 5) A few seconds later, I get "no signal" on my monitor. 6) A few seconds after that, the monitor wakes back up. 7) Sometimes the screen is working for a second or two, but most often it comes back up already frozen, and we repeat from (3) until ... 8) this whole thing loops through about three or four freezes, and then it gets to a point where it just doesn't come back up at all. When this happens, I can't even Ctrl-Alt-F1 to get out to the TTY, and the only way to recover the system is to SSH in and SIGKILL the X process. I have attached an excerpt of what /var/log/messages looks like while this plays out. Quite a few other people have reported bugs with similar error output, but the fixes for those problems do not seem to have solved my problem. It seems that the messages describe a common set of symptoms which can be triggered by a variety of different bugs. Upgrading to later kernel versions (3.7 and 3.8) seems to have made the behaviour more consistent -- that is, it fails every time instead of just most times. I'm not really sure where I should be reporting this bug to (freedesktop? kernel?) so I thought I would begin here. Any guidance on where to direct this, or how to diagnose further, would be greatly appreciated. lspci output follows: 02:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI RV770 [Radeon HD 4850] (prog-if 00 [VGA controller]) Subsystem: Giga-byte Technology Device 21b4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 43 Region 0: Memory at c0000000 (64-bit, prefetchable) [size=256M] Region 2: Memory at dfff0000 (64-bit, non-prefetchable) [size=64K] Region 4: I/O ports at e800 [size=256] Expansion ROM at dffc0000 [disabled] [size=128K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0300c Data: 41a1 Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> Kernel driver in use: radeon Thank you for your time.
Created attachment 337916 [details] Output of emerge --info
Created attachment 337920 [details] Current copy of /usr/src/linux/.config for 3.8.0-rc6
Please also provide your Xorg.0.log You have vesafb enabled in your kernel, does it become active according to dmesg? If yes, this can cause problems.
(In reply to comment #3) > Please also provide your Xorg.0.log > You have vesafb enabled in your kernel, does it become active according to > dmesg? If yes, this can cause problems. Hi Chí-Thanh, You picked it -- I disabled VESA framebuffer and my problems disappeared. I haven't seen a single GPU lockup since making the change. If vesafb can interfere with radeon this severely, maybe the two options should block each other in the kernel? In any case, thanks very much for your help.
The Xorg configuration guide already recommends disabling all framebuffer drivers, including vesafb. http://www.gentoo.org/doc/en/xorg-config.xml In principle, vesafb handover should ensure that radeon KMS smoothly takes over, but in practice that sometimes does not work properly. You can report this at https://bugs.freedesktop.org/ if you wish, but I don't know whether there is interest upstream for fixing this.
(In reply to comment #5) > The Xorg configuration guide already recommends disabling all framebuffer > drivers, including vesafb. > http://www.gentoo.org/doc/en/xorg-config.xml Yeah, the documentation is right, that was my fault. I should have read more carefully when configuring the kernel. The guide could maybe be a little bit more explicit and specifically call out VESA (and, I guess, UVESA) by name as something you need to disable. It does say "VGA" which I suppose includes those two. But given how severe and perplexing the failure can be, it might be reasonable to make a bigger fuss about it. > > In principle, vesafb handover should ensure that radeon KMS smoothly takes > over, but in practice that sometimes does not work properly. You can report > this at https://bugs.freedesktop.org/ if you wish, but I don't know whether > there is interest upstream for fixing this. I don't think I'll bother them about it. Thank you again for your help, and sorry for the noise.