Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 559066

Summary: media-libs/mesa-11.0.0_rc1 GPU lockup with radeon kernel driver on RV620 LE [Radeon HD 3450]
Product: Gentoo Linux Reporter: markus <markus.gapp>
Component: [OLD] UnspecifiedAssignee: Gentoo X packagers <x11>
Status: RESOLVED UPSTREAM    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: emerge --info mesa

Description markus 2015-08-29 06:34:42 UTC
Hi,

After upgrading to mesa-11.0.0_rc1 my kde desktop environment experiences randomly repeated crashes, i.e. plasmashell segfaults, and the desktop gets unresponsive.

While Xorg.log does not show anything special, dmesg comlains about a ring 0 stall on the GPU(?)


  1 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 10456msec
  2 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000282 last fence id 0x0000000000000289     on ring 0)
  3 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: Saved 217 dwords of commands on ring 0.
  4 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: GPU softreset: 0x00000009
  5 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xE5700030
  6 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00110103
  7 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200000C0
  8 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
  9 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00008002
 10 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008086
 11 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80018645
 12 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
 13 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEF
 14 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: SRBM_SOFT_RESET=0x00000100
 15 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008010_GRBM_STATUS      = 0xA0003030
 16 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008014_GRBM_STATUS2     = 0x00000003
 17 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_000E50_SRBM_STATUS      = 0x200080C0
 18 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
 19 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
 20 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
 21 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_008680_CP_STAT          = 0x80100000
 22 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
 23 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: GPU reset succeeded, trying to resume
 24 Aug 28 21:39:07 hostname kernel: [drm] PCIE gen 2 link speeds already enabled
 25 Aug 28 21:39:07 hostname kernel: [drm] PCIE GART of 512M enabled (table at 0x0000000000254000).
 26 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: WB enabled
 27 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff8802251    dac00
 28 Aug 28 21:39:07 hostname kernel: radeon 0000:80:00.0: fence driver on ring 5 use gpu addr 0x00000000000521d0 and cpu addr 0xffffc900010    121d0
 29 Aug 28 21:39:07 hostname kernel: [drm] ring test on 0 succeeded in 1 usecs
 30 Aug 28 21:39:08 hostname kernel: [drm] ring test on 5 succeeded in 1 usecs
 31 Aug 28 21:39:08 hostname kernel: [drm] UVD initialized successfully.
 32 Aug 28 21:39:18 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 10210msec
 33 Aug 28 21:39:18 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 34 Aug 28 21:39:18 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 10710msec
 35 Aug 28 21:39:18 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 36 Aug 28 21:39:19 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 11210msec
 37 Aug 28 21:39:19 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 38 Aug 28 21:39:19 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 11710msec
 39 Aug 28 21:39:19 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 40 Aug 28 21:39:20 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 12210msec
 41 Aug 28 21:39:20 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 42 Aug 28 21:39:20 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 12710msec
 43 Aug 28 21:39:20 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 44 Aug 28 21:39:21 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 13210msec
 45 Aug 28 21:39:21 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)
 46 Aug 28 21:39:21 hostname kernel: radeon 0000:80:00.0: ring 0 stalled for more than 13710msec
 47 Aug 28 21:39:21 hostname kernel: radeon 0000:80:00.0: GPU lockup (current fence id 0x0000000000000285 last fence id 0x0000000000000289     on ring 0)

[...]

downgrading to mesa-10.* fixes my problems.

will attach $emerge --info mesa

thank you!!!!!

markus

Reproducible: Always
Comment 1 markus 2015-08-29 06:39:41 UTC
Created attachment 410552 [details]
emerge --info mesa
Comment 2 Marek Paśnikowski 2015-08-30 19:09:20 UTC
I can not be sure if what I experienced recently could be related to this bug.
I could not start sddm or anything related with QT 5 . After having a long game of hide-and-seek with this problem I found this: https://bugs.freedesktop.org/show_bug.cgi?id=91753 . I immediately masked the mesa-11-rc1, recompiled, rebooted and enjoyed the new KDE.
To me it sounds like the lock ups of the RV620 card may be related to the oversized stream of commands mentioned in the linked report. Considering that I was not able to bring up anything related to 3D ( I was lucky to get the startx session running with firefox ), and that in both mine and this case a revert to the previous version of mesa solved the problem; I think those two are related to the same patch.
According to the discussion in the linked report the offending commit has been reverted in rc2 and no new attempt at the change will be taken. A mask on that version of mesa seems to be a proper way to fix this bug.
Comment 3 Marek Paśnikowski 2015-09-01 08:41:42 UTC
Today I updated to mesa-11-rc2 and experienced no problems. If our problems were indeed from the same source, then that means that upstream had fixed it.
Comment 4 markus 2015-09-01 17:03:55 UTC
Im afraid, mesa-11.0.0_rc2 does not fix or even change the issue for me. So it might be something different than https://bugs.freedesktop.org/show_bug.cgi?id=91753 if that one is really fixed for others. Is anyone else still affected?

thanx a lot
Comment 5 Matt Turner gentoo-dev 2015-09-01 17:07:54 UTC
Please file a bug upstream -- upstream should know if you're seeing GPU hangs and also x11@ can't help you fix them ourselves.
Comment 6 markus 2015-09-01 19:38:38 UTC
Hi again!

It is not enough to restart the X server to sufficiently test a new mesa version. Sorry about sharing my learning curve. 

mesa-11.0.0_rc2 does fix my issue after a fresh boot. So likely i was hit by https://bugs.freedesktop.org/show_bug.cgi?id=91753 .

Will mark this as resolved upstream now. Solution is not to use mesa-11.0.0_rc1 whereras mesa-11.0.0_rc2 works fine in this regard for now.

Thank you all for your patience & help!

markus