Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 690036 - >=sys-kernel/gentoo-sources-5.0: resume after suspend hangs screen with [amdgpu] *ERROR* ring gfx timeout, signaled
Summary: >=sys-kernel/gentoo-sources-5.0: resume after suspend hangs screen with [amdg...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-07-16 22:30 UTC by Paul Gover
Modified: 2019-11-07 17:16 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (file_690036.txt,6.42 KB, text/plain)
2019-07-16 22:32 UTC, Paul Gover
Details
Gzipped config for the bisected kernel (config-4.20.0-rc1) (config.bisected.gz,24.10 KB, application/gzip)
2019-07-16 22:48 UTC, Paul Gover
Details
Gzipped config for the bisected kernel (config-5.2.1) (config-5.2.1-gentoo.gz,25.36 KB, application/gzip)
2019-07-16 22:48 UTC, Paul Gover
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Gover 2019-07-16 22:30:25 UTC
Since kernel 5.0 onwards, my HP laptop (AMD processor and embedded graphics) graphics mode locks up when resuming after suspending.

Specifically, if I login as root on tty1, and stop xdm, I can issue pm-suspend successfully, and on touching the power button the system resumes, and appears to work, but only until I restart xdm.  Then the screen goes grey and locks up; after forcing the system to shut down, syslog shows something along the following lines:

kernel: [   81.096666] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=51, emitted seq=52
kernel: [   81.096671] [drm] IP block:gfx_v8_0 is hung!
kernel: [   81.096734] [drm] GPU recovery disabled.

If instead I simply login from the sddm screen (I'm using KDE), and close the lid or otherwise select suspend, on resume, the screen locks and the same messages appear.

The problem remains in the most recent kernel 5.2.1, and if I specify amdgpu.gpu_recovery=1, I get a bit further:

kernel: [  279.726475] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=57, emitted seq=59
kernel: [  279.726536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 2860 thread X:cs0 pid 2861
kernel: [  279.726542] [drm] IP block:gfx_v8_0 is hung!
kernel: [  279.726609] amdgpu 0000:00:01.0: GPU reset begin!
kernel: [  279.726992] amdgpu 0000:00:01.0: GRBM_SOFT_RESET=0x000F0001
kernel: [  279.727047] amdgpu 0000:00:01.0: SRBM_SOFT_RESET=0x00000100
kernel: [  279.863162] [drm] recover vram bo from shadow start
kernel: [  279.863164] [drm] recover vram bo from shadow done
kernel: [  279.863166] [drm] Skip scheduling IBs!
kernel: [  279.863191] amdgpu 0000:00:01.0: GPU reset(2) succeeded!
kernel: [  280.015794] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

That remains true with the latest x11-misc/sddm-0.18.1-r1 and media-libs/mesa-19.1.2 (I normally run stable).

You'll be pleased to know I've run git bisect.  I wish I'd painted a wall to give me something to watch :-)

106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit
commit 106c7d6148e5aadd394e6701f7e498df49b869d1
Author: Likun Gao <Likun.Gao@amd.com>
Date:   Thu Nov 8 20:19:54 2018 +0800

    drm/amdgpu: abstract the function of enter/exit safe mode for RLC
    
    Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part of
    rlc_init to improve the reusability of RLC.
    
    Signed-off-by: Likun Gao <Likun.Gao@amd.com>
    Acked-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef e14ec968011019e3f601df3f15682bb9ae0bafc6 M      drivers

I'll attach the .config for both the build identified by bisection, and the one I now use with kernel 5.2.1 (which has several changed, none of which bypass the problem).
Comment 1 Paul Gover 2019-07-16 22:32:28 UTC
Created attachment 583156 [details]
emerge --info
Comment 2 Paul Gover 2019-07-16 22:48:00 UTC
Created attachment 583158 [details]
Gzipped config for the bisected kernel (config-4.20.0-rc1)
Comment 3 Paul Gover 2019-07-16 22:48:50 UTC
Created attachment 583160 [details]
Gzipped config for the bisected kernel (config-5.2.1)
Comment 4 Paul Gover 2019-07-17 06:00:41 UTC
This bug looks to be the same as:
https://bugs.freedesktop.org/show_bug.cgi?id=110258
and perhaps:
https://bugzilla.freedesktop.org/show_bug.cgi?id=110457

though the people reporting it there haven't done the git bisect.
Comment 5 Paul Gover 2019-08-04 17:07:23 UTC
I'm in communication with the module author, and currently testing a fix.
Comment 6 Mike Pagano gentoo-dev 2019-11-06 12:58:08 UTC
(In reply to Paul Gover from comment #5)
> I'm in communication with the module author, and currently testing a fix.

Hi Paul.  Any update here on a patch ?
Comment 7 Paul Gover 2019-11-07 17:03:17 UTC
Mike, sorry, I forgot about this Gentoo bug.  The freedesktop bug was
https://bugs.freedesktop.org/show_bug.cgi?id=110258
and the fix
https://cgit.freedesktop.org/drm/drm/commit/?h=drm-fixes&id=72cda9bb5e219aea0f2f62f56ae05198c59022a7
has gone into the kernel; I can't actually work out which kernel it turned up in; it was a month or so back.  Since it came out, I've been happily running with an unpatched kernel.

As far as I'm concerned the bug is fixed, as is the freedesktop one.  (I note the latter is still status New; that ought to be changed; it's not my bug, so I don't know if I can/should change it.)
Comment 8 Mike Pagano gentoo-dev 2019-11-07 17:16:45 UTC
(In reply to Paul Gover from comment #7)
> Mike, sorry, I forgot about this Gentoo bug.  The freedesktop bug was
> https://bugs.freedesktop.org/show_bug.cgi?id=110258
> and the fix
> https://cgit.freedesktop.org/drm/drm/commit/?h=drm-
> fixes&id=72cda9bb5e219aea0f2f62f56ae05198c59022a7
> has gone into the kernel; I can't actually work out which kernel it turned
> up in; it was a month or so back.  Since it came out, I've been happily
> running with an unpatched kernel.
> 
> As far as I'm concerned the bug is fixed, as is the freedesktop one.  (I
> note the latter is still status New; that ought to be changed; it's not my
> bug, so I don't know if I can/should change it.)

Thanks, Paul. Appreciate the response.