Since kernel 5.0 onwards, my HP laptop (AMD processor and embedded graphics) graphics mode locks up when resuming after suspending.
Specifically, if I login as root on tty1, and stop xdm, I can issue pm-suspend successfully, and on touching the power button the system resumes, and appears to work, but only until I restart xdm. Then the screen goes grey and locks up; after forcing the system to shut down, syslog shows something along the following lines:
kernel: [ 81.096666] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=51, emitted seq=52
kernel: [ 81.096671] [drm] IP block:gfx_v8_0 is hung!
kernel: [ 81.096734] [drm] GPU recovery disabled.
If instead I simply login from the sddm screen (I'm using KDE), and close the lid or otherwise select suspend, on resume, the screen locks and the same messages appear.
The problem remains in the most recent kernel 5.2.1, and if I specify amdgpu.gpu_recovery=1, I get a bit further:
kernel: [ 279.726475] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=57, emitted seq=59
kernel: [ 279.726536] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 2860 thread X:cs0 pid 2861
kernel: [ 279.726542] [drm] IP block:gfx_v8_0 is hung!
kernel: [ 279.726609] amdgpu 0000:00:01.0: GPU reset begin!
kernel: [ 279.726992] amdgpu 0000:00:01.0: GRBM_SOFT_RESET=0x000F0001
kernel: [ 279.727047] amdgpu 0000:00:01.0: SRBM_SOFT_RESET=0x00000100
kernel: [ 279.863162] [drm] recover vram bo from shadow start
kernel: [ 279.863164] [drm] recover vram bo from shadow done
kernel: [ 279.863166] [drm] Skip scheduling IBs!
kernel: [ 279.863191] amdgpu 0000:00:01.0: GPU reset(2) succeeded!
kernel: [ 280.015794] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
That remains true with the latest x11-misc/sddm-0.18.1-r1 and media-libs/mesa-19.1.2 (I normally run stable).
You'll be pleased to know I've run git bisect. I wish I'd painted a wall to give me something to watch :-)
106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit
Author: Likun Gao <Likun.Gao@amd.com>
Date: Thu Nov 8 20:19:54 2018 +0800
drm/amdgpu: abstract the function of enter/exit safe mode for RLC
Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part of
rlc_init to improve the reusability of RLC.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Acked-by: Christian König <firstname.lastname@example.org>
Reviewed-by: Alex Deucher <email@example.com>
Signed-off-by: Alex Deucher <firstname.lastname@example.org>
:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef e14ec968011019e3f601df3f15682bb9ae0bafc6 M drivers
I'll attach the .config for both the build identified by bisection, and the one I now use with kernel 5.2.1 (which has several changed, none of which bypass the problem).
Created attachment 583156 [details]
Created attachment 583158 [details]
Gzipped config for the bisected kernel (config-4.20.0-rc1)
Created attachment 583160 [details]
Gzipped config for the bisected kernel (config-5.2.1)
This bug looks to be the same as:
though the people reporting it there haven't done the git bisect.
I'm in communication with the module author, and currently testing a fix.