Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 790566

Summary: sys-kernel/linux-firmware-20210511: AMDGPU broken
Product: Gentoo Linux Reporter: Maciej Barć <xgqt>
Component: Current packagesAssignee: Chí-Thanh Christopher Nguyễn <chithanh>
Status: RESOLVED OBSOLETE    
Severity: normal CC: aladjev.andrew, brezensalzer, jstein, kernel, stefan, zerochaos
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://lists.freedesktop.org/archives/amd-gfx/2021-May/063759.html
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: emerge --info
kernel errors 2012-05-15
config - 5.12.4-gentoo-magentalane-v0.2.7
glxinfo -B
dmesg output when graphics crashes
dmesg sys-kernel/linux-firmware-20210518
dmesg of chromium errors
dmesg of chromium errors (5.13.1/20210629)

Description Maciej Barć gentoo-dev 2021-05-16 17:50:42 UTC
So, my main machine is a AMD laptop "81NC Lenovo IdeaPad S340-15API" and recently some breakages started happening for me. In about 1h after bootup while using a KDE desktop machine GUI would freeze. Sometimes it would be possible to move the mouse but the rest will be frozen. Screen may start blinking or go black.

I'm not sure if this is my kernel, firmware or the hardware.
I don't understands dmesg that's why I'm guessing, but I think it is the firmware
since this behaviour started around 2021-05-15.
From my portage logs I see that I updated my firmware on 2021-05-14 at 18:16:06.

So breakages started with my kernel: 5.10.27 and FW: 20210511.
After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. 
I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
So I booted to 5.12.4 where I was ~1h and it borke.
So I booted to 5.4.97 again and downgraded my FW.
While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.

My uptime is: 2 hours, 45 minutes
Comment 1 Maciej Barć gentoo-dev 2021-05-16 17:51:52 UTC
Created attachment 709164 [details]
emerge --info
Comment 2 Maciej Barć gentoo-dev 2021-05-16 17:52:54 UTC
Created attachment 709167 [details]
kernel errors 2012-05-15
Comment 3 Maciej Barć gentoo-dev 2021-05-16 17:55:08 UTC
Created attachment 709170 [details]
config - 5.12.4-gentoo-magentalane-v0.2.7
Comment 4 Maciej Barć gentoo-dev 2021-05-16 17:55:50 UTC
I use rsyslog. Any parts of syslog that I should provide?
Comment 5 Maciej Barć gentoo-dev 2021-05-16 19:57:48 UTC
lspci -k -v

04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c2) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Picasso
        Flags: bus master, fast devsel, latency 0, IRQ 63, IOMMU group 11
        Memory at b0000000 (64-bit, prefetchable) [size=256M]
        Memory at c0000000 (64-bit, prefetchable) [size=2M]
        I/O ports at 1000 [size=256]
        Memory at c0800000 (32-bit, non-prefetchable) [size=512K]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
Comment 6 Maciej Barć gentoo-dev 2021-05-16 19:58:54 UTC
lshw -numeric -C display                                                              

  *-display
       description: VGA compatible controller
       product: Picasso [1002:15D8]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI] [1002]
       physical id: 0
       bus info: pci@0000:04:00.0
       version: c2
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi msix vga_controller bus_master cap_list rom
       configuration: driver=amdgpu latency=0
       resources: irq:63 memory:b0000000-bfffffff memory:c0000000-c01fffff ioport:1000(size=256) memory:c0800000-c087ffff memory:c0000-dffff
Comment 7 Maciej Barć gentoo-dev 2021-05-16 20:00:37 UTC
Created attachment 709185 [details]
glxinfo -B
Comment 8 Thomas Deutschmann (RETIRED) gentoo-dev 2021-05-16 21:13:57 UTC
Thank you for letting us know but there is not much we can do for you: Please report upstream on your own and update this bug report with a link to your bug report/mail to LKML.
Comment 9 tomtom69 2021-05-17 19:18:10 UTC
Same here on a Ryzen 3350G PRO system (GPU family is also "picasso"). Since last system update (not many packages, but including linux-firmare to 20210511) system hangs or goes back to login screen after 1-3 hours of normal usuage.
Kernel version is 5.10.27.
I'll try to downgrade linux-firmware to 20210315 and test to be sure whether this is the cause or not.
Comment 10 tomtom69 2021-05-17 19:19:04 UTC
Created attachment 709455 [details]
dmesg output when graphics crashes
Comment 11 Maciej Barć gentoo-dev 2021-05-18 14:16:39 UTC
Another case:
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063852.html
Comment 12 Ionen Wolkens gentoo-dev 2021-05-19 20:40:33 UTC
*** Bug 790683 has been marked as a duplicate of this bug. ***
Comment 13 Stefan de Konink 2021-06-11 07:19:52 UTC
Created attachment 715227 [details]
dmesg sys-kernel/linux-firmware-20210518

Problem remains with sys-kernel/linux-firmware-20210518 and sys-kernel/gentoo-sources-5.12.10.
Comment 14 tomtom69 2021-06-12 08:19:23 UTC
There is an upstream patch available which should fix this
https://patchwork.freedesktop.org/patch/433701/
But I found this patgch included in 5.12.10, so there maybe another issue.
Comment 15 Stefan de Konink 2021-06-12 08:45:42 UTC
(In reply to tomtom69 from comment #14)
> There is an upstream patch available which should fix this
> https://patchwork.freedesktop.org/patch/433701/

I am curious, does that mean that functionally that was previously available on this hardware, is now disabled, as being planned obsolescence?


> But I found this patgch included in 5.12.10, so there maybe another issue.

Likely.
Comment 16 Stefan de Konink 2021-06-13 17:33:02 UTC
Created attachment 715743 [details]
dmesg of chromium errors

[ 4233.080397] amdgpu 0000:05:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32774, for process chrome pid 2369 thread chrome:cs0 pid 2393)
[ 4233.080415] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x800114000000 from client 27
[ 4233.080427] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00341051
[ 4233.080431] amdgpu 0000:05:00.0: amdgpu:      Faulty UTCL2 client ID: TCP (0x8)
[ 4233.080435] amdgpu 0000:05:00.0: amdgpu:      MORE_FAULTS: 0x1
[ 4233.080438] amdgpu 0000:05:00.0: amdgpu:      WALKER_ERROR: 0x0
[ 4233.080440] amdgpu 0000:05:00.0: amdgpu:      PERMISSION_FAULTS: 0x5
[ 4233.080442] amdgpu 0000:05:00.0: amdgpu:      MAPPING_ERROR: 0x0
[ 4233.080444] amdgpu 0000:05:00.0: amdgpu:      RW: 0x1

These kind of errors maybe completely unrelated, but cause a full stop. Thankfully the system does recover.
Comment 17 Stefan de Konink 2021-07-11 09:51:44 UTC
Created attachment 723235 [details]
dmesg of chromium errors (5.13.1/20210629)

Issues with 5.13.1/20210629 and Chromium remain.
Comment 18 Maciej Barć gentoo-dev 2021-07-11 12:28:35 UTC
(In reply to Stefan de Konink from comment #17)
> Created attachment 723235 [details]
> dmesg of chromium errors (5.13.1/20210629)
> 
> Issues with 5.13.1/20210629 and Chromium remain.

Thanks for testing, I was not able to test latest version cause I haven't had time to fight those bugs (especially because they are so annoying). 
Still on 20210315.
Comment 19 Stefan de Konink 2021-07-11 12:31:56 UTC
I switched back as well. I still have to confirm that it would be possible to test this change without recompiling the kernel.
Comment 20 Andrew Aladjev 2021-08-03 09:27:31 UTC
For now we have to mask:

# amdgpu
=sys-kernel/linux-firmware-20210315
=sys-kernel/linux-firmware-20210518
=sys-kernel/linux-firmware-20210629

=sys-kernel/linux-firmware-20210208 was the last good firmware for amdgpu.
Comment 21 Stefan de Konink 2021-08-03 09:33:43 UTC
(In reply to Andrew Aladjev from comment #20)
> For now we have to mask:
> 
> # amdgpu
> =sys-kernel/linux-firmware-20210315

What issues have you experienced with 20210315? I "barely" have issues with this one. The issues that I still have on my platform is suspend-resume. And a reproducible kernel panic at powerdown when the device woke up from a cold suspend.

I wonder if the OpenCL issues (I experienced them with tesseract last year) or if it is a general issue with Raven Ridge (being unsupported now upstream for ROC). 
 https://bugs.gentoo.org/764605
Comment 22 Andrew Aladjev 2021-08-03 11:00:21 UTC
I've added mask file for amdgpu several months ago when received random hang (just "ring gfx timeout" without additional info) with old kernel 5.10 and firmware 20210315. Than I've upgraded firmware to 20210511 + kernel to 5.12 and received stable hang (VM_L2_PROTECTION_FAULT_STATUS + "ring gfx timeout") so added 20210511 to mask file, same thing for 20210629. So firmware 20210208 is the island of "stability".

If you want stable GPU than please do not use amdgpu (at least for now). Please review this issue https://gitlab.freedesktop.org/drm/amd/-/issues/892. This issue is the volcano of amdgpu linux user "experience". You can grep linux sources using "TIMEOUT_FOR_FLIP_PENDING", found "dcn20_hwseq.c" file and review the quality of "code" around. You will immediately feel how "dcn20_pipe_control_lock", "dcn20_enable_stream_timing", "dcn20_update_dchubp_dpp", "dcn20_enable_plane", "dcn20_update_mpcc" smells like. This code is experimental, it was not designed to be stable and it won't become stable. This code should be rewritten completely by amd core developers. This rewrite may happen in next 5 years.

If you want a stable GPU for now than use radeon (famous r600/r700/etc) <= gcn 1.0.
Comment 23 Andrew Aladjev 2021-08-03 15:05:05 UTC
See also https://bugzilla.kernel.org/show_bug.cgi?id=205169
Comment 24 tomtom69 2021-08-13 16:43:38 UTC
Some GPU firmware files (*sdma.bin) were now reverted upstream:
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=d843e520a4b0d92b986645548d11ade3b9b239a4
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=99d72504bff7ab40c261b8509c0b9d8abf98b296
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/?id=d7b50e61669dc137924337d03d09b8986eb752a3

I also found out only the picasso_sdma.bin file from newer versions caused the issue here. So I use the current firmware files and only keep picasso_sdma.bin from linux-firmware-20210315:
https://gitlab.freedesktop.org/drm/amd/-/issues/1609

Hopefully these upstream patches fix the problem for now, as soon as they arrive in the portage tree (however it is only an intermediate solution, not a real bugfix).
Comment 25 Maciej Barć gentoo-dev 2021-09-24 00:47:36 UTC
I've been running version 20210818 for 4 days now, seems the issue is gone.
Comment 26 Marc Lee Towers 2021-10-01 08:27:00 UTC
I had freezing and a way to reproduce was emerge in qtwebengine with jumbo-build.. my system would even lose a couple of minutes during the freeze and kde crash sometimes losing me opencl... 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c2)

Finding wayland a bit messy but as for the fix I'm afraid I couldn't determine if it has pushed graphics memory to my swap file, protected video ram could be swapped by accident?  regardless only fix I found was rather than 8gb of ram I added another 16gb to the already 8gb and issues all went away... not exactly the best fix but a fix all the same... they is nothing quiet right still and wayland is a bit buggy but seems lack of ram and some sort of swap space issue on my laptop...
Comment 27 Stefan de Konink 2021-11-24 16:39:33 UTC
Currently running 5.15.2 and linux-firmware-20211027, kernel panics are back.
Comment 28 Stefan de Konink 2021-11-25 10:08:50 UTC
Using 5.15.4-gentoo, and the latest firmware. Currently not (yet) crashing.

[ 4151.697052] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 4151.697069] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e7ef000 from IH client 0x12 (VMC)
[ 4151.697079] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140450
[ 4151.697112] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: VCN (0x2)
[ 4151.697115] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 4151.697117] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 4151.697119] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[ 4151.697122] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 4151.697124] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x1
[ 7513.289273] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289288] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e7ef000 from IH client 0x12 (VMC)
[ 7513.289302] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140451
[ 7513.289306] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: VCN (0x2)
[ 7513.289309] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[ 7513.289314] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289317] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[ 7513.289319] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289353] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x1
[ 7513.289393] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289424] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289457] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140450
[ 7513.289460] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: VCN (0x2)
[ 7513.289463] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.289465] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289468] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x5
[ 7513.289470] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289473] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x1
[ 7513.289501] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289508] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289543] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.289547] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.289550] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.289593] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289596] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.289598] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289601] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.289634] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289649] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289672] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.289674] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.289677] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.289679] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289681] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.289683] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289685] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.289796] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289803] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289814] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.289816] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.289818] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.289820] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289823] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.289825] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289827] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.289883] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289888] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289903] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.289906] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.289909] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.289912] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.289914] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.289915] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.289917] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.289928] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.289943] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.289983] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.290022] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.290024] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.290026] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.290029] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.290030] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.290032] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.290043] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.290058] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.290068] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.290071] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.290074] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.290076] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.290078] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.290080] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.290081] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.290087] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.290092] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.290129] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.290132] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.290134] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.290136] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.290138] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.290140] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.290142] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7513.290196] amdgpu 0000:05:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32775, for process chrome pid 1636 thread chrome:cs0 pid 1662)
[ 7513.290202] amdgpu 0000:05:00.0: amdgpu:   in page starting at address 0x000080011e5e1000 from IH client 0x12 (VMC)
[ 7513.290214] amdgpu 0000:05:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[ 7513.290216] amdgpu 0000:05:00.0: amdgpu: 	 Faulty UTCL2 client ID: MP1 (0x0)
[ 7513.290218] amdgpu 0000:05:00.0: amdgpu: 	 MORE_FAULTS: 0x0
[ 7513.290220] amdgpu 0000:05:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[ 7513.290222] amdgpu 0000:05:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x0
[ 7513.290224] amdgpu 0000:05:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[ 7513.290226] amdgpu 0000:05:00.0: amdgpu: 	 RW: 0x0
[ 7518.747834] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 7518.747846] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[ 7523.788343] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Comment 29 Maciej Barć gentoo-dev 2022-01-31 01:25:44 UTC
Since my last report I had no problem with my GPU, now running version 20211216.
To people who had similar problems: if any other version causes problems file reports for that version.
The version 20210511 is no longer available in the tree, closing this.