Please see the following: [46054.947725] ------------[ cut here ]------------ [46054.947726] WARNING: CPU: 15 PID: 2686 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0xd1/0x230 [amdgpu] [46054.947928] Modules linked in: fuse amdgpu 8021q garp mrp vfat fat binfmt_misc mt7921e mt7921_common mt792x_lib mt76_connac_lib mt76 mac80211 amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_usb_audio snd_hda_intel kvm_amd btusb vfio_pci vfio_pci_core snd_intel_dspc fg snd_hda_codec vfio_iommu_type1 btrtl btintel kvm vfio libarc4 btbcm btmtk snd_usbmidi_lib snd_hda_core cfg80211 bluetooth snd_ump amdxcp i2c_algo_bit snd_rawmidi drm_ttm_helper asus_nb_wmi eeepc_wmi asus_wmi snd_hwdep snd_pcm ttm sparse_keymap wmi_bmof platform_profile drm_exec gp u_sched snd_timer drm_suballoc_helper igc drm_buddy snd rapl drm_display_helper i2c_piix4 pcspkr video mc k10temp i2c_smbus rfkill soundcore wmi gpio_amdpt gpio_generic dm_crypt nvme ccp ucsi_ccg nvme_core typec_ucsi typec sp5100_tco [46054.947980] CPU: 15 UID: 1000 PID: 2686 Comm: sway Not tainted 6.12.16-gentoo-gentoo-dist #2 [46054.947982] Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI, BIOS 3222 03/05/2025 [46054.947983] RIP: 0010:dcn31_program_compbuf_size+0xd1/0x230 [amdgpu] [46054.948145] Code: 00 48 8b 43 28 8b 88 b0 01 00 00 48 8b 43 20 0f b6 50 6c 48 8b 43 18 8b b0 14 01 00 00 e8 e7 45 0e 00 85 c0 0f 85 33 01 00 00 <0f> 0b 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 0f 85 35 01 00 00 [46054.948146] RSP: 0018:ffffbb8480bbf618 EFLAGS: 00010202 [46054.948148] RAX: 0000000000000001 RBX: ffff9842c83ec000 RCX: 000000000000001f [46054.948149] RDX: 0000000000000000 RSI: 000000000000398b RDI: ffff984331f80000 [46054.948150] RBP: 0000000000000004 R08: ffffbb8480bbf61c R09: 000000000000000d [46054.948151] R10: ffffffffb5514028 R11: 0000000000000003 R12: ffff98431c9c0000 [46054.948152] R13: ffff984332800000 R14: ffff9842c83ec000 R15: 0000000000000001 [46054.948153] FS: 00007fcf46149a00(0000) GS:ffff9849fe780000(0000) knlGS:0000000000000000 [46054.948155] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [46054.948156] CR2: 000055a1e816b1d0 CR3: 00000001eb104000 CR4: 0000000000f50ef0 [46054.948157] PKRU: 55555554 [46054.948158] Call Trace: [46054.948160] <TASK> [46054.948161] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu] [46054.948297] ? __warn.cold+0x93/0xf0 [46054.948300] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu] [46054.948420] ? report_bug+0xff/0x140 [46054.948423] ? handle_bug+0x58/0x90 [46054.948425] ? exc_invalid_op+0x17/0x70 [46054.948427] ? asm_exc_invalid_op+0x1a/0x20 [46054.948431] ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu] [46054.948544] ? dcn31_program_compbuf_size+0xc9/0x230 [amdgpu] [46054.948655] dcn20_optimize_bandwidth+0xe4/0x220 [amdgpu] [46054.948814] dc_commit_state_no_check+0xc5b/0xeb0 [amdgpu] [46054.948960] dc_commit_streams+0x31f/0x420 [amdgpu] [46054.949099] amdgpu_dm_atomic_commit_tail+0x65d/0x3a80 [amdgpu] [46054.949265] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949268] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949270] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949271] ? amdgpu_dm_atomic_check+0x15df/0x17c0 [amdgpu] [46054.949417] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949419] ? wait_for_completion_timeout+0x13b/0x170 [46054.949421] ? wait_for_completion_interruptible+0x12d/0x1e0 [46054.949424] commit_tail+0x91/0x130 [46054.949426] drm_atomic_helper_commit+0x11a/0x140 [46054.949428] drm_atomic_commit+0xa6/0xe0 [46054.949431] ? __pfx___drm_printfn_info+0x10/0x10 [46054.949433] drm_mode_atomic_ioctl+0xa73/0xcb0 [46054.949437] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [46054.949439] drm_ioctl_kernel+0xad/0x100 [46054.949442] drm_ioctl+0x277/0x4d0 [46054.949444] ? __pfx_drm_mode_atomic_ioctl+0x10/0x10 [46054.949448] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu] [46054.949564] __x64_sys_ioctl+0x91/0xd0 [46054.949567] do_syscall_64+0x82/0x190 [46054.949570] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949571] ? __count_memcg_events+0x53/0xf0 [46054.949573] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949574] ? count_memcg_events.constprop.0+0x1a/0x30 [46054.949576] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949577] ? handle_mm_fault+0x1bb/0x2c0 [46054.949579] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949581] ? do_user_addr_fault+0x36c/0x620 [46054.949583] ? srso_alias_return_thunk+0x5/0xfbef5 [46054.949584] ? exc_page_fault+0x7e/0x180 [46054.949586] entry_SYSCALL_64_after_hwframe+0x76/0x7e [46054.949588] RIP: 0033:0x7fcf46c9120f [46054.949590] Code: 00 48 89 44 24 18 31 c0 c7 04 24 10 00 00 00 48 8d 44 24 60 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00 [46054.949591] RSP: 002b:00007ffdad034f50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [46054.949593] RAX: ffffffffffffffda RBX: 000055a1e7177050 RCX: 00007fcf46c9120f [46054.949594] RDX: 00007ffdad035000 RSI: 00000000c03864bc RDI: 000000000000000b [46054.949595] RBP: 00007ffdad035000 R08: 0000000000000007 R09: 0000000000000002 [46054.949596] R10: 0000000000000003 R11: 0000000000000246 R12: 00000000c03864bc [46054.949597] R13: 000000000000000b R14: 000055a1e7f5e720 R15: 000055a1e88a3b10 [46054.949600] </TASK> [46054.949600] ---[ end trace 0000000000000000 ]--- Reproducible: Always Steps to Reproduce: 1.Just wait long enough 2. 3. Actual Results: Crash Expected Results: Not crash I have 2 amdgpu cards. One is active (igpu built into ryzen 7700) and in use. The other is a discrete gpu (6900 XT) which is bound to vfio-pci for use with a VM as a passthrough device. Whether or not I run the VM that uses this passthrough device, over time this crash occurs. Probably related, in further time the system will reboot on its own.
I don't think CC'ing me makes sense, I don't handle anything related to amdgpu nor keep up with issues.
You'll probably get a better answer on FDO's bug tracker (there seem to be a few reports like this there already): https://gitlab.freedesktop.org/drm/amd/-/issues/ From a cursory glance, that's power management code being triggered by what looks like a spurious wakeup/hotplug event. There's a known problem where flaky DP connections are enough to crash host software (though it's usually the compositor, not the kernel).
Please seek help in support channels. We can't help you here.