Created attachment 904902 [details] sysrq - blocked calls on stuck qemu-loop-lvm-md stack Hello, I am running a VM with WinXP on my box and after an upgrade to kernel 6.11.0 it started misbehaving. Previous kernels were a 6.6.xx and 6.1.xx The layer stack is this: - gentoo-sources-6.11.0 - mdraid - raid 1 mirror of a HDD+SSD (hdd is flagged as write-mostly) - lvm - to partition the above md device - loopdev - to realign the unaligned winxp partition - app-emulation/qemu-9.0.2-r2 - WinXP guest Once the QEMU is stuck the RDP no longer works nor the spice protocol console nor the monitoring socket. Furthermore, LVM management commands such as lvs will get stuck in D-state as well. To me it looks like some deadlock in this kernel version related to how my VM block-device stackup is made. The lvm volume has a MBR partitioned disk with C: starting at +1MiB (sector 2048, aligned) - that natively does not boot so I made a loop device with the 63 sectors preceeding that first partition start. Qemu is then booted with an unaligned drive, but aligned partition (win-win situation): losetup -o $(( 512 * ( 2048 - 63 ) )) "$DRIVE_BOOT" "$DRIVE_FILE" So far I have added aio=threads to qemu drive configuration to rule out the other async API issues but it just gets stuck with this as well. Yesterday I had to enable CONFIG_MAGIC_SYSRQ so that I can issue: echo w >/proc/sysrq-trigger, whose output is attached. Running lvs&,lvs there is another dump attached. Is there any way to unblock such locks? As this even prevents a graceful reboot - I have to manually stop the other VM's, stop NFS, umount what is umountable and then press the hard reset button. reboot/poweroff is not doing anything here by having those few processes in D state. After killing the two lvs processes that were in (S/S+) state, they get into D as well (strace shown an async api being used inside lvs): # ps auxf | grep D USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 7358 0.0 0.0 0 0 ? D 11:32 0:00 \_ [kworker/u16:0+loop0] root 9444 0.0 0.0 0 0 ? D 11:52 0:00 \_ [kworker/u16:4+flush-251:4] root 21331 0.0 0.0 6552 2132 pts/1 S+ 13:44 0:00 | \_ grep --color=auto D root 19518 0.0 0.0 0 0 pts/3 D 13:24 0:00 \_ [lvs] root 19967 0.0 0.0 0 0 pts/3 D+ 13:29 0:00 \_ [lvs]
Created attachment 904903 [details] sysrq - blocked calls on stuck qemu + two lvs
Couple of options, 1. Try with the latest 6.11.X kernel 2. Attempt with a 6.12-rcX kernel, but it's a little early in that series 3. Do a git bisect to determine if an offending commit in the kernel can be identified.