Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 723642 - >=sys-kernel/gentoo-sources-5.0.0 windows 10 kvm guest random freezes
Summary: >=sys-kernel/gentoo-sources-5.0.0 windows 10 kvm guest random freezes
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-18 07:57 UTC by Garry Filakhtov
Modified: 2020-05-19 03:38 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Garry Filakhtov 2020-05-18 07:57:17 UTC
This was long coming, I just needed a lot of time to ensure there is no hardware issues or any kind of misconfiguration on my end, before opening up a bug.

I have Intel X299 platform and using it to run Windows 10 virtual machine with PCI pass-through. I use NVMe SSD (Samsung EVO 970 Plus), PCIe USB 3.0 (StarTech PEXUSB3S3GE) adapter and GPU (nVidia GeForce 1650) pass-through to get best possible performance and isolation from host OS. 

I have been running on 4.19 kernel without any issues, but 5.4 got amd64 stable and I have switched. After doing so, I have started experiencing random guest freezes, happening anywhere immediately after boot all the way up to multiple hours of usage without a freeze. When the freeze occurs, guest machine will completely stop responding to input, ping, etc. Host machine works fine and I can connect to qemu socket without any problems. I am running on QEMU 4.2.0.

Freeze can continue anywhere from 1 minute up to 5 minutes, and eventually VM is recovering and working properly afterwards, up until the next freeze. Inspecting dmesg on the host machine reveals no any relevant entries.

Problem appears regardless of the type of workflow performed. It can just freeze on the desktop, in the web browser or in the GPU benchmark. I was playing music on the system and just before freezing, sound starts to drop/glitch and then goes completely silent.

Windows event viewer is of course as useful as a fridge on the North pole before the climate change :D (pardon my pun), meaning no entries are produced during the freeze, and there is actually a gap between written entries for however long the freeze took.

So far, I have tested a good variety of Kernel versions:

  [1]   linux-4.19.120-gentoo <- works fine
  [2]   linux-4.20.17-gentoo <- works fine
  [3]   linux-5.0.0-gentoo <- randomly freezes as described
  [4]   linux-5.0.21-gentoo <- randomly freezes as described
  [5]   linux-5.1.21-gentoo <- can't even boot guest, getting freeze during very early boot
  [6]   linux-5.2.20-gentoo <- qemu won't even start, complaining about KVM suberror 1
  [7]   linux-5.3.18-gentoo <- randomly freezes as described
  [8]   linux-5.4.38-gentoo <- randomly freezes as described

My takeaway here is that something went wrong in the 5.0.0 and was never fixed since.

I have not yet tried to bisect the GIT source, but might give it a go, time permitting.

I am using naked qemu-system-x86_64 command, to rule out virt-manager problems. PCIe devices are attached via separate pcie-root-port devices. Using OVMF UEFI (sys-firmware/edk2-ovmf-201905) for booting with Secure Boot enabled (disabling Secure Boot makes no difference). I have also did clean Windows 10 install to rule out any issues with the guest OS itself, but problem persisted. I have tried using Windows-provided GPU drivers as well as the latest from nVidia. Using "host" CPU for qemu.

There is a similar problem reported on Reddit too, the solution was to downgrade: https://www.reddit.com/r/VFIO/comments/b1xx0g/windows_10_qemukvm_freezes_after_50x_kernel_update/

Opening this up now to get a bit of an awareness of this problem and ask for any ideas how else I can debug this problem to get to the root cause of this.

Reproducible: Always

Steps to Reproduce:
1. Using gentoo-sources >= 5.0.0
2. Running qemu-system-x86_64 with PCIe-passthrough and Windows 10 guest OS
Actual Results:  
Guest system randomly freezes for 1 to 5 minutes.

Expected Results:  
Guest system works properly.

Motherboard: ASUS WS X299 SAGE
CPU: Intel i9-9940x
Guest GPU: nVidia GTX 1650
Host GPU: AMD Radeon PRO WX 3100
RAM: 64Gb (4x16Gb) DDR4 2666MHz
SSD: Samsung 970 EVO Plus
PCIe adapter: StarTech PEXUSB3S3GE 3xUSB3.0 + USB Realtek Gigabit network combo adapter
Guest OS: Windows 10 Professional (1909)
QEMU version: 4.2.0

qemu options used:
-name Microsoft Windows 10 Professional
-M q35,kernel_irqchip=on,vmport=off,accel=kvm,mem-merge=off
-nodefaults
-display none
-vga none
-net none
-nographic
-monitor unix:/run/qemu/win10.sock,server,nowait
-pidfile /run/qemu/win10.pid
-cpu host,kvm=off
-smp sockets=1,cores=6,threads=2
-m size=16G
-drive if=pflash,format=raw,readonly,file=/usr/share/edk2-ovmf/OVMF_CODE.secboot.fd
-drive if=pflash,format=raw,file=/usr/share/edk2-ovmf/OVMF_VARS.secboot.fd
-rtc base=localtime
-device pcie-root-port,id=port0.0,bus=pcie.0,chassis=0,slot=0,addr=1.0
-device vfio-pci,host=19:0.0,multifunction=on,bus=port0.0,addr=0.0
-device vfio-pci,host=19:0.1,bus=pcie.0,bus=port0.0,addr=0.1
-device pcie-root-port,id=port0.2,bus=pcie.0,chassis=0,slot=2
-device vfio-pci,host=1a:0.0,bus=port0.2
-device pcie-root-port,id=port0.5,bus=pcie.0,chassis=0,slot=5
-device vfio-pci,host=b3:0.0,bus=port0.5
Comment 1 Jonas Stein gentoo-dev 2020-05-18 23:51:03 UTC
It is sad to read that you have problems with the software. The situation seems to be a bit more complicate and requires some analysis.
We can not help you efficiently via bug tracker. The bug tracker aims rather on specific problems in .ebuilds and less on individual systems. 

I have had very good experience on the gentoo IRC [1] with questions like this. Of course there are also forums and mailing lists [2,3].
I hope you understand, that I will close the bug here therefore and wish you good luck on one of the mentioned channels [4].
Please reopen the ticket in order to provide an indication for an specific error in an ebuild or any gentoo related product.

[1] https://www.gentoo.org/get-involved/irc-channels/
[2] https://forums.gentoo.org/
[3] https://www.gentoo.org/get-involved/mailing-lists/all-lists.html
[4] https://www.gentoo.org/support/
Comment 2 Garry Filakhtov 2020-05-19 03:38:59 UTC
I understand the situation and will be reporting this to an upstream. However, I would say >=gentoo-sources-5.0 should (probably) be masked by ~amd64 on the KVM issue reported.