with recent kernel, KVM live migration cause guest CPU to be stuck at 100% with 100% of stolen time. The problem arise at nearly 100% once the guest have some ram to transfer for migration (as buffer/cache). You can see it easily with top/htop (top/htop: column st) and with vmstat -S (stolen CPU ticks). vmstat -S M 1 10 report it clearly The problem have been observed here with 4.0.x kernel series used on guest. The problem doesn't happens with 3.14.x kernel series. The only solution is to reboot the guest. The problem have been reported upstream: https://bugs.launchpad.net/qemu/+bug/1494350 And there is a patch to solve this problem: http://www.spinics.net/lists/kvm/msg122175.html Since this problem is really important for virtualization environment/clusters, I think it could be great to use this patch in portage before get an upstream patched version. Reproducible: Always Steps to Reproduce: 1. use a 3.18.x or 4.0.x guest KVM kernel with qemu-2.[1-4] 2. use memory (cat /dev/vda > /dev/null to fill buffers) 3. live migration of the guest Actual Results: $ vmstat -S M 1 10 => check CPU/st at 100% Expected Results: $ vmstat -S M 1 10 => check CPU/st at 0%
this is a bug in the kernel, not qemu
I've finally find the exact problem and the problem is located on kernel host side (not guest kernel side). The fix have been reviewed and accepted: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7cae2bedcbd4680b155999655e49c27b9cf020fa Relative thread: https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04306.html Since the patch is really simple. It would be a really good thing to backport it in gentoo kernel patchset. I don't know if gentoo kernel policy authorize such backport but this bug is really a critical one. It block admins to use gentoo as a host hypervisor for kvm since you need a kernel 4.4 to avoid it and there is some technlogy as DRBD that can't be used with a recent kernel because user-space tools need specific kernel version.
(In reply to Zentoo from comment #2) > I've finally find the exact problem and the problem is located on kernel > host side (not guest kernel side). Forget to tell that it should be included in kernel 4.4. So backporting the patch to older kernel is the perfect solution.
Any version preferences for this?
Actual stable versions starting from 3.14 series will be a must actually. (so 3.14.56) I didn't expect such question :)
(In reply to Zentoo from comment #5) > > I didn't expect such question :) I don't understand this statement.
I've checked last gentoo-sources: sys-kernel/gentoo-sources-4.4.0-r1 got the patch. So it will be a gread to backport it for older version since there this patch have no dependencies and can't hurt anything else.
This bug was never present in Linux 4.4. The patch was accepted and merged well before v4.4 was tagged and released.
(In reply to Dan Moulding from comment #8) > This bug was never present in Linux 4.4. The patch was accepted and merged > well before v4.4 was tagged and released. Yes the patch is included since the first 4.4 kernel but the purpose of this bug is to backport the patch to older version of kernel (3.14/3.18/4.0/4.1 series). Why users need it ? They need it because most of virtualisation environment can't use last 4.4 kernel because they need specific kernel part version as DRBD for example. So users need to be sticked to specific kernel version. As this patch doesn't have dependencies and modify nothing except fix this specific KVM bug, it's easy to use it on any kernel version (I use it in production on 3.14 series actually).
There is no kernel in portage tree with this bug anymore.