Summary: | sys-kernel/gentoo-sources: Live-Migration fails on XenServer | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Gaetan <gaetan> |
Component: | Current packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED NEEDINFO | ||
Severity: | normal | CC: | hydrapolic |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=564276 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Gaetan
2019-03-05 16:46:20 UTC
Do you run gentoo-sources on them? Which versions have you tested? We have tested multiple versions since 4.x. Latest tested is 4.20.4. Same bug on all of them. (In reply to Gaetan from comment #0) > We have been using Gentoo for years now on a large number of VMs/Bare-Metal. > > But for the last months/years, we have been facing a bug. This is a bit confusing. You've been using it for years, but the bug seems to be present for years too? It's not a fresh thing, hasn't it been there since the beginning? Which favor of kernel do you use? Gentoo-sources / vanilla-sources / ... ? The bug was NOT present a long time ago (>2y). And is present since then. We we convinced this was coming from our infrastructure or from Xen until now. But our latest tests with other OS tend to prove we were wrong. That's why we are opening this bug @gentoo now. We are using gentoo-sources flavor. Do you remember the last kernel version it worked on? I could not tell which was latest working kernel. Let me share some info we already shared with Citrix. 1/ Here is one on the Motion "strange" this we could observe. Dmesg result show "weird" dates : [Wed Mar 6 12:54:15 2019] Freezing user space processes ... (elapsed 0.001 seconds) done. [Wed Mar 6 12:54:15 2019] OOM killer disabled. [Wed Mar 6 12:54:15 2019] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [Wed Mar 6 12:54:15 2019] suspending xenstore... [Sat Jun 29 15:14:46 2019] xen:events: Xen HVM callback vector for event delivery is enabled [Sat Jun 29 15:14:46 2019] Xen Platform PCI: I/O protocol version 1 [Sat Jun 29 15:14:46 2019] xen:grant_table: Grant tables using version 1 layout [Sat Jun 29 15:14:46 2019] xen: --> irq=9, pirq=16 [Sat Jun 29 15:14:46 2019] xen: --> irq=8, pirq=17 [Sat Jun 29 15:14:46 2019] xen: --> irq=12, pirq=18 [Sat Jun 29 15:14:46 2019] xen: --> irq=1, pirq=19 [Sat Jun 29 15:14:46 2019] xen: --> irq=6, pirq=20 [Sat Jun 29 15:14:46 2019] xen: --> irq=4, pirq=21 [Sat Jun 29 15:14:46 2019] xen: --> irq=7, pirq=22 [Sat Jun 29 15:14:46 2019] xen: --> irq=23, pirq=23 [Sat Jun 29 15:14:46 2019] xen: --> irq=30, pirq=24 [Sat Jun 29 15:14:46 2019] usb usb1: root hub lost power or was reset [Sat Jun 29 15:14:46 2019] ata2.01: configured for MWDMA2 [Sat Jun 29 15:14:46 2019] usb 1-2: reset full-speed USB device number 2 using uhci_hcd [Sat Jun 29 15:14:46 2019] OOM killer enabled. [Sat Jun 29 15:14:46 2019] Restarting tasks ... done. [Sat Jun 29 15:14:46 2019] Setting capacity to 41943040 2/ Sometimes, we can live-move a VM 100 times without a crash. Sometimes it crashes instantly. We could not determine why we got either behaviours. 3/ Sometimes, a VM crash a few hours/days after motion. 4/ Most of the times, the crash is a "kernel panic". Have you tried increasing the grant table size? https://wiki.gentoo.org/wiki/Xen#Xen_domU_hanging_with_kernel_4.3.2B Thanks for the suggestion Tomáš. Description is more or less what we are observing. But, documentation is unclear. Should gnttab_max_frames be set on Dom0 or DomU side ? I suppose Dom0 (Citrix XenServer side). By the way, I could find this documentation which explains more or less the same issue on Debian side with Xen (not XenServer's) : https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=880554 Gaëtan It was written for Xen and the setting depends on which version is used. For xen 4.9 it was global, but in 4.10+ it's a per-domU setting. Please consult the xenserver docs. XenServer (as of 7.6) is using Xen 4.7. We will be testing the "grant table size" tuning on Dom0's side and get back to you if we have any interesting feedback. Let me now if you have any other lead to explore until then. Please reopen if the problem persists. Hello, After some long time of migration & upgrades, I can confirm that rising : gnttab_max_frames to 256 does not solve the issue. We are still impacted with random VM crashes when live-moving Gentoo VMs across XenServer members of a same pool... Is this still an issue? |