Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 558396 - =app-emulation/qemu-2.4.0 machine crash when running virsh blockcommit
Summary: =app-emulation/qemu-2.4.0 machine crash when running virsh blockcommit
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo QEMU Project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-22 08:00 UTC by Christian Roessner
Modified: 2015-09-07 05:50 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
backup-qemu-live.sh (backup-qemu-live.sh,1.98 KB, text/plain)
2015-08-22 08:01 UTC, Christian Roessner
Details
Current kernel configuration (config-4.1.4-mon+bgw,99.47 KB, text/plain)
2015-08-22 08:02 UTC, Christian Roessner
Details
libvirt XML (mx.roessner-net.de.xml,3.38 KB, text/xml)
2015-08-22 08:05 UTC, Christian Roessner
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Roessner 2015-08-22 08:00:11 UTC
I have written a script that should backup all virtual KVM guests at night live. For this I create a external snapshot and cp with sparse the base image to a backup store. After that I commit changed data from the snapshot back into the base image and remove the snapshot, if the last action succeeded,

Unfortunately this does not work. Neither with the stable nor with the unstable ebuild built versions.

When doing the blockcommit thing, the machine suddenly dies. I could not find out, if the kernel killed it, nor if it seg faulted. All I got in journaled is:

Aug 22 05:47:17 mon unknown[514]: <audit-2501> pid=514 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm resrc=cgroup reason=allow vm="mx.roessner-net.de" uuid=b942baf4-de75-
7086-854d-bfc542b4ec6d cgroup="/sys/fs/cgroup/devices/machine.slice/machine-qemu\x2dmx.roessner\x2dnet.de.scope/" class=path path="/var/lib/libvirt/images/mx.roessner-net.de.i
mg" rdev=? acl=rw exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'
Aug 22 05:47:19 mon qemu-system-x86[1084]: <audit-1701> auid=4294967295 uid=77 gid=77 ses=4294967295 pid=1084 comm="qemu-system-x86" exe="/usr/bin/qemu-system-x86_64" sig=6
Aug 22 05:47:19 mon kernel: grsec: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/bin/qemu-system-x86_64[qemu-system-x86:1084] uid/euid:77/77 gid/egid:77/77, parent /usr/lib64/systemd/systemd[systemd:1] uid/euid:0/0 gid/egid:0/0
Aug 22 05:47:19 mon libvirtd[514]: libvirt version: 1.2.18
Aug 22 05:47:19 mon libvirtd[514]: internal error: End of file from monitor
Aug 22 05:47:20 mon lldpd[636]: error while receiving frame on vnet3: Network is down
Aug 22 05:47:20 mon kernel: br0: port 4(vnet3) entered disabled state
Aug 22 05:47:20 mon kernel: device vnet3 left promiscuous mode
Aug 22 05:47:20 mon kernel: br0: port 4(vnet3) entered disabled state
Aug 22 05:47:20 mon unknown: <audit-1700> dev=vnet3 prom=0 old_prom=256 auid=4294967295 uid=77 gid=77 ses=4294967295
Aug 22 05:47:20 mon systemd-machined[899]: Machine qemu-mx.roessner-net.de terminated.
Aug 22 05:47:21 mon unknown[514]: <audit-2500> pid=514 uid=0 auid=4294967295 ses=4294967295 msg='virt=kvm op=stop reason=failed vm="mx.roessner-net.de" uuid=b942baf4-de75-7086-854d-bfc542b4ec6d vm-pid=-1 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? res=success'

So you also see that this is on Gentoo hardened (grsec enabled, no RBAC in use) and systemd.

I see sig=6, SIGABRT, but who or why was it called?

Interesting is that while I copy away the image, free memory gets less and less. Until all 48GB RAM are completely cached. I know, normally this would be okay, but some days ago, the same virtual machine died totally unexpacted (while no backup scripts did even exist at this time) and the physical server also had cached all memory and even Zabbix had triggered an action and told me the machine would get out of memory. So my best guess is that this belongs together. Whenn the machine crashed several days ago, the server had 24GB of RAM. I thought RAM is too small and I doubled the memory. Now I have the same situation. And machine crashes again.

So this bug might be difficult, is it? Could be kernel, libvirt or qemu or some combination out of it.

My best guess: qemu

Why? Maybe some bad memory allocation?

As a current workaround I flush cached memory twice a day with:

echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches

With this bug open, I do not have a working live backup solution

Reproducible: Always

Steps to Reproduce:
1. Use Gentoo hardened stable kernel
2. Latest libvirt
3. Latest qemu
4. Create a script as attached
5. Create some test guests.
6. Run the script
Actual Results:  
It MAY happen that blockcommit dies. Some guests work, some don't. And it is totally random, if it works or not. You can not say that one guest always fails to backup. Sometimes it works.

Expected Results:  
Working live backup.

emerge --info hardened-sources libvirt qemu
Portage 2.2.20.1 (python 2.7.9-final-0, hardened/linux/amd64/no-multilib, gcc-4.8.4, glibc-2.20-r2, 4.1.4-hardened x86_64)
=================================================================
                         System Settings
=================================================================
System uname: Linux-4.1.4-hardened-x86_64-Intel-R-_Xeon-R-_CPU_L5520_@_2.27GHz-with-gentoo-2.2
KiB Mem:    49453536 total,  36320580 free
KiB Swap:    2097148 total,   2092188 free
Timestamp of repository gentoo: Fri, 21 Aug 2015 21:15:01 +0000
sh bash 4.3_p39
ld GNU ld (Gentoo 2.24 p1.4) 2.24
ccache version 3.1.9 [enabled]
app-shells/bash:          4.3_p39::gentoo
dev-lang/perl:            5.20.2::gentoo
dev-lang/python:          2.7.9-r1::gentoo, 3.4.1::gentoo
dev-util/ccache:          3.1.9-r4::gentoo
dev-util/cmake:           3.2.2::gentoo
dev-util/pkgconfig:       0.28-r2::gentoo
sys-apps/baselayout:      2.2::gentoo
sys-apps/openrc:          0.17::gentoo
sys-apps/sandbox:         2.6-r1::gentoo
sys-devel/autoconf:       2.69::gentoo
sys-devel/automake:       1.15::gentoo
sys-devel/binutils:       2.24-r3::gentoo
sys-devel/gcc:            4.8.4::gentoo
sys-devel/gcc-config:     1.7.3::gentoo
sys-devel/libtool:        2.4.6::gentoo
sys-devel/make:           4.1-r1::gentoo
sys-kernel/linux-headers: 3.18::gentoo (virtual/os-headers)
sys-libs/glibc:           2.20-r2::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://rsync.europe.gentoo.org/gentoo-portage
    priority: -1000

x-portage
    location: /usr/local/portage
    masters: gentoo
    priority: 0

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/easy-rsa /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.6/ext-active/ /etc/php/cgi-php5.6/ext-active/ /etc/php/cli-php5.6/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--keep-going --with-bdeps=y --binpkg-respect-use=y --binpkg-changed-deps=y --usepkg=y --rebuilt-binaries=y --rebuilt-binaries-timestamp=20140405050000"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs ccache compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://de-mirror.org/gentoo/ rsync://de-mirror.org/gentoo/"
LANG="en_US.utf8"
LC_ALL="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j17"
PKGDIR="/export/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
USE="acl adns aio amd64 bacula-clientonly bacula-console bash-completion berkdb bindist btrfs bzip2 caps cli cracklib crypt curl cxx device-mapper dri gdbm hardened iconv ipv6 justify logrotate loop-aes lzo mmap mmx mmxext modules ncurses nls nptl nscd ntp openmp openssl pam pax_kernel pcre pie readline seccomp session sse sse2 ssl ssp systemd tcpd threads unicode urandom vim-syntax xattr xtpax zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog aggregation cgroups contextswitch cpu cpufreq curl curl_json curl_xml disk email entropy ethstat exec filecount fscache hddtemp ipmi iptables logfile multimeter netlink network nfs nginx ntpd numa openvpn ping postgresql processes protocols python sensors snmp uptime users uuid" CPU_FLAGS_X86="mmx sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="de en" NGINX_MODULES_HTTP="access auth_basic autoindex browser charset dav empty_gif fastcgi geo gzip headers_more limit_conn limit_req map memcached proxy referer rewrite scgi spdy split_clients ssi upstream_ip_hash userid uwsgi" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" QEMU_SOFTMMU_TARGETS="x86_64 i386" QEMU_USER_TARGETS="x86_64 i386" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON

=================================================================
                        Package Settings
=================================================================

sys-kernel/hardened-sources-4.0.8::gentoo was built with the following:
USE="symlink -build -deblob"


sys-kernel/hardened-sources-4.1.4::gentoo was built with the following:
USE="symlink -build -deblob"


app-emulation/libvirt-1.2.18-r1::gentoo was built with the following:
USE="audit caps fuse iscsi libvirtd lvm lxc macvtap nfs nls numa parted pcap qemu sasl systemd udev vepa -apparmor -avahi -firewalld -glusterfs -openvz -phyp -policykit -rbd (-selinux) -uml -virt-network -virtualbox (-wireshark-plugins) -xen"


app-emulation/qemu-2.4.0::gentoo was built with the following:
USE="aio caps curl fdt filecaps jpeg lzo ncurses nls numa pin-upstream-blobs png python sasl seccomp spice threads tls uuid vhost-net vnc xattr -accessibility -alsa -bluetooth -debug -glusterfs -gtk -gtk2 -infiniband -iscsi -nfs -opengl -pulseaudio -rbd -sdl -sdl2 (-selinux) -smartcard -snappy -ssh -static -static-softmmu -static-user -systemtap -tci -test -usb -usbredir -vde -virtfs -vte -xen -xfs" PYTHON_TARGETS="python2_7" QEMU_SOFTMMU_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -cris -lm32 (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -moxie -or32 (-ppc) (-ppc64) -ppcemb -s390x -sh4 -sh4eb (-sparc) -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -armeb -cris (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 (-ppc) (-ppc64) -ppc64abi32 -s390x -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -unicore32"

>>> Attempting to run pkg_info() for 'app-emulation/qemu-2.4.0'
Using:
  app-emulation/spice-protocol-0.12.3
  sys-firmware/ipxe-1.0.0_p20130925
  sys-firmware/seabios-1.7.5
    USE=binary
  sys-firmware/vgabios-0.7a
Comment 1 Christian Roessner 2015-08-22 08:01:37 UTC
Created attachment 409828 [details]
backup-qemu-live.sh

This script backups KVM virtual machines to a different location.
Comment 2 Christian Roessner 2015-08-22 08:02:57 UTC
Created attachment 409830 [details]
Current kernel configuration

This is my current kernel configuration
Comment 3 Christian Roessner 2015-08-22 08:05:13 UTC
Created attachment 409840 [details]
libvirt XML

XML descritpion of the VM that died last night.
Comment 4 Christian Roessner 2015-08-22 08:31:09 UTC
I just think about the kenel option numa_balancing=1, if this could lead to such problems...
Comment 5 Christian Roessner 2015-08-22 22:34:23 UTC
Did some long time tests today with grsecurity totally disabled. Problem occured again. So this seems to be an issue with qemu.
Comment 6 Christian Roessner 2015-08-24 10:57:41 UTC
More information, qemu aborts the machine as you can see in the logs:

rns root@mon  ~ # virsh blockcommit mx.roessner-net.de vda --wait --active --verbose --pivot
Block commit: [ 85 %]error: failed to query job for disk vda
error: Unable to read from monitor: Connection reset by peer

2015-08-23 16:20:28.495+0000: starting up libvirt version: 1.2.18, qemu version: 2.4.0
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name mx.roessner-net.de -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu qemu64,+kvm_pv_eoi -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid b942baf4-de75-7086-854d-bfc542b4ec6d -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mx.roessner-net.de.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive file=/var/backups/snapshots/backup-snapshot-mx.roessner-net.de.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=23 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:5f:78:c0,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/mx.roessner-net.de.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:1 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
Co-routine re-entered recursively
2015-08-23 16:27:05.697+0000: shutting down
Comment 7 Christian Roessner 2015-08-26 10:21:57 UTC
I opened a bug report upstream:

https://bugs.launchpad.net/qemu/+bug/1488901
Comment 8 Christian Roessner 2015-08-28 07:12:13 UTC
Nug was fixed upstream


commit e424aff5f307227b1c2512bbb8ece891bb895cef
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Aug 13 10:41:50 2015 +0200

   mirror: Fix coroutine reentrance


I tested the master branch for more than 24h now and I can confirm the problem is gone.
Comment 9 SpanKY gentoo-dev 2015-09-07 05:50:26 UTC
fix is included in the 2.4.0-r1 bump:
http://gitweb.gentoo.org/repo/gentoo.git/commit/?id=fec667228a95981586716b7d25004c4d706943e2