app-emulation/qemu-1.4.0 -drive format=rbd do segfault on startup in all cache combinations. format=raw for same rbd image is good in all combinations too (old-style :rbd_cache=true and cache=writeback - if this is even work with this version, or no cache). ceph is 1.56.3 (just renamed 1.56.1 ebuild). Reproducible: Always
1) Please post your `emerge --info' output in a comment. 2) Please post the command you ran and the output. 3) Please try to obtain a gdb backtrace of the failing run.
Portage 2.1.11.54 (default/linux/amd64/13.0, gcc-4.7.2, glibc-2.16.0, 3.8.0 x86_64) ================================================================= System uname: Linux-3.8.0-x86_64-Intel-R-_Xeon-R-_CPU_E5-2620_0_@_2.00GHz-with-gentoo-2.2 KiB Mem: 66017796 total, 522636 free KiB Swap: 125685472 total, 125685472 free Timestamp of tree: Tue, 05 Mar 2013 00:30:01 +0000 ld GNU ld (GNU Binutils) 2.23.1 app-shells/bash: 4.2_p42 dev-lang/python: 2.7.3-r3, 3.2.3-r2 dev-util/cmake: 2.8.10.2-r1 dev-util/pkgconfig: 0.28 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.6 sys-devel/autoconf: 2.69 sys-devel/automake: 1.11.6, 1.12.6, 1.13.1 sys-devel/binutils: 2.23.1 sys-devel/gcc: 4.6.3, 4.7.2-r1 sys-devel/gcc-config: 1.8 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.8 (virtual/os-headers) sys-libs/glibc: 2.16.0 Repositories: gentoo raw x-portage x-local-portage ACCEPT_KEYWORDS="amd64 x86 ~amd64 ~x86" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-w -O3 -Ofast -pipe -mtune=native -march=native -mfpmath=both -fexcess-precision=fast -fpeel-loops -fomit-frame-pointer -ftree-loop-linear -minline-stringops-dynamically -maccumulate-outgoing-args -fivopts -funroll-loops -ftracer -fbranch-target-load-optimize2 -fsection-anchors -fmodulo-sched -fmodulo-sched-allow-regmoves -freschedule-modulo-scheduled-loops -fgcse-sm -fgcse-las -fsee -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fvect-cost-model -floop-optimize -fgraphite-identity -fgcse-after-reload -fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns -ftree-vectorize -funswitch-loops" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo" CPPFLAGS="" CXXFLAGS="-w -O3 -Ofast -pipe -mtune=native -march=native -mfpmath=both -fexcess-precision=fast -fpeel-loops -fomit-frame-pointer -ftree-loop-linear -minline-stringops-dynamically -maccumulate-outgoing-args -fivopts -funroll-loops -ftracer -fbranch-target-load-optimize2 -fsection-anchors -fmodulo-sched -fmodulo-sched-allow-regmoves -freschedule-modulo-scheduled-loops -fgcse-sm -fgcse-las -fsee -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fvect-cost-model -floop-optimize -fgraphite-identity -fgcse-after-reload -fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns -ftree-vectorize -funswitch-loops" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://10.200.200.206/gentoo-portage/ http://ftp.byfly.by/pub/gentoo-distfiles/ http://linux.solo.by/gentoo-distfiles/ http://distfiles.gentoo.org" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu -Wl,--sort-common" MAKEOPTS="-j12 -s" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_EXTRA_OPTS="--exclude=/metadata/cache --whole-file --no-compress --inplace --compress-level=1" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/var/lib/layman/raw /usr/local/portage /mnt/ceph/local-portage" SYNC="rsync://10.200.200.206/gentoo-portage/" USE="acl acpi amd64 berkdb build-kernel bzip2 cli cracklib crypt custom-arch custom-cflags cxx dri embed-hardware extensions fortran gdbm gpm graphite iconv idn ipv6 largepages libatomic lxc mmx modules mudflap multilib multitarget ncurses nls nptl openmp pcre profile radosgw rbd readline rt session sse sse2 ssl subversion tcmalloc threads tmem unicode update-boot urandom xz zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias access_compat auth_digest" APACHE2_MPMS="event" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" GRUB_PLATFORMS="pc coreboot" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, USE_PYTHON
A bit stripped command: /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1:rbd_cache=false,index=0,if=virtio, media=disk,aio=native,cache=none -m 1000 More stripped: /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1 -m 1000 Same result - segfault. Changing to format=raw working fine. Same with qemu 1.2.2 - fine (I just upgrade working system).
Backtrace. I not familiar with gdb, so: with "-march=corei7 -O2 -ggdb" and USE=debug there are no debug anymore (why?). But some useful info I got: #0 0x00007ffff7413768 in librbd::flush(librbd::ImageCtx*) () from /usr/lib64/librbd.so.1 #1 0x00005555555e3027 in ?? () #2 0x00005555555e30e0 in ?? () #3 0x000055555561444a in ?? () #4 0x00007ffff3d0c470 in ?? () from /lib64/libc.so.6 #5 0x00007fffffffb4f0 in ?? () #6 0x0000000000000000 in ?? () Please, tell me know if you want more debug info and how to get it (cflags).
OK, FEATURES=nostrip, -O0, but: Reading symbols from /usr/bin/qemu-system-x86_64...done. (gdb) run Starting program: /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1 -m 1000 warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000 Warning: Cannot insert breakpoint -1. Error accessing memory address 0x3990e2: Input/output error.
Current app-emulation/qemu-9999 - same.
More detail on the stack trace would be nice, please take a look at http://www.gentoo.org/proj/en/qa/backtraces.xml on how do this properly such that most of the ?? disappear in the backtrace.
(In reply to comment #7) > More detail on the stack trace would be nice, please take a look at > > http://www.gentoo.org/proj/en/qa/backtraces.xml > > on how do this properly such that most of the ?? disappear in the backtrace. Yes, I follow all what I find include this. Also I found same reports about trace problem without solution. Just still to try same (gdb-friendly) build of glibc (and, possible, something more). If I will have more results - I say.
I still unsure in point of failure, but sure found bad commit: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58 On other machine, with debugging glibc & latest rbd, but same rbd image I have backtrace (but same partial): #1 0x00007ffff7406d19 in rbd_close (image=<optimized out>) at librbd/librbd.cc:676 - so, commit number is more precise info. I will try find or fix more.
(In reply to comment #9) > I still unsure in point of failure, but sure found bad commit: > http://git.qemu.org/?p=qemu.git;a=commitdiff; > h=f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58 > > On other machine, with debugging glibc & latest rbd, but same rbd image I > have backtrace (but same partial): > #1 0x00007ffff7406d19 in rbd_close (image=<optimized out>) at > librbd/librbd.cc:676 > - so, commit number is more precise info. I will try find or fix more. Thanks Denis. That gives me a lot to work with. I unfortunately don't have an RBD test setup but will try to find a fix for you and have you give it a whirl
Created attachment 343334 [details, diff] rbd-close-notopened.patch I think, this place can make problems for other storages, but since I use only rbd (exclude plain files), attached simple patch for rbd only. Problem place (from commit f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58): + if (bs->file != file) { + bdrv_delete(file); It try to close temporary semi-opened image without true opened by backend (qemu_rbd_close() without qemu_rbd_open() here). PS I in doubts about upstream reporting... IMHO preferrable more complex tests for other backends.
(In reply to comment #10) > Thanks Denis. That gives me a lot to work with. I unfortunately don't have > an RBD test setup but will try to find a fix for you and have you give it a > whirl Not too lot. If you don't want to test, for example glusterfs, shipdog, etc too ;)
Created attachment 343372 [details, diff] rbd-notopened.patch Sorry, first tested with cache=unsafe only. Also simpler. PS If somebody want to dig in other place - take a look into clock.c: bdrv_open_common() - "bs->opaque = ..." & "bdrv_swap(file, bs);".
Not "clock.c", but "block.c".
Created attachment 343386 [details, diff] blk-format-noclose.patch This is for block.c & looks better (I just in doubts about possible leaks, but IMHO clean).
Created attachment 343392 [details, diff] bdrv-noclose.patch More respect next logic.
This is fixed in git, but on other hand, git code still not working for me (few hours ago - just crazy system inside VM), so this (or backported git variant) patch and http://git.qemu.org/?p=qemu.git;a=patch;h=dc7588c1eb3008bda53dde1d6b890cd299758155 still good for 1.4.
(In reply to comment #17) > This is fixed in git, but on other hand, git code still not working for me > (few hours ago - just crazy system inside VM), so this (or backported git > variant) patch and > http://git.qemu.org/?p=qemu.git;a=patch; > h=dc7588c1eb3008bda53dde1d6b890cd299758155 still good for 1.4. I believe I did include that in 1.4.1 so once I bump that we should be set. Hopefully I'll be able to do that shortly but newborn twins has been eating a lot of my spare time for Gentoo.
Should be fixed in master with the qemu 1.4.1 bump.
(In reply to comment #19) > Should be fixed in master with the qemu 1.4.1 bump. s/master/CVS/
Congratulations! ;) But just FYI. Now (minutes ago) I test git qemu snapshot - format=rbd is booting, but do random deadlocks/stuck (mostly on writes), format=raw - still good (both with rbd_cache=false, cache=none). Sometimes deadlocks per task (and multiple), sometimes - whole system. Ceph (libceph) - shapshot too, then it is not source compatibility. And my patch IMHO not correct for git changes. I will try to re-patch it only on release, if still needed (but I fix it too empiric, I prefer to keep it to somebody who meditate on this code long time). For 1.4.1 all same to 1.4.0.
Denis, Give qemu-1.4.2 a shot, had at a fix related to unlocking a mutex in the error case go in. 0486c27: nbd: unlock mutex in nbd_co_send_request() error path (Stefan Hajnoczi)