Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 460244 - app-emulation/qemu-1.4.0 - segfault in librbd::flush(librbd::ImageCtx*) () from /usr/lib64/librbd.so.1
Summary: app-emulation/qemu-1.4.0 - segfault in librbd::flush(librbd::ImageCtx*) () fr...
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Doug Goldstein (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-03-04 10:50 UTC by Denis Kaganovich
Modified: 2013-05-30 02:20 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
rbd-close-notopened.patch (rbd-close-notopened.patch,512 bytes, patch)
2013-03-26 18:40 UTC, Denis Kaganovich
Details | Diff
rbd-notopened.patch (rbd-notopened.patch,732 bytes, patch)
2013-03-26 23:38 UTC, Denis Kaganovich
Details | Diff
blk-format-noclose.patch (blk-format-noclose.patch,680 bytes, patch)
2013-03-27 08:07 UTC, Denis Kaganovich
Details | Diff
bdrv-noclose.patch (bdrv-noclose.patch,741 bytes, patch)
2013-03-27 09:41 UTC, Denis Kaganovich
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Kaganovich 2013-03-04 10:50:20 UTC
app-emulation/qemu-1.4.0 -drive format=rbd do segfault on startup in all cache combinations. format=raw for same rbd image is good in all combinations too (old-style :rbd_cache=true and cache=writeback - if this is even work with this version, or no cache).

ceph is 1.56.3 (just renamed 1.56.1 ebuild).

Reproducible: Always
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2013-03-06 16:38:17 UTC
1) Please post your `emerge --info' output in a comment.
2) Please post the command you ran and the output.
3) Please try to obtain a gdb backtrace of the failing run.
Comment 2 Denis Kaganovich 2013-03-12 16:17:36 UTC
Portage 2.1.11.54 (default/linux/amd64/13.0, gcc-4.7.2, glibc-2.16.0, 3.8.0 x86_64)
=================================================================
System uname: Linux-3.8.0-x86_64-Intel-R-_Xeon-R-_CPU_E5-2620_0_@_2.00GHz-with-gentoo-2.2
KiB Mem:    66017796 total,    522636 free
KiB Swap:  125685472 total, 125685472 free
Timestamp of tree: Tue, 05 Mar 2013 00:30:01 +0000
ld GNU ld (GNU Binutils) 2.23.1
app-shells/bash:          4.2_p42
dev-lang/python:          2.7.3-r3, 3.2.3-r2
dev-util/cmake:           2.8.10.2-r1
dev-util/pkgconfig:       0.28
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.6
sys-devel/autoconf:       2.69
sys-devel/automake:       1.11.6, 1.12.6, 1.13.1
sys-devel/binutils:       2.23.1
sys-devel/gcc:            4.6.3, 4.7.2-r1
sys-devel/gcc-config:     1.8
sys-devel/libtool:        2.4.2
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.8 (virtual/os-headers)
sys-libs/glibc:           2.16.0
Repositories: gentoo raw x-portage x-local-portage
ACCEPT_KEYWORDS="amd64 x86 ~amd64 ~x86"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-w -O3 -Ofast -pipe -mtune=native -march=native -mfpmath=both -fexcess-precision=fast -fpeel-loops -fomit-frame-pointer -ftree-loop-linear -minline-stringops-dynamically -maccumulate-outgoing-args -fivopts -funroll-loops -ftracer -fbranch-target-load-optimize2 -fsection-anchors -fmodulo-sched -fmodulo-sched-allow-regmoves -freschedule-modulo-scheduled-loops -fgcse-sm -fgcse-las -fsee -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fvect-cost-model -floop-optimize -fgraphite-identity -fgcse-after-reload -fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns -ftree-vectorize -funswitch-loops"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/sandbox.d /etc/terminfo"
CPPFLAGS=""
CXXFLAGS="-w -O3 -Ofast -pipe -mtune=native -march=native -mfpmath=both -fexcess-precision=fast -fpeel-loops -fomit-frame-pointer -ftree-loop-linear -minline-stringops-dynamically -maccumulate-outgoing-args -fivopts -funroll-loops -ftracer -fbranch-target-load-optimize2 -fsection-anchors -fmodulo-sched -fmodulo-sched-allow-regmoves -freschedule-modulo-scheduled-loops -fgcse-sm -fgcse-las -fsee -ftree-loop-distribution -ftree-loop-im -ftree-loop-ivcanon -fvect-cost-model -floop-optimize -fgraphite-identity -fgcse-after-reload -fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns -ftree-vectorize -funswitch-loops"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://10.200.200.206/gentoo-portage/ http://ftp.byfly.by/pub/gentoo-distfiles/ http://linux.solo.by/gentoo-distfiles/ http://distfiles.gentoo.org"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--hash-style=gnu -Wl,--sort-common"
MAKEOPTS="-j12 -s"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_EXTRA_OPTS="--exclude=/metadata/cache --whole-file --no-compress --inplace --compress-level=1"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/raw /usr/local/portage /mnt/ceph/local-portage"
SYNC="rsync://10.200.200.206/gentoo-portage/"
USE="acl acpi amd64 berkdb build-kernel bzip2 cli cracklib crypt custom-arch custom-cflags cxx dri embed-hardware extensions fortran gdbm gpm graphite iconv idn ipv6 largepages libatomic lxc mmx modules mudflap multilib multitarget ncurses nls nptl openmp pcre profile radosgw rbd readline rt session sse sse2 ssl subversion tcmalloc threads tmem unicode update-boot urandom xz zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias access_compat auth_digest" APACHE2_MPMS="event" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" GRUB_PLATFORMS="pc coreboot" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, USE_PYTHON
Comment 3 Denis Kaganovich 2013-03-12 16:28:49 UTC
A bit stripped command:

/usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1:rbd_cache=false,index=0,if=virtio,
media=disk,aio=native,cache=none -m 1000

More stripped:

/usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1 -m 1000


Same result - segfault. Changing to format=raw working fine.
Same with qemu 1.2.2 - fine (I just upgrade working system).
Comment 4 Denis Kaganovich 2013-03-12 16:59:40 UTC
Backtrace. I not familiar with gdb, so: with "-march=corei7 -O2 -ggdb" and USE=debug there are no debug anymore (why?). But some useful info I got:

#0  0x00007ffff7413768 in librbd::flush(librbd::ImageCtx*) () from /usr/lib64/librbd.so.1
#1  0x00005555555e3027 in ?? ()
#2  0x00005555555e30e0 in ?? ()
#3  0x000055555561444a in ?? ()
#4  0x00007ffff3d0c470 in ?? () from /lib64/libc.so.6
#5  0x00007fffffffb4f0 in ?? ()
#6  0x0000000000000000 in ?? ()


Please, tell me know if you want more debug info and how to get it (cflags).
Comment 5 Denis Kaganovich 2013-03-12 17:38:48 UTC
OK, FEATURES=nostrip, -O0, but:

Reading symbols from /usr/bin/qemu-system-x86_64...done.
(gdb) run
Starting program: /usr/bin/qemu-system-x86_64 -enable-kvm -cpu host -net nic,model=virtio,macaddr=DE:AD:BE:EF:FF:02 -net tap,ifname=tap2 -drive format=rbd,file=rbd:rbd/vm2_1 -m 1000
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
Warning:
Cannot insert breakpoint -1.
Error accessing memory address 0x3990e2: Input/output error.
Comment 6 Denis Kaganovich 2013-03-12 19:03:53 UTC
Current app-emulation/qemu-9999 - same.
Comment 7 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-03-16 07:05:27 UTC
More detail on the stack trace would be nice, please take a look at

http://www.gentoo.org/proj/en/qa/backtraces.xml

on how do this properly such that most of the ?? disappear in the backtrace.
Comment 8 Denis Kaganovich 2013-03-20 22:01:25 UTC
(In reply to comment #7)
> More detail on the stack trace would be nice, please take a look at
> 
> http://www.gentoo.org/proj/en/qa/backtraces.xml
> 
> on how do this properly such that most of the ?? disappear in the backtrace.

Yes, I follow all what I find include this. Also I found same reports about trace problem without solution. Just still to try same (gdb-friendly) build of glibc (and, possible, something more). If I will have more results - I say.
Comment 9 Denis Kaganovich 2013-03-26 01:14:24 UTC
I still unsure in point of failure, but sure found bad commit:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58

On other machine, with debugging glibc & latest rbd, but same rbd image I have backtrace (but same partial):
#1  0x00007ffff7406d19 in rbd_close (image=<optimized out>) at librbd/librbd.cc:676
- so, commit number is more precise info. I will try find or fix more.
Comment 10 Doug Goldstein (RETIRED) gentoo-dev 2013-03-26 17:14:42 UTC
(In reply to comment #9)
> I still unsure in point of failure, but sure found bad commit:
> http://git.qemu.org/?p=qemu.git;a=commitdiff;
> h=f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58
> 
> On other machine, with debugging glibc & latest rbd, but same rbd image I
> have backtrace (but same partial):
> #1  0x00007ffff7406d19 in rbd_close (image=<optimized out>) at
> librbd/librbd.cc:676
> - so, commit number is more precise info. I will try find or fix more.

Thanks Denis. That gives me a lot to work with. I unfortunately don't have an RBD test setup but will try to find a fix for you and have you give it a whirl
Comment 11 Denis Kaganovich 2013-03-26 18:40:45 UTC
Created attachment 343334 [details, diff]
rbd-close-notopened.patch

I think, this place can make problems for other storages, but since I use only rbd (exclude plain files), attached simple patch for rbd only. Problem place (from commit f500a6d3c2b9ef0bb06d0080d91d8ed3c1d68f58):

+    if (bs->file != file) {
+        bdrv_delete(file);

It try to close temporary semi-opened image without true opened by backend (qemu_rbd_close() without qemu_rbd_open() here).

PS I in doubts about upstream reporting... IMHO preferrable more complex tests for other backends.
Comment 12 Denis Kaganovich 2013-03-26 18:44:08 UTC
(In reply to comment #10)

> Thanks Denis. That gives me a lot to work with. I unfortunately don't have
> an RBD test setup but will try to find a fix for you and have you give it a
> whirl

Not too lot. If you don't want to test, for example glusterfs, shipdog, etc too ;)
Comment 13 Denis Kaganovich 2013-03-26 23:38:16 UTC
Created attachment 343372 [details, diff]
rbd-notopened.patch

Sorry, first tested with cache=unsafe only. Also simpler.

PS If somebody want to dig in other place - take a look into clock.c: bdrv_open_common() - "bs->opaque = ..." & "bdrv_swap(file, bs);".
Comment 14 Denis Kaganovich 2013-03-26 23:39:25 UTC
Not "clock.c", but "block.c".
Comment 15 Denis Kaganovich 2013-03-27 08:07:41 UTC
Created attachment 343386 [details, diff]
blk-format-noclose.patch

This is for block.c & looks better (I just in doubts about possible leaks, but IMHO clean).
Comment 16 Denis Kaganovich 2013-03-27 09:41:07 UTC
Created attachment 343392 [details, diff]
bdrv-noclose.patch

More respect next logic.
Comment 17 Denis Kaganovich 2013-05-06 10:24:21 UTC
This is fixed in git, but on other hand, git code still not working for me (few hours ago - just crazy system inside VM), so this (or backported git variant) patch and http://git.qemu.org/?p=qemu.git;a=patch;h=dc7588c1eb3008bda53dde1d6b890cd299758155 still good for 1.4.
Comment 18 Doug Goldstein (RETIRED) gentoo-dev 2013-05-06 13:08:36 UTC
(In reply to comment #17)
> This is fixed in git, but on other hand, git code still not working for me
> (few hours ago - just crazy system inside VM), so this (or backported git
> variant) patch and
> http://git.qemu.org/?p=qemu.git;a=patch;
> h=dc7588c1eb3008bda53dde1d6b890cd299758155 still good for 1.4.

I believe I did include that in 1.4.1 so once I bump that we should be set. Hopefully I'll be able to do that shortly but newborn twins has been eating a lot of my spare time for Gentoo.
Comment 19 Doug Goldstein (RETIRED) gentoo-dev 2013-05-07 15:50:19 UTC
Should be fixed in master with the qemu 1.4.1 bump.
Comment 20 Doug Goldstein (RETIRED) gentoo-dev 2013-05-07 15:50:31 UTC
(In reply to comment #19)
> Should be fixed in master with the qemu 1.4.1 bump.

s/master/CVS/
Comment 21 Denis Kaganovich 2013-05-09 15:52:26 UTC
Congratulations! ;)

But just FYI.
Now (minutes ago) I test git qemu snapshot - format=rbd is booting, but do random deadlocks/stuck (mostly on writes), format=raw - still good (both with rbd_cache=false, cache=none). Sometimes deadlocks per task (and multiple), sometimes - whole system. Ceph (libceph) - shapshot too, then it is not source compatibility. And my patch IMHO not correct for git changes. I will try to re-patch it only on release, if still needed (but I fix it too empiric, I prefer to keep it to somebody who meditate on this code long time). 

For 1.4.1 all same to 1.4.0.
Comment 22 Doug Goldstein (RETIRED) gentoo-dev 2013-05-30 02:20:44 UTC
Denis,

Give qemu-1.4.2 a shot, had at a fix related to unlocking a mutex in the error case go in.

0486c27: nbd: unlock mutex in nbd_co_send_request() error path (Stefan Hajnoczi)