Summary: | >=sys-fs/lvm2-2.02.49 dmevent crash on creation of snapshot volume | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Dan Goodliffe <gentoo> |
Component: | [OLD] Core system | Assignee: | Robin Johnson <robbat2> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | CC: | agk, cardoe, gentoo, Jimmy.Jazz, markus |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
dmeventd traces
ebuild to pull in my patch the patch |
Description
Dan Goodliffe
2009-10-27 19:01:16 UTC
Same issue: sys-fs/lvm2-2.02.51-r2 w/o option set to --with-dmeventd=/sbin/dmeventd and w/o flag LDFLAGS forced to -Wl,-z,now # emerge --info Portage 2.2_rc46 (default/linux/amd64/10.0, gcc-4.4.2, glibc-2.10.1-r0, 2.6.32-rc5-radeon x86_64) ================================================================= System uname: Linux-2.6.32-rc5-radeon-x86_64-AMD_Athlon-tm-_64_Processor_3200+-with-gentoo-1.12.12 Timestamp of tree: Sat, 31 Oct 2009 07:45:03 +0000 distcc 3.1 x86_64-pc-linux-gnu [enabled] app-shells/bash: 4.0_p28 dev-java/java-config: 2.1.9-r1 dev-lang/python: 2.6.4, 3.1.1-r1 dev-python/pycrypto: 2.0.1-r8 dev-util/cmake: 2.6.4-r3 sys-apps/baselayout: 1.12.12 sys-apps/sandbox: 2.2 sys-devel/autoconf: 2.13, 2.63-r1 sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2, 1.11 sys-devel/binutils: 2.20 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.30-r1 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=k8 -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /etc/splash/livecd-2007.0/1280x1024.cfg /lib/rcscripts/addons /sbin/rc /sbin/splash-functions-bl1.sh /sbin/splash-functions.sh /usr/local/share/cursors/xorg-x11/default/index.theme /usr/share/X11/xkb /usr/share/hddtemp/hddtemp.db /usr/src/linux/.config /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-march=k8 -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="assume-digests collision-protect distcc distlocks fixpackages news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unmerge-logs unmerge-orphans userfetch userpriv" GENTOO_MIRRORS="ftp://ftp.free.fr/mirrors/ftp.gentoo.org/ http://linux.rz.ruhr-uni-bochum.de/download/gentoo-mirror/ http://mirrors.sec.informatik.tu-darmstadt.de/gentoo/" LANG="fr_FR.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="fr" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/portage/local/layman/zumastor /usr/portage/local/layman/gnome /usr/portage/local/layman/science /usr/portage/local/layman/pd-overlay /usr/portage/local/layman/sunrise /usr/portage/local/layman/x11 /usr/portage/local/java /usr/portage/local/overlay" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="3dnow 3dnowext X a52 aac acl acpi alsa amd64 avahi bash-completion bluetooth bmp boost bzip2 cairo caps cli cracklib crypt cups curl dbus dga dlloader dri dts dv dvb dvdr exif expat fam fastcgi ffmpeg fftw firefox flac fontconfig fuse gd gdbm gif glib gmp gnome gnome-keyring gnutls gphoto2 gpm gsl gstreamer gtk guile hal iconv icu ieee1394 imagemagick imap imlib isdnlog jack jbig jpeg jpeg2k ladspa lame lash latex lcms libedit libnotify libsamplerate libwww lirc logrotate lua lzo mad maildir mailwrapper matroska midi mmap mmx mmxext mng modplug modules mp3 mpeg mudflap multilib musepack mysql ncurses network-cron newspr nls nptl nptlonly ogg openal openexr opengl openmp osc pam pch pcre pdf perl png posix pppd pulseaudio python qt4 quicktime raw readline reflection ruby sasl schroedinger sdl session slang smp sndfile speex spell spl sse sse2 ssl startup-notification svg sysfs sysvipc taglib tcpd theora threads tiff truetype udev unicode urandom usb userlocales v4l v4l2 vhosts vim-syntax vorbis wavpack webkit wmf wxwindows x264 xattr xcb xinetd xml xmp xorg xpm xulrunner xv xvid zeroconf zlib" ALSA_CARDS="intel8x0 usb-audio" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic auth_digest authn_anon authn_dbd authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock dbd deflate dir disk_cache env expires ext_filter file_cache filter headers ident imagemap include info log_config logio mem_cache mime mime_magic negotiation proxy proxy_ajp proxy_balancer proxy_connect proxy_http rewrite setenvif so speling status unique_id userdir usertrack vhost_alias" APACHE2_MPMS="worker" CAMERAS="ptp2" DVB_CARDS="usb-wt220u" ELIBC="glibc" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="fr" LIRC_DEVICES="devinput userspace" USERLAND="GNU" VIDEO_CARDS="radeon" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS agk: this one looks like it's your field. dan/jimmy: can one of you please capture a core dump and submit a full backtrace from it. Double check that the matching .so files are being found and pulled in. E.g. use 'ldd <file>' to check it finds all the files and check paths then confirm that timestamps show all the dm+lvm files are from the same build, and check the 'library_dir' setting in lvm.conf ('lvm dumpconfig global/library_dir'). You can also try setting environment variables like LD_WARN or LD_DEBUG ('man ld.so') before running the command to check for runtime linking problems. And try the latest version just in case it's something we hadn't noticed but have since fixed - we did make various changes to the build process recently. I've remerged lvm2-2.02.51-r2 (current latest) with FEATURES=nostrip and CFLAGS=-g... but I can't find a core file to get a back trace from. Any suggestions or something obvious I've missed? p.s. The linking and config file all seems in order. (In reply to comment #4) > I've remerged lvm2-2.02.51-r2 (current latest) with FEATURES=nostrip and > CFLAGS=-g... but I can't find a core file to get a back trace from. Any > suggestions or something obvious I've missed? Did you allow coredumps via ulimit? Check /proc/${PID_OF_DMEVENTD}/limits You'll probably have to muck with the init.d to raise the core limit. Might also be good to set the sysctl for kernel.core_pattern to have an absolute path. Every day's a school day! Ran it on the console... seemed easier... so... firebrand ~ # ulimit -u unlimited firebrand ~ # dmeventd -d <sits happy> firebrand ~ # cat /proc/`ps -C dmeventd -o %p h`/limits Limit Soft Limit Hard Limit Units ... Max core file size unlimited unlimited bytes ... firebrand ~ # lvcreate -s -L100m /dev/data/www ... firebrand ~ # dmeventd -d Segmentation fault (core dumped) firebrand ~ # cat /proc/sys/kernel/core_pattern /tmp/core #0 0xb7516b8b in _temporary_log_fn (level=6, file=0xb74e56db "commands/toolcontext.c", line=426, format=0x0) at dmeventd_snapshot.c:62 #1 0xb749c291 in print_log (level=6, file=0xb74e56db "commands/toolcontext.c", line=426, dm_errno=0, format=0xb74e5789 "Loading config file: %s") at log/log.c:216 #2 0xb7478f06 in _load_config_file (cmd=0x8052838, tag=<value optimized out>) at commands/toolcontext.c:426 #3 0xb747913b in _init_lvm_conf (cmd=0x0) at commands/toolcontext.c:457 #4 0xb747b226 in create_toolcontext (is_long_lived=0, system_dir=0x0) at commands/toolcontext.c:1118 #5 0xb74ca997 in init_lvm () at lvmcmdline.c:1177 #6 0xb74e4447 in cmdlib_lvm2_init (static_compile=0) at lvmcmdlib.c:38 #7 0xb74e447e in lvm2_init () at lvm2cmd.c:20 #8 0xb7516acb in register_device (device=0x80522a0 "data-lvol0", uuid=0x8050878 "LVM-VyAlWCo9m2AbITHG6iFHfukncJdLkZGRirEyVANHe5lyrRFbFcYmdOC01pGatGyc", major=253, minor=15, private=0x8050870) at dmeventd_snapshot.c:170 #9 0x0804b482 in _do_register_device (message_data=0x804f248) at dmeventd.c:654 #10 _register_for_event (message_data=0x804f248) at dmeventd.c:972 #11 0x0804c98c in _handle_request (argc=2, argv=0xbfa0d604) at dmeventd.c:1374 #12 _do_process_request (argc=2, argv=0xbfa0d604) at dmeventd.c:1403 #13 _process_request (argc=2, argv=0xbfa0d604) at dmeventd.c:1430 #14 main (argc=2, argv=0xbfa0d604) at dmeventd.c:1741 Created attachment 209352 [details]
dmeventd traces
I have used the following script to trace dmeventd. The kernel hasn't generate a core file.
/etc/sysctl.conf contains,
kernel.core_uses_pid = 1
kernel.core_pattern = core.%u.%e.%p
Also dmesg returns,
dmeventd[16670]: segfault at 0 ip 00007fa39ba13d84 sp 00007fff63b94c68 error 4 in libdevmapper-event-lvm2snapshot.so.2.02[7fa39ba13000+2000]
I certainly missed something...
-d is obsolete, the process still goes background.
#!/bin/bash
exec 2>/var/tmp/dmeventd.$$
ulimit -f unlimited
export LD_WARN='yes'
export LD_DEBUG='all'
/sbin/dmeventd -d # --pidfile /var/run/dmeventd.pid
If I understand well libdevmapper-event could start dmeventd if it isn't already running but it does nothing. Perhaps just some wrong permissions on the fifos?
The fifos are:
prw------- 1 root root 0 nov. 5 18:17 /var/run/dmeventd-client
prw------- 1 root root 0 nov. 5 18:17 /var/run/dmeventd-server
and
# ls -l /proc/sys/kernel/core_
core_pattern core_pipe_limit core_uses_pid
# cat /proc/sys/kernel/core_*
core.%u.%e.%p
0
1
Additional infos with dmeventd strace: It seems not to like gentoo lvm.conf file writev(2, [{" 32407:\t", 12}, {"symbol=", 7}, {"unregister_device", 17}, {"; lookup in file=", 18}, {"/lib/libdevmapper-event-lvm2snap"..., 39}, {" [", 2}, {"0", 1}, {"]\n", 2}], 8) = 98 getpid() = 32407 writev(2, [{" 32407:\t", 12}, {"binding file ", 13}, {"/lib/libdevmapper-event-lvm2snap"..., 39}, {" [", 2}, {"0", 1}, {"] to ", 5}, {"/lib/libdevmapper-event-lvm2snap"..., 39}, {" [", 2}, {"0", 1}, {"]: ", 3}, {"normal", 6}, {" symbol `", 9}, {"unregister_device", 17}, {"'", 1}], 14) = 150 writev(2, [{"\n", 1}], 1) = 1 open("/proc/devices", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87814f3000 read(7, "Character devices:\n 1 mem\n 4 /"..., 1024) = 525 close(7) = 0 munmap(0x7f87814f3000, 4096) = 0 open("/proc/misc", O_RDONLY) = 7 fstat(7, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87814f3000 read(7, "229 fuse\n 57 btrfs-control\n 58 d"..., 1024) = 190 close(7) = 0 munmap(0x7f87814f3000, 4096) = 0 stat("/dev/mapper/control", {st_mode=S_IFCHR|0660, st_rdev=makedev(10, 58), ...}) = 0 open("/dev/mapper/control", O_RDWR) = 7 uname({sys="Linux", node="snowman", ...}) = 0 open("/proc/devices", O_RDONLY) = 8 fstat(8, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f87814f3000 read(8, "Character devices:\n 1 mem\n 4 /"..., 1024) = 525 close(8) = 0 munmap(0x7f87814f3000, 4096) = 0 ioctl(7, DM_VERSION, 0x1867c90) = 0 ioctl(7, DM_DEV_STATUS, 0x1867c90) = 0 mmap(NULL, 307200, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8781478000 mprotect(0x7f8781478000, 4096, PROT_NONE) = 0 clone(child_stack=0x7f87814c2210, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f87814c29e0, tls=0x7f87814c2910, child_tidptr=0x7f87814c29e0) = 2485 open("/usr/lib64/locale/locale-archive", O_RDONLY) = 8 fstat(8, {st_mode=S_IFREG|0644, st_size=2301760, ...}) = 0 mmap(NULL, 2301760, PROT_READ, MAP_PRIVATE, 8, 0) = 0x7f877f9d3000 close(8) = 0 stat("/etc/lvm", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 stat("/etc/lvm/lvm.conf", {st_mode=S_IFREG|0644, st_size=18597, ...}) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- Process 32407 detached Actually, if mirror_library and snapshot_library are both commented in dmeventd() {} section in lvm.conf the daemon will stop crashing. Also, it does nothing apparently. I confirm this problem. After upgrading lvm2-2.02.36 to lvm2-2.02.51-r1 i couldn't create LVM snapshots anymore. What's even more annoying is that device-mapper doesn't start with baselayout-1. I wonder why, because baselayout-2 isn't stable yet. I tried to fix it by editing device-mapper's startup script: start() { local f=/lib/rcscripts/addons/dm-start.sh if [ -r "$f" ]; then ( . "$f" ) fi } This improves situation. Sometimes LVM snapshot creation/removal work. But most of the time i get segfaults: kernel: dmeventd[16902]: segfault at 0 ip b7d67c78 sp bfbe77a0 error 4 in libdevmapper-event-lvm2snapshot.so.2.02[b7d67000+2000] Is there a good reason, why lvm2's device-mapper doesn't support baselayout-1 anymore? This seems to be one of the origins of the problem. I found a workaround: * Stop all udevd * Add comments like supposed in #10 to lvm.conf * Stop dmeventd I now can create/remove LVM snapshots without errors again. I'm also having this problem. For me, comment #10 followed by restarting /etc/init.d/dmeventd solved it. According to kdbg, there is a line of code in print_log (line 201) that doesn't get executed. Which leads to a null message being logged, causing the seg fault... but there's no apparent reason for it to not get executed. Found the root cause of the problem. _temporary_log_fn parameters as defined in daemons/dmeventd/plugins/snapshot/dmeventd_snapshot.c and daemons/dmeventd/plugins/mirror/dmeventd_mirror.c don't match the expected function call, they're missing dm error parameter (being passed as 0 from print_log.c +216. This matches my stack trace and correcting those functions fixes the crashing problem. I think that's all I changed, I'll double check and make a patch and ebuild for anyone who wants to test it. Created attachment 210544 [details]
ebuild to pull in my patch
Created attachment 210545 [details, diff]
the patch
agk: see patch in comment #17 please? Fixed in 2.02.56. |