Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 701056

Summary: sys-fs/lvm2 with sys-fs/udev-243-r2: lvremove hangs with snapshot
Product: Gentoo Linux Reporter: Robert S <>
Component: Current packagesAssignee: Gentoo's Team for Core System packages <base-system>
Status: UNCONFIRMED ---    
Severity: normal CC: marci_r, mgerstner, Sergiy.Borodych, udev-bugs
Priority: Normal    
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Package list:
Runtime testing required: ---

Description Robert S 2019-11-24 10:00:46 UTC
I've been using lvm2 to create a snapshot of my home partition for backup purposes.  The lvremove command now hangs:

# lvcreate -L2G -s -n homebackup /dev/system/home > /dev/null
# lvremove /dev/system/homebackup -fv
    Archiving volume group "system" metadata (seqno 1161).
    Removing snapshot volume system/homebackup.
    Loading table for system-home (254:9).
    Loading table for system-homebackup (254:13).
    Not monitoring system/homebackup with
    Unmonitored LVM-FjwEH69DmFF5fICrqPy4C2N1QO30J3ij48LeBtt2ym8aFMbY9Y5U7Lj9cJuVE4ge for events
    Suspending system-home (254:9) with device flush
    Suspending system-homebackup (254:13) with device flush
    Suspending system-home-real (254:11) with device flush
    Suspending system-homebackup-cow (254:12) with device flush
    activation/volume_list configuration setting not defined: Checking only host tags for system/homebackup.
    Resuming system-homebackup-cow (254:12).
    Resuming system-home-real (254:11).
    Resuming system-homebackup (254:13).
    Removing system-homebackup-cow (254:12)
    Resuming system-home (254:9).
    Removing system-home-real (254:11)
The command hangs at this point, and nothing happens.
Some info:
gentoo, kernel 4.19.82-gentoo (recently upgraded from 4.19.72-gentoo)
lvm2 version: sys-fs/lvm2-2.02.184-r5
udev version:  sys-fs/udev-243-r2

I've tried downgrading the kernel and lvm2

Downgrading udev has fixed this problem

Reproducible: Always

Steps to Reproduce:
1. As above
Actual Results:  
lvremove command hangs forever

Expected Results:  
Command should complete

# emerge --info
Portage 2.3.76 (python 2.7.16-final-0, default/linux/amd64/17.1, gcc-9.2.0, glibc-2.29-r2, 4.19.72-gentoo x86_64)
System uname: Linux-4.19.72-gentoo-x86_64-AMD_FX-tm-4100_Quad-Core_Processor-with-gentoo-2.6
KiB Mem:     7640564 total,   4424700 free
KiB Swap:    2097148 total,   2097148 free
Timestamp of repository gentoo: Sun, 24 Nov 2019 08:30:01 +0000
Head commit of repository gentoo: cb3fbd845ee6858e0cbbac1ecfdb0eb2bb391cf1
sh bash 4.4_p23-r1
ld GNU ld (Gentoo 2.32 p2) 2.32.0
distcc 3.3.3 x86_64-pc-linux-gnu [disabled]
ccache version 3.7.4 [enabled]
app-shells/bash:          4.4_p23-r1::gentoo
dev-lang/perl:            5.28.2-r1::gentoo
dev-lang/python:          2.7.16::gentoo, 3.5.7::gentoo, 3.6.9::gentoo
dev-util/ccache:          3.7.4::gentoo
dev-util/cmake:           3.14.6::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.41.2::gentoo
sys-apps/sandbox:         2.13::gentoo
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.13.4-r2::gentoo, 1.15.1-r2::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.32-r1::gentoo
sys-devel/gcc:            8.3.0-r1::gentoo, 9.2.0-r2::gentoo
sys-devel/gcc-config:     2.1::gentoo
sys-devel/libtool:        2.4.6-r3::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 4.19::gentoo (virtual/os-headers)
sys-libs/glibc:           2.29-r2::gentoo

    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-max-age: 24

    location: /usr/local/portage
    masters: gentoo
    priority: 0

CFLAGS="-O2 -pipe"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.3/ext-active/ /etc/php/cgi-php7.3/ext-active/ /etc/php/cli-php7.3/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo"
CXXFLAGS="-O2 -pipe"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs ccache config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
USE="acl amd64 bash-completion berkdb bzip2 cli crypt cxx dri fortran gdbm gpm iconv libtirpc mmx multilib ncurses nls nptl openmp pam pcre readline seccomp split-usr sse sse2 ssl tcpd unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NETBEANS_MODULES="apisupport cnd groovy gsf harness ide identity j2ee java mobility nb php profiler soa visualweb webcommon websvccommon xml" OFFICE_IMPLEMENTATION="libreoffice" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_5" PYTHON_TARGETS="python2_7 python3_5 python3_6" QEMU_SOFTMMU_TARGETS="i386 x86_64 arm" QEMU_USER_TARGETS="i386 x86_64 arm" RUBY_TARGETS="ruby24 ruby25" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Comment 1 Lars Wendler (Polynomial-C) gentoo-dev 2019-11-24 10:40:40 UTC
Have you tried upgrading to sys-fs/lvm2-2.02.186-r1 version?
Comment 2 Lars Wendler (Polynomial-C) gentoo-dev 2019-11-24 10:41:07 UTC
Have you tried upgrading to sys-fs/lvm2-2.02.186-r1 version?
Comment 3 Lars Wendler (Polynomial-C) gentoo-dev 2019-11-24 10:41:33 UTC
Sorry for the double post, I accidentally hit my return key while editing the subject...
Comment 4 Robert S 2019-11-24 20:38:41 UTC
Yes - upgrading lvm2 fixed it - with the latest udev:

sys-fs/lvm2: (~)2.02.186-r1
sys-fs/udev: 243-r2

# lvcreate -L2G -s -n homebackup /dev/system/home
  Logical volume "homebackup" created.

# lvremove -v /dev/system/homebackup
Do you really want to remove active logical volume system/homebackup? [y/n]: y
    Accepted input: [y]
    Archiving volume group "system" metadata (seqno 1194).
    Removing snapshot volume system/homebackup.
    Loading table for system-home (254:9).
    Loading table for system-homebackup (254:12).
    Not monitoring system/homebackup with
    Unmonitored LVM-FjwEH69DmFF5fICrqPy4C2N1QO30J3ij6G5wyXndCv66CVMhaJca24FDURYloYn0 for events
    Suspending system-home (254:9) with device flush
    Suspending system-homebackup (254:12) with device flush
    Suspending system-home-real (254:10) with device flush
    Suspending system-homebackup-cow (254:11) with device flush
    activation/volume_list configuration setting not defined: Checking only host tags for system/homebackup.
    Resuming system-homebackup-cow (254:11).
    Resuming system-home-real (254:10).
    Resuming system-homebackup (254:12).
    Resuming system-home (254:9).
    Removing system-home-real (254:10)
    Removing system-homebackup (254:12)
    Removing system-homebackup-cow (254:11)
    Releasing logical volume "homebackup"
    Creating volume group backup "/etc/lvm/backup/system" (seqno 1196).
  Logical volume "homebackup" successfully removed

Should lvm2-2.02.186-r1 be marked "stable"?
Comment 5 Matthias Gerstner 2019-12-23 14:18:20 UTC
I can confirm this bug. It occurs only when using sys-fs/udev, not when using sys-fs/eudev. On a fresh install of Gentoo using the still stable sys-fs/lvm2 2.02.184-r5 it happens every time I do a `lvcreate -s`, `lvremove` pair of commands. It may also be related to the lvm2 use flags, particularly I disabled `thin`:

[ebuild   R    ] sys-fs/lvm2-2.02.184-r5::gentoo  USE="readline udev -device-mapper-only -lvm2create_initrd -sanlock (-selinux) -static -static-libs -systemd -thin" 0 KiB

The reason for the blocking `lvremove` is a deadlock or lost wakeup bug in lvm2. A SysV semaphore is left behind with a count of 2:

root# ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     
0x0d4d8ec1 131072     root       600        1         

The `lvremove` process is waiting for this semaphore to be decremented, which never happens. A race condition is also somehow involved. When trying to debug it with breakpoints and single stepping within the `lvremove` process then the deadlock doesn't occur.

The Semaphore key/id is randomly generated during each lvremove run and somehow tied to udev requests. Thus the difference between using eudev and udev, most probably.

The gdb backtrace of `lvremove` when it's stuck looks as follows:

(gdb) bt
#0  semop (semid=294912, sops=sops@entry=0x7ffec41cdc42, nsops=nsops@entry=1)
    at ../sysdeps/unix/sysv/linux/semop.c:30
#1  0x00007f2df7510d59 in _udev_wait (cookie=223199993, nowait=nowait@entry=0x7ffec41cdc74) at libdm-common.c:2649
#2  0x00007f2df7512438 in dm_udev_wait (cookie=<optimized out>) at libdm-common.c:2668
#3  0x000056002add0535 in fs_unlock () at activate/fs.c:493
#4  0x000056002ad28bee in _lv_info (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5b9c88, 
    use_layer=use_layer@entry=0, info=info@entry=0x7ffec41cdd90, seg=seg@entry=0x0, 
    seg_status=seg_status@entry=0x0, with_open_count=1, with_read_ahead=0) at activate/activate.c:675
#5  0x000056002ad2a373 in lv_info (with_read_ahead=0, with_open_count=1, info=0x7ffec41cdd90, use_layer=0, 
    lv=0x56002d5b9c88, cmd=0x56002cd294a0) at activate/activate.c:724
#6  lv_info (cmd=0x56002cd294a0, lv=0x56002d5b9c88, use_layer=0, info=0x7ffec41cdd90, with_open_count=1, 
    with_read_ahead=0) at activate/activate.c:718
#7  0x000056002ad2acd2 in lv_check_not_in_use (lv=lv@entry=0x56002d5b9c88, error_if_used=error_if_used@entry=1)
    at activate/activate.c:850
#8  0x000056002ad2fa8b in lv_deactivate (cmd=cmd@entry=0x56002cd294a0, 
    lvid_s=lvid_s@entry=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", 
    lv=0x56002d5b9c88) at activate/activate.c:2664
#9  0x000056002addc16d in _file_lock_resource (cmd=0x56002cd294a0, 
    resource=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", flags=24, 
    lv=<optimized out>) at locking/file_locking.c:96
#10 0x000056002ad6470a in _lock_vol (cmd=cmd@entry=0x56002cd294a0, 
    resource=resource@entry=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", 
    flags=flags@entry=24, lv_op=lv_op@entry=LV_NOOP, lv=lv@entry=0x56002d5a1c58) at locking/locking.c:266
#11 0x000056002ad65159 in lock_vol (cmd=cmd@entry=0x56002cd294a0, 
    vol=vol@entry=0x56002d5a1c58 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", flags=24, 
    lv=lv@entry=0x56002d5a1c58) at locking/locking.c:351
#12 0x000056002ad7a12d in lv_remove_single (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5a1c58, 
    force=force@entry=DONT_PROMPT, suppress_remove_message=suppress_remove_message@entry=0)
    at metadata/lv_manip.c:6135
#13 0x000056002ad7adb8 in lv_remove_with_dependencies (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5a1c58, 
    force=DONT_PROMPT, level=level@entry=0) at metadata/lv_manip.c:6367
#14 0x000056002ad1acfa in lvremove_single (cmd=cmd@entry=0x56002cd294a0, lv=0x56002d5a1c58, 
    handle=handle@entry=0x56002cd61698) at toollib.c:4788
#15 0x000056002ad176e7 in process_each_lv_in_vg (cmd=cmd@entry=0x56002cd294a0, vg=vg@entry=0x56002d5a1040, 
    arg_lvnames=arg_lvnames@entry=0x7ffec41cf4a0, tags_in=tags_in@entry=0x7ffec41cf450, 
    stop_on_error=stop_on_error@entry=0, handle=handle@entry=0x56002cd61698, check_single_lv=0x0, 
    process_single_lv=0x56002ad1acd0 <lvremove_single>) at toollib.c:3232
#16 0x000056002ad18dc7 in _process_lv_vgnameid_list (process_single_lv=0x56002ad1acd0 <lvremove_single>, 
    check_single_lv=0x0, handle=0x56002cd61698, arg_tags=0x7ffec41cf450, arg_lvnames=0x7ffec41cf470, 
    arg_vgnames=0x7ffec41cf460, vgnameids_to_process=0x7ffec41cf490, read_flags=1048576, cmd=0x56002cd294a0)
    at toollib.c:3696
#17 process_each_lv (cmd=cmd@entry=0x56002cd294a0, argc=argc@entry=1, argv=argv@entry=0x7ffec41cf8f8, 
    one_vgname=one_vgname@entry=0x0, one_lvname=one_lvname@entry=0x0, read_flags=read_flags@entry=1048576, 
    handle=0x56002cd61698, check_single_lv=<optimized out>, process_single_lv=<optimized out>) at toollib.c:3854
#18 0x000056002ad025c1 in lvremove (cmd=0x56002cd294a0, argc=1, argv=0x7ffec41cf8f8) at lvremove.c:29
#19 0x000056002ad00a05 in lvm_run_command (cmd=cmd@entry=0x56002cd294a0, argc=<optimized out>, argc@entry=3, 
    argv=<optimized out>, argv@entry=0x7ffec41cf8e8) at lvmcmdline.c:3008
#20 0x000056002ad01d33 in lvm2_main (argc=3, argv=0x7ffec41cf8e8) at lvmcmdline.c:3537
#21 0x00007f2df72f9eab in __libc_start_main (main=0x56002acdeb00 <main>, argc=3, argv=0x7ffec41cf8e8, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffec41cf8d8)
    at ../csu/libc-start.c:308
#22 0x000056002acdeb3a in _start () at lvm.c:22
Comment 6 Sergiy Borodych 2020-02-14 07:44:26 UTC
I can confirm the bug.
Also have the issue with 'lvmcreate -s --size=' (so no thin) and udev.

Updated to:
 sys-fs/lvm2-2.02.186-r2::gentoo [2.02.184-r5::gentoo]

And the issue seems like gone.

But still have noises in dmesg like:

dm-4: Conflicting device node '/dev/mapper/vg-label' found, link to '/dev/dm-4' will not be created.
Comment 7 Jacek 2020-02-24 14:07:34 UTC
can confirm too.

I downgraded udev to 242. It is ok now.
After reading this
there is a patch, but I not not tested it.