Summary: | sys-fs/lvm2 with sys-fs/udev-243-r2: lvremove hangs with snapshot | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Robert S <robert.spam.me.senseless> |
Component: | Current packages | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | marci_r, mgerstner, Sergiy.Borodych, udev-bugs |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Robert S
2019-11-24 10:00:46 UTC
Have you tried upgrading to sys-fs/lvm2-2.02.186-r1 version? Have you tried upgrading to sys-fs/lvm2-2.02.186-r1 version? Sorry for the double post, I accidentally hit my return key while editing the subject... Yes - upgrading lvm2 fixed it - with the latest udev: sys-fs/lvm2: (~)2.02.186-r1 sys-fs/udev: 243-r2 # lvcreate -L2G -s -n homebackup /dev/system/home Logical volume "homebackup" created. # lvremove -v /dev/system/homebackup Do you really want to remove active logical volume system/homebackup? [y/n]: y Accepted input: [y] Archiving volume group "system" metadata (seqno 1194). Removing snapshot volume system/homebackup. Loading table for system-home (254:9). Loading table for system-homebackup (254:12). Not monitoring system/homebackup with libdevmapper-event-lvm2snapshot.so Unmonitored LVM-FjwEH69DmFF5fICrqPy4C2N1QO30J3ij6G5wyXndCv66CVMhaJca24FDURYloYn0 for events Suspending system-home (254:9) with device flush Suspending system-homebackup (254:12) with device flush Suspending system-home-real (254:10) with device flush Suspending system-homebackup-cow (254:11) with device flush activation/volume_list configuration setting not defined: Checking only host tags for system/homebackup. Resuming system-homebackup-cow (254:11). Resuming system-home-real (254:10). Resuming system-homebackup (254:12). Resuming system-home (254:9). Removing system-home-real (254:10) Removing system-homebackup (254:12) Removing system-homebackup-cow (254:11) Releasing logical volume "homebackup" Creating volume group backup "/etc/lvm/backup/system" (seqno 1196). Logical volume "homebackup" successfully removed Should lvm2-2.02.186-r1 be marked "stable"? I can confirm this bug. It occurs only when using sys-fs/udev, not when using sys-fs/eudev. On a fresh install of Gentoo using the still stable sys-fs/lvm2 2.02.184-r5 it happens every time I do a `lvcreate -s`, `lvremove` pair of commands. It may also be related to the lvm2 use flags, particularly I disabled `thin`: [ebuild R ] sys-fs/lvm2-2.02.184-r5::gentoo USE="readline udev -device-mapper-only -lvm2create_initrd -sanlock (-selinux) -static -static-libs -systemd -thin" 0 KiB The reason for the blocking `lvremove` is a deadlock or lost wakeup bug in lvm2. A SysV semaphore is left behind with a count of 2: ``` root# ipcs ------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status ------ Semaphore Arrays -------- key semid owner perms nsems 0x0d4d8ec1 131072 root 600 1 ``` The `lvremove` process is waiting for this semaphore to be decremented, which never happens. A race condition is also somehow involved. When trying to debug it with breakpoints and single stepping within the `lvremove` process then the deadlock doesn't occur. The Semaphore key/id is randomly generated during each lvremove run and somehow tied to udev requests. Thus the difference between using eudev and udev, most probably. The gdb backtrace of `lvremove` when it's stuck looks as follows: ``` (gdb) bt #0 semop (semid=294912, sops=sops@entry=0x7ffec41cdc42, nsops=nsops@entry=1) at ../sysdeps/unix/sysv/linux/semop.c:30 #1 0x00007f2df7510d59 in _udev_wait (cookie=223199993, nowait=nowait@entry=0x7ffec41cdc74) at libdm-common.c:2649 #2 0x00007f2df7512438 in dm_udev_wait (cookie=<optimized out>) at libdm-common.c:2668 #3 0x000056002add0535 in fs_unlock () at activate/fs.c:493 #4 0x000056002ad28bee in _lv_info (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5b9c88, use_layer=use_layer@entry=0, info=info@entry=0x7ffec41cdd90, seg=seg@entry=0x0, seg_status=seg_status@entry=0x0, with_open_count=1, with_read_ahead=0) at activate/activate.c:675 #5 0x000056002ad2a373 in lv_info (with_read_ahead=0, with_open_count=1, info=0x7ffec41cdd90, use_layer=0, lv=0x56002d5b9c88, cmd=0x56002cd294a0) at activate/activate.c:724 #6 lv_info (cmd=0x56002cd294a0, lv=0x56002d5b9c88, use_layer=0, info=0x7ffec41cdd90, with_open_count=1, with_read_ahead=0) at activate/activate.c:718 #7 0x000056002ad2acd2 in lv_check_not_in_use (lv=lv@entry=0x56002d5b9c88, error_if_used=error_if_used@entry=1) at activate/activate.c:850 #8 0x000056002ad2fa8b in lv_deactivate (cmd=cmd@entry=0x56002cd294a0, lvid_s=lvid_s@entry=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", lv=0x56002d5b9c88) at activate/activate.c:2664 #9 0x000056002addc16d in _file_lock_resource (cmd=0x56002cd294a0, resource=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", flags=24, lv=<optimized out>) at locking/file_locking.c:96 #10 0x000056002ad6470a in _lock_vol (cmd=cmd@entry=0x56002cd294a0, resource=resource@entry=0x7ffec41cef10 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", flags=flags@entry=24, lv_op=lv_op@entry=LV_NOOP, lv=lv@entry=0x56002d5a1c58) at locking/locking.c:266 #11 0x000056002ad65159 in lock_vol (cmd=cmd@entry=0x56002cd294a0, vol=vol@entry=0x56002d5a1c58 "cyWnRC2RdnAXEpEbXu9vEEzwS202EIDs7ZVETzHuwBpeB5FXM8L2ZYiWOYHdjUZf", flags=24, lv=lv@entry=0x56002d5a1c58) at locking/locking.c:351 #12 0x000056002ad7a12d in lv_remove_single (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5a1c58, force=force@entry=DONT_PROMPT, suppress_remove_message=suppress_remove_message@entry=0) at metadata/lv_manip.c:6135 #13 0x000056002ad7adb8 in lv_remove_with_dependencies (cmd=cmd@entry=0x56002cd294a0, lv=lv@entry=0x56002d5a1c58, force=DONT_PROMPT, level=level@entry=0) at metadata/lv_manip.c:6367 #14 0x000056002ad1acfa in lvremove_single (cmd=cmd@entry=0x56002cd294a0, lv=0x56002d5a1c58, handle=handle@entry=0x56002cd61698) at toollib.c:4788 #15 0x000056002ad176e7 in process_each_lv_in_vg (cmd=cmd@entry=0x56002cd294a0, vg=vg@entry=0x56002d5a1040, arg_lvnames=arg_lvnames@entry=0x7ffec41cf4a0, tags_in=tags_in@entry=0x7ffec41cf450, stop_on_error=stop_on_error@entry=0, handle=handle@entry=0x56002cd61698, check_single_lv=0x0, process_single_lv=0x56002ad1acd0 <lvremove_single>) at toollib.c:3232 #16 0x000056002ad18dc7 in _process_lv_vgnameid_list (process_single_lv=0x56002ad1acd0 <lvremove_single>, check_single_lv=0x0, handle=0x56002cd61698, arg_tags=0x7ffec41cf450, arg_lvnames=0x7ffec41cf470, arg_vgnames=0x7ffec41cf460, vgnameids_to_process=0x7ffec41cf490, read_flags=1048576, cmd=0x56002cd294a0) at toollib.c:3696 #17 process_each_lv (cmd=cmd@entry=0x56002cd294a0, argc=argc@entry=1, argv=argv@entry=0x7ffec41cf8f8, one_vgname=one_vgname@entry=0x0, one_lvname=one_lvname@entry=0x0, read_flags=read_flags@entry=1048576, handle=0x56002cd61698, check_single_lv=<optimized out>, process_single_lv=<optimized out>) at toollib.c:3854 #18 0x000056002ad025c1 in lvremove (cmd=0x56002cd294a0, argc=1, argv=0x7ffec41cf8f8) at lvremove.c:29 #19 0x000056002ad00a05 in lvm_run_command (cmd=cmd@entry=0x56002cd294a0, argc=<optimized out>, argc@entry=3, argv=<optimized out>, argv@entry=0x7ffec41cf8e8) at lvmcmdline.c:3008 #20 0x000056002ad01d33 in lvm2_main (argc=3, argv=0x7ffec41cf8e8) at lvmcmdline.c:3537 #21 0x00007f2df72f9eab in __libc_start_main (main=0x56002acdeb00 <main>, argc=3, argv=0x7ffec41cf8e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffec41cf8d8) at ../csu/libc-start.c:308 #22 0x000056002acdeb3a in _start () at lvm.c:22 ``` I can confirm the bug. Also have the issue with 'lvmcreate -s --size=' (so no thin) and udev. Updated to: sys-fs/lvm2-2.02.186-r2::gentoo [2.02.184-r5::gentoo] And the issue seems like gone. P.S. But still have noises in dmesg like: dm-4: Conflicting device node '/dev/mapper/vg-label' found, link to '/dev/dm-4' will not be created. can confirm too. I downgraded udev to 242. It is ok now. After reading this https://github.com/systemd/systemd/issues/13976 there is a patch, but I not not tested it. The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82a0082762f7687ec3750a398f4e35af55ed8714 commit 82a0082762f7687ec3750a398f4e35af55ed8714 Author: Jakov Smolić <jsmolic@gentoo.org> AuthorDate: 2023-09-23 00:51:19 +0000 Commit: Jakov Smolić <jsmolic@gentoo.org> CommitDate: 2023-09-23 00:51:28 +0000 sys-fs/udev: treeclean Closes: https://bugs.gentoo.org/514016 Closes: https://bugs.gentoo.org/804218 Closes: https://bugs.gentoo.org/701056 Signed-off-by: Jakov Smolić <jsmolic@gentoo.org> profiles/package.mask | 1 - profiles/targets/systemd/package.mask | 3 +-- sys-fs/udev/metadata.xml | 11 ----------- sys-fs/udev/udev-250.ebuild | 15 --------------- 4 files changed, 1 insertion(+), 29 deletions(-) |