Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 878353 - sys-fs/lvm2: init script fails to start with vgscan status code 5
Summary: sys-fs/lvm2: init script fails to start with vgscan status code 5
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: PullRequest
: 889559 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-10-26 11:28 UTC by Till Schäfer
Modified: 2024-03-20 10:12 UTC (History)
6 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
output of vgscan -vvvv --config 'global { locking_dir = "/run/lock/lvm" }' --mknodes (vgscan.out,184.23 KB, text/plain)
2022-10-26 11:29 UTC, Till Schäfer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Till Schäfer 2022-10-26 11:28:45 UTC
Whenever I try to start the lvm service (openrc) if fails with the following output. It should be noted though, that the lvm volumes are all there and lvm is working without any observable issues. 

# /etc/init.d/lvm start
 * Starting the Logical Volume Manager ...
  Found volume group "vgdata" using metadata type lvm2
  Found volume group "vgsystem" using metadata type lvm2
  Command failed with status code 5.
  12 logical volume(s) in volume group "vgdata" now active
  7 logical volume(s) in volume group "vgsystem" now active
 * Failed to start the Logical Volume Manager                                                                                        [ !! ]
 * ERROR: lvm failed to start


It seems that the erroneous return code is issued by the vgscan command with --mknode parameter. When this parameter is removed, the issue is not present. 


# vgscan --config 'global { locking_dir = "/run/lock/lvm" }' --mknodes
  Found volume group "vgdata" using metadata type lvm2
  Found volume group "vgsystem" using metadata type lvm2
  Command failed with status code 5.

A log with -vvvv is attached to the bug report. However, I cannot find any message about a failed node creation or a similar issue. 

The only thing special to that system seems to be that it runs a software raid 5, containing a luks container which then contains the physical volume of the volume group vgdata. However, the lvm command already fails at boot time. when the luks container is still closed. Thus, this contradicts this point. 

I found an old similar report here

https://gitlab.alpinelinux.org/alpine/aports/-/issues/3543

but the issue is refereed to as fixed by this commit

https://github.com/lvmteam/lvm2/commit/4dc602f79bd6579eef15a9227aee99fe832a7610



Reproducible: Always




# emerge --info
Portage 3.0.38.1 (python 3.10.8-final-0, default/linux/amd64/17.1/no-multilib/hardened, gcc-11.3.0, glibc-2.35-r8, 5.15.74-gentoo x86_64)
=================================================================
System uname: Linux-5.15.74-gentoo-x86_64-Intel-R-_Core-TM-_i5-4570S_CPU_@_2.90GHz-with-glibc2.35
KiB Mem:    16253124 total,   2972612 free
KiB Swap:   16777212 total,  16777212 free
Timestamp of repository gentoo: Wed, 26 Oct 2022 07:31:49 +0000
Head commit of repository gentoo: 3e19a7d9285d53513ebf2a4282312c13b9fe00c3

sh bash 5.1_p16-r1
ld GNU ld (Gentoo 2.38 p4) 2.38
app-misc/pax-utils:        1.3.5::gentoo
app-shells/bash:           5.1_p16-r1::gentoo
dev-lang/perl:             5.34.1-r3::gentoo
dev-lang/python:           3.9.15::gentoo, 3.10.8::gentoo
dev-lang/rust:             1.64.0-r1::gentoo
dev-util/cmake:            3.24.2::gentoo
dev-util/meson:            0.63.2-r1::gentoo
sys-apps/baselayout:       2.8::gentoo
sys-apps/openrc:           0.45.2-r1::gentoo
sys-apps/sandbox:          2.29::gentoo
sys-devel/autoconf:        2.71-r1::gentoo
sys-devel/automake:        1.16.5::gentoo
sys-devel/binutils:        2.38-r2::gentoo
sys-devel/binutils-config: 5.4.1::gentoo
sys-devel/gcc:             11.3.0::gentoo
sys-devel/gcc-config:      2.5-r1::gentoo
sys-devel/libtool:         2.4.7::gentoo
sys-devel/make:            4.3::gentoo
sys-kernel/linux-headers:  5.15-r3::gentoo (virtual/os-headers)
sys-libs/glibc:            2.35-r8::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo.git
    priority: -1000
    sync-git-verify-commit-signature: true

shared_overlay
    location: /opt/conf/common/var/db/repos/shared_overlay
    masters: gentoo
    priority: 100

local_overlay
    location: /var/db/repos/local_overlay
    masters: gentoo
    priority: 200

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -ftree-vectorize -fforce-addr"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/portage/package.accept_keywords/99-autounmask /etc/portage/package.unmask/99-autounmask /etc/portage/package.use/99-autounmask /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe -ftree-vectorize -fforce-addr"
DISTDIR="/var/cache/distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps=y --autounmask=y --autounmask-write --autounmask-continue --jobs=2 --load-average=4 --backtrack=100 --alert"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live compressdebug config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch parallel-install pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms sign splitdebug strict strict-keepdir unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j4"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
RUSTFLAGS="-C target-cpu=native -O"
SHELL="/bin/bash"
USE="acl aes amd64 avx avx2 bash-completion bzip2 cli crypt cups dnssec dri f16c fma3 fortran gdbm glib hardened iconv icu idn ipv4 ipv6 libglvnd libtirpc lvm mmx mmxext ncurses nfsv4 nfsv41 nls nptl ntp openmp opensslcrypt pam pclmul pcre pie popcnt rdrand readline samba seccomp split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 syslog test-rust threads unicode vim-syntax xattr xtpax zlib" ABI_X86="64" ADA_TARGET="gnat_2020" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-4 php8-0" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10" RUBY_TARGETS="ruby27" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LEX, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 Till Schäfer 2022-10-26 11:29:39 UTC
Created attachment 825487 [details]
output of vgscan -vvvv --config 'global { locking_dir = "/run/lock/lvm" }' --mknodes
Comment 2 Till Schäfer 2022-10-26 11:32:30 UTC
I am currently using =sys-fs/lvm2-2.03.14-r3, but the issue was already present at version 2.02.188-r3.
Comment 3 Till Schäfer 2022-10-26 11:58:53 UTC
# pvdisplay 
  --- Physical volume ---
  PV Name               /dev/mapper/crypt_raid
  VG Name               vgdata
  PV Size               32.74 TiB / not usable 3.75 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              8583098
  Free PE               3347130
  Allocated PE          5235968
  PV UUID               BordYg-Qpir-Ksm1-iOeK-v2hA-HQya-A4Hmug
   
  --- Physical volume ---
  PV Name               /dev/sda4
  VG Name               vgsystem
  PV Size               414.78 GiB / not usable <2.82 MiB
  Allocatable           yes 
  PE Size               4.00 MiB
  Total PE              106184
  Free PE               40009
  Allocated PE          66175
  PV UUID               witdMP-cVc1-IycP-YHHB-ergZ-05c2-95zjhC


# vgdisplay 
  --- Volume group ---
  VG Name               vgdata
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  14
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                12
  Open LV               12
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               32.74 TiB
  PE Size               4.00 MiB
  Total PE              8583098
  Alloc PE / Size       5235968 / 19.97 TiB
  Free  PE / Size       3347130 / <12.77 TiB
  VG UUID               Ahewgi-Osnw-r7gf-a3SA-AzT7-URAj-XHmaMJ
   
  --- Volume group ---
  VG Name               vgsystem
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  36
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                7
  Open LV               6
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               414.78 GiB
  PE Size               4.00 MiB
  Total PE              106184
  Alloc PE / Size       66175 / <258.50 GiB
  Free  PE / Size       40009 / <156.29 GiB
  VG UUID               W7knUd-A0yQ-0BSn-wDHZ-D2O7-g4D8-tG58mD
Comment 4 Johannes Lode 2022-12-04 14:26:13 UTC
I have the same symptoms here, temporary solution is removing the --mknodes option from the startup script.

The system runs on md-raid1 PVs without initramfs.

Interestingly, there is no startup script for lvmetad installed. 

Reproducible: Always

emerge --info
Portage 3.0.38.1 (python 3.10.8-final-0, default/linux/amd64/17.1, gcc-11.3.0, glibc-2.36-r5, 5.15.80-gentoo-dom0-Hyperjanus x86_64)
=================================================================
System uname: Linux-5.15.80-gentoo-dom0-Hyperjanus-x86_64-AMD_Opteron-tm-_X3418_APU-with-glibc2.36
KiB Mem:      930720 total,    695632 free
KiB Swap:   32804576 total,  32804576 free
Timestamp of repository gentoo: Sat, 03 Dec 2022 08:30:01 +0000
Head commit of repository gentoo: c2ad6974286ce29bae310d0ff67c773cfe123acc
sh bash 5.1_p16-r2
ld GNU ld (Gentoo 2.38 p4) 2.38
distcc 3.4 x86_64-pc-linux-gnu [disabled]
ccache version 4.6.3 [disabled]
app-misc/pax-utils:        1.3.5::gentoo
app-shells/bash:           5.1_p16-r2::gentoo
dev-lang/perl:             5.34.1-r4::gentoo
dev-lang/python:           3.10.8_p3::gentoo, 3.11.0_p2::gentoo
dev-util/ccache:           4.6.3::gentoo
dev-util/cmake:            3.24.3::gentoo
dev-util/meson:            0.63.3::gentoo
sys-apps/baselayout:       2.9::gentoo
sys-apps/openrc:           0.45.2-r1::gentoo
sys-apps/sandbox:          2.29::gentoo
sys-devel/autoconf:        2.71-r5::gentoo
sys-devel/automake:        1.16.5::gentoo
sys-devel/binutils:        2.38-r2::gentoo
sys-devel/binutils-config: 5.4.1::gentoo
sys-devel/gcc:             11.3.0::gentoo
sys-devel/gcc-config:      2.8::gentoo
sys-devel/libtool:         2.4.7::gentoo
sys-devel/make:            4.3::gentoo
sys-kernel/linux-headers:  5.15-r3::gentoo (virtual/os-headers)
sys-libs/glibc:            2.36-r5::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-jobs: 1
    sync-rsync-extra-opts: 
    sync-rsync-verify-max-age: 24

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=athlon64 -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=athlon64 -fomit-frame-pointer -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en"
MAKEOPTS="-j8"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="amd64 bzip2 cli crypt dri gdbm iconv libglvnd libtirpc minimal mmx multilib ncurses nptl openmp pam pcre readline seccomp split-usr sse sse2 ssl test-rust unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2021" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-4 php8-0" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_10" PYTHON_TARGETS="python3_10" RUBY_TARGETS="ruby27" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LANG, LC_ALL, LD, LEX, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 5 Till Schäfer 2022-12-08 10:36:02 UTC
Interesting that you are also using mdraid. 

Since mdraid and lvm are both using the device mapper, my hypothesis is that mdraid is creating some device mapper nodes prior to lvm and that this causes lvm to fail when it tries to create the nodes itself. Maybe return code 5 is not an issue at all and should be simply ignored by the init script? However, i cannot find a documentation regarding the return code and what is exactly means or if it is also used for more severe situation.  


(lvmetad was deprecated and is gone since version 2.03. Thus, missing lvmetad is expected and should not be of concern)
Comment 6 Johannes Lode 2022-12-09 18:42:53 UTC
I do not use LVM2 to manage the RAID setup, this is left to the boot time kernel auto-raid feature and the mdadm. But my LVM uses RAID devices /dev/md* as PVs.

So I'm not longer sure, if the bug title relates to my situation, but the symptoms are the very same.
Comment 7 Till Schäfer 2022-12-09 22:01:58 UTC
I have confused md and dm devices...

My setup is also using md devices onthe bottom of the stack. More precisely mdraid -> luks -> PV.
Comment 8 Forza 2023-12-22 15:33:04 UTC
I have the same issue with the '--mknodes' option. I removed it from my init.d/lvm script and all is good.

I think the issue is that mknodes should not be used when 'udev' is enabled since udev takes care of creating all device mapper nodes.

A secondary issue to failing to start 'lvm' is that other services like 'lvmpolld' won't work without it.

Maybe there ought to be a check for udev in the init script, or in the ebuild to prevent this issue?
Comment 9 Jaco Kroon 2024-01-11 14:51:36 UTC
crowsnest [16:46:36] ~ # vgscan --mknodes; echo $?
  Found volume group "lvm" using metadata type lvm2
  Command failed with status code 5.
5
crowsnest [16:46:48] ~ # vgscan; echo $?
  Found volume group "lvm" using metadata type lvm2
0
crowsnest [16:46:53] ~ # vgscan --mknodes; echo $?
  Found volume group "lvm" using metadata type lvm2
  Command failed with status code 5.
5

So can confirm that --mknodes in combination with thin-volumes cause this.

Will have to check if this also happens when udevd is NOT running, but from what I can tell based on simply reading our initrd (and knowing it will block and wait for init rd "recovery shell" on command failures I don't think it does).

I would thus suggest to *not* use --mknodes if (and only if) udevd is running.
Comment 10 Mike Gilbert gentoo-dev 2024-01-11 16:44:48 UTC
I don't think any of the Gentoo maintainers are using lvm in this configuration (OpenRC/thin-volumes). I suspect most of us are using systemd now.

Patches welcome.
Comment 11 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-01-11 20:53:14 UTC
Nope, I do use thin volumes, and no systemd.

I can't reproduce your problem.

# ps faux |grep -i -e ude[v] -e system[d]
root      6654  0.0  0.0  23660  6004 ?        Ss    2023   1:59 /lib/systemd/systemd-udevd --debug
robbat2  19770  0.0  0.0   4868  3312 ?        S     2023   0:00          |   \_ xscreensaver-systemd

# lvs -o +lv_all |grep -w -e thin |wc -l
5

# vgscan --mknodes; echo $? 
  Found volume group "vg" using metadata type lvm2
0
Comment 12 Jaco Kroon 2024-01-11 23:51:06 UTC
(In reply to Robin Johnson from comment #11)
> Nope, I do use thin volumes, and no systemd.
> 
> I can't reproduce your problem.
> 
> # ps faux |grep -i -e ude[v] -e system[d]
> root      6654  0.0  0.0  23660  6004 ?        Ss    2023   1:59
> /lib/systemd/systemd-udevd --debug
> robbat2  19770  0.0  0.0   4868  3312 ?        S     2023   0:00          | 
> \_ xscreensaver-systemd
> 
> # lvs -o +lv_all |grep -w -e thin |wc -l
> 5
> 
> # vgscan --mknodes; echo $? 
>   Found volume group "vg" using metadata type lvm2
> 0


Right.  So it's something specific to the thin volumes that mysqld (and seemingly others in the past) use or do.

Do we care to figure out exactly what?  I would have guessed thin snap related, but on the new nodes (11 of them) where I'm seeing this error we don't have snapshots (yet).  Previously we only saw this on a handful of nodes so I didn't bother to try and figure it out.

The output from lvs -o +lv_all is rather hard to read as a matter of fact, so I'm not sure how you'd like to proceed if we do want to figure it out.  Might be udev/lvm version related and that it was fixed somewhere along the line and since I'm using stable and not ~ for this package ...

So lvm2 2.03.21 compared to 2.03.22?

Either way, I've pushed a PR for -r4 in the interim, not sure why hppa, ppc, ia64, alpha and m68k keywords were dropped for -r3 - so PR has to be well reviewed.
Comment 13 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-01-12 01:35:11 UTC
I've had cases where the device nodes did NOT get created without --mknodes and I couldn't boot, so I'm wary about merging a change to not create them.

Yes, that output is hard to read, but there's also the json.
sudo lvs -o +lv_all --reportformat json


# This shows the 3x sparse-thin LVs I have.
sudo lvs -o +lv_all --reportformat json |jq '.report|.[]|.lv|.[]|select(.lv_layout|match("thin"))|select(.lv_layout|match("sparse"))' 


I'm on sys-fs/lvm2-2.03.22-r2 here right now.
Comment 14 Jaco Kroon 2024-03-19 11:23:31 UTC
(In reply to Robin Johnson from comment #13)
> I've had cases where the device nodes did NOT get created without --mknodes
> and I couldn't boot, so I'm wary about merging a change to not create them.

Do you perhaps have information on that?  Was it if udevd was not running?

In our initrd we still have this:

/sbin/lvm vgscan --mknodes --quiet

At this point udevd is NOT running, and --mknodes is required.  Since this is what we term a "managed command" (ie, it has to succeed) at this point we know that this works correctly.

The follow-up /sbin/lvm vgchange --sysinit -a ay --quiet --ignoremonitoring however is set to permit failure (causes a delay of 30 seconds, but this gives an opportunity in the case of this being a real problem for a sysadmin to interfere), since that does still fail with exit code 5 on thin LVs.

> Yes, that output is hard to read, but there's also the json.
> sudo lvs -o +lv_all --reportformat json

Gets marginally easier, but as per https://bugs.gentoo.org/889559 we merely need to create a thin pool for this issue to exhibit itself.  Even after passing that via jq.

> # This shows the 3x sparse-thin LVs I have.
> sudo lvs -o +lv_all --reportformat json |jq
> '.report|.[]|.lv|.[]|select(.lv_layout|match("thin"))|select(.
> lv_layout|match("sparse"))' 

I really need to learn jq query syntax :).

> I'm on sys-fs/lvm2-2.03.22-r2 here right now.

The relevant PR still issues vgchange with --mknodes if udevd is NOT running.  It does this by checking if /run/udev/control is a socket (if it is, we don't pass --mknodes, if it's not a socket, we pass --mknodes).
Comment 15 dwfreed 2024-03-19 14:55:02 UTC
I debugged this on my Debian system which exhibits the same error.  The issue is that in one particular bit of code, it skips the thin pool LV, but fails to override the return code from the default error value.  This issue has been fixed in 2.03.22 already:

https://github.com/lvmteam/lvm2/commit/e3cc3e55c8e75f20997f321bfac766859337bef6

This explains why Robin can't reproduce the issue, as 2.03.22 versions in Gentoo went stable relatively recently.
Comment 16 dwfreed 2024-03-19 14:56:22 UTC
*** Bug 889559 has been marked as a duplicate of this bug. ***
Comment 17 Jaco Kroon 2024-03-20 10:12:59 UTC
Based on further testing I concur that this has been fixed upstream in version 2.03.22 of the lvm2 package (link in directly preceding comment).

I'm closing this as fixed since same version is also now marked as stable.