Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 448156

Summary: Genkernel creates broken initramfs image - wrong lvm links blocks booting to a LVM rootfs
Product: Gentoo Linux Reporter: Daniel Rozsnyo <daniel>
Component: [OLD] Core systemAssignee: Gentoo Genkernel Maintainers <genkernel>
Status: RESOLVED FIXED    
Severity: critical CC: daniel, gentoo, whissi, zerochaos
Priority: Normal Keywords: InVCS, PATCH
Version: unspecified   
Hardware: AMD64   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Combined patches for this bug
genkernel-lvm-to-sbin.patch

Description Daniel Rozsnyo 2012-12-22 09:33:23 UTC
Hi, after upgrading my system and building the kernel with genkernel 3.4.45, the server no longer boots and requires manual intervention.

The cause of the failure are the broken symlinks for lvm commands, which directs these commands to just "lvm", which is inconsistent with the true location of the lvm command (located at /bin/lvm while the pv/vg/lv links are in sbin).


My system is:

# emerge genkernel busybox lvm2 -pv
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild   R    ] sys-apps/busybox-1.20.2  USE="ipv6 pam static -livecd -make-symlinks -math -mdev -savedconfig (-selinux) -sep-usr -systemd" 0 kB
[ebuild   R    ] sys-kernel/genkernel-3.4.45  USE="crypt -cryptsetup (-ibm) (-selinux)" 0 kB
[ebuild   R    ] sys-fs/lvm2-2.02.88  USE="lvm1 readline static static-libs (-clvm) (-cman) (-selinux)" 0 kB


And the xzdec/cpio unpack of initramfs shows:

ls -la bin/lvm sbin/{lv,pv,vg}*
-r-xr-xr-x 1 root root 1662240 Dec 22 10:19 bin/lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvchange -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvconvert -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvcreate -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvdisplay -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvextend -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvmchange -> lvm
-r-xr-xr-x 1 root root    6691 Dec 22 10:19 sbin/lvmconf
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvmdiskscan -> lvm
-r-xr-xr-x 1 root root    5990 Dec 22 10:19 sbin/lvmdump
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvmsadc -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvmsar -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvreduce -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvremove -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvrename -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvresize -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvs -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/lvscan -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvchange -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvck -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvcreate -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvdisplay -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvmove -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvremove -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvresize -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvs -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/pvscan -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgcfgbackup -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgcfgrestore -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgchange -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgck -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgconvert -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgcreate -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgdisplay -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgexport -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgextend -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgimport -> lvm
-r-xr-xr-x 1 root root   10352 Dec 22 10:19 sbin/vgimportclone
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgmerge -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgmknodes -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgreduce -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgremove -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgrename -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgs -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgscan -> lvm
lrwxrwxrwx 1 root root       3 Dec 22 10:19 sbin/vgsplit -> lvm


The boot is possible by typing "shell" at the rootfs not found prompt, creating the correct symlink and issuing vgscan/vgchange commands.
Comment 1 Thomas Deutschmann (RETIRED) gentoo-dev 2013-01-04 02:16:40 UTC
Created attachment 334276 [details]
Combined patches for this bug

Hi,

first, I can confirm this bug. This is really anoying. Don't understand why this build could become stable. Anyway...

I attached a patch, which will fix multiple things:

1) The compilation of lvm2, when no local static lvm2 was detected (which was always the case, because of another bug), has changed. Binaries are now stored in /sbin instead of /bin. This patch will honor this change.

2) Local lvm binaries weren't used, because of an if condition, which always evaluated to FALSE. With this patch set, any local lvm.static or lvm binary will be checked (and used, when valid) like in previous versions. See #442078

3) Finally, this patch fixes the same problem like described in this bug report for dmsetup.
Comment 2 Xake 2013-01-04 17:07:04 UTC
Hmm, this is something I cannot reproduce with version 3.4.45.
This for multiple reasons:

1. Those symlinks should not exist inside the ramfs which is because of
2. The Genkernel init scripts does not use those symlinks, but rather the equivilant of "/bin/lvm vgscan" which work just as well, and does not clutter the ramfs with symlinks (yeah, we have some unupdated error message telling lies, but those are soon gone).

So something seems fishy in those two aspects, as a newly created genkernel initramfs for me works fine with dolvm and starts all LVs and boot it.
It does not have those symlinks at all.

So could you please check your configuration so that you did not break anything in the resent updates (many users did when we removed all defaults values for busybox and so on from genkernel.conf).

If this does not help, please answer the following questions:
1. What kind of setup do you use (i.e. LVM on RAID, and what kind)
2. Does Genkernel even try to start the LVs
   - If it works you will see something equivalent of
     >> Scanning for and activating Volume Groups 
       2 logical volume(s) in volume group "Test-GK" now active
   - If you have a error message instead, please post it
     Some of those messages are misleading, so please post.
   - If you do not have
     >> Scanning for and activating Volume Groups 
     at all, then please check if you have dolvm
3. post your genkernel.conf and genkernel.log
Comment 3 Xake 2013-01-04 17:22:21 UTC
(In reply to comment #1)
> Created attachment 334276 [details]
> Combined patches for this bug
> 

Please post separate patches, much more easy to cherry pick changes.

> Hi,
> 
> first, I can confirm this bug. This is really anoying. Don't understand why
> this build could become stable. Anyway...
> 

Probably because we, the developers, are not using a configuration that meet the criteria to experience this bug, and noone else testing those unstable upgrades does. Or that we, since we follow the development, know what are going on and adjust our configurations accordingly but misses to update the documentations.

> I attached a patch, which will fix multiple things:
> 
> 1) The compilation of lvm2, when no local static lvm2 was detected (which
> was always the case, because of another bug), has changed. Binaries are now
> stored in /sbin instead of /bin. This patch will honor this change.
> 

Which should be a non-problem since initrd.scripts uses /bin/lvm directly and never uses those symlinks. So the placement of lvm, vgscan and vgchange should not matter.


> 2) Local lvm binaries weren't used, because of an if condition, which always
> evaluated to FALSE. With this patch set, any local lvm.static or lvm binary
> will be checked (and used, when valid) like in previous versions. See #442078
> 
> 3) Finally, this patch fixes the same problem like described in this bug
> report for dmsetup.

Lets talk about this in bug #442078
Comment 4 Thomas Deutschmann (RETIRED) gentoo-dev 2013-01-04 19:00:16 UTC
Hi,

this is weired. I wanted to prepare a step by step guide how to reproduce the issue, but now I am unable. Genkernel 3.4.45 out of the box is now working for me.

Well, I guess I dealt with multiple problems (syslinux 5.00 issue and a 3.7's kernel problem with DM_LOG_USERSPACE) and lost the overview  :/

The error was something like "vgchange not found" but I didn't write it down.

Anyway, that patch is still valid:

1) It makes sure that the static lvm and dmsetup binary will stay in /sbin, like it is proposed by upstream. Side effect: Commands like "vgchange" will work in rescue shell as expected, because the symlinks are valid.

2) This fix about the dmsetup call is "wrong". Well, not wrong in the meaning of invalid, but /sbin/dmsetup will only be called when MULTIPATH is used. When MULTIPATH is used, gen_initramfs.sh will copy /sbin/dmsetup from the local system (BTW: Isn't that an inconsistency? You advocate to use genkernel's own LVM2 build, because in history there were some problems with system's build, but here we are using it). But if not used, the missing dmsetup in /sbin is not a problem.

3) It will re-add support system's lvm2 build. But as said before and you mentioned, this seems to be something which we don't want back.

Because I am not affect by this bug anymore, I would step back. Maybe Daniel is still able to reproduce and can answer your questions.

And please accept my apologize, if I was too harsh in my critic. I saw the update and suddenly the system wasn't working anymore... sorry about that. Now I feel ashamed, because I cannot reproduce this anymore.

Finally, should you pick up any of my patches? Not sure.

If we don't want to use local binaries, you shouldn't.

I could provide just the patch for the move to /sbin. So that the symlinks would be valid, when you are in the rescue shell and we stay in sync with upstream.
Comment 5 Xake 2013-01-04 19:11:11 UTC
(In reply to comment #4)
<snip>
> The error was something like "vgchange not found" but I didn't write it down.
> 

This is one of those un-updated error-messages I was talking about. If /bin/lvm does not exists, then genkernel errors out with a message about vgscan/vgchange because that was the commands it used before.

> Anyway, that patch is still valid:
> 
> 1) It makes sure that the static lvm and dmsetup binary will stay in /sbin,
> like it is proposed by upstream. Side effect: Commands like "vgchange" will
> work in rescue shell as expected, because the symlinks are valid.
> 

Well, there you are right, I will take care of that.

> 2) This fix about the dmsetup call is "wrong". Well, not wrong in the
> meaning of invalid, but /sbin/dmsetup will only be called when MULTIPATH is
> used. When MULTIPATH is used, gen_initramfs.sh will copy /sbin/dmsetup from
> the local system (BTW: Isn't that an inconsistency? You advocate to use
> genkernel's own LVM2 build, because in history there were some problems with
> system's build, but here we are using it). But if not used, the missing
> dmsetup in /sbin is not a problem.
> 
> 3) It will re-add support system's lvm2 build. But as said before and you
> mentioned, this seems to be something which we don't want back.
> 

I cannot really comment ont this, Robin did the change and know what broke to begin with. Please keep that discussion to that bug.

> Finally, should you pick up any of my patches? Not sure.
> 

I will consider pieces of them to, but adapt them and let Robin take care of stuff related to not using system LVM if it is OK with you.
Comment 6 Thomas Deutschmann (RETIRED) gentoo-dev 2013-01-15 00:52:36 UTC
(In reply to comment #5)
>
> <snip>
>
> I will consider pieces of them to, but adapt them and let Robin take care of
> stuff related to not using system LVM if it is OK with you.

Yep, I am fine with that.

I installed multiple systems in the meantime and was unable to reproduce the bug. So genkernel 3.4.45 is working fine for me.
Comment 7 Eriks Latosheks 2013-01-31 12:50:54 UTC
I was experiencing LVM root mounting problem with genkernel 3.4.45 too. 
The symptoms are that "dolvm" option do not detect LVM volumes, however if you go into "shell" mode and type "lvm vgscan"/"lvm vgchange -ay" manually everything is detected just fine. 
I also found out that /svin/vg* symlinks are broken in initramfs but this seems to be not the primary cause (as indeed scripts do call /bin/lvm directly not using symlinks). 

Another potential problem that I have found is that it seems that genkernel scripts run "lvm vgscan" only when /etc/lvm/cache directory in initramfs is present. Isn't this a bug? Seems like strange condition to me.

Anyway the problem resolved for me after I have enabled CONFIG_DEVTMPFS=y in kernel configuration. The only reason I have enabled is is that latest udev was complaining that it is required to function. And magically genkernel started to detect LVM. IMO it would be great to report missing configuration option when calling genkernel --lvm * initramfs, or just run "lvm vgscan" everytime. 

PS In short: I am using genkernel to build initrrams only but kernel is built manually. If you do not enable CONFIG_DEVTMPFS=y LVMs root volumes are not detected properly even if LVM itself seems to have no hard dependency on that option (i.e. manually LVM can be assembled).
Comment 8 Xake 2013-01-31 21:54:54 UTC
(In reply to comment #7)
> Another potential problem that I have found is that it seems that genkernel
> scripts run "lvm vgscan" only when /etc/lvm/cache directory in initramfs is
> present. Isn't this a bug? Seems like strange condition to me.
> 

Nowdays pvscan/vgscan/*any*scan only exists to update the device-mapper-cache. If it does not exists vgchange does the "scan" all by itself. So if the cache does not exists, then there is no cache to update (unless someone has had some fun and moved the cache, need to investigate).

> Anyway the problem resolved for me after I have enabled CONFIG_DEVTMPFS=y in
> kernel configuration. The only reason I have enabled is is that latest udev
> was complaining that it is required to function. And magically genkernel
> started to detect LVM. IMO it would be great to report missing configuration
> option when calling genkernel --lvm * initramfs, or just run "lvm vgscan"
> everytime. 
> 

Tbh this should work even when devtmpfs is not used, I thought robbat2 had added a call to mkdevnodes for that? Need to investigate...

> PS In short: I am using genkernel to build initrrams only but kernel is
> built manually. If you do not enable CONFIG_DEVTMPFS=y LVMs root volumes are
> not detected properly even if LVM itself seems to have no hard dependency on
> that option (i.e. manually LVM can be assembled).

Well, your system will have problems without devtmpfs anyway. udev+devtmpfs has for a long time (say years, long before the systemd-integration-mess) been the upstream preferred default. It is just Gentoo that has catched up first now.
The reason why LVM works anyway is to support embedded/legacy/static-dev systems. But last time I heard that support was not crisp anyway.
Comment 9 Eriks Latosheks 2013-02-07 15:36:57 UTC
Xane,

OK makes sense. Probably most users will enable that option anyway. My minor proposal is to maybe warn user from genkernel if --lvm is used but CONFIG_DEVTMPFS is not set in current kernel. If such messaging possibility is available in genkernel anyway.

Thank you.
Comment 10 Xake 2013-02-07 21:59:54 UTC
(In reply to comment #9)
> Xane,
> 
> OK makes sense. Probably most users will enable that option anyway. My minor
> proposal is to maybe warn user from genkernel if --lvm is used but
> CONFIG_DEVTMPFS is not set in current kernel. If such messaging possibility
> is available in genkernel anyway.
> 
> Thank you.

Latest stable udev will shout loud if you even try to merge it without devtmpfs configured for the kernel found at /usr/src/linux.
As is the same with SYSFS_DEPRECATED and lot of options that also screws LVM and alike. So I really do not see why we should need to do that too.

If we enable devtmpfs in our default config, then that should be enough I say.
Comment 11 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-02-17 02:23:21 UTC
Created attachment 339112 [details, diff]
genkernel-lvm-to-sbin.patch

Let's handle one bug at a time, under the restrictions of the Description...

After applying the cherry picked patch I attached we recursively look for remaining bin/lvm (non-sbin) occurrences to make sure we dealt with all of them:

 $ grep -rP '(?<!s)bin/lvm'
gen_initramfs.sh:	if [ -x /sbin/lvm -o -x /bin/lvm ]

Only one left which is a condition that will continue to work.

But, now is the question whether this should be applied or not; it might fix the possible cause but does that fix the problem?

> indeed scripts do call /bin/lvm directly not using symlinks

This is interesting, but are we sure that all scripts do that? It could be the case that some use bin whereas others use sbin. If needed we can symlink between them as a temporary workaround (ugh...) while we get the scripts to use the right path.

> ... DEVTMPFS ...

Maybe we should first get a reproducible case of this bug with DEVTMPFS enabled to get a better idea whether this is relevant and how it affects this problem.
Comment 12 LamkaSlim 2013-02-25 12:59:51 UTC
I reproduced issue when genkernel doesn't activate volume groups.
It happened in one of our servers and after that i reproduced it in virtual environment.
Our test setup is:
2 disks raid
/dev/md1 - boot, mirror, metadata 0.90
/dev/md3 - root, mirror, metadata 0.90
/dev/md5 - for lvm, mirror, metadata 1.2
I created partition on /dev/md5 and marked it as LVM
so /dev/md5p1 is pv. 
Volume Group vg01 based on /dev/md5p1 (pv) with separated usr, var lv's
Important difference between working setup and failed is where pv located, on /dev/md5 or /dev/md5p1.

grub.conf
title=Gentoo Linux (3.6.11-gentoo-r3)
root (hd0,0)
kernel /boot/kernel-genkernel-x86_64-3.6.11-gentoo-r3 root=/dev/ram0 real_root=UUID=af6c03cc-1e47-4901-a712-fe4e7e13f907 domdadm dolvm
initrd /boot/initramfs-genkernel-x86_64-3.6.11-gentoo-r3

zcat /proc/config.gz | grep DEVTMPFS
CONFIG_DEVTMPFS=y
# CONFIG_DEVTMPFS_MOUNT is not set

emerge -1av genkernel lvm2
ebuild   R    ] sys-kernel/genkernel-3.4.45  USE="crypt -cryptsetup (-ibm) (-selinux)" 0 kB                                                                     
[ebuild   R    ] sys-fs/lvm2-2.02.97-r1  USE="lvm1 readline thin udev (-clvm) (-cman) (-selinux) -static -static-libs" 0 kB

Genkernel log:
Activating mdev
...
mdadm: /dev/md1 has been started with 2 drivers
mdadm: /dev/md2 has been started with 2 drivers
mdadm: /dev/md/5 has been started with 2 drivers
Scaning for and activating Volume Groups
 No volume groups found (?!)
 No volume groups found (?!)

Using initramfs shell:
# lvm vgdisplay
 No volume groups found
# lvm vgscan
 Reading all physical volumes. This may take a while...
 Found volume group "vg01" using metadata type lvm2
# lvm vgchange -a y
 2 logical volume(s) in volume group "vg01" now active
# lvm vgdisplay
 --- Volume group ---
 VG Name             vg01
 ...etc...
As you see - now volume group found and i can boot without any problems.
Comment 13 Xake 2013-02-25 15:59:36 UTC
(In reply to comment #12)

This kind of information I like!
This has probably to do with partitions on raid, my first guess is that something in our code initiate the partitions, but that happends AFTER lvm has made its rounds.
As soon as my dev-machine is running again I will set up a test-VM to see if I can reproduce.

I see I have missed som development here and patches, I will at the same time take a look at them.
Comment 14 Eriks Latosheks 2013-02-25 16:10:15 UTC
I was using raid partitions too, on /dev/md126p2.
Comment 15 Doug Goldstein (RETIRED) gentoo-dev 2015-12-20 22:09:08 UTC
genkernel initramfs images still put all the lvm utilities into /sbin and the actual lvm binary into /bin which results in a bunch of broken symlinks. When you have to drop to a recovery shell this is a fun one to track down. I've got this issue with genkernel 3.4.52.2
Comment 16 Rick Farina (Zero_Chaos) gentoo-dev 2016-01-05 18:53:55 UTC
for at least this second TomWij's suggested fix is in git and being actively tested.  The patch was independently recreated by myself and ryao so I am pretty confident that it's correct.
Comment 17 Rick Farina (Zero_Chaos) gentoo-dev 2016-01-05 23:05:44 UTC
after the original fix, and then several more commits by robbat2, this is all set.