Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 237930 - sys-fs/lvm2: problems during init in baselayout-1 because of failing to lock
Summary: sys-fs/lvm2: problems during init in baselayout-1 because of failing to lock
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-17 13:00 UTC by Michael Hammer (RETIRED)
Modified: 2008-09-18 10:25 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Snapshot (snapshot.jpg,87.44 KB, image/jpeg)
2008-09-18 08:53 UTC, Michael Hammer (RETIRED)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Hammer (RETIRED) gentoo-dev 2008-09-17 13:00:07 UTC
I am using / on top of lvm. Until now my config worked pretty well but now init fails during execution of /etc/init.d/fsck because lvm doesn't work. In busybox the (pv|vg|lv)scan commands fail because of the following error:

Locking type 1 initialisation failed.

That points me to the fact that /var/lock/lvm is not writable which is of course because of / being mounted ro at this point. But that was always the fact and will always be.

Did somebody change the lvm locking mechanism? Is that a regression? Did I miss something or do we have a bug here?

Thx so far, g, 

Mueli

p.S.: The problem is fixed if I use baselayout-2 which is still unstable so the problem should also be fixed in baselayout-1 - IMHO
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-09-17 18:55:08 UTC
My laptop uses / on LVM, and worked perfectly with both baselayout1 and baselayout2. 

The upstream changelog doesn't list any changes in locking that would affect this.

We specifically make all the LVM calls in the lvm-start.sh with --ignorelockingfailure. So that what you see is a warning, NOT an error.

I find it very weird that your problem goes away if you use baselayout2, because the only difference between BL1 and BL2 is that lvm-start.sh is called from an init.d script instead of the baselayout1 internals.

Could you please provide more context on the error, including which of the pvscan/vgscan/vgchange calls it's coming from? You might want to instrument "/lib/rcscripts/addons/lvm-start.sh" with some echo output.
Comment 2 Doug Goldstein (RETIRED) gentoo-dev 2008-09-17 20:02:58 UTC
Interesting... wrt to bl1 vs bl2. I'm at a loss how/why this could be happening. Please provide the output of "rc-status boot".

I'll add a little bit more to the discussion... The reason we call things with --ignorelockingfailure is the fact that at this point the file system is mounted ro and nothing else can be happening with the filesystem(s). The locking is primarily to ensure that there's no multi-initializations occuring (i.e. initializing a snapshot/mirror while the main bits are not available). These situations can occur once the system is booted and in a rw state so we leave the configs in tact to protect against those situations if you're tinkering with your LVM once your system is up. Once your system is alive, /var/lock/lvm is the appropriate place for it to use as a locking file. But during boot, there's one thing happening... your system is coming up. Nothing else.
Comment 3 Michael Hammer (RETIRED) gentoo-dev 2008-09-18 06:21:57 UTC
Please excuse the lag but I've to create a new test environment as this problem occurred on the installation at my institute. So I was forced to fix it fast without the real possibility to investigate the problem. I am trying to reproduce the problem in a virtual machine and give you some logs as soon as I've them.

thx for the fast response, g

mueli
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2008-09-18 07:38:22 UTC
Could you please describe your setup in the meantime?
1. emerge --info
2. What's the disk setup? single device, software RAID, fake hardware RAID (promise, highpoint), real RAID (3ware, SCSI etc).
3. What kernel config?
4. did you use genkernel, or create your own initramfs?
5. you said the fsck failed. Did it even start?
6. from the busybox environment right after lvm has failed, is there anything odd in dmesg?
Comment 5 Michael Hammer (RETIRED) gentoo-dev 2008-09-18 08:53:43 UTC
Created attachment 165704 [details]
Snapshot
Comment 6 Michael Hammer (RETIRED) gentoo-dev 2008-09-18 09:17:14 UTC
Ok - now I've a problem. I am not able to reproduce the problem in the virtual machine. I've unmerged lvm2 and restarted and then created a snapshot of the situation I had yesterday.

May be it's a amd64 issue which I can't test in virtualbox as my CPU has no HW virt - but I don't think so.

One more thing came into my mind which may have happened. If so I really have to apologize for the inconvenience. I have switched to shared make.conf /etc/portage/* and distfiles config here for all my machines. Perhaps I've first upgraded baselayout-1 to 2 and then switched to the shared config afterwards (were the baselayout was not unmasked) which forces a backswitch to baselayout-1. That would explain why the upgrade to baselayout-2 fixed my problem.

I'll give it a try ...
Comment 7 Michael Hammer (RETIRED) gentoo-dev 2008-09-18 10:25:48 UTC
I am so sry -> that's it.

I've downgraded baselayout-2 to baselayout-1 during an "emerge -puvND world" because of switching the package.keywords. This causes the weird error "Locking type 1 initalisation failed." and that's the first error which occurs. I've interpreted the error as an LVM bug which in fact it isn't

Please analogize the noise I made - once again the golden rule is proven: "Never trust the user!" ;)

Thx, mueli