Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 324185 - sys-fs/lvm2-2.02.67-r2 fails to start
Summary: sys-fs/lvm2-2.02.67-r2 fails to start
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Robin Johnson
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-15 19:42 UTC by Juergen Rose
Modified: 2010-08-20 17:32 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Juergen Rose 2010-06-15 19:42:50 UTC
After installing sys-fs/lvm2-2.02.67-r2 my systems stops to boot. The last messages, I see, are

* Autoloaded 8 Modules(s)
* Setting uo the Logical Volume Manager...

If I increase /etc/lvm/lvm.conf the log level to 3, I see messages:

mlock area unreadable '7....  libncurses.so.5.7':  Skipping
...
mlock area unreadable '7...   libc-2.11.2.so': Skipping.
...
mlock default filter '/libreadline.so' matches '7f...' : Skipping
mlock area unreadable '7f...':   Skipping
...
Getting device info for vg-usr [LVM ...]
dm info ..
...
Creating vg-usr
...
Loading vg-usr ...
...
Resuming vg-usr (...)
  Udev cookie  ... created
  Udev cookie ... incremented
  Udev cookie ... incremented
  Udev cookie ... assigned to dm_task type 5 with flag 0x0
  dm resume ...
  vg-usr:  Stacking NODE_READ_AHEAD 256 (flags=1)
  Udev cookie ... decremented
  Udev cookie ... : Waiting for zero

The system is waiting several 10 minutes, then I get similar messages for vg-var, the system hangs again some 10 minutes.
 

Reproducible: Always
Comment 1 Juergen Rose 2010-06-15 19:48:48 UTC
If I boot with the install-amd64-minimal-20100408 CD I have no problem to start the LV system manually:

mdadm -As
mount /dev/md126 /mnt/gentoo
mount /dev/md125 /mnt/gentoo/boot

pvscan
lvscan

vgchange -a y

mount /dev/vg/usr /mnt/gentoo/usr
...
Comment 2 Jason C 2010-06-16 12:02:52 UTC
I noticed this problem with the mdadm-3.1.2...downgrading to 3.1.1-r1 resolved the issue.
Comment 3 Juergen Rose 2010-06-16 13:52:39 UTC
Hi Jason,

thanks for the hint. Perhaps I should add still some information to my bug report from yesterday. My system yesterday crashed during 'emerge -uvND world'. Neither a change to the console nor a ssh login was possible. So I had to do a hard reboot. During the hard reboot the system stops as it tries to start lvm as described yesterday. So I had to boot from an other boot medium (like Fedora-13 LiveCd or the nstall-amd64-minimal-20100408 CD). I did both, the  Fedora-13 LiveCd seems to be dangerous, until yesterday I had the raid devices md0, md1 and md2 after booting with Fedora-13 I have md125, md126 and md127. Then I had to repair the raid md127 (958 GB, more than four hours) and to repair the filesystem, add mdraid to /etc/runlevels/boot and add the raid devices to /etc/mdadm.conf (was not necessary in the past). But the error remains. So I booted again from the gentoo install CD, constructed the the raid and lvm system manually, chrooted to /mnt/gentoo and completed 'emerge -uvDN world'. During 'emerge -uvDN world' udisk-1.0.1 was installed, perhaps this was relevant. I followed your hint and downgraded mdadm to version 3.1.1-r1.
The next reboot was succesfull. Then I upgraded mdadm again to version 3.1.2 and I also could boot with this version.
Comment 4 Juergen Rose 2010-06-22 18:16:33 UTC
Now I have the problem with the second computer. The last lines I see during booting are:

 /sys/devices/virtual/block/md1  7584
 /sys/devices/virtual/block/md2  7585
 /sys/devices/virtual/block/md1  7586
* Mounting /dev/pts
* Mounting /dev/shm
* Setting system clock using the hardware clock [UTC]
* loading module lp
* loading module cpufreq_user...
* loading module cpufreq_powersave
* loading module dvb-core
* loading module saa7134
* loading module saa7134-alsa
* loading module saa7134_dvb
* Autolaoding 7 modules
* Starting up RAID devices
mdadm /dev/md/3 has been started with 3 drives
* ----  starting md3 -----
* ERROR md3start failed to start
* Setting up logical Volume Manager ...


Then I see heave harddisk activity (at least the HD-LED is blinking) and it seems nothing to happen, so long as I waited.

If I boot again with the install-amd64-minimal-20100408 CD and do 

mdamdm -As

the names of my raid devices are changed:

livecd ~ # ll /dev/md*
brw-rw---- 1 root disk 9, 123 Jun 23 00:37 /dev/md123
brw-rw---- 1 root disk 9, 124 Jun 23 00:37 /dev/md124
brw-rw---- 1 root disk 9, 125 Jun 23 00:37 /dev/md125
brw-rw---- 1 root disk 9, 126 Jun 23 00:37 /dev/md126
brw-rw---- 1 root disk 9, 127 Jun 23 00:37 /dev/md127

/dev/md:
total 0
lrwxrwxrwx 1 root root 8 Jun 23 00:37 0_0 -> ../md123
lrwxrwxrwx 1 root root 8 Jun 23 00:37 123 -> ../md123
lrwxrwxrwx 1 root root 8 Jun 23 00:37 124 -> ../md124
lrwxrwxrwx 1 root root 8 Jun 23 00:37 125 -> ../md125
lrwxrwxrwx 1 root root 8 Jun 23 00:37 126 -> ../md126
lrwxrwxrwx 1 root root 8 Jun 23 00:37 127 -> ../md127
lrwxrwxrwx 1 root root 8 Jun 23 00:37 1_0 -> ../md125
lrwxrwxrwx 1 root root 8 Jun 23 00:37 1_1 -> ../md124
lrwxrwxrwx 1 root root 8 Jun 23 00:37 2_0 -> ../md126
lrwxrwxrwx 1 root root 8 Jun 23 00:37 grizzly:3 -> ../md127
livecd ~ # mdadm -V
mdadm - v3.0 - 2nd June 2009

I identified /dev/md125 as my root device, and I mounted /dev/md125 at /mnt/gentoo. If I grep in /mnt/gentoo/etc/fstab for raid devices, I find:

livecd ~ # grep md /mnt/gentoo/etc/fstab
/dev/md0 /boot         ext2  noauto,noatime     1 1
/dev/md1 /             ext3  noatime            0 0
/dev/md3 /home_grizzly ext4  noatime,user_xattr 1 2

In /mnt/gentoo/etc/mdadm.conf I find:
livecd ~ # grep "^ARRAY" /mnt/gentoo/etc/mdadm.conf 
ARRAY /dev/md/0_0 metadata=0.90 UUID=66edb608:3c66a3a3:f9ae6609:e0599e49
ARRAY /dev/md/2_0 metadata=0.90 UUID=280c0b21:27d1b9b5:ad62a584:f5a3b59c
ARRAY /dev/md/1_0 metadata=0.90 UUID=8b7f774b:26608217:52db585b:9e733037
ARRAY /dev/md/3 metadata=1.01 name=grizzly:3 UUID=b4e68642:1cf870e1:f60ad64d:daeef4e0

livecd ~ # ll /mnt/gentoo/etc/mdadm.conf 
-rw-r--r-- 1 root root 2990 Jun  6 08:59 /mnt/gentoo/etc/mdadm.conf

I can't remember, that I changed /etc/mdadm.conf two weeks ago. Can it be, that some update changed my /etc/mdadm.conf?

Under /mnt/gentoo/dev I still find the old device names:

livecd ~ # ll /mnt/gentoo/dev/md*
lrwxrwxrwx 1 root root    4 Jun 30  2005 /mnt/gentoo/dev/md0 -> md/0
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md1 -> md/1
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md2 -> md/2
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md3 -> md/3
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md4 -> md/4
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md5 -> md/5
lrwxrwxrwx 1 root root    4 May  1  2006 /mnt/gentoo/dev/md6 -> md/6

/mnt/gentoo/dev/md:
total 0
brw-rw---- 1 root disk 9, 0 Apr 30  2006 0
brw-rw---- 1 root disk 9, 1 Jun 29  2005 1
brw-rw---- 1 root disk 9, 2 Jun 29  2005 2
brw-rw---- 1 root disk 9, 3 Jun 29  2005 3
brw-rw---- 1 root disk 9, 4 Jun 29  2005 4
brw-rw---- 1 root disk 9, 5 Jun 29  2005 5
brw-rw---- 1 root disk 9, 6 Jun 29  2005 6

So for me the question "who changed the name of raid devices?" remains. 
And the next question is: "Is there an easy way to solve the problem?", or is the only to boot with an alternative boot medium and edit the entries in /etc/fstab an /etc/lilo.conf or /boot/grub/grub.conf?

 



Comment 5 Juergen Rose 2010-06-22 18:21:24 UTC
Still an additional info:

livecd ~ # grep mdadm /mnt/gentoo/var/log/emerge.log | tail
1264379699:  ::: completed emerge (4 of 25) sys-fs/mdadm-3.1.1-r1 to /
1275631697:  >>> emerge (3 of 26) sys-fs/mdadm-3.1.2 to /
1275631700:  === (3 of 26) Cleaning (sys-fs/mdadm-3.1.2::/usr/portage_grizzly/sys-fs/mdadm/mdadm-3.1.2.ebuild)
1275631700:  === (3 of 26) Compiling/Merging (sys-fs/mdadm-3.1.2::/usr/portage_grizzly/sys-fs/mdadm/mdadm-3.1.2.ebuild)
1275631745:  === (3 of 26) Merging (sys-fs/mdadm-3.1.2::/usr/portage_grizzly/sys-fs/mdadm/mdadm-3.1.2.ebuild)
1275631758:  >>> AUTOCLEAN: sys-fs/mdadm:0
1275631758:  === Unmerging... (sys-fs/mdadm-3.1.1-r1)
1275631760:  >>> unmerge success: sys-fs/mdadm-3.1.1-r1
1275631761:  === (3 of 26) Post-Build Cleaning (sys-fs/mdadm-3.1.2::/usr/portage_grizzly/sys-fs/mdadm/mdadm-3.1.2.ebuild)
1275631761:  ::: completed emerge (3 of 26) sys-fs/mdadm-3.1.2 to /

I.e. mdadm-3.1.2 was emerge at Jun 4th. Did mdadm-3.1.2 rename the raid-devices?
Comment 6 Steffen Schaumburg 2010-08-05 20:44:46 UTC
I ran into this bug as well, and it is not fixed.
Jürgen, can you please re-open the bug? Since I'm neither the reporter nor a Gentoo dev I do not have the access rights to do so. Downgrading is merely a workaround, at least as long as the newer version is still in the tree.
I would also suggest changing the summary to this: sys-fs/mdadm-3.1.2 causes LVM2 to fail on boot

Could you also raise the severity since it causes boot failure and may even lead people to mistakenly think that their RAID arrays have been destroyed/damaged, which in turn may lead to them to actually destroy the arrays.

Here's how it happened to me:
- I booted my system with 2.6.34 kernel and updated to mdadm-3.1.2
- I reshaped/grew my two main RAIDs (one from 3drive RAID5 to 4drive RAID5, the other from 2drive RAID1 to 3drive RAID5) which have an LVM PV each on top of them. These two RAID-LVMs use mdadm superblock 1.01. Additionally I also have 2 "plain" RAIDs (a 3drive RAID1 for /boot and a 3 drive RAID5 for /). These two "plain" RAIDs use mdadm superblock 0.90, however I have not (yet) reshaped/grown them.
- I logged out of KDE and back in
- during the loading of my many autostarted programs the system froze up (no response to Ctrl+Alt+F1, Ctrl+Alt+Del or Ctrl+Alt+Backspace)
- I did a manual power off by holding the power button
- I booted again
- It shows many error messages like "/sys/devices/virtual/block/md1  7586" and then proceeds to hang on "Setting up Logical Volume Manager" (I waited for just a few minutes once, then about 10 minutes, finally over 20 minutes). It is possible to abort LVM setup to drop to a shell. It is also possible (but not very useful since my /usr is on one of the main RAIDs that use LVM) to boot into recovery mode with busybox.
- I tried this with vanilla-sources-2.6.35-rc6 and gentoo-sources-2.6.34 (-r0), the latter being a known-good kernel for my system (well, except for the original crash after KDE-login mentioned above).

At this point I had sys-fs/lvm2-2.02.67-r2 and sys-fs/mdadm-3.1.2.
All drives are SATA, connected to the mainboard's SATA ports 1, 3, 4 and 5. It might be worth noting that during BIOS POST the mainboard only shows the drives up to port 4. Nothing is connected to SATA ports 2 and 6, and on PATA I merely have a DVDRW drive. I use sys-apps/baselayout-2.0.1.

From busybox I removed LVM from the boot runlevel. I then did _not_ boot the system again, but instead booted a system rescue CD v1.5.1 (amd64 kernel, Gentoo-based). The boot CD recognised and "activated" (not sure if that's the right word) all 4 of my RAIDs and recognised the LVMs. I mounted all the relevant partitions, chrooted, re-added LVM to the boot runlevel and changed the software as such:
DOWNgrade to sys-fs/mdadm-3.1.1-r1 and UPgrade to sys-fs/lvm2-2.02.70.
Rebooting into vanilla-sources-2.6.35-rc6 now worked fine.

I believe the fact that the first errors occured before LVM was even loaded clearly indicates that mdadm is at fault here, but I'm happy to confirm this if necessary.

I would suggest considering to hardmask mdadm-3.1.2 due to the not negligible chance that people may destroy their data by misinterpreting the problem.

If there's any testing I can do or further information that may be helpful please just ask. I hope I wasn't too verbose...
Thanks, Steffen
Comment 7 Steffen Schaumburg 2010-08-05 20:46:39 UTC
Oh and I do not have any version of udisk installed.
Comment 8 Juergen Rose 2010-08-06 08:46:30 UTC
Reopened due to Comment  #6 of steffen schaumburger.
Comment 9 Steffen Schaumburg 2010-08-20 16:28:37 UTC
Ok whilst I can confirm that the bug occurs in mdadm-3.1.2 even if the system has shut down properly, using mdadm-3.1.3 works fine. So this can be closed now :)
Comment 10 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2010-08-20 17:32:46 UTC
Closing invalid per comment #9 that mdadm-3.1.2 was broken.