Bug 198892 - boot w/ mdadm fails to find device nodes and also flunks multiple devices
|
Bug#:
198892
|
Product: Gentoo Hosted Projects
|
Version: unspecified
|
Platform: All
|
|
OS/Version: Linux
|
Status: RESOLVED
|
Severity: major
|
Priority: P2
|
|
Resolution: FIXED
|
Assigned To: genkernel@gentoo.org
|
Reported By: robbat2@gentoo.org
|
|
Component: genkernel
|
|
|
URL:
|
|
Summary: boot w/ mdadm fails to find device nodes and also flunks multiple devices
|
|
Keywords:
|
|
Status Whiteboard:
|
|
Opened: 2007-11-12 06:03 0000
|
Booting on my G5, with domdadm. I have 3 MD devices, md0 = /boot, md1 = /, md2
= LVM.
>> Detected real_root as a md device. Setting up the device node...
mdadm: error opening /dev/md/0: No such file or directory
mdadm: failed to open array
mdadm: error opening /dev/md/1: No such file or directory
mdadm: failed to open array
mdadm: error opening /dev/md/2: No such file or directory
mdadm: failed to open array
>> ...
I changed the /etc/mdadm.conf in the initramfs to use the /dev/mdX names
instead of /dev/md/X, and then I found that it only starts the first device
listed in the mdadm.conf file.
The /etc/mdadm.conf was generated with the busybox mdadm in the first place, so
I guess it fails for the lack of the /dev/md/ directory.
OK. How do I fix it, then?
Seems a bug with busybox mdadm patch.
The function "int mdassemble_main()" expects the "/dev/md" string (without the
trailing /). This function is called to assemble the arrays.
In contrast, the function "brief_examine_super1()" writes "ARRAY /dev/md/".
This function is invoked in order to examine the devices (the output of
/etc/mdadm.conf comes from this one).
I tried deleting the slash in "brief_examine_super1()". Now works for me. The
array is assembled.
There seems to be another bug in the function "int mdassemble_main()". The if
condition for creating the node should be based on errno instead of the return
value of strtoul(). This function returns a long value (the minor number), not
an error condition.
Want to test with more than one array to verify things. But seems pretty clear.
The patch to mdadm is trivial, but can provide it anyway.
Attached the patch.
Have tested it creating up to 3 different arrays. All of them are started ok.
Have found problems if the metadata version is 1.1 or 1.2 (haven't tried with
the old version 0.9). Only version 1.0 arrays are started.
BTW, sys-fs/mdadm was also affected by a bug similar to this (was fixed in
release 2.6.3 though). Perhaps both bugs are related.
I'll try it in 12 hours or so. I have 0.9, 1.0 and 1.1 metadata arrays.
Some mdadm/kernel versions have a glitch that causes them to behave weirdly
with metadata 1.0 arrays as well.
Ok, it generates the mdadm.conf correctly now, 0.9 metadata works fine, v1.1 is
detected, but not assembled: 'no devices found'
Thank you for testing, Robin ;-).
You confirm we still have some bugs with versions 1.1 and 1.2. Versions 0.9 and
1.0 are working ok. I will have a look at busybox mdadm and try to propose a
patch. I expect to do it this week.
BTW, could you be more explicit about that bug regarding some mdadm/kernel
versions?
For the main mdadm/kernel bugs, boot one of the pre-release 2007.1 media (I ran
into the bug on the ppc64 media that I was testing for rangerpb), and try to
create a 1.0 array. It will be created, but mdadm will fail to start it. If you
then try to assemble it manually using the sysfs interface, you'll get a
backtrace in the kernel, and the box hangs in some cases.
That's why my md3 is v1.1 actually. I prefer 1.0 and 0.9, since the superblocks
are at the end of the device, so they don't block unmodified booting of a RAID1
device.
I was using real disk, and I'm not sure if it happens with loop devices offhand
(it should tho).
(see "man md" for the exact details).
0.9 - superblock located in the last 64-128K of device, 64Kb in size.
1.0 - superblock located in the last 8-12K of device.
1.1 - at start of device.
1.2 - starting at 4K in the device.
I've gone ahead and added this. I'd rather we support *some* of the versions
than none.
This should be done in 3.4.9_pre10 which will be hitting the tree soon.
Seems reasonable to me. We only gain functionality since it doesn't work now.
A fully working version supporting all 4 metadata types could be released in
another genkernel revision.
Just my opinion. I would like to hear Robin about this.
we're only moving forward by including it, no objections from me.
getting there in small steps is a help when doing a big step in one go isn't
available.
Attached a new patch against sys-kernel/genkernel-3.4.9 to support all 4
metadata versions (i.e., 0.9, 1.0, 1.1 and 1.2).
The patch is based on some changes from full mdadm-2.6.2 to mdadm-2.6.3 (so
credit should go to Neil Brown).
Have reasonably tested and think it should work fine.
OK, I added the mdadm3 patch to SVN, so it'll show up in 3.4.10_pre2...