Summary: | LVM not starting properly with baselayout-1.11.8 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Harris Landgarten <harrisl> |
Component: | [OLD] baselayout | Assignee: | Greg Kroah-Hartman (RETIRED) <gregkh> |
Status: | RESOLVED REMIND | ||
Severity: | normal | CC: | kamensky.fb, kay.sievers |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
hotplug environment logger
hotplug log |
Description
Harris Landgarten
2004-12-22 19:58:43 UTC
adding calls to 'sleep' is not a valid fix do you use udev ? if so, what version ? Yes I am using udev-050 if you edit /sbin/rc and change this if statement: if [ "$(udev_version)" -ge "48" ] ; then read like this: if [ "$(udev_version)" -ge "500" ] ; then and then reboot, does everything work out ? The change to /sbin/rc does not fix the problem. On a hunch, I downgraded to UDEV-045 and the problem stopped. I now believe this problem is related to bug 74786. In addition to fixing the LVM problem it also solved a problem where net.eth0 was showing started before it received an IP address from dhcpcd causing ntp-client to fail to start. I appears that udevsend is broken is more ways than one and is the cause of various race conditions on boot. Any chance to try a 2.6.10 kernel? We've had a bug in 2.6.9 which causes holes hotplug sequence numbers: http://linus.bkbits.net:8080/linux-2.5/cset@41798d05sEo-J9zrpeXvVYgctOLw1w I was already running 2.6.10-gentoo Created attachment 46975 [details]
hotplug environment logger
Ah, ok. Any chance to reproduce this while recording the events with the simple
attached script?
Where should I put the script? Sorry, forgot that: /etc/hotplug.d/default/00-log.hotplug Created attachment 46978 [details]
hotplug log
The first time I rebooted with the log script LVM started properly but ntp-client did not. I rebooted and LVM hung at the maintenance point. After running vgchange -ay and exiting booting completed and this time ntp-client started. hotplug.log is attached. Seems you didn't use /sbin/udevsend as the hotplug helper as there is no MANAGED_EVENT=1 key in your log. I rebooted again and it hung again. I checked /dev/.hotplug.log from the maintenance point and it showed --------924-------- 1104169448 add /module/dm_mod SUBSYSTEM=module DEVPATH=/module/dm_mod PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=924 --------924-------- --------925-------- 1104169448 add /module/dm_mod/sections SUBSYSTEM=module DEVPATH=/module/dm_mod/sections PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=925 --------925-------- --------926-------- 1104169448 add /class/misc/device-mapper SUBSYSTEM=misc DEVPATH=/class/misc/device-mapper PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=926 --------926-------- Downgrading to udev-045 definitely solved the problem so what else could it be? Don't know what is causing the problem here. It's definitely not the udevsend problem. We need to narrow the possibilities. Do the rules change when you up/downgrade udev? You may try the 045 udev with the 050 rules, or the other way around. To begin with, I wanted to verify my race condition prop. I added a sleep 5 to checkfs before the lvm section. Rebooting failed. I changed the sleep 5 to a sleep 10 and rebooting suceeded many times. This proves that checkfs is running before /dev/mapper/control can be established. This only started happening with baselayout-1.11.8 and udev > 045. I doubt it is rule based because udev-050 works fine with baselayout-1.11.7-r2. Something changed with these updates that slowed down the post modprobe processing of dm_mod using udev. Whatever is causing this may be the cause of other problems. Aside from that, checkfs needs a more robust method than testing for the existence of /dev/mapper/control to handle when dm_mod is in process but not finished. BTW, should I post the net.eth0 problems as a separate bug? > I changed the sleep 5 to a sleep 10 and rebooting suceeded many times.
You can see in your logs that the kernel emits the events with 20 seconds
time difference:
--------926--------
1104169448 add /class/misc/device-mapper
--------927--------
1104169468 add /block/dm-0
Any idea why this happens?
What is triggering the creation of the dm-0 device at bootup?
I captured the hotplug_d.log and ls -l /dev/mapper before the LVM stuff in checkfs which is right after dm_mod is modprobed hotplug_d.log contains: --------924-------- 1104206320 add /module/dm_mod SUBSYSTEM=module DEVPATH=/module/dm_mod PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=924 --------924-------- --------925-------- 1104206321 add /module/dm_mod/sections SUBSYSTEM=module DEVPATH=/module/dm_mod/sections PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=925 --------925-------- --------926-------- 1104206321 add /class/misc/device-mapper SUBSYSTEM=misc DEVPATH=/class/misc/device-mapper PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=926 --------926-------- ls -l /dev/mapper ls: /dev/mapper: No such file or directory At the maintenance point /dev/.hotplug_d.log has not changed but ls -l /dev/mapper gives: crw-rw---- 1 root root 10, 61 Dec 27 22:58 control dm-0 etc are produced after vgchange -ay runs which explains the delay since it is being input manually. It seems that it is simple case of /dev/mapper/control not appearing in time for checkfs and I assume this is the responibility of udev in response to dm_mod loading. It looks like event 926 triggers this but why is it taking over 5s? Could it have anything to do with the fact that it is the last event? Is udev waiting for something else? /etc/hotplug.d/default/10-udev.hotplug is symlinked to /sbin/udevsend. Doesn't this mean that udevsend is being used as a helper agent for all hotplug events? If not udevsend -> udevd, what is processing kernel event 926 and creating /dev/mapper/control? > It seems that it is simple case of /dev/mapper/control not appearing > in time for checkfs and I assume this is the responibility of udev > in response to dm_mod loading. Yes, udev creates the node with the event. > /etc/hotplug.d/default/10-udev.hotplug is symlinked to /sbin/udevsend. > Doesn't this mean that udevsend is being used as a helper agent for all > hotplug events? If not udevsend -> udevd, what is processing kernel > event 926 and creating /dev/mapper/control? Yes, the unpredictability of events by forked helper preocesses makes it necessary for udev to reorder the events. It's that way from the early days of udev on. In the future, we want to take over the _whole_ hotplug event with udevsend by letting the kernel fork /sbin/udevsend instead of /sbin/hotplug, but that is disabled on your box, according to your logs. > Could it have anything to do with the fact that it is the last event? > Is udev waiting for something else? Not that I can think of anything. > It looks like event 926 triggers this but why is it taking over 5s? Good question. Please put the same script from /hotplug.d/ in: /etc/dev.d/default/00-log.dev and let it write to another file. This way we get the time udev has created the node. Sorry for the delay. My experimental machine was busy recovering a failed stripe 0 for a customer. The following was recorded at the maintenance point. From hotplug_d.log --------926-------- 1104519006 add /class/misc/device-mapper SUBSYSTEM=misc DEVPATH=/class/misc/device-mapper PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ HOME=/ SHLVL=2 SEQNUM=926 --------926-------- From dev_d.log --------926-------- 1104519016 add /class/misc/device-mapper SUBSYSTEM=misc DEVPATH=/class/misc/device-mapper PATH=/sbin:/bin:/usr/sbin:/usr/bin ACTION=add _=/bin/env PWD=/ MANAGED_EVENT=1 DEVNAME=/dev/mapper/control HOME=/ SHLVL=2 SEQNUM=926 --------926-------- As you can see it is taking 10s for udev to respond to the 926 request. BTW, how do I configure the system so that udevsend takes over all hotplug activity? Happy New Year! > As you can see it is taking 10s for udev to respond to the 926 request. That's bad. How does the complete sequence look like? Are more udev events delayed like this one? Or is it only this single one. I can't reproduce this on any of my boxes, both events happen in the same second after loading the module. > how do I configure the system so that udevsend takes over all hotplug activity? By echo "/sbin/udevsend > /proc/sys/kernel/hotplug". This way the kernel notifies udevd and the event is handled by the forked udev instance after the node creation and waiting for sysfs to be ready. But we should find the error first in your current setup. I looked at the .dev_d.log file at the maintance point. At this point hotplug_d.log only has 3 entries as show above. dev_d.log was a 256K file but the 926 entry was the only one with a sequence number and was the last entry in the file. The only other issue I have is that net.eth0 is shown as started before it receives an ip from dhcpcd on boot. An ifconfig immediately after root signon sometimes shows eth0 without an IP address with subsequent tries showing the IP address. I don't know if this is somehow related but it is baselayout-1.11.8 issue I think we should put some debugging statements into udevsend so that we can see what is causing the delay. udevsend is probably timing out of the code that starts on line 179. It is also possible that udev is slow in starting. Would 926 be the sequency that intially causes udevsend to load udevd? Harris Just an idea: You may try to remove the udevd kill from /sbin/rc:91: killall -q udevd This is bad. It will delay all further events for 10 seconds on a current udev. I've made a patch to prevent the initial delay, so killing it should work sometimes in the future, but doing it today isn't a good idea: http://bugs.gentoo.org/attachment.cgi?id=46753&action=view Maybe your network problems are caused by the changes according to: http://bugs.gentoo.org/show_bug.cgi?id=73327 I tried inserting a usb stick to see if there were any delays in udev creating a dev for it. Both .hotplug_d.log and .dev_d.log show the same time stamp in processing this event so it seems like all is working as it should once things get started and settle down. Since I have a totally reproducable boot related bug on a new 2004.3 installation I would like to systematically track it down. We know that there is a 10s delay in creating /dev/mapper/control. Assuming this delay is happening in the loop that starts at line 179 of udevsend.c (my first guess), could you create a version of that file with some debugging output that will show the status of that loop and whether it is waiting on udevd or something else is preventing its return. > seems like all is working as it should once things get started and settle down. Yes, seems so. Loading all basic modules _before_ the run of udevstart should solve all these kinds of problems too. That's the way other distributions are doing this and we don't get reports from them. > Since I have a totally reproducable boot related bug on a new 2004.3 installation > I would like to systematically track it down. We know that there is a 10s delay in > creating /dev/mapper/control. Did you try the patch, that reduces the startup delay? http://bugs.gentoo.org/attachment.cgi?id=46753&action=view > Assuming this delay is happening in the loop that starts at line 179 of > udevsend.c (my first guess), That's very unlikely. I don't expect any problem here. > could you create a version of that file with some > debugging output that will show the status of that loop and whether it is waiting > on udevd or something else is preventing its return. Just pass a "DEBUG=true" to the make command and udev will print that to syslog. I applied the udevd-initial-timout-fix-01.patch to udev-050 but it did not make any difference. I am still getting a 10s delay. Hmm, no idea. Any chance to pass DEBUG=true to the make command. It will write a lot of debug to syslog. Will DEBUG=true emerge udev do what you want. Also remember /var is not mounted when the error occurs. > Will DEBUG=true emerge udev do what you want. Sorry, I don't know anything about gentoo. > Also remember /var is not mounted when the error occurs. Ah, I see. It should print on the console then, but it may be too much log to follow it, don't know... Apart from the possible udev bug, that we looking for and it would be nice to find, gentoo should consider loading the dm modules before it runs udevstart. I just added a lvm home to my notebook and now it has the same problem. Another data point: I'm also seeing the problem after upgrading to baselayout-1.11.11-r3, but this isn't a ~x86 system: udev-045 lvm2-2.00.33-r1 device-mapper-1.00.19-r2 gentoo-sources-2.6.11-r6 Possibly just coincidental: when I boot, I also get /dev/usb created as a character device rather than a directory, so apcupsd fails until I recreate it as a directory and reattach my UPS. (Just like <https://www.redhat.com/archives/fedora-list/2005-January/msg07909.html>.) reopen this bug if this is still an issue. This should be fixed with newer baselayouts and newer lvm2 packages |