Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 150113 - udev doesn't wait long enough to process events
Summary: udev doesn't wait long enough to process events
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: All Linux
: High critical (vote)
Assignee: Greg Kroah-Hartman (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: udev-meta
  Show dependency tree
 
Reported: 2006-10-04 14:45 UTC by Karl-Johan Karlsson
Modified: 2007-01-15 10:53 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Karl-Johan Karlsson 2006-10-04 14:45:11 UTC
/lib/rcscripts/addons/udev-start.sh from sys-apps/baselayout-1.12.5-r2 waits for udev to process all bootup events by calling

   /sbin/udevsettle --timeout=60

However, 60 seconds isn't nearly enough time for this to complete on my 32-CPU SGI Origin 2000, making the boot process die shortly afterwards trying to fsck not-yet-existing /dev/md/* devices. Removing "--timeout=60" makes my machine boot again.

Since the machine actually became slower when I expanded it from 16 to 32 CPU:s I suspect there are kernel misconfigurations or bugs behind it taking this long, but dying horribly on boot because of a timeout that's too short shouldn't happen anyway.

Portage 2.1.2_pre2 (default-linux/mips/2006.1/ip27/o32/nptl, gcc-4.1.1, glibc-2.4-r4, 2.6.17.10-mipsgit-20060618 mips64)
=================================================================
System uname: 2.6.17.10-mipsgit-20060618 mips64 R12000 V2.3  FPU V0.0
Gentoo Base System version 1.12.5
Last Sync: Wed, 04 Oct 2006 01:50:01 +0000
ccache version 2.4 [enabled]
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: [Not Present]
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.4-r2
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.18.1
sys-devel/autoconf:  2.13, 2.60
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.91.0.2
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.16.16
ACCEPT_KEYWORDS="mips ~mips"
AUTOCLEAN="yes"
CBUILD="mips-unknown-linux-gnu"
CFLAGS="-O2 -pipe -mips4 -mabi=32 -fweb -ftracer -frename-registers -ggdb"
CHOST="mips-unknown-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -pipe -mips4 -mabi=32 -fweb -ftracer -frename-registers -ggdb"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--alphabetical"
FEATURES="autoconfig buildpkg ccache distlocks fixpackages maketest metadata-transfer nostrip parallel-fetch sfperms strict userpriv"
GENTOO_MIRRORS="http://ftp.sunet.se/pub/Linux/distributions/gentoo http://mirror.gentoo.se http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
INSTALL_MASK="/var/mail /pkg"
LDFLAGS="-Wl,-O1"
LINGUAS="en en_US sv sv_SE ja ja_JP de de_DE fr fr_FR"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/opt/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X bash-completion berkdb bitmap-fonts cli cups dlloader doc elibc_glibc emacs fortran gdbm gtk gtk2 idn input_devices_keyboard input_devices_mouse ip27 isdnlog jpeg kde kerberos kernel_linux libwww linguas_de linguas_de_DE linguas_en linguas_en_US linguas_fr linguas_fr_FR linguas_ja linguas_ja_JP linguas_sv linguas_sv_SE logrotate mips ncurses nls nptl nptlonly pam pcre pdf perl png pppd python qt3 readline reflection sdl session spl ssl svg tcpd tiff truetype truetype-fonts type1-fonts unicode userland_GNU video_cards_impact xorg"
Unset:  CTARGET, LANG, LC_ALL, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Greg Kroah-Hartman (RETIRED) gentoo-dev 2006-10-05 13:27:27 UTC
So what kind of timeout value should we use here?
Comment 2 Karl-Johan Karlsson 2006-10-06 09:37:47 UTC
Is there a reason for having a timeout at all? Couldn't any missed event potentially make booting impossible? If so, successful booting requires us to wait until all events have been processed.

Or are there situations where new events come in continuously and it's better to continue and drop into an emergency shell for debugging when the boot process fails at some later stage, instead of waiting forever for the events to stop coming?
Comment 3 Greg Kroah-Hartman (RETIRED) gentoo-dev 2006-10-24 14:37:08 UTC
Can you try the 103 release?

And yes, it is good to break out eventually so we don't hang forever.  If
103 doesn't fix this, I'll bump up the timeout value some more for bigger machines.
Comment 4 Kevin Parent 2006-12-05 11:53:38 UTC
This issue affected my system as well when upgrading from ck-sources-2.6.17-ck1-r2 to ck-sources-2.6.18-ck1-r2.  I'm using the latest stable release of udev for amd64 which is udev-087-r1.

/etc/init.d/checkroot runs before udev-start has completed populating /dev.  The raid device node for my root partition (/dev/md/2) isn't created before /etc/init.d/checkroot is started resulting in an fsck error, prompting password for maintenance mode.  This happens 19 out of 20 boots or so.

I determined this fact by pausing checkroot, editing /etc/init.d/checkroot from

--- snip ---
start () {
         local retval=0
--- end snip---

to:

---snip---
start () {
         /bin/date
         sleep 3s
         /bin/date
         local retval=0
--- end snip ---

The output showed that checkroot started at 10:23:23, paused, and resumed at 10:23:26, with a udevd-event error between the timestamps.  My system booted successfully with the pause.

--- snip ---
Press I to enter interactive boot mode

* Mounting proc at /proc...						[ OK ]
* Mounting sysfs at /sys...						[ OK ]
* Mounting /dev for udev...						[ OK ]
* Seeding /dev with needed nodes...					[ OK ]
* Setting up proper hotplug agent...
*   Using netlink for hotplug events...					[ OK ]
* Starting udevd...							[ OK ]
* Populating /dev with existing devices through uevents...		[ OK ]
* Letting udev process events...					[ OK ]
* Finalizing udev configuration...					[ OK ]
* Mounting devpts at /dev/pts						[ OK ]
Tues Dec 5 10:23:23 CST 2006
udevd-event[1216]: find_free_number: %e is depreciated, will be removed and is unlikely to work correctly.  Don't use it.
Tues Dec 5 10:23:26 CST 2006
* Checking root filesystem...
/dev/md2: clean, 487402/131070 files, 2074132/2621248 blocks		[ OK ]
* Remounting root filesystem read/write...				[ OK ]
--- end snip ---

Listing /dev/md shows that /dev/md/2 was created about 3 seconds later than when checkroot started:

# ls -lRtr --full-time /dev/md
/dev/md:
total 0
brw-rw---- 1 root disk 9, 0 2006-12-05 10:23:23.890344415 -0600 0
brw-rw---- 1 root disk 9, 1 2006-12-05 10:23:23.925339095 -0600 1
brw-rw---- 1 root disk 9, 2 2006-12-05 10:23:26.565937663 -0600 2
brw-rw---- 1 root disk 9, 3 2006-12-05 10:23:27.812748119 -0600 3

The /dev/md2 and md3 links are actually created before their /dev/md/* nodes while /dev/md0 and md1 links are made at EXACTLY the same time as their nodes:

# ls -lRtr --full-time /dev/md*
lrwxrwxrwx 1 root root   4 2006-12-05 10:23:23.846351103 -0600 /dev/md2 -> md/2
lrwxrwxrwx 1 root root   4 2006-12-05 10:23:23.890344415 -0600 /dev/md0 -> md/0
lrwxrwxrwx 1 root root   4 2006-12-05 10:23:23.925339095 -0600 /dev/md1 -> md/1
lrwxrwxrwx 1 root root   4 2006-12-05 10:23:24.064317967 -0600 /dev/md3 -> md/3

Shouldn't the nodes be created before the links?

This issue is probably already known since /lib/rcscripts/addons/udev-start.sh contains the following (udev-087-r1) in the populate_udev function:

--- snip ---
        # loop until everything is finished
        # there's gotta be a better way...
        ebegin "Letting udev process events"
        loop=0
        while test -d /dev/.udev/queue; do
                sleep 0.1;
                test "$loop" -gt 300 && break
                loop=$(($loop + 1))
        done
        #einfo "loop = $loop"
        eend 0
--- end snip ---

and that loop is replaced with the udevsettle command in latter versions of udev.

udev-start.sh should complete/exit successfully before checkroot runs.  Additionally, it would be nice if the checkroot init script did a sanity check for the root fs device node before it runs, outputting a more informative/correct error message or pausing until the root fs device node is created.

# emerge --info
Portage 2.1.1-r2 (default-linux/amd64/2006.1/desktop, gcc-4.1.1, glibc-2.4-r4, 2.6.18-ck1-r2 x86_64)
=================================================================
System uname: 2.6.18-ck1-r2 x86_64 AMD Athlon(tm) 64 Processor 3500+
Gentoo Base System version 1.12.6
Last Sync: Tue, 05 Dec 2006 10:50:01 +0000
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: 1.3.7, 2.0.30
dev-lang/python:     2.3.5-r2, 2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.60
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r4
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=k8 -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.datapipe.net/gentoo http://gentoo.chem.wisc.edu/gentoo/"
LANG="en_US"
LC_ALL="en_US"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.namerica.gentoo.org/gentoo-portage"
USE="amd64 X alsa apache2 berkdb bitmap-fonts bzip2 cairo cdparanoia cdr cli cracklib crypt cups dga dlloader dri eds elibc_glibc emboss encode esd evo fam ffmpeg flac foomaticdb fortran frontendonly gd gdbm gif glut gmp gnome gpm gstreamer gtk gtk2 gtkhtml iconv idn imap imlib input_devices_keyboard input_devices_mouse irda irmc isdnlog ithreads java jpeg kde kernel_linux lcms ldap libg++ logrotate lzw mad mbox mikmod mng mp3 mpeg mysql mythtv ncurses nls nptl nptlonly nsplugin nvidia oav offensive ogg oggvorbis opengl oss pam pcre pda pdf perl png posix ppds pppd pthreads python qt3 qt4 quicktime readline reflection samba sdl session slang spell spl ssl tcpd tiff truetype truetype-fonts type1-fonts udev unsupported_8bit usb userland_GNU video_cards_nvidia vorbis xine xml xorg xpm xv xvmc zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 5 Greg Kroah-Hartman (RETIRED) gentoo-dev 2006-12-06 14:55:51 UTC
(In reply to comment #4)
>
> * Mounting devpts at /dev/pts                                           [ OK ]
> Tues Dec 5 10:23:23 CST 2006
> udevd-event[1216]: find_free_number: %e is depreciated, will be removed and is
> unlikely to work correctly.  Don't use it.

Please update your rules, they are incorrect.  See if that fixes anything or not.
Comment 6 Kevin Parent 2006-12-06 19:51:18 UTC
(In reply to comment #5)
> (In reply to comment #4)
> >
> > * Mounting devpts at /dev/pts                                           [ OK ]
> > Tues Dec 5 10:23:23 CST 2006
> > udevd-event[1216]: find_free_number: %e is depreciated, will be removed and is
> > unlikely to work correctly.  Don't use it.
> 
> Please update your rules, they are incorrect.  See if that fixes anything or
> not.
> 

I did.  No help, except the %e error message is gone.  Doesn't happen with ck-sources-2.6.17-ck1-r2 or gentoo-sources-2.6.18-r3.

I was also having an issue with an ext3 error "mounted in future, fixed" as listed in bug 142850.  I added "clock" to the critial services so it would run before checkroot.  Apparently that gives udev enough time to create my md2 root node and boots successfully.

I'd upgrade to udev-103 but its listed as unstable for amd64.  Took me long enough to figure this one out.
Comment 7 Jakub Moc (RETIRED) gentoo-dev 2007-01-15 10:53:36 UTC
Get back to us once you've tested udev-103 as already requested a couple of times.