Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 99663

Summary: using baselayout-1.12.0_pre1-r1 after a service fails to start some other services freeze when trying to start them
Product: Gentoo Linux Reporter: Petteri Räty (RETIRED) <betelgeuse>
Component: [OLD] baselayoutAssignee: Gentoo's Team for Core System packages <base-system>
Status: RESOLVED FIXED    
Severity: blocker CC: chris, derek.berube, dsdale24, g1gsw, gentoo-bugs, hynek, ian, jay.phelps, kwach, matrixhax0r, nbkolchin, patrick, radek, Rainmaker526, uberlord
Priority: High Keywords: InVCS
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: /etc/conf.d/rc
/etc/conf.d/net
runscript.sh
rc-services.sh
rc-services.sh
runscript.sh

Description Petteri Räty (RETIRED) gentoo-dev 2005-07-20 06:16:13 UTC
Before updating to 1.12.0_pre1-r1 I had parallel startup on. After updating I
thought It would be better to turn off parallel startup for the first boot just
to be sure. This lead to problems. Services start up until sshd and then nothing
happens. I can ssh in put clearly some services are not running. Trying to start
/etc/init.d/{portmap,netmount} with RC_VERBOSE="yes" doesn't give any output and
nothing happens. After switching back to parallel startup I have no problems. 

Reproducible: Always
Steps to Reproduce:
1. update to the latest baselayout with parallel startup on
2. turn parallel off
3. reboot (Be sure to have some way to rescue yourself)




I talked with UberLord on IRC and we thought this could also we have something
to do with the network but changing the values of RC_NET_STRICT_CHECKING does
not effect the problem.
Comment 1 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 06:16:36 UTC
pena netbeans1 # emerge info
Portage 2.0.51.22-r2 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r0,
2.6.12-gentoo-r6 i686)
=================================================================
System uname: 2.6.12-gentoo-r6 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz
Gentoo Base System version
distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
dev-lang/python:     2.4.1-r1
sys-apps/sandbox:    1.2.11
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.18-r1
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O3 -march=pentium4 -pipe -mfpmath=sse -ffast-math -fomit-frame-pointer"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env
/usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config
/usr/lib/X11/xkb /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/init.d /etc/splash /etc/terminfo /etc/env.d"
CXXFLAGS="-O3 -march=pentium4 -pipe -mfpmath=sse -ffast-math -fomit-frame-pointer"
DISTDIR="/usr/src/distfiles"
FEATURES="autoconfig ccache cvs distlocks fixpackages noauto sandbox sfperms strict"
GENTOO_MIRRORS=" http://trumpetti.atm.tut.fi/gentoo 
http://lame.lut.fi/linux/gentoo "
LANG="en_US.utf8"
LC_ALL="en_US.utf-8"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/overlays/betelgeuse
/usr/local/overlays/gentoo-java-experimental /usr/local/overlays/gentopia"
SYNC="rsync://aria/portage"
USE="x86 X aac acl acpi alsa apm audiofile avi bash-completion berkdb
bitmap-fonts browserplugin bzip2 bzlib cdb cddb cdparanoia cdr crypt cups curl
divx4linux dts dvd dvdr dvdread emboss esd fam flac foomaticdb freetype gcj gif
gnome gstreamer gtk gtk2 hal imagemagick java jpeg kde kdeenablefinal kdexdeltas
libg++ libwww logitech-mouse lzo mad makecheck mikmod mjpeg mmx mmx2 mp3 mpeg
ncurses network nptl nptlonly nvidia offensive ogg oggvorbis opengl pam pdflib
png python qt quicktime readline real rtc ruby samba slp spell sse sse2 ssl
subversion svg symlink tcpd theora tiff truetype truetype-fonts type1-fonts
unicode usb userlocales vorbis win32codecs xine xml xml2 xv xvid zlib
video_cards_nvidia userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LDFLAGS, LINGUAS, MAKEOPTS
Comment 2 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 06:17:12 UTC
Created attachment 63879 [details]
/etc/conf.d/rc
Comment 3 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 06:17:35 UTC
Created attachment 63880 [details]
/etc/conf.d/net
Comment 4 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 06:20:50 UTC
I just need to change RC_PARALLEL_STARTUP to no and it doesn't work any more. It
is  a bit weird that kdm starts so early with that turned off. Before xdm has
always been the last service to start. Now it comes before sshd. I think there
might be some problems with switching from parallel to normal.
Comment 5 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 07:45:09 UTC
It seems that this is caused by ntp-client failing to start. It is not a daemon
but just sets the clock from a ntp server. For some reason my local server
doesn't work. After changing it to pool.ntp.org I had no problems starting for
example netmount and hald. I removed it from default runlevel and now I have no
problems booting. 
Comment 6 Darren Dale 2005-07-20 07:48:35 UTC
When I rebooted my laptop, I got the report that cardmanager was watching one
socket, and then the system froze. I also noticed that services were not
starting.  I didn't try setting RC_PARALLEL_STARTUP to yes, I just booted from a
live cd and downgraded my baselayout to the previous version.

this bug seems related, http://bugs.gentoo.org/show_bug.cgi?id=98745: "init
scripts fail with our new baselayout version because we now do some tricks with
start-stop-daemon to ensure daemons start and stop correctly."

Maybe 1.12.0_pre1-r1 should be masked until these boot problems get sorted, what
do you think?
Comment 7 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 08:03:18 UTC
looking at the ntp-client start script it seems that it does not use
start-stop-daemon so this is probably not related
Comment 8 Jörg Gollnick 2005-07-20 14:20:52 UTC
At my systems it hangs on different points.  
Until we have a better solution I vote for masking baselayout-1.12.0_pre1-r1. 
(see also 99691) 
 
Comment 9 Jay Phelps 2005-07-20 21:00:30 UTC
I tried setting RC_PARALLEL_STARTUP to 'yes' and while this got me farther along
it still hung on my system.  I rolled back baselayout for now but I'm willing to
roll back forward to test once this get's nailed down to a a cause/potential fix. 
Comment 10 Petteri Räty (RETIRED) gentoo-dev 2005-07-20 22:51:59 UTC
You can use ctrl+c many times to get to console. If you can ssh in from the
outside you can use rc-status to find out which services get stuck. In here it
said starting for those services. Also you can look into
/var/lib/init.d/exclusive/ to get info about which get stuck. Thanks to UberLord
who gave me instructions on how to find the cause of the problem.
Comment 11 Hynek Schlawack 2005-07-21 00:11:21 UTC
I'm also voting for masking as it also additionally to hanging up screws up
starting up of rp-pppoe (seems like lock-file-issues). I've masked it here for
myself.
Comment 12 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 00:33:12 UTC
Weird, works fine here .. I asume it freezes with the stuff from the default
runlevel?  If so, could you guys post a 'ls /etc/runlevels/default', and also
try to remove ntp-client as suggested?
Comment 13 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 01:04:43 UTC
*** Bug 99672 has been marked as a duplicate of this bug. ***
Comment 14 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 01:44:03 UTC
For those with order problems, add this to /etc/init.d/test:

-----
#!/sbin/runscript

start() {
        trace_dependencies $(cd /etc/runlevels/default; ls)

}
-----

and chmod +x it, and then start it, and add the output here.
Comment 15 Petteri Räty (RETIRED) gentoo-dev 2005-07-21 01:51:14 UTC
pena mozilla-thunderbird # ls /etc/runlevels/default/
acpid      cupsd  distccd     famd   hald   metalog   netmount    ntpd  xdm
alsasound  dbus   domainname  fcron  local  net.eth0  ntp-client  sshd  xfs
Comment 16 Petteri Räty (RETIRED) gentoo-dev 2005-07-21 01:58:27 UTC
pena mozilla-thunderbird # /etc/init.d/test  start
 * Caching service dependencies ...                                            
                [ ok ]
 checkroot hostname modules checkfs localmount clock metalog net.eth0 net.lo xfs
portmap hotplug dbus bootmisc xdm sshd ntpd ntp-client netmount local hald fcron
famd domainname distccd cupsd alsasound acpid  hostname modules checkfs
localmount clock metalog net.eth0 net.lo xfs portmap hotplug dbus bootmisc xdm
sshd ntpd ntp-client netmount local hald fcron famd domainname distccd cupsd
alsasound acpid


This seems to explain why I could ssh in.
Comment 17 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 08:32:04 UTC
Ok, I'm going to attach two scripts, please copy them where stated, and make
sure its chmod +x, and let me know if this fixes it.
Comment 18 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 08:34:09 UTC
Created attachment 63999 [details]
runscript.sh

Please copy to /sbin.
Comment 19 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 08:35:08 UTC
Created attachment 64000 [details]
rc-services.sh

Please copy to /lib/rcscripts/sh/.
Comment 20 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 08:35:50 UTC
Just note that you need baselayout-1.12.0_pre1-r1 .. they will not work with
1.11.1x ...
Comment 21 Jasper Thrussell 2005-07-21 10:24:05 UTC
(In reply to comment #19)
> Created an attachment (id=64000) [edit]
> rc-services.sh
> 
> Please copy to /lib/rcscripts/sh/.
> 

Yup, these two scripts seem to have doen the trick, my system boots almost
normally now. I say 'almost' because for some reason my net.ra0 service doesn't
appear in the service list during boot, but it IS started. 

Just in case:

SpectrumZX jasper # /etc/init.d/test start
 checkroot hostname modules checkfs localmount clock syslog-ng net.eth0 net.lo
net.ra0 keymaps bootmisc xdm vixie-cron shorewall portmap numlock local hdparm
domainname consolefont alsasound acpid  hostname modules checkfs localmount
clock syslog-ng net.eth0 net.lo net.ra0 keymaps bootmisc xdm vixie-cron
shorewall portmap numlock local hdparm domainname consolefont alsasound acpid
Comment 22 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-21 13:24:38 UTC
Normal if I understand the parallel stuff .. sometimes it will rather drop
printing than garble two services output together - If using parallel that is.
Comment 23 Marek Kwasceki 2005-07-21 13:25:02 UTC
could you mask it if it's not repaired in the portage?

i had a quite a problem tooday, because my system refused to start net.eth0 and
made me go about 20km to fix it by hand. (first time my gentoo cost me money :/)
Comment 24 Petteri Räty (RETIRED) gentoo-dev 2005-07-21 13:28:20 UTC
(In reply to comment #23)
> could you mask it if it's not repaired in the portage?
> 
> i had a quite a problem tooday, because my system refused to start net.eth0 and
> made me go about 20km to fix it by hand. (first time my gentoo cost me money :/)

Personally in cases like this I don't really recommend using the baselayout from
unstable/testing. If you are using it, it would probably be best to wait a few
days and see if bugs like this come up.
Comment 25 Marek Kwasceki 2005-07-21 13:35:23 UTC
oh.. i don't complain (much) about it.
it was quite a coincidence.

but still... shouldn't it be masked for a while?
Comment 26 Petteri Räty (RETIRED) gentoo-dev 2005-07-21 14:14:25 UTC
(In reply to comment #19)
> Created an attachment (id=64000) [edit]
> rc-services.sh
> 
> Please copy to /lib/rcscripts/sh/.
> 

This version is better but still has problems. Now it fails to start portmap
which is needed by netmount and famd so they fail with and error about required
services not starting. After logging in I can execute /etc/init.d/netmount start
and it starts correctly with:
pena betelgeuse # /etc/init.d/netmount start
 * Starting portmap ...                                                        
               [ ok ]
 * Mounting network filesystems ...                                            
                [ ok ]

I haven't added portmap to any runlevel but it has worked before so I presume it
is just broken. I had ntp-client failing while testing. I will now test what
happens if it set to execute succesfully. 
Comment 27 Petteri Räty (RETIRED) gentoo-dev 2005-07-21 14:25:05 UTC
Yeah it is caused by ntp-client failing. You should use something that fails in
your testing. Something like this should be enough:
#!/sbin/runscript
start() { eerror "This always fails" ; return 1; }

I wonder why running this without eerror doesn't output anything. Well maybe
return 1 is not the right way to tell that the init script fails. Normally I see
those !! marks in the place where ok normally is. 
Comment 28 Roy Marples (RETIRED) gentoo-dev 2005-07-21 16:24:14 UTC
Created attachment 64017 [details]
rc-services.sh

Please copy to /lib/rcscripts/sh.
Comment 29 Roy Marples (RETIRED) gentoo-dev 2005-07-21 16:25:16 UTC
Created attachment 64018 [details]
runscript.sh

Please copy to /sbin.

New versions - these should solve all issues reported here
Comment 30 James White 2005-07-22 08:57:28 UTC
ntp-client is not in my init scripts.  But I fail just the same.
I booted in with liveCD and mounted my HDD - then deleted all of the symlinks in
/etc/init.d/runlevels/default.  I rebooted and was back in my system again (yay!).  

Then I /etc/init.d/<scriptname> start for all of the scripts that I had deleted
earlier.  Everything ran just fine (but net.wlan0 can't find it's configuration
information) so I rc-update added all scripts.

Rebooted and the darned thing siezed again.

Observations:  It isn't ntp-client related but it does seem related to my wlan
config file.
Comment 31 Marek Kwasceki 2005-07-22 09:02:35 UTC
(In reply to comment #30)
> ntp-client is not in my init scripts.  But I fail just the same.
> I booted in with liveCD and mounted my HDD - then deleted all of the symlinks in
> /etc/init.d/runlevels/default.  I rebooted and was back in my system again
(yay!).  
> 
> Then I /etc/init.d/<scriptname> start for all of the scripts that I had deleted
> earlier.  Everything ran just fine (but net.wlan0 can't find it's configuration
> information) so I rc-update added all scripts.
> 
> Rebooted and the darned thing siezed again.
> 
> Observations:  It isn't ntp-client related but it does seem related to my wlan
> config file.

did u try with attachment (id=64018) -> runscript.sh

well... i had it similar but with net.eth0. it couldn't find its config and it
hung doing xfs.

can sb explain what is the reason of this bug?
Comment 32 Roy Marples (RETIRED) gentoo-dev 2005-07-22 09:29:36 UTC
(In reply to comment #30)
> ntp-client is not in my init scripts.  But I fail just the same.

Doesn't matter - it's if any script fails and/or something depending on the
failed script.
Comment 33 Petteri Räty (RETIRED) gentoo-dev 2005-07-22 11:01:20 UTC
(In reply to comment #29)
> Created an attachment (id=64018) [edit]
> runscript.sh
> 
> Please copy to /sbin.
> 
> New versions - these should solve all issues reported here

It still fails to start netmount and famd after ntp-client fails.
Comment 34 James White 2005-07-23 10:33:44 UTC
I'm not sure if I screwed up somewhere...

I have added the two scripts as required:
 rc-services.sh /lib/rcscripts/sh/
 runscript.sh /sbin

I have also removed all items except for local, net.eth0, sshd and xdm from
/etc/init.d/runlevels/default in order to boot succesfully.

Now I am able to boot just fine, but none of the default scripts have run.  I
try to manually run them but nothing happens.  For example, if I run
"/etc/init.d/hostname start" the prompt just moves down a line.  If I run
"/etc/init.d/hostname" then I get the error message telling me to use either
"start|restart" etc.

So, a succesful boot now means that I see ENTERING RUNLEVEL 3 and then nothing
happens for about 30 seconds and then I'm at the logon prompt with none of my
scripts having run.  What am I missing here?
Comment 35 Roy Marples (RETIRED) gentoo-dev 2005-07-24 01:28:40 UTC
baselayout-1.12.0_pre2 has all these fixes in, please test
Comment 36 Petteri Räty (RETIRED) gentoo-dev 2005-07-24 13:16:39 UTC
(In reply to comment #35)
> baselayout-1.12.0_pre2 has all these fixes in, please test

It stills fails to start portmap after ntp-client fails. 
Comment 37 Martin Schlemmer (RETIRED) gentoo-dev 2005-07-26 02:43:23 UTC
Remerge baselayout with portage's --noconfmem and retry please.
Comment 38 Roy Marples (RETIRED) gentoo-dev 2005-07-29 04:31:34 UTC
I've just put pre3 into portage - it's still p.masked though.

If everyone could test that things work in relation to this bug only (or fail -
but correctly) and report back here then I'll unmask it in a few days.

Thanks
Comment 39 Petteri Räty (RETIRED) gentoo-dev 2005-07-29 05:09:05 UTC
(In reply to comment #38)
> I've just put pre3 into portage - it's still p.masked though.
> 
> If everyone could test that things work in relation to this bug only (or fail -
> but correctly) and report back here then I'll unmask it in a few days.
> 

I have tested this version and it solved all the problems I had. Hopefully this
encourages others to test it too. 
Comment 40 Roy Marples (RETIRED) gentoo-dev 2005-07-31 23:10:03 UTC
Removed from package.mask
Comment 41 David Li 2005-08-20 09:19:11 UTC
Hmm, I some something "like" the bug and I'm asking for some input.

When I boot, it hangs for a long time on net.lo. After this, everything boots
sucesfully and normally. The weirdest part is that net.lo now starts net.eth1 as
well. net.eth1 is for my wireless network with dhcp which could explain the slow
down.

The annoying part is that the only way I realized this was happening was:
1) ifconfig after boot shows eth1 is up
2) Adding "echo ${IFACE} > /dev/tty1" to /etc/conf.d/net postup()
Otherwise I woun't see any messages about eth1 starting even with RC_VERBOSE="yes"

I do have ntp but it doesn't appear to affect my issue.

I'm using pre6 by the way.
Comment 42 Roy Marples (RETIRED) gentoo-dev 2005-08-21 08:29:35 UTC
That's hotplug starting net.eth1 because the module for the card was inserted.
Comment 43 David Li 2005-08-21 08:39:14 UTC
(In reply to comment #42)
> That's hotplug starting net.eth1 because the module for the card was inserted.

Hmm, that's funny because I though I loaded it in /etc/modules.autoload.d/kernel-2.6
I guess I'll look into this more.

BTW, I'm told that you if you use coldplug, you aren't supposed to have hotplug
in  the runlevel. Is that correct?
Comment 44 Roy Marples (RETIRED) gentoo-dev 2005-08-21 08:53:12 UTC
The hotplug script is pretty redundant these days.

When the module is loaded (by whatever means) hotplug then starts the relevant
net.xxx script if it's a network module. This is what hotplug does.
Comment 45 Ian Stakenvicius 2006-09-08 07:09:00 UTC
Related to this bug (i think) -- i'm having issues with baselayout-1.12.4-r7, when RC_PARALLEL_STARTUP=no, with netmount starting before net.eth0 does.  And it _seems_ to be because xdm has 'before alsasound net.lo' in it, as xdm needs netmount.  I got rid of that before line in xdm's depend() and everything works normally.  Note, with RC_PARALLEL_STARTUP=yes it also works, but i would prefer not to run the system with parallel startup.

Is there a particular reason why xdm (which should be in default) should be starting before net.lo and alsasound (which should be in boot) ??  This makes no sense to me..
Comment 46 Ian Stakenvicius 2006-09-08 07:25:28 UTC
Aww crud, sorry -- this is an xinit issue, i assumed /etc/init.d/xdm was part of baselayout.