Before updating to 1.12.0_pre1-r1 I had parallel startup on. After updating I thought It would be better to turn off parallel startup for the first boot just to be sure. This lead to problems. Services start up until sshd and then nothing happens. I can ssh in put clearly some services are not running. Trying to start /etc/init.d/{portmap,netmount} with RC_VERBOSE="yes" doesn't give any output and nothing happens. After switching back to parallel startup I have no problems. Reproducible: Always Steps to Reproduce: 1. update to the latest baselayout with parallel startup on 2. turn parallel off 3. reboot (Be sure to have some way to rescue yourself) I talked with UberLord on IRC and we thought this could also we have something to do with the network but changing the values of RC_NET_STRICT_CHECKING does not effect the problem.
pena netbeans1 # emerge info Portage 2.0.51.22-r2 (default-linux/x86/2005.0, gcc-3.4.4, glibc-2.3.5-r0, 2.6.12-gentoo-r6 i686) ================================================================= System uname: 2.6.12-gentoo-r6 i686 Intel(R) Pentium(R) 4 CPU 2.40GHz Gentoo Base System version distcc 2.18.3 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.4 [enabled] dev-lang/python: 2.4.1-r1 sys-apps/sandbox: 1.2.11 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6 sys-devel/binutils: 2.16.1 sys-devel/libtool: 1.5.18-r1 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O3 -march=pentium4 -pipe -mfpmath=sse -ffast-math -fomit-frame-pointer" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/init.d /etc/splash /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -march=pentium4 -pipe -mfpmath=sse -ffast-math -fomit-frame-pointer" DISTDIR="/usr/src/distfiles" FEATURES="autoconfig ccache cvs distlocks fixpackages noauto sandbox sfperms strict" GENTOO_MIRRORS=" http://trumpetti.atm.tut.fi/gentoo http://lame.lut.fi/linux/gentoo " LANG="en_US.utf8" LC_ALL="en_US.utf-8" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/overlays/betelgeuse /usr/local/overlays/gentoo-java-experimental /usr/local/overlays/gentopia" SYNC="rsync://aria/portage" USE="x86 X aac acl acpi alsa apm audiofile avi bash-completion berkdb bitmap-fonts browserplugin bzip2 bzlib cdb cddb cdparanoia cdr crypt cups curl divx4linux dts dvd dvdr dvdread emboss esd fam flac foomaticdb freetype gcj gif gnome gstreamer gtk gtk2 hal imagemagick java jpeg kde kdeenablefinal kdexdeltas libg++ libwww logitech-mouse lzo mad makecheck mikmod mjpeg mmx mmx2 mp3 mpeg ncurses network nptl nptlonly nvidia offensive ogg oggvorbis opengl pam pdflib png python qt quicktime readline real rtc ruby samba slp spell sse sse2 ssl subversion svg symlink tcpd theora tiff truetype truetype-fonts type1-fonts unicode usb userlocales vorbis win32codecs xine xml xml2 xv xvid zlib video_cards_nvidia userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CTARGET, LDFLAGS, LINGUAS, MAKEOPTS
Created attachment 63879 [details] /etc/conf.d/rc
Created attachment 63880 [details] /etc/conf.d/net
I just need to change RC_PARALLEL_STARTUP to no and it doesn't work any more. It is a bit weird that kdm starts so early with that turned off. Before xdm has always been the last service to start. Now it comes before sshd. I think there might be some problems with switching from parallel to normal.
It seems that this is caused by ntp-client failing to start. It is not a daemon but just sets the clock from a ntp server. For some reason my local server doesn't work. After changing it to pool.ntp.org I had no problems starting for example netmount and hald. I removed it from default runlevel and now I have no problems booting.
When I rebooted my laptop, I got the report that cardmanager was watching one socket, and then the system froze. I also noticed that services were not starting. I didn't try setting RC_PARALLEL_STARTUP to yes, I just booted from a live cd and downgraded my baselayout to the previous version. this bug seems related, http://bugs.gentoo.org/show_bug.cgi?id=98745: "init scripts fail with our new baselayout version because we now do some tricks with start-stop-daemon to ensure daemons start and stop correctly." Maybe 1.12.0_pre1-r1 should be masked until these boot problems get sorted, what do you think?
looking at the ntp-client start script it seems that it does not use start-stop-daemon so this is probably not related
At my systems it hangs on different points. Until we have a better solution I vote for masking baselayout-1.12.0_pre1-r1. (see also 99691)
I tried setting RC_PARALLEL_STARTUP to 'yes' and while this got me farther along it still hung on my system. I rolled back baselayout for now but I'm willing to roll back forward to test once this get's nailed down to a a cause/potential fix.
You can use ctrl+c many times to get to console. If you can ssh in from the outside you can use rc-status to find out which services get stuck. In here it said starting for those services. Also you can look into /var/lib/init.d/exclusive/ to get info about which get stuck. Thanks to UberLord who gave me instructions on how to find the cause of the problem.
I'm also voting for masking as it also additionally to hanging up screws up starting up of rp-pppoe (seems like lock-file-issues). I've masked it here for myself.
Weird, works fine here .. I asume it freezes with the stuff from the default runlevel? If so, could you guys post a 'ls /etc/runlevels/default', and also try to remove ntp-client as suggested?
*** Bug 99672 has been marked as a duplicate of this bug. ***
For those with order problems, add this to /etc/init.d/test: ----- #!/sbin/runscript start() { trace_dependencies $(cd /etc/runlevels/default; ls) } ----- and chmod +x it, and then start it, and add the output here.
pena mozilla-thunderbird # ls /etc/runlevels/default/ acpid cupsd distccd famd hald metalog netmount ntpd xdm alsasound dbus domainname fcron local net.eth0 ntp-client sshd xfs
pena mozilla-thunderbird # /etc/init.d/test start * Caching service dependencies ... [ ok ] checkroot hostname modules checkfs localmount clock metalog net.eth0 net.lo xfs portmap hotplug dbus bootmisc xdm sshd ntpd ntp-client netmount local hald fcron famd domainname distccd cupsd alsasound acpid hostname modules checkfs localmount clock metalog net.eth0 net.lo xfs portmap hotplug dbus bootmisc xdm sshd ntpd ntp-client netmount local hald fcron famd domainname distccd cupsd alsasound acpid This seems to explain why I could ssh in.
Ok, I'm going to attach two scripts, please copy them where stated, and make sure its chmod +x, and let me know if this fixes it.
Created attachment 63999 [details] runscript.sh Please copy to /sbin.
Created attachment 64000 [details] rc-services.sh Please copy to /lib/rcscripts/sh/.
Just note that you need baselayout-1.12.0_pre1-r1 .. they will not work with 1.11.1x ...
(In reply to comment #19) > Created an attachment (id=64000) [edit] > rc-services.sh > > Please copy to /lib/rcscripts/sh/. > Yup, these two scripts seem to have doen the trick, my system boots almost normally now. I say 'almost' because for some reason my net.ra0 service doesn't appear in the service list during boot, but it IS started. Just in case: SpectrumZX jasper # /etc/init.d/test start checkroot hostname modules checkfs localmount clock syslog-ng net.eth0 net.lo net.ra0 keymaps bootmisc xdm vixie-cron shorewall portmap numlock local hdparm domainname consolefont alsasound acpid hostname modules checkfs localmount clock syslog-ng net.eth0 net.lo net.ra0 keymaps bootmisc xdm vixie-cron shorewall portmap numlock local hdparm domainname consolefont alsasound acpid
Normal if I understand the parallel stuff .. sometimes it will rather drop printing than garble two services output together - If using parallel that is.
could you mask it if it's not repaired in the portage? i had a quite a problem tooday, because my system refused to start net.eth0 and made me go about 20km to fix it by hand. (first time my gentoo cost me money :/)
(In reply to comment #23) > could you mask it if it's not repaired in the portage? > > i had a quite a problem tooday, because my system refused to start net.eth0 and > made me go about 20km to fix it by hand. (first time my gentoo cost me money :/) Personally in cases like this I don't really recommend using the baselayout from unstable/testing. If you are using it, it would probably be best to wait a few days and see if bugs like this come up.
oh.. i don't complain (much) about it. it was quite a coincidence. but still... shouldn't it be masked for a while?
(In reply to comment #19) > Created an attachment (id=64000) [edit] > rc-services.sh > > Please copy to /lib/rcscripts/sh/. > This version is better but still has problems. Now it fails to start portmap which is needed by netmount and famd so they fail with and error about required services not starting. After logging in I can execute /etc/init.d/netmount start and it starts correctly with: pena betelgeuse # /etc/init.d/netmount start * Starting portmap ... [ ok ] * Mounting network filesystems ... [ ok ] I haven't added portmap to any runlevel but it has worked before so I presume it is just broken. I had ntp-client failing while testing. I will now test what happens if it set to execute succesfully.
Yeah it is caused by ntp-client failing. You should use something that fails in your testing. Something like this should be enough: #!/sbin/runscript start() { eerror "This always fails" ; return 1; } I wonder why running this without eerror doesn't output anything. Well maybe return 1 is not the right way to tell that the init script fails. Normally I see those !! marks in the place where ok normally is.
Created attachment 64017 [details] rc-services.sh Please copy to /lib/rcscripts/sh.
Created attachment 64018 [details] runscript.sh Please copy to /sbin. New versions - these should solve all issues reported here
ntp-client is not in my init scripts. But I fail just the same. I booted in with liveCD and mounted my HDD - then deleted all of the symlinks in /etc/init.d/runlevels/default. I rebooted and was back in my system again (yay!). Then I /etc/init.d/<scriptname> start for all of the scripts that I had deleted earlier. Everything ran just fine (but net.wlan0 can't find it's configuration information) so I rc-update added all scripts. Rebooted and the darned thing siezed again. Observations: It isn't ntp-client related but it does seem related to my wlan config file.
(In reply to comment #30) > ntp-client is not in my init scripts. But I fail just the same. > I booted in with liveCD and mounted my HDD - then deleted all of the symlinks in > /etc/init.d/runlevels/default. I rebooted and was back in my system again (yay!). > > Then I /etc/init.d/<scriptname> start for all of the scripts that I had deleted > earlier. Everything ran just fine (but net.wlan0 can't find it's configuration > information) so I rc-update added all scripts. > > Rebooted and the darned thing siezed again. > > Observations: It isn't ntp-client related but it does seem related to my wlan > config file. did u try with attachment (id=64018) -> runscript.sh well... i had it similar but with net.eth0. it couldn't find its config and it hung doing xfs. can sb explain what is the reason of this bug?
(In reply to comment #30) > ntp-client is not in my init scripts. But I fail just the same. Doesn't matter - it's if any script fails and/or something depending on the failed script.
(In reply to comment #29) > Created an attachment (id=64018) [edit] > runscript.sh > > Please copy to /sbin. > > New versions - these should solve all issues reported here It still fails to start netmount and famd after ntp-client fails.
I'm not sure if I screwed up somewhere... I have added the two scripts as required: rc-services.sh /lib/rcscripts/sh/ runscript.sh /sbin I have also removed all items except for local, net.eth0, sshd and xdm from /etc/init.d/runlevels/default in order to boot succesfully. Now I am able to boot just fine, but none of the default scripts have run. I try to manually run them but nothing happens. For example, if I run "/etc/init.d/hostname start" the prompt just moves down a line. If I run "/etc/init.d/hostname" then I get the error message telling me to use either "start|restart" etc. So, a succesful boot now means that I see ENTERING RUNLEVEL 3 and then nothing happens for about 30 seconds and then I'm at the logon prompt with none of my scripts having run. What am I missing here?
baselayout-1.12.0_pre2 has all these fixes in, please test
(In reply to comment #35) > baselayout-1.12.0_pre2 has all these fixes in, please test It stills fails to start portmap after ntp-client fails.
Remerge baselayout with portage's --noconfmem and retry please.
I've just put pre3 into portage - it's still p.masked though. If everyone could test that things work in relation to this bug only (or fail - but correctly) and report back here then I'll unmask it in a few days. Thanks
(In reply to comment #38) > I've just put pre3 into portage - it's still p.masked though. > > If everyone could test that things work in relation to this bug only (or fail - > but correctly) and report back here then I'll unmask it in a few days. > I have tested this version and it solved all the problems I had. Hopefully this encourages others to test it too.
Removed from package.mask
Hmm, I some something "like" the bug and I'm asking for some input. When I boot, it hangs for a long time on net.lo. After this, everything boots sucesfully and normally. The weirdest part is that net.lo now starts net.eth1 as well. net.eth1 is for my wireless network with dhcp which could explain the slow down. The annoying part is that the only way I realized this was happening was: 1) ifconfig after boot shows eth1 is up 2) Adding "echo ${IFACE} > /dev/tty1" to /etc/conf.d/net postup() Otherwise I woun't see any messages about eth1 starting even with RC_VERBOSE="yes" I do have ntp but it doesn't appear to affect my issue. I'm using pre6 by the way.
That's hotplug starting net.eth1 because the module for the card was inserted.
(In reply to comment #42) > That's hotplug starting net.eth1 because the module for the card was inserted. Hmm, that's funny because I though I loaded it in /etc/modules.autoload.d/kernel-2.6 I guess I'll look into this more. BTW, I'm told that you if you use coldplug, you aren't supposed to have hotplug in the runlevel. Is that correct?
The hotplug script is pretty redundant these days. When the module is loaded (by whatever means) hotplug then starts the relevant net.xxx script if it's a network module. This is what hotplug does.
Related to this bug (i think) -- i'm having issues with baselayout-1.12.4-r7, when RC_PARALLEL_STARTUP=no, with netmount starting before net.eth0 does. And it _seems_ to be because xdm has 'before alsasound net.lo' in it, as xdm needs netmount. I got rid of that before line in xdm's depend() and everything works normally. Note, with RC_PARALLEL_STARTUP=yes it also works, but i would prefer not to run the system with parallel startup. Is there a particular reason why xdm (which should be in default) should be starting before net.lo and alsasound (which should be in boot) ?? This makes no sense to me..
Aww crud, sorry -- this is an xinit issue, i assumed /etc/init.d/xdm was part of baselayout.