Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 149128 - init.d scripts can't handle parallel starts when resuming
Summary: init.d scripts can't handle parallel starts when resuming
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-09-25 17:55 UTC by Roger Binns
Modified: 2008-08-24 23:45 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Roger Binns 2006-09-25 17:55:06 UTC
When resuming from hibernation, various scripts are run.  In particular net.eth0 is started, as are a number of others that depend on eth0 such as netmount and sshd.  These aren't started by init but by the hibernate scripts.  Error output is this:

 * ERROR:  net.eth0 is already starting.
mount: RPC: Remote system error - Network is unreachable
mount: RPC: Remote system error - Network is unreachable
mount: RPC: Remote system error - Network is unreachable
 * ERROR:  cannot start sshd as net.eth0 could not start
 * ERROR:  cannot start netmount as net.eth0 could not start

net.eth0 did start successfully (it does dhcp).  I presume the other scripts just didn't wait long enough.  

I have no idea if the problem is in the scripts themselves or if hibernate should be using init to serialise and track dependencies.

Portage 2.1.1 (default-linux/amd64/2006.1, gcc-4.1.1, glibc-2.4-r3, 2.6.17-suspend2-r6 x86_64)
=================================================================
System uname: 2.6.17-suspend2-r6 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
Gentoo Base System version 1.12.5
Last Sync: Mon, 25 Sep 2006 23:00:06 +0000
ccache version 2.3 [enabled]
app-admin/eselect-compiler: [Not Present]
dev-java/java-config: 1.2.11-r1
dev-lang/python:     2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.3
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=athlon64"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=athlon64"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig ccache distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LINGUAS=""
MAKEOPTS="-j4"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/vmware"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 X a52 aac acpi alsa asf audiofile avahi beagle berkdb bitmap-fonts bonobo bzip2 cairo cdinstall cdparanoia cdr chm cli crypt css cups dbus dga dlloader dri dts dv dvb dvd dvdr dvdread elibc_glibc encode evo exif fbcon ffmpeg firefox flac fortran gdbm gif gimp gimpprint glitz glut gnome gphoto2 gpm gtk gtk2 hal hbci ieee1394 imap input_devices_evdev input_devices_keyboard input_devices_mouse ipod isdnlog java javascript jpeg kernel_linux ldap libg++ mad mikmod mjpeg mono mozcalendar mp3 mpeg mplayer musicbrainz ncurses nls nntp nptl nptlonly nsplugin nvidia offensive ofx ogg ole opengl pam pcre pdf perl png ppds pppd print python qt3 qt4 quicktime quotes readline reflection samba scanner sdl session silc smp sndfile speex spl sql ssl subtitles subversion svg symlink tcpd theora threads tiff transcode truetype truetype-fonts type1-fonts udev unicode usb userland_GNU v4l v4l2 video_cards_fbdev video_cards_nv video_cards_nvidia video_cards_vesa vorbis wmf wmv x264 xine xinerama xml xmms xorg xv xvid xvmc yahoo yv12 zlib"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Roy Marples (RETIRED) gentoo-dev 2007-01-10 12:29:13 UTC
I don't think this is a baselayout bug as such as you say that hibernate is starting the scripts.

Which package are your hibernate scripts from?
Comment 2 Roger Binns 2007-01-10 16:39:35 UTC
The hibernate scripts are from sys-power/hibernate-script

The init.d scripts obviously already have some notion of concurrency.  For example in the output I gave you can see that net.eth0 detects that it is already in the process of running, and sshd/netmount both try to detect if net.eth0 has started.  Unfortunately if it hasn't finished running, they assume it has failed to start rather than waiting for it to finish.
Comment 3 Roy Marples (RETIRED) gentoo-dev 2007-01-11 15:08:07 UTC
Are you brave enough to test baselayout-1.13.0_alpha11-r1 where I think this has been fixed?
Comment 4 Roger Binns 2007-01-12 03:24:21 UTC
I'm not brave enough to have to deal with an unbootable machine should the new initscripts screw up.  I'll test a beta though.

I can suggest being able to test the scripts using the following:

# cd /etc/init.d
# ./net.eth0 stop ; ./sshd stop ; ./ntpd stop ; ./netmount stop
# (./net.eth0 start &) ; (./net.sshd start &) ; (./net.ntpd start &) ; (./net.netmount start &)

Comment 5 Roy Marples (RETIRED) gentoo-dev 2007-01-12 07:23:04 UTC
(In reply to comment #4)
> I'm not brave enough to have to deal with an unbootable machine should the new
> initscripts screw up.  I'll test a beta though.

Fair enough. We do have one report where the new alpha11 doesn't boot due to clock issues, so I do understand.

> 
> I can suggest being able to test the scripts using the following:
> 
> # cd /etc/init.d
> # ./net.eth0 stop ; ./sshd stop ; ./ntpd stop ; ./netmount stop
> # (./net.eth0 start &) ; (./net.sshd start &) ; (./net.ntpd start &) ;
> (./net.netmount start &)

Right away I can tell you that will not work.
Only /sbin/rc can background the init scripts like that because it has created the needed lock files first. What's more is that ignores all dependencies. What if sshd needs ntpd? But you've started sshd manually first?

This works
# for x in net.eth0 sshd ntp netmount ; do /etc/init.d/$x stop ; done
# rc

Well, it works in that we only restart services in the runlevels, so if service foo isn't in a runlevel and is started and hibernate stopped it then it won't be restarted.

With baselayout-1.13 we now provide a userland tool called rc-depend to work out  init script ordering so the hibernate could do the above like this

# for x in net.eth0 sshd ntp netmount ; do /etc/init.d/$x stop ; done
# for x $(rc-depend -ineed -iuse -iafter net.eth0 sshd ntp netmount) ; do /etc/init.d/$x start ; done

Hmmmm, although maybe we need a new option, --order so that only net.eth0 sshd ntp and netmount are reported as here we are not interested in dependencies themselves, only the correct order to start things in.
Comment 6 Roger Binns 2007-01-12 07:31:15 UTC
The backgrounding line is what I believe the hibernate script does the equivalent of.  It has no idea what the dependencies are.  It also uses separate config directives for things like which network interfaces to restart on a resume, which modules to reload and which services to restart.  I don't know exactly what it does parallel and only have the error messages to go on.

The simplest way to reproduce this problem is to put suspend/resume on your machine and then do a suspend/resume cycle from a command prompt :-)
Comment 7 Roy Marples (RETIRED) gentoo-dev 2007-01-12 07:43:23 UTC
(In reply to comment #6)
> The backgrounding line is what I believe the hibernate script does the
> equivalent of.  It has no idea what the dependencies are.  It also uses
> separate config directives for things like which network interfaces to restart
> on a resume, which modules to reload and which services to restart.  I don't
> know exactly what it does parallel and only have the error messages to go on.

Right. So it needs to do it how I've described above. As they're in separate modules them they'll need to queue themselves up maybe like so

rm -f /tmp/hibernate-queue
echo net.eth0 net.eth1 >> /tmp/hibernate-queue
echo netmount >> /tmp/hibernate-queue
echo sshd >> /tmp/hibernate-queue

OR just stop it from backgrounding anything. That's probably the simplest answer.

 
> The simplest way to reproduce this problem is to put suspend/resume on your
> machine and then do a suspend/resume cycle from a command prompt :-)

No, if what you've described is true then the simplest way to fix this is for the maintainer of this package to stop it from backgrounding as he should know it better than me.
Comment 8 Alon Bar-Lev (RETIRED) gentoo-dev 2007-01-12 09:31:16 UTC
Well...
I have never understood why modules should be stopped when interface is gone...
I use in /etc/conf.d/rc
RC_NET_STRICT_CHECKING="lo"

So all services stays up, and have no problems.

When I need services to be up after interface goes up I use preup() in /etc/conf.d/net in order to do so.

I mean it is nice to clean up unneeded processes when iterface is down, but not to remember which should be started if interface is up makes it difficult to use.

A typical configuration of my server is net.eth0 depends on firewall, which is implemented by firehol, and openvpn which bind to specific interface on net.eth0. When I add before statement to depend_eth0() it let openvpn start before the interface is up... So I started to use preup() in order to make sence out of this.
Comment 9 Alon Bar-Lev (RETIRED) gentoo-dev 2007-01-15 19:07:32 UTC
Roger, have you tried the solution I suggested?
Comment 10 Roger Binns 2007-01-16 01:50:08 UTC
(In reply to comment #9)
> Roger, have you tried the solution I suggested?

Sort of.  I disabled the shutdown and startup of network interfaces in the hibernate script.  (By default it restarts eth0).

Fundamentally the problem is with Gentoo somewhere.  The shipped init.d scripts do some sort of checking (see my original message).  The shipped hibernate script does some sort of stopping and starting (see my original message).

If you use the default shipped scripts and the default shipped hibernate then various network services do not restart since they get confused about wether interfaces are up, down, started, stopped or starting.

I have no idea where the "blame" in this is.  Hibernate can be fixed to change however it starts things, the init.d scripts could be more robust, or a completely different way of doing this sort of thing (Ubuntu's upstart is a far better match for a machine that is doing hibernation and resume).
Comment 11 Alon Bar-Lev (RETIRED) gentoo-dev 2007-01-16 07:14:50 UTC
(In reply to comment #10)
> Sort of.  I disabled the shutdown and startup of network interfaces in the
> hibernate script.  (By default it restarts eth0).

I cannot see that this is the default...

> Fundamentally the problem is with Gentoo somewhere.  The shipped init.d scripts
> do some sort of checking (see my original message).  The shipped hibernate
> script does some sort of stopping and starting (see my original message).

This depends of your point of view... Everyone wish to have a different defaults.
Please modify also the /etc/conf.d/rc::RC_NET_STRICT_CHECKING to "lo"

Is everything works now as expected?
Comment 12 Roger Binns 2007-01-16 08:22:09 UTC
> I cannot see that this is the default...

True.  It is set that way, but then commented out.

### network
# DownInterfaces eth0
# UpInterfaces auto

I normally have the lines uncommented because my laptop has varied network connectivity.

> This depends of your point of view... Everyone wish to have a different
> defaults.

That point I don't understand.  Look at my original description:

====
 * ERROR:  net.eth0 is already starting.
mount: RPC: Remote system error - Network is unreachable
mount: RPC: Remote system error - Network is unreachable
mount: RPC: Remote system error - Network is unreachable
 * ERROR:  cannot start sshd as net.eth0 could not start
 * ERROR:  cannot start netmount as net.eth0 could not start
====

Obviously there is an issue of some sort.  eth0 did in fact start just fine, but sshd and netmount think it failed to start when in fact it was still starting.

> Please modify also the /etc/conf.d/rc::RC_NET_STRICT_CHECKING to "lo"

I already did that.  See prior comment.

> Is everything works now as expected?

Yes, I can get things to work by not restarting the interface on my desktop machine.  I suspect my laptop will also work with this workaround.

However the original problem report keeps being ignored.  The init.d scripts have code that detects they are already being started.  There is also code that misdetects starting network intefaces and not being able to start.  One workaround is try to restart interfaces.  But if they are restarted for whatever reason, then the problems remain.

Comment 13 Alon Bar-Lev (RETIRED) gentoo-dev 2007-01-16 09:35:54 UTC
(In reply to comment #12)
> I normally have the lines uncommented because my laptop has varied network
> connectivity.

I don't think you should do this, except if you have a problem... For example ipw3945 has such.

> > Is everything works now as expected?
> 
> Yes, I can get things to work by not restarting the interface on my desktop
> machine.  I suspect my laptop will also work with this workaround.

I am glad.

> However the original problem report keeps being ignored.  The init.d scripts
> have code that detects they are already being started.  There is also code that
> misdetects starting network intefaces and not being able to start.  One
> workaround is try to restart interfaces.  But if they are restarted for
> whatever reason, then the problems remain.

Roy, I guess it back to your domain...
I think Roger has some valid points here, I had the same problem with OpenVPN... Described it in comment#8, it has nothing to do with hibernate script... the hibernate script is only one user of the baselayout networking stuff...

Another example is a laptop which has a disable wireless button...
Comment 14 Roger Binns 2007-01-17 01:02:25 UTC
(In reply to comment #13)
> (In reply to comment #12)
> > I normally have the lines uncommented because my laptop has varied network
> > connectivity.
> 
> I don't think you should do this, except if you have a problem... For example
> ipw3945 has such.

My laptop has wired ethernet, builtin wireless (ipw2100, B only), pcmcia card (Airgo - G, ndiswrapper sadly), firewire, bluetooth, infra-red and who knows what else.

After a suspend, the resume environment is almost 100% certain to be different than when suspended.  An undetermined amount of time may also have passed.  Once Network Manager is stable, I'll be happy to manage things using it.  Until then the easiest solution is to restart all interfaces on a resume.

> I think Roger has some valid points here, I had the same problem with
> OpenVPN... Described it in comment#8, it has nothing to do with hibernate
> script... the hibernate script is only one user of the baselayout networking
> stuff...

That was my original problem report :-)
 
> Another example is a laptop which has a disable wireless button...

I do that too :-)  The builtin wireless is B only and doesn't support WPA so I often disable it and use the PCMCIA card, but that sucks because it has to use ndiswrapper ...
Comment 15 Roy Marples (RETIRED) gentoo-dev 2007-01-17 15:37:58 UTC
You can always try baselayout-1.13.0_alpha12 hot off the press today which has loads of new code which may fix this issue.
Comment 16 Roy Marples (RETIRED) gentoo-dev 2007-07-12 14:01:20 UTC
baselayout-2.0.0_alpha4 should handle this a lot better. Please test and re-open if things are still bad.
Comment 17 Heiko Rosemann 2008-08-24 23:45:15 UTC
Hi everyone,

I just ran into the very same problem on 2008.0 x86. Tracked it down to udev starting net.eth0 (in the background) when the hibernate script loads the appropriate network module. (This started happening when I switched from an e100-PCI-card to nvidia onBoard-LAN - e100 did not have to be unloaded for suspend-resume-cycles, forcedeth is blacklisted by hibernate-script)

Fixed for me by setting RC_PLUG_SERVICES to "!net.eth0" in /etc/conf.d/rc - so udev does not start net.eth0 but hibernate can do so afterwards. When hibernate starts net.eth0, it waits for it to finish before starting anything depending on it. This somehow points to a udev problem, in my opinion.

Regards, Heiko

P.S: I am not brave enough to test alpha baselayouts... just had enough strange trouble upgrading to 2.6.24 and trying skge...

My system runs the following:

Portage 2.1.4.4 (default/linux/x86/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-tuxonice-r9 i686)
=================================================================
System uname: 2.6.24-tuxonice-r9 i686 AMD Athlon XP-M
Timestamp of tree: Sun, 24 Aug 2008 17:16:01 +0000
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.4.4-r13, 2.5.2-r6
dev-python/pycrypto: 2.0.1-r6
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r2
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="x86"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=athlon-xp -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-O2 -march=athlon-xp -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo "
LDFLAGS="-Wl,-O1"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/sunrise /usr/local/portage"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X aac acl alsa asf berkdb browserplugin bzip2 cdparanoia cdr cli cracklib crypt cups cupsddk divx4linux dri dvb dvd dvdread encode ffmpeg fortran gdbm gif gimpprint gpm iconv icq ipv6 isdnlog jabber java javascript jpeg mad mbox midi mmx mmxext mozdevelop mozilla moznopango msn mudflap ncurses nls nptl nptlonly nsplugin oggsvg opengl openmp pam pcre pdf png postfix pppd print python quicktime readline real reflection sasl session spl sse ssl sysfs tcpd tetex tif udev usb v4l v4l2 vim vorbis win32codecs x86 xanim xinerama xorg xv xvid yahoo zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="radeon"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS