I have three network interfaces which should be started in the default runlevel: eth0, tap0 and br0. I use rc_depend_strict="NO" so that services depending on 'net' don't require all of these to be up. Since ypbind needs actual access to the LAN, I have rc_after="net.eth0" in /etc/conf.d/ypbind. To create the tap0 interface, I have the following in /etc/conf.d/net: preup() { case "$IFACE" in tap0) tunctl -t tap0 -u nbowler ;; esac } but since this requires the 'nbowler' user to exist, tap0 needs to be started after ypbind. Therefore, I also have the following in /etc/conf.d/net: depend_tap0() { need ypbind } Finally, br0 is a bridge which includes the tap0 interface, so I have: depend_br0() { need net.tap0 } If I _manually_ start net.eth0, ypbind and net.br0, all services start with no issues. However, after adding net.eth0, ypbind and net.br0 to the default runlevel, openrc refuses to start them at boot, giving me the following errors: rc default logging started at Mon Sep 13 16:15:26 2010 <snip> * Loading iptables state and starting firewall ... [ ok ] * ERROR: cannot start net.tap0 as portmap would not start * ERROR: cannot start net.br0 as portmap would not start * Bringing up interface eth0 <snip> * Starting portmap ... [ok] After logging in, net.tap0 and net.br0 can be started without issue. Portage 2.1.8.3 (default/linux/amd64/10.0, gcc-4.4.3, glibc-2.11.2-r0, 2.6.36-rc4 x86_64) ================================================================= System uname: Linux-2.6.36-rc4-x86_64-Intel-R-_Core-TM-2_Quad_CPU_Q8300_@_2.50GHz-with-gentoo-2.0.1 Timestamp of tree: Mon, 13 Sep 2010 15:30:23 +0000 distcc 3.1 x86_64-pc-linux-gnu [disabled] ccache version 2.4 [enabled] app-shells/bash: 4.0_p37 dev-java/java-config: 2.1.11 dev-lang/python: 2.6.5-r3, 3.1.2-r4 dev-util/ccache: 2.4-r7 dev-util/cmake: 2.8.1-r2 sys-apps/baselayout: 2.0.1 sys-apps/openrc: 0.6.3 sys-apps/sandbox: 1.6-r2 sys-devel/autoconf: 2.13, 2.65 sys-devel/automake: 1.9.6-r3, 1.10.3, 1.11.1 sys-devel/binutils: 2.20.1-r1 sys-devel/gcc: 4.4.3-r2 sys-devel/gcc-config: 1.4.1 sys-devel/libtool: 2.2.6b sys-devel/make: 3.81-r2 virtual/os-headers: 2.6.30-r1 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -march=core2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/bin/startx /usr/share/X11/xkb" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/java/ /etc/eselect/postgresql /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-O2 -march=core2 -pipe" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--keep-going --with-bdeps=y" FEATURES="assume-digests ccache distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://mirror.datapipe.net/gentoo" LANG="en_CA.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS="en_CA en ja" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X acl alsa amd64 berkdb bzip2 cairo cjk cli cracklib crypt cups curl cvs cxx doc dri exif fbcon ffmpeg flac fontforge fortran gdbm gpm graphviz gtk iconv icu idn imagemagick ipv6 jadetex jpeg kpathsea latex lcms mmx modules mp3 mudflap multilib ncurses nis nls nptl nptlonly ogg opengl openmp pam pcre perl png postgres pppd python qt3support readline reflection sdl session smp sse sse2 sse3 ssl ssse3 svg sysfs tcpd tex4ht theora tiff truetype unicode vim-syntax vorbis xcb xinerama xorg xulrunner xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CAMERAS="ptp2" ELIBC="glibc" INPUT_DEVICES="evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en_CA en ja" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="intel fbdev" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Created attachment 247175 [details] /etc/conf.d/net
Why is portmap/rpcbind not starting earlier? Is there a circular loop with it and ypbind?
(In reply to comment #2) > Why is portmap/rpcbind not starting earlier? Presumably this is because portmap has 'use net' in its dependencies. If I understand correctly, this means it must start after at least one of the net services has started. > Is there a circular loop with it and ypbind? portmap: use net before inetd before xinetd ypbind: need net portmap use ypserv domainname (after net.eth0 added by me) So net.eth0 should satisfy all deps of portmap and portmap should then satisfy all deps of ypbind. There's no obvious cycle. If there was a dependency cycle, wouldn't it be impossible to start the services manually?
Can we provide it's a dep issue directly please? 1. Edit portmap/rpcbind to have 'use net.lo' instead of 'use net' 2. Repeat test to see if result order still has problems. 3. Change the ypbind 'rc_after="net.eth0"' to 'rc_need="net.eth0"' 4. Repeat test.
(In reply to comment #4) > Can we provide it's a dep issue directly please? > 1. Edit portmap/rpcbind to have 'use net.lo' instead of 'use net' It's always a good idea to be specific when possible... > 3. Change the ypbind 'rc_after="net.eth0"' to 'rc_need="net.eth0"' I've always been a bit unclear as to where "after" fits in between use and need. (Need is clearly stronger than use, which can be read as "use if available/convenient", meaning start it first if it's in the same runlevel, but after??) However, I did once have a problem similar to this with "after" that was resolved by switching the dependency to "need" upon Roy's suggestion, so I'd definitely recommend trying that. Need is apparently MUCH stronger than the others and is the only one that REFUSES to start a service until the dependency is met. So try that. Meanwhile, the above implies a documentation bug, as the differences between use/after/need should be clear/explicit. Additionally, I find that while the runscript manpage is a reasonable place for the documentation, I always have to look for it, getting confused, as the first places I look always seem to be the various rc manpages. The correct manpage reference should be added to the comments in rc.conf (under # SERVICE CONFIGURATION VARIABLES ), at a minimum, and preferably, the rc, rc_config, rc.eselect, rc-service, etc, manpages, should be updated to include the runscript manapge in their "related" lists. (FWIW, those are all the manpages I looked at this time before I found runscript. It seems there's enough /developer/ documentation, the C-language calls, etc, that the user documentation gets lost... Maybe /that's/ a bug too, and most of it should be conditional on USE=docs.) Should I file that as a different bug, or since this one may resolve to a misunderstanding of the after/need difference, is it enough? (Or, is it NOTABUG, don't bother?)
(In reply to comment #4) > 1. Edit portmap/rpcbind to have 'use net.lo' instead of 'use net' > 2. Repeat test to see if result order still has problems. I changed 'net' to 'net.lo' in /etc/init.d/portmap and /etc/init.d/ypbind (the former has a "use" dep, the latter has a "need" dep). The behaviour is unchanged (exactly the same output as the original report). > 3. Change the ypbind 'rc_after="net.eth0"' to 'rc_need="net.eth0"' > 4. Repeat test. I did this on top of the earlier changes. The behaviour remains unchanged. (In reply to comment #5) > I've always been a bit unclear as to where "after" fits in between use and > need. (Need is clearly stronger than use, which can be read as "use if > available/convenient", meaning start it first if it's in the same runlevel, > but after??) At system startup time, I believe there is no difference between after and use. However, the differences are apparent when you run init scripts after the system is booted. If bar has the dependency "use foo", and foo is in the current runlevel, then starting bar will *cause* foo to start as well. In the case of ypbind, the 'use' semantics are too strong: while on my system ypbind must be started after net.eth0 at boot time, net.eth0 may be stopped (and other interfaces configured) after the fact and ypbind should not force net.eth0 to start. > However, I did once have a problem similar to this with "after" that was > resolved by switching the dependency to "need" upon Roy's suggestion, so I'd > definitely recommend trying that. Need is apparently MUCH stronger than the > others and is the only one that REFUSES to start a service until the > dependency is met. > > So try that. I have tried it (see above), but also note that need is _absolutely_ not what we want in the case of ypbind: this means that stopping (or restarting) net.eth0 will stop (or restart) ypbind! Stopping or restarting ypbind causes all user sessions on the system to be killed, which is not cool.
Nick: can you please capture a your full startup boot log for us, and attach it?
Created attachment 257234 [details] rc.log Here's the openrc boot log. After logging in, running "rc" with no arguments will successfully start both remaining network services (net.tap0 and net.br0).
One more file: /lib/rc/init.d/deptree Also, when you run rc at the end to start those two services manually, does it recompute the above file? If so, I want that copy too.
Nick: reping for the file I need. If there is no response in 48 hours (7 days from my request), I'm going to close as NEEDINFO.
Created attachment 259634 [details] /lib/rc/init.d/deptree (In reply to comment #9) > Also, when you run rc at the end to start those two services manually, does it > recompute the above file? If so, I want that copy too. It does recompute it (I think...): the line "Caching service dependencies" is printed when I run 'rc' and the mtime on /lib/rc/init.d/deptree is updated. However, the file's contents are not changed by this.
Ok, I just added some new functionality to OpenRC's rc-depend internal binary to help debug this (it used to be annoying to give the deptree files, but now it has a new option -F to use them instead of the system copy, it's not quite perfect, but it does help a lot, I need to build some more debug tools still). The interesting that has turned up in that, is that based on the deptree file you gave me, is a potential conflict in how you're mixing stable vs unstable. # rc-depend -F /tmp/user-deptree net.br0 sysfs udev-mount udev fsck localmount net.lo portmap ypbind net.tap0 net.br0 Can you please upgrade your NIS stack, so that you are using the new net-nds/rpcbind package as your portmap provider. You will need nfs-utils-1.1.6-r1 or newer. Then retest, and upload your new deptree file, as well as tell me what you have in each of your runlevels.
your code here too has style problems. you need to trim trailing whitespace, and you need to use spaces after "if". a quick survey of other commits by you show similar problems. please review your own code and fix the problems you've introduced all over. i guess we'll need to implement a git hook to reject commits with this kind of crap in it.
Also, prior to doing the rpcbind suggestion, I have one more related idea to test: rc_net_lo_provide="!net" rc_depend_strict="NO" Then have ypbind just depending on net.
(In reply to comment #14) > Also, prior to doing the rpcbind suggestion, I have one more related idea to > test: > > rc_net_lo_provide="!net" > rc_depend_strict="NO" > > Then have ypbind just depending on net. Not sure exactly what you mean by this last bit, but I added rc_net_lo_provide="!net" to rc.conf, (I already have rc_depend_strict="NO"), and removed rc_after="net.eth0" from /etc/conf.d/ypbind. The behaviour is unchanged. (In reply to comment #12) > Can you please upgrade your NIS stack, so that you are using the new > net-nds/rpcbind package as your portmap provider. You will need > nfs-utils-1.1.6-r1 or newer. Then retest, and upload your new deptree file, > as well as tell me what you have in each of your runlevels. I reverted the above changes and upgraded net-fs/nfs-utils to version 1.2.3-r1, net-nds/ypbind to version 1.32, and replaced net-nds/portmap with net-nds/rpcbind-0.2.0. The behaviour is exactly the same as before, except s/portmap/rpcbind/ in the error messages. Will follow up with attachments.
Created attachment 260180 [details] deptree with upgraded bits.
Created attachment 260181 [details] Services in each runlevel. This is the output of rc-status -a after boot.
Created attachment 260184 [details] deptree with upgraded bits (v2). I realized that I forgot to run etc-update after upgrading, so I had outdated stuff in init.d. Upgrading those didn't fix anything, but here's the _correct_ deptree file.
Created attachment 260185 [details] Services in each runlevel (v2). Likewise for the runlevels.
This is getting really annoying now. portmap/rpcbind should come up at this point. (In fact, it should have come up in the boot logs when it had net.lo). The following should be a perfectly valid order for your system to start: net.lo rpcbind/portmap net.eth0 ypbind net.tap0 net.br0 Yet it doesn't work, and I cannot figure out why. I'm writing some tools to graph based on your data now. 1. Explicitly set rpcbind/portmap to 'need net.lo' instead of 'need net' Set these: rc_net_lo_provide="!net" rc_net_tap0_provide="!net" rc_net_br0_provide="!net" rc_depend_strict="NO" 2. Add net.tap0 to your default runlevel. 3. Upload the deptree+rc-status data again after both.
Not quite done the graphing tool, but it's making good progress. http://dev.gentoo.org/~robbat2/bug337140-openrc-init-graph.png Green: "iuse" Red: "ineed" Blue: "ibefore" Cyan: "iafter" (disabled presently) Black: "iprovide" Boxes are init.d services. Diamonds are virtual "provided" services. I've disabled all non-provide lines that would leave the clusters for visibility reasons at present. I need to figure out a better way to integrate the virtual services where they cross runlevels.and have multiple providers.
Ok, your system DOES have a circular dependency in it. And my suggestion of making only net.eth0 provide net should fix the problem for you. I'm uploading two graphs now, first one is your existing init deps, with net.lo not providing net. Second one is making ONLY net.eth0 provide net.
Created attachment 260299 [details] bug337140-circular-net-dep-graph.gif
Created attachment 260301 [details] bug337140-net-only-by-net.eth0-fixup.gif
Okay, cool. I put rc_provide="!net" rc_net_lo_provide="net" in /etc/rc.conf, and all my problems went away. So, if I'm understanding things correctly, it seems that openrc is treating "provides" as a strict dependency relationship -- i.e., that "net" depends on all the things which provide it (it appears to be behaving similarly to an "after" relationship, which I suppose explains all my problems?). This seems counter-intuitive, at least when rc_depend_strict="NO". The documentation for rc_depend_strict says: # Do we allow any started service in the runlevel to satisfy the depedency or # do we want all of them regardless of state? For example, if net.eth0 and # net.eth1 are in the default runlevel then with rc_depend_strict="NO" both # will be started, but services that depend on 'net' will work if either one # comes up. With rc_depend_strict="YES" we would require them both to come # up. which doesn't suggest that when it's set to "NO" there's an implied "after" relationship (or something like that) between net and the various interfaces.
Can you please test the following combinations: 1. rc_provide="!net" rc_net_lo_provide="net" rc_net_eth0_provide="net" 2. rc_provide="!net" rc_net_eth0_provide="net" 3. rc_net_tap0_provide="!net" rc_net_br0_provide="!net" 4. rc_net_eth0_provide="!net" rc_net_tap0_provide="!net" rc_net_br0_provide="!net" The rc_depend_strict just means that ANY of the providers could be used to satisfy it. However, in your case, two of the providers (tap+br) introduce a circular dependency if it attempts to satisfy the net requirement. Detecting that one of the other providers could avoid the circular dependency looks like it would require a lot more intelligence in processing the dependencies. Ideally if testcase #3 (or #4 to a lesser degree) works above, I'd promote that as the official solution, describing it in the documentation along with adding dependencies between interfaces.
[Sorry for the delay, I forgot that you had asked me to test things]. Not surprisingly, all four of the above combinations work fine. Since the dependency resolution cannot handle these kind of "circular" dependencies, I guess the rule goes something like: If a service (directly or indirectly) depends on 'net', that service must not itself provide 'net' (irrespective of the existence of other providers). Replace 'net' with any other virtual service, if any others exist. Hence, the user currently has to add rc_blah_provide="!net" (or equivalent) for each service where this matters. Perhaps openrc could directly implement the rule, as in: For each service that depends on 'net' Remove that service from the list of providers of 'net'. thus saving the user some hassle. (The above rule might very well be too simplistic, and I haven't really thought about how it interacts with services in more than one runlevel).
(In reply to comment #27) > For each service that depends on 'net' > Remove that service from the list of providers of 'net'. > thus saving the user some hassle. (The above rule might very well be > too simplistic, and I haven't really thought about how it interacts with > services in more than one runlevel). It's too simplistic I fear. 1. It is possible for net.eth0 to depend on net.lo. With your proposal, this would leave the only provider of 'net' as net.lo, and then services which NEED outside connectivity would try to start before net.eth0 is up. 2. Related, what if the user wants the only provider of net to be the VPN device [1] [1] Usage case would be a network with no WAN connection, but just a VPN endpoint that authorized users MUST connect to, and tunnel all their WAN traffic over. That would be: net.tap0 -> net.eth0, with net.tap0 being the correct 'net' provider. Thanks for the followup, I'm going to put this into the documentation.
(In reply to comment #28) > It's too simplistic I fear. > 1. It is possible for net.eth0 to depend on net.lo. With your proposal, this > would leave the only provider of 'net' as net.lo, and then services which NEED > outside connectivity would try to start before net.eth0 is up. > 2. Related, what if the user wants the only provider of net to be the VPN > device [1] > > [1] Usage case would be a network with no WAN connection, but just a VPN > endpoint that authorized users MUST connect to, and tunnel all their WAN > traffic over. That would be: net.tap0 -> net.eth0, with net.tap0 being the > correct 'net' provider. When I say "depends on 'net'" I mean if there's an actual "need net" (or "use net" or whatever) in the dependency chain, rather than a "need net.lo" or "need net.eth0". So if net.eth0 depends on net.lo, we don't exclude it. Similarly for the VPN, we don't exclude net.tap0 just because it depends on net.eth0. Nevertheless, I'm happy now that I've made net.lo as the sole provider of net. I just add explicit dependencies to other services as required, and things work great. > Thanks for the followup, I'm going to put this into the documentation.
Moving to openrc documentation tracker bug
Believe this bug appropriate. The following was reported in #gentoo via DusanC to work for KVM and bridging What follows is his /etc/conf.d/net of course this is static and setting his own routes bridge_br0="eth1" config_eth0="192.168.0.3/24" routes_eth0="default via 192.168.0.1" config_eth1="null" config_br0="192.168.1.3/24" rc_br0_provide="!net" Laptops could/likely have networking via at the least both wireless and eth0 Some example for use case exists? the rc.conf was like 3 lines didn't save it.;-(
(In reply to comment #31) > Believe this bug appropriate. > > The following was reported in #gentoo via DusanC to work for KVM and bridging > What follows is his /etc/conf.d/net of course this is static and setting his > own routes > > bridge_br0="eth1" > config_eth0="192.168.0.3/24" > routes_eth0="default via 192.168.0.1" > config_eth1="null" > config_br0="192.168.1.3/24" > rc_br0_provide="!net" This is clearly wrong. Under this, both net.eth0 and net.eth1 are valid providers of 'net', but eth1 with it's configuration of 'null' clearly wouldn't actually work. rc_br0_provide="!net" rc_eth1_provide="!net" so thus eth0 is the only valid provider. > Laptops could/likely have networking via at the least both wireless and eth0 > Some example for use case exists? Most of the time on the laptop only one of your interfaces will start correctly (you tend to be either wireless XOR wired).
(In reply to comment #32) > > Laptops could/likely have networking via at the least both wireless and eth0 > > Some example for use case exists? > Most of the time on the laptop only one of your interfaces will start correctly > (you tend to be either wireless XOR wired). Speaking only for myself, I disagree with that statement. For example, when I'm at home, my wifi automatically connects. If I happen to sit at my desk and plug in the cable (sometimes for HD streaming, or whatever), I see no reason to go and actively disable the wifi (the wireless route has a higher metric, so new connections will use the wired interface anyway). In this picture, the common cases are: A. Only wifi connected. B. Both wifi and wired connected.
Guys, the Gentoo handbook has been updated two-fold: 1/ the information pertaining to need/use/before/after is clarified some more. From the looks of it, it was already correctly described, but across two main sections. I have now put the descriptions in one section (bullet list) so that people get the impact of need/use/before/after more clearly. 2/ in the advanced networking chapter, I have added that providing net needs to be clarified the moment you are working with multiple interfaces. The example was with a bridge over eth0 and eth1, so we put rc_net_eth0_provide="!net" and rc_net_eth1_provide="!net" in /etc/conf.d/net. Does that sounds good? Or do you expect some particular other changes? Changes should show up on site in an hour or so.
It's a month since. I'm going to drop the blocker on our openrc documentation tracker. If you still need documentation updates, don't hesitate to contact me or the GDP in general!
Robin, can we close this at this point? Thanks, William
Closing as resolved thanks to docs.