OpenRC softlevels work very well per-se and they also correctly honour and resolve all dependencies provided all required services to be started for a softlevel (e.g. /etc/runlevels/xen) contain direct symlinks to the required startup script in the /etc/init.d directory. Please see the listing below: ==== working example for /etc/runlevels/xen ==== lrwxrwxrwx 1 root root 18 Jun 24 23:08 cronie -> /etc/init.d/cronie lrwxrwxrwx 1 root root 20 Jun 24 23:08 ebtables -> /etc/init.d/ebtables lrwxrwxrwx 1 root root 22 Jun 24 23:08 icinga.snd -> /etc/init.d/icinga.snd lrwxrwxrwx 1 root root 21 Jun 24 23:08 ip6tables -> /etc/init.d/ip6tables lrwxrwxrwx 1 root root 20 Jun 24 23:08 iptables -> /etc/init.d/iptables lrwxrwxrwx 1 root root 17 Jun 24 23:08 local -> /etc/init.d/local lrwxrwxrwx 1 root root 17 Jun 24 23:08 mdadm -> /etc/init.d/mdadm lrwxrwxrwx 1 root root 23 Jun 24 23:08 net.enp0s25 -> /etc/init.d/net.enp0s25 lrwxrwxrwx 1 root root 22 Jun 24 23:08 net.xenbr0 -> /etc/init.d/net.xenbr0 lrwxrwxrwx 1 root root 20 Jun 24 23:08 netmount -> /etc/init.d/netmount lrwxrwxrwx 1 root root 15 Jun 24 23:08 nfs -> /etc/init.d/nfs lrwxrwxrwx 1 root root 16 Jun 24 23:08 nrpe -> /etc/init.d/nrpe lrwxrwxrwx 1 root root 16 Jun 24 23:08 ntpd -> /etc/init.d/ntpd lrwxrwxrwx 1 root root 22 Jun 24 23:08 nullmailer -> /etc/init.d/nullmailer lrwxrwxrwx 1 root root 18 Jun 24 23:08 smartd -> /etc/init.d/smartd lrwxrwxrwx 1 root root 16 Jun 24 23:08 sshd -> /etc/init.d/sshd lrwxrwxrwx 1 root root 21 Jun 24 23:08 syslog-ng -> /etc/init.d/syslog-ng lrwxrwxrwx 1 root root 22 Jun 24 23:08 xencommons -> /etc/init.d/xencommons lrwxrwxrwx 1 root root 23 Jun 24 23:08 xenconsoled -> /etc/init.d/xenconsoled lrwxrwxrwx 1 root root 22 Jun 24 23:08 xendomains -> /etc/init.d/xendomains lrwxrwxrwx 1 root root 22 Jun 24 23:08 xenqemudev -> /etc/init.d/xenqemudev lrwxrwxrwx 1 root root 21 Jun 24 23:08 xenstored -> /etc/init.d/xenstored ==== end working example /etc/runlevels/xen ==== If however the softlevel directory only contains a few additional services required for the runlevel and for the rest refers back to another existing runlevel (e.g. the "default" runlevel) by symlinking to this runlevel's directory, dependencies seem to go haywire and no longer result in the expected startup sequence. Please see the following 2 listings for a softlevel referring to another runlevel: ==== non-working example /etc/runlevels/xen ==== lrwxrwxrwx 1 root root 10 Dec 23 2013 default -> ../default lrwxrwxrwx 1 root root 22 Jun 7 02:15 icinga.snd -> /etc/init.d/icinga.snd lrwxrwxrwx 1 root root 15 Apr 30 22:09 nfs -> /etc/init.d/nfs lrwxrwxrwx 1 root root 16 Apr 30 20:20 nrpe -> /etc/init.d/nrpe lrwxrwxrwx 1 root root 22 Dec 23 2013 xencommons -> /etc/init.d/xencommons lrwxrwxrwx 1 root root 23 Mar 10 21:55 xenconsoled -> /etc/init.d/xenconsoled lrwxrwxrwx 1 root root 22 Dec 23 2013 xendomains -> /etc/init.d/xendomains lrwxrwxrwx 1 root root 22 Dec 23 2013 xenqemudev -> /etc/init.d/xenqemudev lrwxrwxrwx 1 root root 21 Dec 23 2013 xenstored -> /etc/init.d/xenstored ==== end non-working example /etc/runlevels/xen ==== ==== referred /etc/runlevels/default ==== lrwxrwxrwx 1 root root 18 May 26 01:46 cronie -> /etc/init.d/cronie lrwxrwxrwx 1 root root 20 Jun 18 12:33 ebtables -> /etc/init.d/ebtables lrwxrwxrwx 1 root root 21 Jun 18 12:32 ip6tables -> /etc/init.d/ip6tables lrwxrwxrwx 1 root root 20 Jun 18 12:32 iptables -> /etc/init.d/iptables lrwxrwxrwx 1 root root 17 Dec 12 2013 local -> /etc/init.d/local lrwxrwxrwx 1 root root 17 Dec 24 2013 mdadm -> /etc/init.d/mdadm lrwxrwxrwx 1 root root 23 Apr 30 22:06 net.enp0s25 -> /etc/init.d/net.enp0s25 lrwxrwxrwx 1 root root 22 Apr 30 22:09 net.xenbr0 -> /etc/init.d/net.xenbr0 lrwxrwxrwx 1 root root 20 Dec 12 2013 netmount -> /etc/init.d/netmount lrwxrwxrwx 1 root root 16 Dec 22 2013 ntpd -> /etc/init.d/ntpd lrwxrwxrwx 1 root root 22 Apr 25 21:11 nullmailer -> /etc/init.d/nullmailer lrwxrwxrwx 1 root root 18 Dec 22 2013 smartd -> /etc/init.d/smartd lrwxrwxrwx 1 root root 16 Dec 20 2013 sshd -> /etc/init.d/sshd lrwxrwxrwx 1 root root 21 Dec 20 2013 syslog-ng -> /etc/init.d/syslog-ng ==== end /etc/runlevels/default ==== I was not really able to extract a pattern on the startup sequence, but it differs significantly from the (working) example at the beginning as can bee seen by the following side-by-side comparision. The list is taken from rc-status and the numbers refer to the startup sequence for both the working and non-working case (NOTE: I have deleted the "[ started ]" output in the list below). ==== comparision of startup sequence ==== rc-status startup sequence Runlevel: xen working non-working ebtables 1 2 ip6tables 2 11 iptables 3 12 net.enp0s25 4 3 net.xenbr0 5 4 syslog-ng 6 22 xencommons 8 5 xenstored 9 6 nfs 10 7 ntpd 11 18 xenconsoled 12 8 xendomains 13 9 icinga.snd 14 10 nrpe 17 13 xenqemudev 21 14 cronie 7 1 mdadm 15 16 netmount 16 17 nullmailer 18 19 smartd 19 20 sshd 20 21 local 22 15 Dynamic Runlevel: hotplugged Dynamic Runlevel: needed rpcbind rpc.pipefs rpc.idmapd rpc.statd Dynamic Runlevel: manual ==== end rc-status ==== Please also NOTE: For the working case the startup sequence is pretty much in line with the output of the "rc-status" command within the started environment - althought there are a few sequence deviations as well which, if required, I can probably change with a bit of tweaking on my side. Furthermore please find the complete output of rc-status command when started in the non-working example: ==== rc-status non working ==== Runlevel: xen.old xencommons [ started ] xenstored [ started ] nfs [ started ] xenconsoled [ started ] xendomains [ started ] xenqemudev [ started ] nrpe [ started ] icinga.snd [ started ] Stacked Runlevel: default ebtables [ started ] ip6tables [ started ] iptables [ started ] net.enp0s25 [ started ] net.xenbr0 [ started ] netmount [ started ] syslog-ng [ started ] cronie [ started ] ntpd [ started ] mdadm [ started ] nullmailer [ started ] smartd [ started ] sshd [ started ] local [ started ] Dynamic Runlevel: hotplugged Dynamic Runlevel: needed net.enp0s25 [ started ] net.xenbr0 [ started ] rpcbind [ started ] rpc.pipefs [ started ] rpc.idmapd [ started ] rpc.statd [ started ] Dynamic Runlevel: manual syslog-ng [ started ] mdadm [ started ] netmount [ started ] cronie [ started ] ebtables [ started ] ip6tables [ started ] iptables [ started ] ntpd [ started ] nullmailer [ started ] smartd [ started ] sshd [ started ] local [ started ] ==== end rc-status ==== Expected behaviour: Identical startup sequence in both cases. If this does not work as expected it makes managing numerous softlevels differing only in a number of additional services over and above a defined base (the "default" runlevel) very difficult.
You didn't post your emerge --info, so I am assuming you are using openrc-0.12.4. Yes, there are some issues with stacked runlevels in that version. Can you please test with openrc-9999? This will become openrc-0.13, and there are fixes there for stacked runlevels. Thanks, William
(In reply to William Hubbs from comment #1) Hallo William, first of all many thanks for your quick reply. > You didn't post your emerge --info, so I am assuming you are using > openrc-0.12.4. Yes, there are some issues with stacked runlevels in that > version. You are right, I use openrc-0.12.4 and appologies for not posting emerge --info. > Can you please test with openrc-9999? This will become openrc-0.13, and > there are fixes there for stacked runlevels. I have gien it a go and - as expected - received a lot of warnings (in yellow) related to the rename of runscript. There have been no changes to both runlevels (the one called working w/o stacking and the other named non-working using default as a stacked runlevel). Other than that there are new results as follows: ==== comparision of startup sequence ==== OpenRC-0.12.4 OpenRC-9999 rc-status startup sequence startup sequence Runlevel: xen working non-working working non-working ebtables 1 2 1 1 ip6tables 2 11 2 2 iptables 3 12 3 3 net.enp0s25 4 3 4 4 net.xenbr0 5 4 5 5 syslog-ng 6 22 6 6 xencommons 8 5 15 8 xenstored 9 6 16 9 nfs 10 7 17 10 ntpd 11 18 10 11 xenconsoled 12 8 18 12 xendomains 13 9 19 13 icinga.snd 14 10 20 14 nrpe 17 13 21 17 xenqemudev 21 14 22 21 cronie 7 1 7 7 mdadm 15 16 8 15 netmount 16 17 9 16 nullmailer 18 19 11 18 smartd 19 20 12 19 sshd 20 21 13 20 local 22 15 14 22 Dynamic Runlevel: hotplugged Dynamic Runlevel: needed rpcbind rpc.pipefs rpc.idmapd rpc.statd Dynamic Runlevel: manual ==== end comparision startup sequence ==== It appears that the new version completely mimics the old working version, so that looks good. On the other hand if I run the old working version with the new OpenRC-9999 version, I get different results - please compare the tow columns named "working" for differences. That to me indicated that there's still something wrong somewhere as it seems reasonable to expect identiacl startup sequences with the same version of OpenRC if one uses stacked runlevels as long as the services in both runlevels are identical. Furthermore there now seems to be a bug in rc-status because this now loops forever when asked to provide info about the non-working version (regardless of whether booted from the non-working runlevel or the working runlevel when the former is provided as an argument to rc-status). The output is as follows: ==== output of rc-statur for non-working ==== Runlevel: xen.old xencommons [ started ] xenstored [ started ] nfs [ started ] xenconsoled [ started ] xendomains [ started ] xenqemudev [ started ] nrpe [ started ] icinga.snd [ started ] *Stacked Runlevel: default *ebtables [ started ] *ip6tables [ started ] *iptables [ started ] *net.enp0s25 [ started ] *net.xenbr0 [ started ] *netmount [ started ] *syslog-ng [ started ] *cronie [ started ] *ntpd [ started ] *mdadm [ started ] *nullmailer [ started ] *smartd [ started ] *sshd [ started ] *local [ started ] ==== end of rc-status ==== NOTE: Those lines marked with "*" (which is not part of the output put rather my manual marker) will be repeated forever. Best regards KK P.S. I hope your hardware issues are resolved ...
Thanks for asking, everything seems to be working now. I just did the following on my system. 1) mkdir /etc/runlevels/test 2) rc-update add -s default test # This adds the default runlevel to the test runlevel. 3) add a couple of extra services to the test runlevel. Once I did this, I was able to switch freely between default and test without issues by running: "openrc default" or "openrc test". Can you try this with your runlevels?
(In reply to William Hubbs from comment #3) > I just did the following on my system. > > 1) mkdir /etc/runlevels/test > 2) rc-update add -s default test # This adds the default runlevel to the test > runlevel. > 3) add a couple of extra services to the test runlevel. > > Once I did this, I was able to switch freely between default and test without > issues by running: > > "openrc default" or "openrc test". > > Can you try this with your runlevels? I have to admit, I am slightly confused and do not really understand where you want to arrive at by me trying this. Though this might be due to a misunderstanding of my previous comment (probably the line stateing "There have been no changes to both runlevels" which could easily be misread). So please beat with me while I'll try to re-iterate the main points of my last test: An upfront NOTE: The term 'scenario' used further below is independent of the OpenRC version. A scenario just describe how services are linked to a softlevel/runlevel directory. 1.) With OpenRC-9999 the startup sequence using a stacked runlevel (i.e. including another symlinked runlevel, that is a directory from /etc/runlevels - I have called this scenario "non-working") is *identical* to OpenRC-0.12.4 with all services individually linked (i.e. no stacked runlevels, just symlinked files from /etc/init.d; I have called this scenario "working"). In other words: Startup sequence for scenario "non-working" using OpenRC-9999 _is_ identiccal to startup sequence for scenario "working" on OpenRC.0.12.4. That's good and expected and resolves my initial problem once OpenRC-0.13 is out. For the remaining points, we can therefore safely disregard the startup sequence for scenario "non-working" using OpenRC-0.12.4 as this is confirmed to be broken. 2.) While doing the tests, I have also observed that using OpenRC-9999 the startup sequence, however, does differ when comparing scenario "non-working" with scenario "working" (NOTE: both using OpenRC-9999; there is no OpenRC-0.12.4 involved in this comparision). I do not think that this is expected behaviour. For details please see the penultimate table with a heading reading "==== comparision of startup sequence ====" in my previous comment and compare the two rightmost columns describing the startup sequence between scenarios "working" and "non-working", both using OpenRC-9999. 3. Finally I also observed that OpenRC-9999's command rc-status is looping forever with information constantly being printed to the screen in (at least) the following two cases: a) if booted in softlevel "non-working" and invoked as "rc-status" b) if booted in softlevel "working" and invoked as "rc-status non-working" Details about the output and the repetitive pattern of lines is available in the last table of my previous comment. I hope that clarifies my previous comment. In case I misunderstood you and you still require me to try your steps I'd very much appreciate a quick explanation what it is you try to achive with that. Many thanks and keep up the excellent work with OpenRC - I do not want to move over to systemd any time at all. KK
I am narrowing this bug down to one issue which I was able to reproduce here, along with the steps for that issue. If there is a separate issue, can you please open another bug? The issue I will tie to this bug is this: If a runlevel has a stacked runlevel inside it, rc-status goes into an infinite loop when you ask for information about that runlevel. To reproduce: 1. mkdir /etc/runlevel/test 2. add some services to the test runlevel using rc-update 3. add the default runlevel to the test runlevel using: rc-update add -s default test 4. run rc-status test and observe the loop.
I just used git bisect to track this down, and it reports that commit 7716bf31 is the commit that introduced the issue.
It looks like print_stacked_services() in rc-status.c is where the issue is. This function goes into an infinite loop for some reason.
Hello Max, can you please assist with this bug? It is a regression that was introduced some how as part of the fix for bug #467368. Thanks much, William
Hi William. No worries. I'm pretty busy at the moment but I'll try to have a look by the end of the week. All the servers in our cluster are running openrc-0.11.8, which does not seem to exhibit this problem. See test below... # rc-status Runlevel: extended xenstored [ started ] puppet [ started ] mcollectived [ started ] xenconsoled [ started ] xendomains [ started ] corosync [ started ] pacemaker [ started ] Stacked Runlevel: default snmpd [ started ] lm_sensors [ started ] sensord [ started ] rpc.pipefs [ started ] ntp-client [ started ] rpcbind [ started ] rpc.statd [ started ] rpc.idmapd [ started ] nfsmount [ started ] ceph [ started ] autofs [ started ] ntpd [ started ] dhcpd [ started ] netmount [ started ] vixie-cron [ started ] local [ started ] Stacked Runlevel: core iptables [ started ] openib [ started ] net.ib0 [ started ] net.eth0 [ started ] syslog-ng [ started ] sshd [ started ] opensm [ started ] Dynamic Runlevel: hotplugged Dynamic Runlevel: needed Dynamic Runlevel: manual # ls /etc/runlevels/extended/ -l total 0 lrwxrwxrwx 1 root root 20 Sep 19 2013 corosync -> /etc/init.d/corosync lrwxrwxrwx 1 root root 10 Apr 25 2013 default -> ../default lrwxrwxrwx 1 root root 24 Aug 19 2013 mcollectived -> /etc/init.d/mcollectived lrwxrwxrwx 1 root root 21 Sep 19 2013 pacemaker -> /etc/init.d/pacemaker lrwxrwxrwx 1 root root 18 Aug 15 2013 puppet -> /etc/init.d/puppet lrwxrwxrwx 1 root root 23 Apr 23 2013 xenconsoled -> /etc/init.d/xenconsoled lrwxrwxrwx 1 root root 22 Aug 31 2013 xendomains -> /etc/init.d/xendomains lrwxrwxrwx 1 root root 21 Apr 23 2013 xenstored -> /etc/init.d/xenstored # ls /etc/runlevels/default/ -l total 0 lrwxrwxrwx 1 root root 18 Aug 20 2010 autofs -> /etc/init.d/autofs lrwxrwxrwx 1 root root 16 Sep 2 2013 ceph -> /etc/init.d/ceph lrwxrwxrwx 1 root root 7 Apr 25 2013 core -> ../core lrwxrwxrwx 1 root root 17 Sep 12 2013 dhcpd -> /etc/init.d/dhcpd lrwxrwxrwx 1 root root 22 Mar 23 2010 lm_sensors -> /etc/init.d/lm_sensors lrwxrwxrwx 1 root root 17 Jun 9 2011 local -> /etc/init.d/local lrwxrwxrwx 1 root root 20 Jun 9 2011 netmount -> /etc/init.d/netmount lrwxrwxrwx 1 root root 20 Apr 24 2012 nfsmount -> /etc/init.d/nfsmount lrwxrwxrwx 1 root root 22 Aug 15 2013 ntp-client -> /etc/init.d/ntp-client lrwxrwxrwx 1 root root 16 Sep 15 2009 ntpd -> /etc/init.d/ntpd lrwxrwxrwx 1 root root 19 May 9 2013 rpcbind -> /etc/init.d/rpcbind lrwxrwxrwx 1 root root 22 Aug 31 2013 rpc.idmapd -> /etc/init.d/rpc.idmapd lrwxrwxrwx 1 root root 22 Aug 31 2013 rpc.pipefs -> /etc/init.d/rpc.pipefs lrwxrwxrwx 1 root root 21 Apr 13 2009 rpc.statd -> /etc/init.d/rpc.statd lrwxrwxrwx 1 root root 19 Mar 23 2010 sensord -> /etc/init.d/sensord lrwxrwxrwx 1 root root 17 Aug 20 2011 snmpd -> /etc/init.d/snmpd lrwxrwxrwx 1 root root 22 Oct 28 2011 vixie-cron -> /etc/init.d/vixie-cron I shall try to build the latest GIT version, install it in a VM and see what happens.
Sorry to reply to myself so soon but I felt I should clarify that our servers are using openrc-0.11.8 ___from our overlay___ (hacking-gentoo in layman) in what we refer to as "replace mode". It would really help to narrow this down if you could tell me if this issue is present in the various versions from our overlay as the patches applied to the GIT version are not exactly as they are in our version (specifically openrc-0.11.8-default_runlevel.patch is not included). Thanks!
(In reply to Max Hacking from comment #10) > It would really help to narrow this down if you could tell me if this issue > is present in the various versions from our overlay as the patches applied > to the GIT version are not exactly as they are in our version (specifically > openrc-0.11.8-default_runlevel.patch is not included). Is this the patch I remember seeing a while back that reads the default runlevel from /etc/inittab? If it is, that should not affect this bug.
(In reply to William Hubbs from comment #11) > (In reply to Max Hacking from comment #10) > > It would really help to narrow this down if you could tell me if this issue > > is present in the various versions from our overlay as the patches applied > > to the GIT version are not exactly as they are in our version (specifically > > openrc-0.11.8-default_runlevel.patch is not included). > > Is this the patch I remember seeing a while back that reads the default > runlevel from /etc/inittab? If it is, that should not affect this bug. Yes, although the patches on that ebuild are the ones I submitted before the "cleanup" of rc_get_runlevel_chain() and rc_runlevel_stacks(), have been well tested on a large cluster and, as far as I'm concerned, are working fine. I'm interested to see if the problem can be replicated with the same ebuild I'm using as so far I can't seem to do so. If I can't then that kind of implies that the patches got broken some how after I submitted them (presumably during the above mentioned "cleanup", assuming that ever happened). Sorry if that is a pain to test (you don't need to use "replace mode" for everything, just open-rc).
(In reply to Max Hacking from comment #12) > (In reply to William Hubbs from comment #11) > > (In reply to Max Hacking from comment #10) > > > It would really help to narrow this down if you could tell me if this issue > > > is present in the various versions from our overlay as the patches applied > > > to the GIT version are not exactly as they are in our version (specifically > > > openrc-0.11.8-default_runlevel.patch is not included). > > > > Is this the patch I remember seeing a while back that reads the default > > runlevel from /etc/inittab? If it is, that should not affect this bug. > > Yes, although the patches on that ebuild are the ones I submitted before the > "cleanup" of rc_get_runlevel_chain() and rc_runlevel_stacks(), have been > well tested on a large cluster and, as far as I'm concerned, are working > fine. I can tell you that OpenRc-0.12.4, which is current stable, doesn't have the issue; I pointed to the commit in git that introduced it above in comment #6. Can you please pull that commit and tell me what we changed in comparison to your patch, other than the style changes? That might help me with finding where the issue is. From the commit log, I don't specifically remember rewriting anything, but I'm not sure how much Alexander rewrote. > I'm interested to see if the problem can be replicated with the same ebuild > I'm using as so far I can't seem to do so. If I can't then that kind of > implies that the patches got broken some how after I submitted them > (presumably during the above mentioned "cleanup", assuming that ever > happened). > > Sorry if that is a pain to test (you don't need to use "replace mode" for > everything, just open-rc). I don't have a vm anywhere and I just run on one box, so I tend to hesitate to go back that far with OpenRc. Thanks for your understanding. William
This has been fixed in commit 40f42ce and will be included in OpenRc-0.13.