Summary: | sys-apps/openrc-9999: rc-status loops forever when the given runlevel has a stacked runlevel inside it | ||
---|---|---|---|
Product: | Gentoo Hosted Projects | Reporter: | KK <klaus.kreil> |
Component: | OpenRC | Assignee: | OpenRC Team <openrc> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | klaus.kreil, max.gentoo.bugzilla, whissi |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 481182 |
Description
KK
2014-06-24 23:31:52 UTC
You didn't post your emerge --info, so I am assuming you are using openrc-0.12.4. Yes, there are some issues with stacked runlevels in that version. Can you please test with openrc-9999? This will become openrc-0.13, and there are fixes there for stacked runlevels. Thanks, William (In reply to William Hubbs from comment #1) Hallo William, first of all many thanks for your quick reply. > You didn't post your emerge --info, so I am assuming you are using > openrc-0.12.4. Yes, there are some issues with stacked runlevels in that > version. You are right, I use openrc-0.12.4 and appologies for not posting emerge --info. > Can you please test with openrc-9999? This will become openrc-0.13, and > there are fixes there for stacked runlevels. I have gien it a go and - as expected - received a lot of warnings (in yellow) related to the rename of runscript. There have been no changes to both runlevels (the one called working w/o stacking and the other named non-working using default as a stacked runlevel). Other than that there are new results as follows: ==== comparision of startup sequence ==== OpenRC-0.12.4 OpenRC-9999 rc-status startup sequence startup sequence Runlevel: xen working non-working working non-working ebtables 1 2 1 1 ip6tables 2 11 2 2 iptables 3 12 3 3 net.enp0s25 4 3 4 4 net.xenbr0 5 4 5 5 syslog-ng 6 22 6 6 xencommons 8 5 15 8 xenstored 9 6 16 9 nfs 10 7 17 10 ntpd 11 18 10 11 xenconsoled 12 8 18 12 xendomains 13 9 19 13 icinga.snd 14 10 20 14 nrpe 17 13 21 17 xenqemudev 21 14 22 21 cronie 7 1 7 7 mdadm 15 16 8 15 netmount 16 17 9 16 nullmailer 18 19 11 18 smartd 19 20 12 19 sshd 20 21 13 20 local 22 15 14 22 Dynamic Runlevel: hotplugged Dynamic Runlevel: needed rpcbind rpc.pipefs rpc.idmapd rpc.statd Dynamic Runlevel: manual ==== end comparision startup sequence ==== It appears that the new version completely mimics the old working version, so that looks good. On the other hand if I run the old working version with the new OpenRC-9999 version, I get different results - please compare the tow columns named "working" for differences. That to me indicated that there's still something wrong somewhere as it seems reasonable to expect identiacl startup sequences with the same version of OpenRC if one uses stacked runlevels as long as the services in both runlevels are identical. Furthermore there now seems to be a bug in rc-status because this now loops forever when asked to provide info about the non-working version (regardless of whether booted from the non-working runlevel or the working runlevel when the former is provided as an argument to rc-status). The output is as follows: ==== output of rc-statur for non-working ==== Runlevel: xen.old xencommons [ started ] xenstored [ started ] nfs [ started ] xenconsoled [ started ] xendomains [ started ] xenqemudev [ started ] nrpe [ started ] icinga.snd [ started ] *Stacked Runlevel: default *ebtables [ started ] *ip6tables [ started ] *iptables [ started ] *net.enp0s25 [ started ] *net.xenbr0 [ started ] *netmount [ started ] *syslog-ng [ started ] *cronie [ started ] *ntpd [ started ] *mdadm [ started ] *nullmailer [ started ] *smartd [ started ] *sshd [ started ] *local [ started ] ==== end of rc-status ==== NOTE: Those lines marked with "*" (which is not part of the output put rather my manual marker) will be repeated forever. Best regards KK P.S. I hope your hardware issues are resolved ... Thanks for asking, everything seems to be working now. I just did the following on my system. 1) mkdir /etc/runlevels/test 2) rc-update add -s default test # This adds the default runlevel to the test runlevel. 3) add a couple of extra services to the test runlevel. Once I did this, I was able to switch freely between default and test without issues by running: "openrc default" or "openrc test". Can you try this with your runlevels? (In reply to William Hubbs from comment #3) > I just did the following on my system. > > 1) mkdir /etc/runlevels/test > 2) rc-update add -s default test # This adds the default runlevel to the test > runlevel. > 3) add a couple of extra services to the test runlevel. > > Once I did this, I was able to switch freely between default and test without > issues by running: > > "openrc default" or "openrc test". > > Can you try this with your runlevels? I have to admit, I am slightly confused and do not really understand where you want to arrive at by me trying this. Though this might be due to a misunderstanding of my previous comment (probably the line stateing "There have been no changes to both runlevels" which could easily be misread). So please beat with me while I'll try to re-iterate the main points of my last test: An upfront NOTE: The term 'scenario' used further below is independent of the OpenRC version. A scenario just describe how services are linked to a softlevel/runlevel directory. 1.) With OpenRC-9999 the startup sequence using a stacked runlevel (i.e. including another symlinked runlevel, that is a directory from /etc/runlevels - I have called this scenario "non-working") is *identical* to OpenRC-0.12.4 with all services individually linked (i.e. no stacked runlevels, just symlinked files from /etc/init.d; I have called this scenario "working"). In other words: Startup sequence for scenario "non-working" using OpenRC-9999 _is_ identiccal to startup sequence for scenario "working" on OpenRC.0.12.4. That's good and expected and resolves my initial problem once OpenRC-0.13 is out. For the remaining points, we can therefore safely disregard the startup sequence for scenario "non-working" using OpenRC-0.12.4 as this is confirmed to be broken. 2.) While doing the tests, I have also observed that using OpenRC-9999 the startup sequence, however, does differ when comparing scenario "non-working" with scenario "working" (NOTE: both using OpenRC-9999; there is no OpenRC-0.12.4 involved in this comparision). I do not think that this is expected behaviour. For details please see the penultimate table with a heading reading "==== comparision of startup sequence ====" in my previous comment and compare the two rightmost columns describing the startup sequence between scenarios "working" and "non-working", both using OpenRC-9999. 3. Finally I also observed that OpenRC-9999's command rc-status is looping forever with information constantly being printed to the screen in (at least) the following two cases: a) if booted in softlevel "non-working" and invoked as "rc-status" b) if booted in softlevel "working" and invoked as "rc-status non-working" Details about the output and the repetitive pattern of lines is available in the last table of my previous comment. I hope that clarifies my previous comment. In case I misunderstood you and you still require me to try your steps I'd very much appreciate a quick explanation what it is you try to achive with that. Many thanks and keep up the excellent work with OpenRC - I do not want to move over to systemd any time at all. KK I am narrowing this bug down to one issue which I was able to reproduce here, along with the steps for that issue. If there is a separate issue, can you please open another bug? The issue I will tie to this bug is this: If a runlevel has a stacked runlevel inside it, rc-status goes into an infinite loop when you ask for information about that runlevel. To reproduce: 1. mkdir /etc/runlevel/test 2. add some services to the test runlevel using rc-update 3. add the default runlevel to the test runlevel using: rc-update add -s default test 4. run rc-status test and observe the loop. I just used git bisect to track this down, and it reports that commit 7716bf31 is the commit that introduced the issue. It looks like print_stacked_services() in rc-status.c is where the issue is. This function goes into an infinite loop for some reason. Hello Max, can you please assist with this bug? It is a regression that was introduced some how as part of the fix for bug #467368. Thanks much, William Hi William. No worries. I'm pretty busy at the moment but I'll try to have a look by the end of the week. All the servers in our cluster are running openrc-0.11.8, which does not seem to exhibit this problem. See test below... # rc-status Runlevel: extended xenstored [ started ] puppet [ started ] mcollectived [ started ] xenconsoled [ started ] xendomains [ started ] corosync [ started ] pacemaker [ started ] Stacked Runlevel: default snmpd [ started ] lm_sensors [ started ] sensord [ started ] rpc.pipefs [ started ] ntp-client [ started ] rpcbind [ started ] rpc.statd [ started ] rpc.idmapd [ started ] nfsmount [ started ] ceph [ started ] autofs [ started ] ntpd [ started ] dhcpd [ started ] netmount [ started ] vixie-cron [ started ] local [ started ] Stacked Runlevel: core iptables [ started ] openib [ started ] net.ib0 [ started ] net.eth0 [ started ] syslog-ng [ started ] sshd [ started ] opensm [ started ] Dynamic Runlevel: hotplugged Dynamic Runlevel: needed Dynamic Runlevel: manual # ls /etc/runlevels/extended/ -l total 0 lrwxrwxrwx 1 root root 20 Sep 19 2013 corosync -> /etc/init.d/corosync lrwxrwxrwx 1 root root 10 Apr 25 2013 default -> ../default lrwxrwxrwx 1 root root 24 Aug 19 2013 mcollectived -> /etc/init.d/mcollectived lrwxrwxrwx 1 root root 21 Sep 19 2013 pacemaker -> /etc/init.d/pacemaker lrwxrwxrwx 1 root root 18 Aug 15 2013 puppet -> /etc/init.d/puppet lrwxrwxrwx 1 root root 23 Apr 23 2013 xenconsoled -> /etc/init.d/xenconsoled lrwxrwxrwx 1 root root 22 Aug 31 2013 xendomains -> /etc/init.d/xendomains lrwxrwxrwx 1 root root 21 Apr 23 2013 xenstored -> /etc/init.d/xenstored # ls /etc/runlevels/default/ -l total 0 lrwxrwxrwx 1 root root 18 Aug 20 2010 autofs -> /etc/init.d/autofs lrwxrwxrwx 1 root root 16 Sep 2 2013 ceph -> /etc/init.d/ceph lrwxrwxrwx 1 root root 7 Apr 25 2013 core -> ../core lrwxrwxrwx 1 root root 17 Sep 12 2013 dhcpd -> /etc/init.d/dhcpd lrwxrwxrwx 1 root root 22 Mar 23 2010 lm_sensors -> /etc/init.d/lm_sensors lrwxrwxrwx 1 root root 17 Jun 9 2011 local -> /etc/init.d/local lrwxrwxrwx 1 root root 20 Jun 9 2011 netmount -> /etc/init.d/netmount lrwxrwxrwx 1 root root 20 Apr 24 2012 nfsmount -> /etc/init.d/nfsmount lrwxrwxrwx 1 root root 22 Aug 15 2013 ntp-client -> /etc/init.d/ntp-client lrwxrwxrwx 1 root root 16 Sep 15 2009 ntpd -> /etc/init.d/ntpd lrwxrwxrwx 1 root root 19 May 9 2013 rpcbind -> /etc/init.d/rpcbind lrwxrwxrwx 1 root root 22 Aug 31 2013 rpc.idmapd -> /etc/init.d/rpc.idmapd lrwxrwxrwx 1 root root 22 Aug 31 2013 rpc.pipefs -> /etc/init.d/rpc.pipefs lrwxrwxrwx 1 root root 21 Apr 13 2009 rpc.statd -> /etc/init.d/rpc.statd lrwxrwxrwx 1 root root 19 Mar 23 2010 sensord -> /etc/init.d/sensord lrwxrwxrwx 1 root root 17 Aug 20 2011 snmpd -> /etc/init.d/snmpd lrwxrwxrwx 1 root root 22 Oct 28 2011 vixie-cron -> /etc/init.d/vixie-cron I shall try to build the latest GIT version, install it in a VM and see what happens. Sorry to reply to myself so soon but I felt I should clarify that our servers are using openrc-0.11.8 ___from our overlay___ (hacking-gentoo in layman) in what we refer to as "replace mode". It would really help to narrow this down if you could tell me if this issue is present in the various versions from our overlay as the patches applied to the GIT version are not exactly as they are in our version (specifically openrc-0.11.8-default_runlevel.patch is not included). Thanks! (In reply to Max Hacking from comment #10) > It would really help to narrow this down if you could tell me if this issue > is present in the various versions from our overlay as the patches applied > to the GIT version are not exactly as they are in our version (specifically > openrc-0.11.8-default_runlevel.patch is not included). Is this the patch I remember seeing a while back that reads the default runlevel from /etc/inittab? If it is, that should not affect this bug. (In reply to William Hubbs from comment #11) > (In reply to Max Hacking from comment #10) > > It would really help to narrow this down if you could tell me if this issue > > is present in the various versions from our overlay as the patches applied > > to the GIT version are not exactly as they are in our version (specifically > > openrc-0.11.8-default_runlevel.patch is not included). > > Is this the patch I remember seeing a while back that reads the default > runlevel from /etc/inittab? If it is, that should not affect this bug. Yes, although the patches on that ebuild are the ones I submitted before the "cleanup" of rc_get_runlevel_chain() and rc_runlevel_stacks(), have been well tested on a large cluster and, as far as I'm concerned, are working fine. I'm interested to see if the problem can be replicated with the same ebuild I'm using as so far I can't seem to do so. If I can't then that kind of implies that the patches got broken some how after I submitted them (presumably during the above mentioned "cleanup", assuming that ever happened). Sorry if that is a pain to test (you don't need to use "replace mode" for everything, just open-rc). (In reply to Max Hacking from comment #12) > (In reply to William Hubbs from comment #11) > > (In reply to Max Hacking from comment #10) > > > It would really help to narrow this down if you could tell me if this issue > > > is present in the various versions from our overlay as the patches applied > > > to the GIT version are not exactly as they are in our version (specifically > > > openrc-0.11.8-default_runlevel.patch is not included). > > > > Is this the patch I remember seeing a while back that reads the default > > runlevel from /etc/inittab? If it is, that should not affect this bug. > > Yes, although the patches on that ebuild are the ones I submitted before the > "cleanup" of rc_get_runlevel_chain() and rc_runlevel_stacks(), have been > well tested on a large cluster and, as far as I'm concerned, are working > fine. I can tell you that OpenRc-0.12.4, which is current stable, doesn't have the issue; I pointed to the commit in git that introduced it above in comment #6. Can you please pull that commit and tell me what we changed in comparison to your patch, other than the style changes? That might help me with finding where the issue is. From the commit log, I don't specifically remember rewriting anything, but I'm not sure how much Alexander rewrote. > I'm interested to see if the problem can be replicated with the same ebuild > I'm using as so far I can't seem to do so. If I can't then that kind of > implies that the patches got broken some how after I submitted them > (presumably during the above mentioned "cleanup", assuming that ever > happened). > > Sorry if that is a pain to test (you don't need to use "replace mode" for > everything, just open-rc). I don't have a vm anywhere and I just run on one box, so I tend to hesitate to go back that far with OpenRc. Thanks for your understanding. William This has been fixed in commit 40f42ce and will be included in OpenRc-0.13. |