After installing sys-apps/openrc-0.9.6 and rebooting system boot process stops after attempting to start the /etc/init.d/modules (most likely the other services are waiting for this service will fulfill). Reproducible: Always Steps to Reproduce: 1. emerge sys-apps/openrc-0.9.6 2.reboot
same here
I can confirm this too
same here, however, the cause of problem is not modules related. after updating to 0.9.6 the migrate-run script was added to boot runlevel, and this is where the problem starts - it wants to run before everything else (after localmount, which depends on next services like modules, fsck, root, mtab and so on) - and it's kinda loop-situation i belive. to get your system up and running you have to: 1. boot into single 2. remount / to rw 3. remove migrate-run from boot level 4. reboot
I confirm this too, and the working solution is to remove migrate-run from runlevel... Another ebuild that has been not tested by devs... how many times you repeat this?!
same here. can't find any kind of openrc logs (yes, I'm a little bit n00b in /var/) to help detect reason of this problem gonna try "remove migrate-run from boot level" later. thanX to Dawid Stawiarski for this advice :)
Took me some hours of work to find out that this bug hit me (thought I had screwed up something by myself so didn't check bugzilla earlier). Removing migrate-run from boot works for me. Before that I was also able to boot again by setting rc_parallel=NO in /etc/rc.conf This issue maybe only arises with rc_parallel=YES ?
Same here. Workaround: sudo rc-update del migrate-run boot but some services are "stopped" after boot :(
(In reply to comment #8) > Same here. > Workaround: sudo rc-update del migrate-run boot > but some services are "stopped" after boot :( All started here
This issue must occur only with rc_parallel set to yes, because I did test here, several times, before doing the release. In response to the comment in comment #5 about things not being tested: rc_parallel has never officially been declared a stable feature (see the comments in rc.conf regarding this). Can you try with migrate_run in the boot runlevel and rc_parallel=no?
Specifically, the following comments from rc.conf apply: # WARNING: whilst we have improved parallel, it can still potentially lock # the boot process. Don't file bugs about this unless you can supply # patches that fix it without breaking other things!
(In reply to comment #10) the /run depends don't work as desired. the "before *" makes it run before localmount. we'll probably need to add code to the dependency logic.
Really funny: It doesn't want to just "return" in parallel? The only work migrate-run is when shutdown. But there also is "unlogic"...
> Can you try with migrate_run in the boot runlevel and rc_parallel=no? Just tried. With commented line "rc_parallel = yes" (equal to uncommented rc_parallel=no) works fine even with migrate-run in the "boot" runlevel $ /etc/init.d/migrate-run status * status: started $
rc_parallel has always been considered an unstable feature of openrc. There was a very clear warning in rc.conf thatsetting rc_parallel=y can lock up your boot process. rc_parallel=y is only to be used currently by developers and users who are willing to test the feature, and bugs against it are not considered release blockers. I have removed the documentation and setting itself from rc.conf in commit 695f388. I recommend removing any rc_parallel lines from your rc.conf unless you are comfortable with this risk.
Reading over my last comment, I realized that it might have sounded more harsh than I intended, so sorry about that. In a nutshell though, it is a known issue that rc_parallel=yes can lock up your boot process. Because of this, rc_parallel=yes is not recommended for general use.
In reply to comment #12: Yes, the dependencies for migrate-run are not correct, but that should be a separate bug.
Created attachment 293917 [details, diff] migrate-run-link-fix.patch
Why can't this script be started before mount-ro as /etc/runlevels/shutdown/migrate-run Why this "start-return" but "stop-do" logics? Wouldn't it be even better to run killprocs before migrations? @Alphat-PC, yes wrong direction this old: "ln -s /var/run /run" But why relative path to /run (ln -s ../run /var/run)? Do you know for sure where we are?
Created attachment 293959 [details, diff] avoid hang in parallel mode.patch ln -s /var/run /run is not correct ln -s /run /var/run or ln -s ../run /var/run is OK.
In response to comment #19: We can't run this after killprocs, because if /var is on a separate partition, it will be unmounted by then. It is safe to make the links in /var point to the absolute locations /run and /run/lock. For more information about this, see the article on /run referred to in bug #361349. Also, the patch on this comment is incorrect, because we do not care about the status of /run/lock. It may not even exist the first time we run this script. /run being a directory and either /var/run or /var/lock not being links is the only thing we need to worry about to tell us to perform the migration. However, I will look into breaking up the test so that we migrate each directory separately. In response to comment #20: You are correct, and I will make this change along with separating the directorie migrations.
I'm not sure if comment 21 already covers it (it seems it does), but migrate-run as it was in openrc 0.9.6 was in conflict with acpid init.d script, as (AFAICT) '/var/run' was created in bootmisc, but acpid is starting before it (it only has 'need localmount use logger' in depend()), resulting in a failure to start. Though there still might be an issue of improper shutdown to consider.
All, I have posted a request for comments to the openrc team regarding dropping support forr parallel startup. My reasoning is that there is no way we can test all configurations for all services, and it is also very possible for a user to write an init script that will hang their system and not realize it until they attempt a reboot, which is definitely too late. In other words, this feature is very volatile. This was posted to the team approximately 24 hourse ago. I will update this bug with the results of any discussion.
We have decided that we need to keep rc_parallel and fix it. However, we have also agreed that we should not advertise it in rc.conf until it is fixed. So, if you are using rc_parallel, be aware that it is a feature that is in development, and definitely should not be used on production systems at this time. The next release will remove the documentation and setting itself from rc.conf.
I use non-parallel, because I want to enter a password for my luks-encrypted /home. As a simple desktop user I would like to have a simple conf like /etc/conf.d/mysql:init_nowait=1 /etc/conf.d/dhcpcd:init_nowait=1 Why rely on a buggy openrc parallel algorythm?
How to hang... # cat /etc/init.d/service #!/sbin/runscript description="$RC_SVCNAME" depend() { : } start() { einfo "starting $RC_SVCNAME" return 0 } stop() { einfo "stopping $RC_SVCNAME" return 0 } # ln -s service /etc/init.d/svc1 # ln -s service /etc/init.d/svc2 # ln -s service /etc/init.d/svc3 # ln -s service /etc/init.d/svc4 Remove migrate-run for testing purpose #rc-update del migrate-run boot #rc-update add svc1 default #rc-update add svc2 default set rc_parallel="YES" And add the following configuration parameter in /etc/rc.conf, reboot # dep: svc1 < svc2 < svc1 : no hang rc_svc1_after="svc2" rc_svc2_after="svc1" # dep: svc1 > svc2 > svc1 : no hang rc_svc1_before="svc2" rc_svc2_before="svc1" #rc-update add svc3 default # dep: svc1 < svc2 < svc3 < svc1 : hang rc_svc1_after="svc2" rc_svc2_after="svc3" rc_svc3_after="svc1" # dep: svc1 > svc2 > svc3 > svc1 : hang rc_svc1_before="svc2" rc_svc2_before="svc3" rc_svc3_before="svc1" #rc-update add svc4 default # dep: svc1 < svc2 < svc3 < svc4 < svc1 : hang rc_svc1_after="svc2" rc_svc2_after="svc3" rc_svc3_after="svc4" rc_svc4_after="svc1" # dep: svc1 > svc2 > svc3 > svc4 > svc1 : hang rc_svc1_before="svc2" rc_svc2_before="svc3" rc_svc3_before="svc4" rc_svc4_before="svc1" # dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc3 > svc2 : no hang # dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 > svc3 : no hang rc_svc1_before="svc2" rc_svc2_before="svc3" rc_svc3_before="svc4" rc_svc4_before="svc1" rc_svc3_before="svc2" #rc_svc4_before="svc3" # dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 > svc2 : # svc2, svc3, svc4: hang rc_svc1_before="svc2" rc_svc2_before="svc3" rc_svc3_before="svc4" rc_svc4_before="svc1" rc_svc4_before="svc2" # dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 < svc2 : hang. rc_svc1_before="svc2" rc_svc2_before="svc3" rc_svc3_before="svc4" rc_svc4_before="svc1" rc_svc4_after="svc2"
Created attachment 294457 [details, diff] remove_wrong_dependencies.patch
Another update for this bug: We are looking at moving the code that handles migrating /var/{lock,run} to /run into bootmisc, which will remove the separate service for doing this. I don't know whether we want to accept the patch in comment #27 or not; I'll let others on the team look it over and add their thoughts. My first thought is that we should catch this when the dependency tree is being constructed and some how warn the user.
We do rc_waitpid(pid) in non-parallel, so we don't hang with wrong dependencies. Is it right?
(In reply to comment #28) I think openrc can automatically disable the parallel execution when circular dependency is detected. (Of course a warning should be issued to the user when this happens.)
rc_parallel was removed from rc.conf in 0.9.7 though it is still recognized (and currently working, for me at least).
I think that the parallel feature is one of the indicator of healthy init layout implementation. A lot of effort was invested in making it happen. Giving it up is not the solution.
(In reply to comment #32) > I think that the parallel feature is one of the indicator of healthy init > layout implementation. > A lot of effort was invested in making it happen. > Giving it up is not the solution. We're not going to give it up of course! ;)
i'm not sure if i should file another bug report for hwclock issue. i just added clock_adjfile="YES" to /etc/conf.d/clock. the settings is available but was not listed in the config file, i think the dev tried to hide this bug by not listing it in default config file as an example. when set to yes, the following dependency happens: hwclock: use root, provide clock root: before fsck fsck: use clock the whole boot up is screwed up: "root" script is ran in runlevel 3 while it should be in "boot" level. sometimes it failed to remount / in rw mode (or /etc/init.d/root is not even started?). and when rebooting, it hang after "Deactivating swap devices" it sure is circular dependency. and any package can add init.d script that create circular dependency please fix openrc to perform better under this situation. if it cannot do the job, it should warn about it, when caching dependencies. and there should be a way to list dependency to see what will happen before running into a fatal boot process multiple bugs have been reported relative to dependency issue, parallel booting, maybe openrc should reconsider in a design level, not just by checking the
multiple bugs have been reported relative to dependency issue, parallel booting, maybe openrc should reconsider in the design level beside patching the code, to make it logically sane
(In reply to comment #34) > i'm not sure if i should file another bug report for hwclock issue. i just > added clock_adjfile="YES" to /etc/conf.d/clock. the settings is available > but was not listed in the config file, i think the dev tried to hide this > bug by not listing it in default config file as an example. This is probably not documented because it is still considered testing, or Roy just forgot to document it. I'm sure it wasn't a deliberate attempt to hide a bug. To answer your question, yes, I would file a separate bug. (In reply to comment #35) > multiple bugs have been reported relative to dependency issue, parallel > booting, maybe openrc should reconsider in the design level beside patching > the code, to make it logically sane As has been said before, patches, proposals, etc are welcome. :-)
Hello. Seems I've fixed the bug [1]. [1] https://github.com/xaionaro/openrc/commit/a0899e2fd03cb78ac6c8b084cc6fa80b7c3eca8f Sorry for my coding style, I've never looked into OpenRC code before. Tell me what should I polish in the code.
With the patch: d[16:35:44] [root@imperium /srv/lxc/gentoo/rootfs/etc]# lxc-start -n gentoo INIT: version 2.88 booting * The 'rc' applet is deprecated; please use 'openrc' instead. OpenRC 0.13.de18640 is starting up Linux 3.12.8 (x86_64) [LXC] * /proc is already mounted * /run/openrc: creating directory * /run/lock: creating directory * /run/lock: correcting owner * Caching service dependencies ... [ ok ] * The 'rc' applet is deprecated; please use 'openrc' instead. mtab | * Updating /etc/mtab ...tmpfiles.setup | * setting up tmpfiles.d entries ... [ ok ] [ ok ] INIT: Entering runlevel: 3 * The 'rc' applet is deprecated; please use 'openrc' instead. * * runscript is deprecated; please use openrc-run instead. * * runscript is deprecated; please use openrc-run instead. svc2 | * ERROR: svc2 failed to start. Dependencies loop. svc4 |runscript is deprecated; please use openrc-run instead. runscript is deprecated; please use openrc-run instead. svc1 | * ERROR: svc4 failed to start. Dependencies loop. * ERROR: svc1 failed to start. Dependencies loop. svc3 | * ERROR: svc3 failed to start. Dependencies loop. local | * Starting local local | [ ok ] gentoo login: root (automatic login) gentoo ~ # rc-status Runlevel: default svc1 [ stopped ] svc2 [ stopped ] svc3 [ stopped ] svc4 [ stopped ] local [ started ] Dynamic Runlevel: hotplugged Dynamic Runlevel: needed Dynamic Runlevel: manual gentoo ~ #
*** Bug 498764 has been marked as a duplicate of this bug. ***
Also I've added early loop detector. It will try to solve loops on stage of "rc-update -u" and print every found loop with ewarn() if the loop is solved and eerror() if cannot solve it. Please review my fixes.
Here's an example of "rc-update -u" work: d[13:22:07] [root@imperium /home/xaionaro]# rc-update -u * Caching service dependencies ... * Found a dependencies loop: bootmisc -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on termencoding from the cache. * Found a dependencies loop: bootmisc -> localmount -> mtab -> root -> bootmisc. Trying to solve it by removing use/before dependencies of localmount on mtab from the cache. * Found a dependencies loop: bootmisc -> localmount -> fsck -> clock -> bootmisc. Trying to solve it by removing use/before dependencies of fsck on clock from the cache. * Found a dependencies loop: bootmisc -> keymaps -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on keymaps from the cache. * Found a dependencies loop: bootmisc -> consolefont -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on consolefont from the cache. * Found a dependencies loop: localmount -> modules -> localmount. Trying to solve it by removing use/before dependencies of localmount on modules from the cache. [ ok ] d[13:22:09] [root@imperium /home/xaionaro]#
This bug is has been turned into a feature request to upstream the boot dependency loop solver implemented for Debian. The present status: in Debian the loop solver by Okunev worked perfectly well for more than 5 years.