Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 391945

Summary: sys-apps/openrc: Dependency loop solver from Debian
Product: Gentoo Linux Reporter: Dmitry Suloev <SuloevDmitry>
Component: [OLD] Core systemAssignee: OpenRC Team <openrc>
Status: CONFIRMED ---    
Severity: critical CC: alon.barlev, anton.kochkov, asturm, carlphilippreh, CasperVector, che, christophe.chabanois, cruzki123, email, gentoo-bugs-augustin, heroxbd, jlec, kripton, lists, O01eg, regboxemg, rharwood, rion4ik, ryao, taaroa, till2.schaefer, williamh, xaionaro
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://archives.gentoo.org/gentoo-dev/message/77752cba11a56906131503ab36e2b995
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: migrate-run-link-fix.patch
avoid hang in parallel mode.patch
remove_wrong_dependencies.patch

Description Dmitry Suloev 2011-11-26 09:15:19 UTC
After installing sys-apps/openrc-0.9.6 and rebooting system boot process stops after attempting to start the /etc/init.d/modules (most likely the other services are waiting for this service will fulfill).

Reproducible: Always

Steps to Reproduce:
1. emerge sys-apps/openrc-0.9.6
2.reboot
Comment 1 Lukas Zavodny 2011-11-26 09:45:32 UTC
same here
Comment 2 I am 2011-11-26 10:15:58 UTC
same here
Comment 3 Chris Wells 2011-11-26 10:38:37 UTC
I can confirm this too
Comment 4 Dawid Stawiarski 2011-11-26 11:35:28 UTC
same here, however, the cause of problem is not modules related. after updating to 0.9.6 the migrate-run script was added to boot runlevel, and this is where the problem starts - it wants to run before everything else (after localmount, which depends on next services like modules, fsck, root, mtab and so on) - and it's kinda loop-situation i belive.

to get your system up and running you have to:
1. boot into single
2. remount / to rw
3. remove migrate-run from boot level
4. reboot
Comment 5 Rafal Kupiec 2011-11-26 13:28:03 UTC
I confirm this too, and the working solution is to remove migrate-run from runlevel...

Another ebuild that has been not tested by devs... how many times you repeat this?!
Comment 6 Ivan 2011-11-26 14:14:38 UTC
same here.

can't find any kind of openrc logs (yes, I'm a little bit n00b in /var/) to help detect reason of this problem

gonna try "remove migrate-run from boot level" later.
thanX to Dawid Stawiarski for this advice :)
Comment 7 Stefan G. Weichinger 2011-11-26 15:38:49 UTC
Took me some hours of work to find out that this bug hit me (thought I had screwed up something by myself so didn't check bugzilla earlier).

Removing migrate-run from boot works for me.

Before that I was also able to boot again by setting rc_parallel=NO in /etc/rc.conf

This issue maybe only arises with rc_parallel=YES ?
Comment 8 Andrey 2011-11-26 16:18:47 UTC
Same here.
Workaround: sudo rc-update del migrate-run boot 
but some services are "stopped" after boot :(
Comment 9 Rafal Kupiec 2011-11-26 16:21:04 UTC
(In reply to comment #8)
> Same here.
> Workaround: sudo rc-update del migrate-run boot 
> but some services are "stopped" after boot :(

All started here
Comment 10 William Hubbs gentoo-dev 2011-11-26 17:56:09 UTC
This issue must occur only with rc_parallel set  to yes, because I did test here, several times, before doing the release.

In response to the comment in comment #5 about things not being tested:

rc_parallel has never officially been declared a stable feature (see the
comments in rc.conf regarding this).

Can you try with migrate_run in the  boot runlevel and rc_parallel=no?
Comment 11 William Hubbs gentoo-dev 2011-11-26 18:11:45 UTC
Specifically, the following comments from rc.conf apply:

# WARNING: whilst we have improved parallel, it can still potentially lock
# the boot process. Don't file bugs about this unless you can supply
# patches that fix it without breaking other things!
Comment 12 SpanKY gentoo-dev 2011-11-26 18:27:49 UTC
(In reply to comment #10)

the /run depends don't work as desired.  the "before *" makes it run before localmount.  we'll probably need to add code to the dependency logic.
Comment 13 Ulenrich 2011-11-26 19:16:18 UTC
Really funny: It doesn't want to just "return" in parallel?
The only work migrate-run is when shutdown. But there also is "unlogic"...
Comment 14 Ivan 2011-11-26 22:44:44 UTC
> Can you try with migrate_run in the  boot runlevel and rc_parallel=no?

Just tried.

With commented line "rc_parallel = yes" (equal to uncommented rc_parallel=no) works fine even with migrate-run in the "boot" runlevel

$ /etc/init.d/migrate-run status
 * status: started
$
Comment 15 William Hubbs gentoo-dev 2011-11-27 00:57:19 UTC
rc_parallel has always been considered an unstable feature of openrc.
There was a very clear warning in rc.conf thatsetting rc_parallel=y can
lock up your boot process.

rc_parallel=y is only to be used currently by developers and users who are willing
to test the feature, and bugs against it are not considered release blockers.

I have removed the documentation and setting itself from rc.conf in commit
695f388.

I recommend removing any rc_parallel lines from your rc.conf
unless you are comfortable with this risk.
Comment 16 William Hubbs gentoo-dev 2011-11-27 01:57:46 UTC
Reading over my last comment, I realized that it might have sounded more
harsh than I intended, so sorry about that.

In a nutshell though, it is a known issue that rc_parallel=yes can lock
up your boot process.  Because of this, rc_parallel=yes is not
recommended for general use.
Comment 17 William Hubbs gentoo-dev 2011-11-27 05:14:47 UTC
In reply to comment #12:

Yes, the dependencies for migrate-run are not correct, but that should
be a separate bug.
Comment 18 iGentoo 2011-11-27 09:04:35 UTC
Created attachment 293917 [details, diff]
migrate-run-link-fix.patch
Comment 19 Ulenrich 2011-11-27 15:20:15 UTC
Why can't this script be started before mount-ro as 
/etc/runlevels/shutdown/migrate-run
Why this "start-return" but "stop-do" logics?
Wouldn't it be even better to run killprocs before migrations?

@Alphat-PC, yes wrong direction this old: 
"ln -s /var/run /run"
But why relative path to /run (ln -s ../run /var/run)?
Do you know for sure where we are?
Comment 20 iGentoo 2011-11-27 18:07:31 UTC
Created attachment 293959 [details, diff]
avoid hang in parallel mode.patch

ln -s /var/run /run
is not correct

ln -s /run /var/run
or
ln -s ../run /var/run
is OK.
Comment 21 William Hubbs gentoo-dev 2011-11-27 19:51:03 UTC
In response to comment #19:

We can't run this after killprocs, because if /var is on a separate
partition, it will be unmounted by then.

It is safe to make the links in /var point to the absolute locations
/run and /run/lock. For more information about this, see the article on
/run referred to in bug #361349.

Also, the patch on this comment is incorrect, because we do not care
about the status of /run/lock. It may not even exist the first time we
run this script. /run being a directory and either /var/run or /var/lock
not being links is the only thing we need to worry about to tell us to
perform the migration. However, I will look into breaking up the test so
that we migrate each directory separately.

In response to comment #20:

You are correct, and I will make this change along with separating the
directorie migrations.
Comment 22 Rafał Mużyło 2011-11-28 08:56:46 UTC
I'm not sure if comment 21 already covers it (it seems it does), but migrate-run as it was in openrc 0.9.6 was in conflict with acpid init.d script, as (AFAICT) '/var/run' was created in bootmisc, but acpid is starting before it (it only has 'need localmount use logger' in depend()), resulting in a failure to start.

Though there still might be an issue of improper shutdown to consider.
Comment 23 William Hubbs gentoo-dev 2011-11-28 15:17:57 UTC
All,

I have posted a request for comments to the openrc team regarding
dropping support forr parallel startup. My reasoning is that there is no
way we can test all configurations for all services, and it is also very
possible for a user to write an init script that will hang their system
and not realize it until they attempt a reboot, which is definitely too
late. In other words, this feature is very volatile.

This was posted to the team approximately 24 hourse ago. I will update
this bug with the results of any discussion.
Comment 24 William Hubbs gentoo-dev 2011-11-29 01:21:28 UTC
We have decided that we need to keep rc_parallel and fix it. However, we have also agreed that we should not advertise it in rc.conf until it is fixed.

So, if you are using rc_parallel, be aware that it is a feature that is in development, and definitely should not be used on production systems at this time.

The next release will remove the documentation and setting itself from rc.conf.
Comment 25 Ulenrich 2011-11-30 23:27:31 UTC
I use non-parallel, because I want to enter 
a password for my luks-encrypted /home. 

As a simple desktop user I would like to have a simple conf like
/etc/conf.d/mysql:init_nowait=1
/etc/conf.d/dhcpcd:init_nowait=1
Why rely on a buggy openrc parallel algorythm?
Comment 26 iGentoo 2011-12-01 22:02:02 UTC
How to hang...

# cat /etc/init.d/service
#!/sbin/runscript

description="$RC_SVCNAME"

depend()
{
	:
}

start()
{
	einfo "starting $RC_SVCNAME"
	return 0
}

stop()
{
	einfo "stopping $RC_SVCNAME"
	return 0
}

# ln -s service /etc/init.d/svc1
# ln -s service /etc/init.d/svc2
# ln -s service /etc/init.d/svc3
# ln -s service /etc/init.d/svc4

Remove migrate-run for testing purpose
#rc-update del migrate-run boot
#rc-update add svc1 default
#rc-update add svc2 default

set rc_parallel="YES"

And add the following configuration parameter in /etc/rc.conf, reboot

# dep: svc1 < svc2 < svc1 : no hang
rc_svc1_after="svc2"
rc_svc2_after="svc1"

# dep: svc1 > svc2 > svc1 : no hang
rc_svc1_before="svc2"
rc_svc2_before="svc1"

#rc-update add svc3 default
# dep: svc1 < svc2 < svc3 < svc1 : hang
rc_svc1_after="svc2"
rc_svc2_after="svc3"
rc_svc3_after="svc1"

# dep: svc1 > svc2 > svc3 > svc1 : hang
rc_svc1_before="svc2"
rc_svc2_before="svc3"
rc_svc3_before="svc1"

#rc-update add svc4 default
# dep: svc1 < svc2 < svc3 < svc4 < svc1 : hang
rc_svc1_after="svc2"
rc_svc2_after="svc3"
rc_svc3_after="svc4"
rc_svc4_after="svc1"

# dep: svc1 > svc2 > svc3 > svc4 > svc1 : hang
rc_svc1_before="svc2"
rc_svc2_before="svc3"
rc_svc3_before="svc4"
rc_svc4_before="svc1"

# dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc3 > svc2 : no hang
# dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 > svc3 : no hang
rc_svc1_before="svc2"
rc_svc2_before="svc3"
rc_svc3_before="svc4"
rc_svc4_before="svc1"
rc_svc3_before="svc2" #rc_svc4_before="svc3"

# dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 > svc2 :
# svc2, svc3, svc4: hang
rc_svc1_before="svc2"
rc_svc2_before="svc3"
rc_svc3_before="svc4"
rc_svc4_before="svc1"
rc_svc4_before="svc2"

# dep: svc1 > svc2 > svc3 > svc4 > svc1 && svc4 < svc2 : hang.
rc_svc1_before="svc2"
rc_svc2_before="svc3"
rc_svc3_before="svc4"
rc_svc4_before="svc1"
rc_svc4_after="svc2"
Comment 27 iGentoo 2011-12-01 22:03:22 UTC
Created attachment 294457 [details, diff]
remove_wrong_dependencies.patch
Comment 28 William Hubbs gentoo-dev 2011-12-01 22:23:54 UTC
Another update for this bug:

We are looking at moving the code that handles migrating /var/{lock,run}
to /run into bootmisc, which will remove the separate service for doing
this.

I don't know whether we want to accept the patch in comment #27 or not;
I'll let others on the team look it over and add their thoughts.

My first thought is that we should catch this when the dependency tree
is being constructed and some how warn the user.
Comment 29 iGentoo 2011-12-01 22:30:06 UTC
We do rc_waitpid(pid) in non-parallel, so we don't hang with wrong
dependencies.
Is it right?
Comment 30 Casper Ti. Vector 2011-12-03 12:28:14 UTC
(In reply to comment #28)

I think openrc can automatically disable the parallel execution when circular dependency is detected. (Of course a warning should be issued to the user when this happens.)
Comment 31 Andreas Sturmlechner gentoo-dev 2011-12-10 21:39:37 UTC
rc_parallel was removed from rc.conf in 0.9.7 though it is still recognized (and currently working, for me at least).
Comment 32 Alon Bar-Lev 2012-01-25 21:18:37 UTC
I think that the parallel feature is one of the indicator of healthy init layout implementation.
A lot of effort was invested in making it happen.
Giving it up is not the solution.
Comment 33 Christian Ruppert (idl0r) gentoo-dev 2012-01-25 21:45:44 UTC
(In reply to comment #32)
> I think that the parallel feature is one of the indicator of healthy init
> layout implementation.
> A lot of effort was invested in making it happen.
> Giving it up is not the solution.

We're not going to give it up of course! ;)
Comment 34 Xuefer 2012-10-12 08:13:14 UTC
i'm not sure if i should file another bug report for hwclock issue. i just added clock_adjfile="YES" to /etc/conf.d/clock. the settings is available but was not listed in the config file, i think the dev tried to hide this bug by not listing it in default config file as an example.

when set to yes, the following dependency happens:
hwclock: use root, provide clock
root: before fsck
fsck: use clock

the whole boot up is screwed up: "root" script is ran in runlevel 3 while it should be in "boot" level. sometimes it failed to remount / in rw mode (or /etc/init.d/root is not even started?). and when rebooting, it hang after "Deactivating swap devices"

it sure is circular dependency. and any package can add init.d script that create circular dependency

please fix openrc to perform better under this situation. if it cannot do the job, it should warn about it, when caching dependencies. and there should be a way to list dependency to see what will happen before running into a fatal boot process
multiple bugs have been reported relative to dependency issue, parallel booting, maybe openrc should reconsider in a design level, not just by checking the
Comment 35 Xuefer 2012-10-12 08:18:36 UTC
multiple bugs have been reported relative to dependency issue, parallel booting, maybe openrc should reconsider in the design level beside patching the code, to make it logically sane
Comment 36 William Hubbs gentoo-dev 2012-10-12 14:15:41 UTC
(In reply to comment #34)
> i'm not sure if i should file another bug report for hwclock issue. i just
> added clock_adjfile="YES" to /etc/conf.d/clock. the settings is available
> but was not listed in the config file, i think the dev tried to hide this
> bug by not listing it in default config file as an example.

This is probably not documented because it is still considered testing, or Roy just forgot to document it. I'm sure it wasn't a deliberate attempt to hide a bug.

To answer your question, yes, I would file a separate bug.

(In reply to comment #35)
> multiple bugs have been reported relative to dependency issue, parallel
> booting, maybe openrc should reconsider in the design level beside patching
> the code, to make it logically sane

As has been said before, patches, proposals, etc are welcome. :-)
Comment 37 Dmitry Yu Okunev 2014-01-19 12:41:18 UTC
Hello.

Seems I've fixed the bug [1].

[1] https://github.com/xaionaro/openrc/commit/a0899e2fd03cb78ac6c8b084cc6fa80b7c3eca8f

Sorry for my coding style, I've never looked into OpenRC code before. Tell me what should I polish in the code.
Comment 38 Dmitry Yu Okunev 2014-01-19 12:42:39 UTC
With the patch:

d[16:35:44] [root@imperium /srv/lxc/gentoo/rootfs/etc]# lxc-start -n gentoo
INIT: version 2.88 booting
 * The 'rc' applet is deprecated; please use 'openrc' instead.

   OpenRC 0.13.de18640 is starting up Linux 3.12.8 (x86_64) [LXC]

 * /proc is already mounted
 * /run/openrc: creating directory
 * /run/lock: creating directory
 * /run/lock: correcting owner
 * Caching service dependencies ... [ ok ]
 * The 'rc' applet is deprecated; please use 'openrc' instead.
mtab             | * Updating /etc/mtab ...tmpfiles.setup   | * setting up tmpfiles.d entries ... [ ok ]
 [ ok ]
INIT: Entering runlevel: 3
 * The 'rc' applet is deprecated; please use 'openrc' instead.
 *  * runscript is deprecated; please use openrc-run instead.
 *  * runscript is deprecated; please use openrc-run instead.
svc2             | * ERROR: svc2 failed to start. Dependencies loop.
svc4             |runscript is deprecated; please use openrc-run instead.
runscript is deprecated; please use openrc-run instead.
svc1             | * ERROR: svc4 failed to start. Dependencies loop.
 * ERROR: svc1 failed to start. Dependencies loop.
svc3             | * ERROR: svc3 failed to start. Dependencies loop.
local            | * Starting local
local            | [ ok ]

gentoo login: root (automatic login)
gentoo ~ # rc-status 
Runlevel: default
 svc1
 [  stopped  ]
 svc2
 [  stopped  ]
 svc3
 [  stopped  ]
 svc4
 [  stopped  ]
 local
 [  started  ]
Dynamic Runlevel: hotplugged
Dynamic Runlevel: needed
Dynamic Runlevel: manual
gentoo ~ #
Comment 39 William Hubbs gentoo-dev 2014-01-21 16:13:45 UTC
*** Bug 498764 has been marked as a duplicate of this bug. ***
Comment 40 Dmitry Yu Okunev 2014-01-23 08:34:06 UTC
Also I've added early loop detector.

It will try to solve loops on stage of "rc-update -u" and print every found loop with ewarn() if the loop is solved and eerror() if cannot solve it.

Please review my fixes.
Comment 41 Dmitry Yu Okunev 2014-01-23 09:22:10 UTC
Here's an example of "rc-update -u" work:

d[13:22:07] [root@imperium /home/xaionaro]# rc-update -u
 * Caching service dependencies ...
 * Found a dependencies loop: bootmisc -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on termencoding from the cache.
 * Found a dependencies loop: bootmisc -> localmount -> mtab -> root -> bootmisc. Trying to solve it by removing use/before dependencies of localmount on mtab from the cache.
 * Found a dependencies loop: bootmisc -> localmount -> fsck -> clock -> bootmisc. Trying to solve it by removing use/before dependencies of fsck on clock from the cache.
 * Found a dependencies loop: bootmisc -> keymaps -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on keymaps from the cache.
 * Found a dependencies loop: bootmisc -> consolefont -> termencoding -> root -> bootmisc. Trying to solve it by removing use/before dependencies of bootmisc on consolefont from the cache.
 * Found a dependencies loop: localmount -> modules -> localmount. Trying to solve it by removing use/before dependencies of localmount on modules from the cache.                                            [ ok ]
d[13:22:09] [root@imperium /home/xaionaro]#
Comment 42 Benda Xu gentoo-dev 2019-07-09 10:40:44 UTC
This bug is has been turned into a feature request to upstream the boot dependency loop solver implemented for Debian.

The present status: in Debian the loop solver by Okunev worked perfectly well for more than 5 years.