Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 481064

Summary: sys-apps/openrc-0.11.8 tries to start services multiple times at boot, although if the first attempt failed
Product: Gentoo Hosted Projects Reporter: Thomas Deutschmann (RETIRED) <whissi>
Component: OpenRCAssignee: OpenRC Team <openrc>
Status: UNCONFIRMED ---    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on: 481672    
Bug Blocks:    
Attachments: rc.log without any modifications (text version from first picture)
rc.log without "before net" (text version from second picture)
rc.log from openrc-0.12
/run/openrc/deptree as requested

Description Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-14 16:29:50 UTC
Hi,

while working on an init script and testing its behavior in error conditions I noticed that Gentoo tried to start the service at least two times:

http://f.666kb.com/i/cgnpw633nbwou3dwt.jpg

This is unexpected, at least for me.

The failing service's depend function:
depend() {
	need localmount
	before net
	after bootmisc ipset tmpfiles.setup ulogd
}

# rc-update show |grep shorewall-init
       shorewall-init |      boot

So the service was only installed in "boot".

# /etc/init.d/shorewall-init ineed
fsck localmount 

# /etc/init.d/shorewall-init needsme

# /etc/init.d/shorewall-init iafter
hwclock sysctl bootmisc tmpfiles.setup 

# /etc/init.d/shorewall-init ibefore
local net.eth0

Well, someone may think this is net.* related. But if you remove the "before net" line Gentoo still tries to start the service twice:

http://f.666kb.com/i/cgnpo1lvnqol9lh65.jpg

So I think it's not.

I guess this time, the second start is triggered by the "local" service, which has a "after *" line in its depend function.

I asked in #gentoo-dev-help if this is a wanted behavior from Gentoo, to start a recently failed service again, but nobody was really sure. I created this bug report to get clarification from the OpenRC team.

It is not a real problem (for me), it is more a cosmetic problem, because you see a failing service twice. And if you don't expect that Gentoo will restart a service you may think "Hu? What's going on..." ... and remember, I am testing in error conditions: Normally, you shouldn't see an error at all, because the service will start on the first attempt.

But this can be a problem for services, which will clean/move something (like bootmisc): Imagine Gentoo tries to start bootmisc again just before local, but after all the other services were started, because it failed on boot...

Further observations:
It seems like it is runlevel related. If I move the init script from boot to default, I don't see a second start attempt (maybe because Gentoo knows "This service already failed in the current runlevel" and don't try it again?).



Reproducible: Always
Comment 1 Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-14 16:42:27 UTC
Created attachment 356010 [details]
rc.log without any modifications (text version from first picture)

I attached a text version of the first picture (rc.log).
Comment 2 Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-14 16:43:41 UTC
Created attachment 356012 [details]
rc.log without "before net" (text version from second picture)

I attached a text version from the second picture (rc.log).
Comment 3 William Hubbs gentoo-dev 2013-08-15 15:46:58 UTC
Can you please upgrade to OpenRC-0.12 and let me know if this is still
an issue?

Thanks,

William
Comment 4 Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-15 15:58:44 UTC
Created attachment 356092 [details]
rc.log from openrc-0.12

Hi,

still the same with openrc-0.12. I attached a new, complete rc.log:

shorewall-init is set to start at boot runlevel (and only at boot runlevel), but it isn't configured, so the services denies to start.

When the system enters runlevel default, Gentoo tries to start shorewall-init again, because shorewall-init should start before net (see the depend() function from #1) and this is still unexpected, at least for me.
Comment 5 Alexander Vershilov (RETIRED) gentoo-dev 2013-08-15 16:05:00 UTC
(In reply to Thomas D. from comment #4)
> Created attachment 356092 [details]
> rc.log from openrc-0.12
> 
> Hi,
> 
> still the same with openrc-0.12. I attached a new, complete rc.log:
> 
> shorewall-init is set to start at boot runlevel (and only at boot runlevel),
> but it isn't configured, so the services denies to start.
> 
> When the system enters runlevel default, Gentoo tries to start
> shorewall-init again, because shorewall-init should start before net (see
> the depend() function from #1) and this is still unexpected, at least for me.

can you also attach a deptree (/run/openrc/deptree)?
Comment 6 Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-15 16:18:10 UTC
Created attachment 356094 [details]
/run/openrc/deptree as requested

/run/openrc/deptree as requested
Comment 7 Alexander Vershilov (RETIRED) gentoo-dev 2013-08-16 05:44:50 UTC
Let me summarize information.

As initscript uses start_pre and fails in it that it's state in stopped. So there is no difference in if this service was started and then failed or not started at all.

After runlevel change openrc tries to start all dependencies: because it shorewall-init is in stopped state it's also started.

To fix this situation you can mark service as failed manually with 'mark_service_failed' function.
Comment 8 Alexander Vershilov (RETIRED) gentoo-dev 2013-08-16 06:09:21 UTC
The other option is introducing RC_AUTOMATIC variable env that variable that will be set when services is started automatically, and will be unset in case manual of a manual run. Then in case of AUTOMATIC start we can mark service not passing start_pre as failed.
Comment 9 Thomas Deutschmann (RETIRED) gentoo-dev 2013-08-19 17:03:45 UTC
I am trying what you said in c7, but it doesn't seem to work, see bug 481672.

For testing I created /etc/init.d/test with the following content:

#!/sbin/runscript

description="Test runscript"

start_pre() {
	einfo "Running start_pre()..."

	# Normally, we would run checkconfig() here
	# which will fail (because the service isn't configured
	# Let us mark the service as failed like suggested by Alexander
	mark_service_failed "${SVCNAME}"
	
	# I think we need to exit here
	$ Otherwise, our failed status might get overwritten by start()
	return 1
}

start() {
	ebegin "I am starting"
	sleep 3
	eend $?
}

stop() {
	einfo "I am stopping"
	sleep 3
	eend $?
}
Comment 10 William Hubbs gentoo-dev 2013-08-19 20:26:01 UTC
(In reply to Thomas D. from comment #9)
> #!/sbin/runscript
> 
> description="Test runscript"
> 
> start_pre() {
> 	einfo "Running start_pre()..."
> 
> 	# Normally, we would run checkconfig() here
> 	# which will fail (because the service isn't configured
> 	# Let us mark the service as failed like suggested by Alexander
> 	mark_service_failed "${SVCNAME}"

You shouldn't need to mark the service as failed because of the return line below. When start_pre fails, the service should be marked failed...

> 	# I think we need to exit here
> 	$ Otherwise, our failed status might get overwritten by start()
> 	return 1
> }
> 
> start() {
> 	ebegin "I am starting"
> 	sleep 3
> 	eend $?
> }
> 
> stop() {
> 	einfo "I am stopping"
> 	sleep 3
> 	eend $?
> }
Comment 11 Alexander Vershilov (RETIRED) gentoo-dev 2013-08-20 05:45:01 UTC
(In reply to William Hubbs from comment #10)
> When start_pre fails, the service should be marked failed...

The problem is that failed mark is removed on runlevel change.
Comment 12 William Hubbs gentoo-dev 2013-08-20 15:29:39 UTC
Does any other init script have a dependency on shorewall-init?
Comment 13 William Hubbs gentoo-dev 2013-10-19 15:45:38 UTC
I checked with Roy on this a while back, and it is working as designed.
So I guess the question is, is this something we should change or just
document?
Comment 14 Thomas Deutschmann (RETIRED) gentoo-dev 2013-10-23 16:32:42 UTC
The problem is like Alexander said in comment 7, that failed mark is removed on runlevel change, right?

Is there a reason for not keeping the error state across runlevels or was it done without further intention?

If this is really wanted, I would like to hear why. Currently I cannot imagine why this could be wanted. As long as I cannot image why this is wanted I would vote for changing the current design (e.g. keep information about failed services across runlevels), because the current behavior can be dangerous (see comment 0, imagine bootmisc will fail for some reason but re-run later).
Comment 15 William Hubbs gentoo-dev 2014-01-16 16:27:07 UTC
(In reply to Thomas D. from comment #14)
> The problem is like Alexander said in comment 7, that failed mark is removed
> on runlevel change, right?

You are correct about the fail mark being removed on runlevel changes.

> Is there a reason for not keeping the error state across runlevels or was it
> done without further intention?

I'm not really sure, but I do know that this is a risky change, and it does have the potential of breaking backward compatibility, so I don't want to do anything with it until 1.0.

> If this is really wanted, I would like to hear why. Currently I cannot
> imagine why this could be wanted. As long as I cannot image why this is
> wanted I would vote for changing the current design (e.g. keep information
> about failed services across runlevels), because the current behavior can be
> dangerous (see comment 0, imagine bootmisc will fail for some reason but
> re-run later).

Bootmisc will never fail since the start() function always returns 0, so this case will never happen.