76761 – net-fs/am-utils - /etc/init.d/amd needs to wait for the daemon to actually stop

Bug 76761 - net-fs/am-utils - /etc/init.d/amd needs to wait for the daemon to actually stop

Summary: net-fs/am-utils - /etc/init.d/amd needs to wait for the daemon to actually stop

Status:	RESOLVED NEEDINFO

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	High normal
Assignee:	Network Filesystems

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2005-01-05 07:17 UTC by Sergio Gelato
Modified:	2007-04-02 06:43 UTC (History)
CC List:	2 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Sergio Gelato 2005-01-05 07:17:50 UTC

Running /etc/init.d/amd restart nearly always fails in my environment; the reason is that the daemon needs a few seconds to release resources, and the 
init script tries to restart it without waiting. 
Running /etc/init.d/amd start a few seconds later invariably succeeds.

How to fix:

One might be tempted to use the --retry feature of start-stop-daemon for this, except that the appropriate delay is very dependent on the network environment.
That's probably why the Debian init script loops with "kill -0" once a second until the daemon has exited or 120 seconds have elapsed.

Reproducible: Always
Steps to Reproduce:
1. Set up the automounter with a few maps for remote NFS mounts.
2. Use the system normally, triggering some mounts in the process.
3. /etc/init.d/amd restart

Actual Results:  
 * Stopping amd...                                                        [ ok ] 
 * Starting amd...
 * Failed to start amd                                                    [ !! ]


Expected Results:  
 * Stopping amd...                                                        [ ok ] 
[including a pause until the daemon has actually stopped]
 * Starting amd...                                                        [ ok ]

Comment 1 splite 2005-06-07 14:00:28 UTC

I had the same problem, but fixed it by adding "-R 10" to the start-stop-daemon
line.  10 seconds is enough for my particular setup.

Following the Debian example, why not use "--retry TERM/120/KILL" as a more
general solution?

Comment 2 Sergio Gelato 2005-06-17 04:54:25 UTC

(In reply to comment #1)
Debian itself does not use --retry TERM/120/KILL . I think that is because
start-stop-daemon would then wait the full 120 seconds, which is far longer than
necessary in most cases.

Comment 3 splite 2005-09-14 13:27:35 UTC

No, it would wait up to 120 seconds for the daemon to exit after sending
SIGTERM.  If amd exits before then, start-stop-daemon would return immediately.
 That seems to me to be the same behavior you describe Debian as having.

While I'm at it, am-utils requires the portmapper to be running, so "need
localmount" in /etc/init.d/amd should be "need localmount portmap".

Comment 4 Sergio Gelato 2005-09-15 03:51:48 UTC

Now that I looked at the source code for start-stop-daemon I see that indeed
they have a smart polling algorithm to detect when the daemon has exited. I
don't know why, then, Debian's /etc/init.d/am-utils script doesn't use this;
historical reasons perhaps.

And since you mention the portmap issue, I think Gentoo is still lacking the
patch I contributed to Debian some time ago to tcp-wrap the RPC listener in amd.
(I don't like it when anyone on the network can use amq -x to change the log
options...)

Comment 5 Aron Griffis (RETIRED) gentoo-dev

2006-02-06 15:07:54 UTC

How about posting the patches you'd like to see added?  That might help to see this bug resolved eventually ;-)

Comment 6 Jakub Moc (RETIRED) gentoo-dev

2007-04-01 21:48:59 UTC

Get back to us.

Comment 7 Sergio Gelato 2007-04-02 06:43:20 UTC

If you wish; however, we no longer run Gentoo and we've switched from
amd to autofs, so I'm no longer in a position to test anything.

The request in comment #5 is a red herring. The tcp-wrapping patch is a
feature enhancement, not a fix for this bug; I mentioned it only as an
aside. Comments 1-3 contain all the information needed to fix the
bug.