Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 566972 - net-misc/chrony-2.2 init script shouldn't specify --background
Summary: net-misc/chrony-2.2 init script shouldn't specify --background
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Jeroen Roovers (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-27 10:01 UTC by peter@prh.myzen.co.uk
Modified: 2015-12-04 06:05 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description peter@prh.myzen.co.uk 2015-11-27 10:01:58 UTC
On my 2-core, 32-bit Atom box, starting chronyd with rtcsync in chrony.conf causes these errors:

# /etc/init.d/chronyd start
 * Starting chronyd ...
 * start-stop-daemon: caught an interrupt
 * start-stop-daemon: /usr/sbin/chronyd died
 * Failed to start chronyd                                          [ !! ]
 * ERROR: chronyd failed to start

But a chronyd process is actually running:

20169 ?        S      0:00 /usr/sbin/chronyd -f /etc/chrony/chrony.conf

The problem appears to be caused by /etc/init.d/chronyd including the parameter --background in its call to start-stop-daemon. This conflicts with chronyd's own forking into the background, and sometimes the foreground process exits before start-stop-daemon finishes its work, so it thinks the process has died.

Please change the init script to omit --background. My tests show that chronyd works well without it.

Thanks.
Comment 1 peter@prh.myzen.co.uk 2015-11-27 10:24:10 UTC
(In reply to Peter Humphrey from comment #0)

I forgot to include a reference to my discussion with the chrony maintainer, which is here:

http://news.gmane.org/gmane.comp.time.chrony.user

The thread is "What's changed in chrony-2.2?"
Comment 2 Holger Hoffstätte 2015-11-30 14:18:15 UTC
Removing --background slows down the boot sequence significantly since chrony tries to access remote servers (DNS lookups might fail), get iburst samples etc. before returning. One could argue that this is better than failing - it is! - but maybe we can find a way to have fast startups and a non-racy init script.
I realize that one could also make the argument for intentionally blocking the boot sequence until chronyd has started in order to guarantee somewhat consistent time from that point on. This seems like a lose/lose situation. :/
Comment 3 Holger Hoffstätte 2015-11-30 14:43:58 UTC
Peter, how about adding -n to $ARGS so that chronyd itself does not daemonize?
This works for me in combination with --background without any apparent downsides.
Comment 4 Jeroen Roovers (RETIRED) gentoo-dev 2015-12-01 03:36:27 UTC
(In reply to Peter Humphrey from comment #0)
> On my 2-core, 32-bit Atom box, starting chronyd with rtcsync in chrony.conf
> causes these errors:
> 
> # /etc/init.d/chronyd start
>  * Starting chronyd ...
>  * start-stop-daemon: caught an interrupt
>  * start-stop-daemon: /usr/sbin/chronyd died
>  * Failed to start chronyd                                          [ !! ]
>  * ERROR: chronyd failed to start
> 
> But a chronyd process is actually running:
> 
> 20169 ?        S      0:00 /usr/sbin/chronyd -f /etc/chrony/chrony.conf
> 
> The problem appears to be caused by /etc/init.d/chronyd including the
> parameter --background in its call to start-stop-daemon. This conflicts with
> chronyd's own forking into the background, and sometimes the foreground
> process exits before start-stop-daemon finishes its work, so it thinks the
> process has died.

I can only assume my systems aren't fast enough to see this race.

> Please change the init script to omit --background. My tests show that
> chronyd works well without it.

Well, there is some contention now, it seems.
Comment 5 peter@prh.myzen.co.uk 2015-12-01 09:49:45 UTC
(In reply to Holger Hoffstätte from comment #2)
> Removing --background slows down the boot sequence significantly since
> chrony tries to access remote servers (DNS lookups might fail), get iburst
> samples etc. before returning. One could argue that this is better than
> failing - it is! - but maybe we can find a way to have fast startups and a
> non-racy init script.

I don't see any of that delay here. The boot sequence just whizzes through as usual. It seems to me that the processing you describe must be done by the child process, not the initially called program.
Comment 6 Holger Hoffstätte 2015-12-01 10:48:33 UTC
(In reply to Peter Humphrey from comment #5)
> (In reply to Holger Hoffstätte from comment #2)
> > Removing --background slows down the boot sequence significantly since
> > chrony tries to access remote servers (DNS lookups might fail), get iburst
> > samples etc. before returning. One could argue that this is better than
> > failing - it is! - but maybe we can find a way to have fast startups and a
> > non-racy init script.
> 
> I don't see any of that delay here. The boot sequence just whizzes through
> as usual. It seems to me that the processing you describe must be done by
> the child process, not the initially called program.

I thought it could be due to having -s -r enabled on my side (rtc reading, reloading history), so I just tested again without those two options and --background removed, and it takes 12-14 seconds to return, despite syncing from an inhouse upstream server. I don't reboot often, but that's a long time.

I strace'd starting chronyd directly but didn't learn anything new other than that it waits for the child to start.

When I add -n to $ARGS (unconditionally in setxtrarg() before returning) and keep --background, the init script always finishes fast (obviously) with no noticeable delay. So this seems to be a generally more consistent way to not rely on system/HW/configuration specific effects. The only downside to this is that configuration errors will only be detected after the fact, but that doesn't seem to have been a problem for anyone to date (?).
I don't know what Gentoo's policy is on this, if there is one; other services do this too.

While it's odd that nobody else seems to have encountered this race to date, I agree that double-daemonizing is not good, so this discussion is useful in any case. :)
Comment 7 Holger Hoffstätte 2015-12-01 12:15:48 UTC
Problem solved! \o/

As I found out & reported on the chrony-user list (http://article.gmane.org/gmane.comp.time.chrony.user/1229) this was due to my use of initstepslew in addition to the traditional RTC driver instead of the Linux-specific rtcsync mode.

With those changes removing --background - as proposed by Peter - works fine.

Esp. initstepslew *must* block because that's its purpose. Since it's not a default option we can go ahead with simply removing --background from the init script.
Comment 8 peter@prh.myzen.co.uk 2015-12-01 12:41:13 UTC
I'm glad Holger's found the real differences between his setup and mine.

My only remaining puzzle is about why no-one else has reported this. Perhaps it's because this Atom box has a fairly fast SSD together with the slow processor (the original spinning disk died a year ago).

It might be a good idea for the postinst to point to the new documents and caution against blindly copying yesterday's config file into today's. In my case rtcsync does the job, as in the installation default, and I have set hwclock and swclock to do nothing; likewise the kernel options to synchronise the clock are switched off. Chrony does a fine job on its own.
Comment 9 Jeroen Roovers (RETIRED) gentoo-dev 2015-12-04 06:05:20 UTC
Fixed in 2.2-r1 and 9999.