109803 – net-misc/ntp: ntp-client init script fails to start in default runlevel, but can start later

Bug 109803 - net-misc/ntp: ntp-client init script fails to start in default runlevel, but can start later

Summary: net-misc/ntp: ntp-client init script fails to start in default runlevel, but ...

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	x86 Linux

Importance:	High normal
Assignee:	SpanKY

URL:
Whiteboard:
Keywords:

Duplicates (2):	139845 140172 (view as bug list)
Depends on:	94668
Blocks:
	Show dependency tree

Reported:	2005-10-19 06:21 UTC by Eric Brown
Modified:	2007-01-09 17:48 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
patch for the init script on files/ntp-client.rc (ntp-client.patch,721 bytes, patch) 2006-03-01 08:52 UTC, Eric Brown	Details \| Diff
here's the correct patch file... (ntp.patch,721 bytes, patch) 2006-03-01 08:59 UTC, Eric Brown	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Eric Brown 2005-10-19 06:21:39 UTC

When I reboot my machine, ntp-client fails to start (something about line 33,
killed).  If I run /etc/init.d/ntp-client after the system is ready, it will run
without any problems.  I think it's possibly related to the timing or deps of
the init script. 

On a related note, I am also opening a bug about netmount behaving similarly.

Reproducible: Always
Steps to Reproduce:
1. reboot
2.
3.

Actual Results:  
ntp-client fails to start

Expected Results:  
ntp-client should have started

sys-apps/baselayout-1.11.13-r1


Portage 2.0.51.22-r3 (hardened/x86/2.6, gcc-3.3.6, glibc-2.3.5-r2,
2.6.11-hardened-r15 i686)
=================================================================
System uname: 2.6.11-hardened-r15 i686 Intel(R) Xeon(TM) CPU 3.00GHz
Gentoo Base System version 1.6.13
dev-lang/python:     2.3.5-r2, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.15.92.0.2-r10
sys-devel/libtool:   1.5.20
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config
/usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -march=pentium4 -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://gentoo.chem.wisc.edu/gentoo"
MAKEOPTS="-j8"
PKGDIR="/usr/portage//packages/x86/"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage/"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://raptor.magbank.com/gentoo-portage"
USE="x86 berkdb crypt curl doc fam fastcgi gif hardened imap jpeg ldap libclamav
maildir mailwrapper mmx nfsv4 nptl nptl-only pam pcre perl pic png postgres
python readline samba sasl sse ssl tcpd tiff unicode vhosts zlib userland_GNU
kernel_linux elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS

Comment 1 SpanKY gentoo-dev

2005-10-19 07:08:56 UTC

sounds like a timeout issue

use a faster/closer server and/or increase the timeout option in the ntp-client
conf.d file

Comment 2 Eric Brown 2005-11-29 12:12:35 UTC

I don't think this is a timeout issue, the time server is on my local network,
and it returns immediately when I run /etc/init.d/ntp-client start. 
Furthermore, conf.d/ntp-client has the timeout set to 10 seconds, which is well
above the default 1 second timeout for lan queries, according to the man page
for ntpdate.

It's almost like the script isn't being run on the default runlevel at all even
though rc-update -s shows it there.

Comment 3 Eric Brown 2005-11-29 12:27:11 UTC

I just posted another buy related to snort hostname resolution failing in the
default runlevel, but not when I log in to start it (113935).  I think it's
highly likely that the source of this ntp-client bug is a name resolution
problem during startup as well (same machine).  There must be a network related
dependency that's not fully ready when ntp-client and snort start up.

Comment 4 Eric Brown 2005-11-29 12:40:12 UTC

changing conf.d/rc strict net checking to "yes" did not help.  Neither did
turning off parallel startups...

Comment 5 Donald R. Gray Jr 2005-12-09 20:25:42 UTC

have you tried replacing:

     need net

with: 

     need net dns

on those init scripts, or adding those hosts names/ipaddress to 
your /etc/hosts?

It sounds to me like they are trying to start before name resolution is 
working properly.

If at all possible you should add those hosts to the config files in numerical 
IP form so you dont need dns.

Comment 6 Martin d'Anjou 2005-12-24 09:34:54 UTC

I have the same problem. The error message in /var/log/messages is:
Dec 21 16:56:13 tux rc-scripts: Please edit /etc/conf.d/ntp-client
Dec 21 16:56:13 tux rc-scripts: Unable to locate the client command ntpdate!

Adding need net or other as suggested above to depends() did not help.

The only way I could get it to work reliably was to remove the "which" command from /etc/init.d/ntp-client:

checkconfig() {
   if [ ! -x "${NTPCLIENT_CMD}" ] ; then
...

And to set the NTPCLIENT_CMD in /etc/conf.d/ntp-client to have the full path:
NTPCLIENT_CMD="/usr/sbin/ntpdate"

No other init script has had this problem on my system. People on the forums have said this is a problem with pam login, however I have no idea whether this claim correct or not. See http://forums.gentoo.org/viewtopic-t-366490-highlight-unable+locate+client+ntpdate.html

Comment 7 Eric Brown 2006-01-04 21:23:10 UTC

I think this could be related to bug 94668.  The machine I have, and the one in that bug are Dell machines with e1000 NICs.  This is a pretty common nic, and matt also mentioned that Dell sent out a warning about delays in intel gigE nic init times..

If this is a NIC driver error, we might want to at least establish a recommended way to remedy this situation here with some extra lines in start() for those nics...

Someone has already suggested while (ping ***); in start()...

Comment 8 SpanKY gentoo-dev

2006-01-04 21:34:49 UTC

not a ntp-specific issue

Comment 9 Eric Brown 2006-03-01 08:51:02 UTC

I just revisited this problem and I found the cause:

The e1000 nic is NOT at fault here. The init script is using a weird way to detect successful completion of the ntp command that looks like this:

    ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null &
    local pid=$!
    (sleep ${NTPCLIENT_TIMEOUT:-30}; kill -9 ${pid} >&/dev/null) &
    wait ${pid}
    eend $? "Failed to set clock"

To me, it looks like it tries to run ntpdate in the background and grab the pid.  The issue here is that when ntpdate completes, the pid no longer exists, and because ntpdate completes almost instantaneously on my machine, the script has problems because it expects to be able to use the pid a few lines down.

Is the init script designed like this because other ntpclient_cmd's will stay running until killed?  Did the way ntpdate runs change?

Anyway, here's my solution in start():

start() {
    checkconfig || return $?

    ebegin "Setting clock via the NTP client '${NTPCLIENT_CMD}'"

    if [ "${NTPCLIENT_CMD}" == "ntpdate" ]
    then
        ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null
        eend $? "Failed to set clock"
    else
        ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null &
        local pid=$!
        (sleep ${NTPCLIENT_TIMEOUT:-30}; kill -9 ${pid} >&/dev/null) &
        wait ${pid}
        eend $? "Failed to set clock"
    fi
}

I will also upload the patch and file..

Comment 10 Eric Brown 2006-03-01 08:52:59 UTC

Created attachment 81042 [details, diff]
patch for the init script on files/ntp-client.rc

This is the patch for the init script that adds a check on NTPCLIENT_CMD to use different code for ntpdate...

Comment 11 Eric Brown 2006-03-01 08:58:14 UTC

Comment on attachment 81042 [details, diff]
patch for the init script on files/ntp-client.rc

--- ntp-client.rc       2006-03-01 11:57:16.000000000 -0500
+++ ntp-client  2006-03-01 11:57:00.000000000 -0500
@@ -27,9 +27,16 @@
        checkconfig || return $?

        ebegin "Setting clock via the NTP client '${NTPCLIENT_CMD}'"
-       ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null &
-       local pid=$!
-       (sleep ${NTPCLIENT_TIMEOUT:-30}; kill -9 ${pid} >&/dev/null) &
-       wait ${pid}
-       eend $? "Failed to set clock"
+
+       if [ "${NTPCLIENT_CMD}" == "ntpdate" ]
+       then
+               ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null
+               eend $? "Failed to set clock"
+       else
+               ${NTPCLIENT_CMD} ${NTPCLIENT_OPTS} >/dev/null &
+               local pid=$!
+               (sleep ${NTPCLIENT_TIMEOUT:-30}; kill -9 ${pid} >&/dev/null) &
+               wait ${pid}
+               eend $? "Failed to set clock"
+       fi
 }

Comment 12 Eric Brown 2006-03-01 08:58:51 UTC

Comment on attachment 81042 [details, diff]
patch for the init script on files/ntp-client.rc

Sorry, it' backwards...

Comment 13 Eric Brown 2006-03-01 08:59:53 UTC

Created attachment 81043 [details, diff]
here's the correct patch file...

this patch works like.. patch -p0 original-ntp-init-script < ntp.patch

Comment 14 SpanKY gentoo-dev

2006-03-01 16:54:36 UTC

Comment on attachment 81043 [details, diff]
here's the correct patch file...

either we sleep or we dont, there is no in between

Comment 15 Steve Herber 2006-03-14 12:19:42 UTC

I have two points to make about this problem.
The first is a confirmation, my init script ran when the -u ntp:ntp option was removed from the ntpd.conf file.

Second is about error reporting.  When I ran strace against the ntp startup, I could see the error message saying that -u was not a recognized flag.  Where do such errors go?  Is this a problem with the ntpd init.d script or is this a problem with all the init.d scripts?  Thank you.

Comment 16 Miroslaw Mieszczak 2006-05-17 10:24:14 UTC

I have the same problem even with baselayout 1.11.xxx or 1.12.xxxx.

The problem is that system try to start ntp-client before starting of network interfaces (ethx or athx or rax or whatever else).
It seems that it use for dependency net.lo, and when net.lo is started, system try to start the client.

Comment 17 Eric Brown 2006-05-17 14:37:13 UTC

It's really just the sleep issue on my system, not the network interface timing...

Comment 18 Jakub Moc (RETIRED) gentoo-dev

2006-07-10 02:53:27 UTC

*** Bug 139845 has been marked as a duplicate of this bug. ***

Comment 19 Jakub Moc (RETIRED) gentoo-dev

2006-07-13 00:00:28 UTC

*** Bug 140172 has been marked as a duplicate of this bug. ***

Comment 20 kavol 2006-07-13 03:04:42 UTC

hi people,

it seems to me that the comment #16 here, bug #139845 and bug #140172, which talk about the same, are not related to the problem which is discussed in this bugreport, so I believe the other bugreports should not be marked as duplicate of this - if I am wrong, please, could somebody explain to me the common points?

furthermore, I would like to ask somebody more experienced with Gentoo startup scripts to investigate whether the problem with ignorance of the setting 
RC_NET_STRICT_CHECKING="no" is ntp-client specific?

- thanks

Comment 21 SpanKY gentoo-dev

2007-01-09 17:48:39 UTC

nothing here is ntp-specific