I have long-standing problems with my cluster. The cluster nodes bootup too fast after a power-up while the Cisco VPN router takes much longer to boot. The nodes run dhclient but probably timeout too early. Feb 5 17:18:47 node007 r8169: eth0: link up Feb 5 17:18:48 node007 rc-scripts: Configuration not set for eth0 - assuming DHCP Feb 5 17:18:49 node007 dhclient: option_space_encapsulate: option space agent does not exist, but is configured. Feb 5 17:18:49 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 Feb 5 17:18:56 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 7 Feb 5 17:19:03 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 15 Feb 5 17:19:07 node007 r8169: eth0: link down Feb 5 17:19:16 node007 r8169: eth0: link down Feb 5 17:19:18 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8 Feb 5 17:19:18 node007 r8169: eth0: link up Feb 5 17:19:26 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 10 Feb 5 17:19:36 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 11 Feb 5 17:19:47 node007 dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 Feb 5 17:19:50 node007 dhclient: No DHCPOFFERS received. Feb 5 17:19:50 node007 rc-scripts: ERROR: cannot start ntp-client as net.eth0 could not start My /etc/conf.d/net is empty. net-misc/dhcpcd is not installed. # emerge --info Portage 2.1.4.4 (default/linux/amd64/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24-gentoo-r4-default x86_64) ================================================================= System uname: 2.6.24-gentoo-r4-default x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Timestamp of tree: Thu, 05 Feb 2009 16:00:01 +0000 distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] app-shells/bash: 3.2_p33 dev-java/java-config: 1.3.7, 2.1.6 dev-lang/python: 2.4.4-r13, 2.5.2-r8 dev-python/pycrypto: 2.0.1-r6 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.63 sys-devel/automake: 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.2 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 2.2.6a virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="amd64 ~amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -march=nocona -fomit-frame-pointer -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /var/bind /var/qmail/alias /var/qmail/control /var/spool/torque" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-O2 -march=nocona -fomit-frame-pointer -pipe" DISTDIR="/nfslarge/usr/portage/distfiles" FEATURES="distlocks metadata-transfer sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://gentoo.mirror.web4u.cz/" LDFLAGS="-Wl,-O1" LINGUAS="en cs cz" MAKEOPTS="-j1" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/nfslarge/usr/portage" PORTDIR_OVERLAY="/nfslarge/usr/portage/local/layman/sunrise /nfslarge/usr/portage/local" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="R X Xaw3d acl acpi amd64 apache2 bash-completion bcmath berkdb blas boost bzip2 cblas clamav cli colordiff compress cpio cracklib crypt cscope css ctype curl curlwrappers cxx dbus djbfft emboss enblend encode enscript expat fam fortran ftp gcj gd gdbm gif glibc-compat20 glibc-omitfp glitz glut gmp gnuplot gnutls gpm graphviz gs gtk gtkhtml hal icc iconv ifc inifile innodb isdnlog ithreads java javascript jbig jikes jpeg jpeg2k kdtree lapack lcms libedit libwww lzo lzw maildir mhash midi mime ming mjpeg mmap mmx mng mod_python modperl modplug mozilla moznoirc mpeg mpi mpi_njtree mpich2 mudflap mule multilib mxdatetime mysql mysqli ncurses netcdf netpbm network nntp nptl nptlonly numeric openmp pam pcntl pcre pdf perl plotutils png pnm postproc postscript ppds pppd procmail pymol python rar raw readline reflection reiserfs rpm rtc scp seamonkey server session sftp sift smime sndfile sockets spl srt sse sse2 sse3 ssl svg svgz sysfs sysvipc tcl tcpd threads tiff transcode unicode urandom userlocales uuencode vim-syntax vim-with-x wmf xanim xfs xinetd xml xorg xpm xslt xv xvid zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en cs cz" USERLAND="GNU" VIDEO_CARDS="vesa" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
The default timeout in version 3.1.1 (current stable) appears to be 60 seconds. Can't you set up an /etc/dhcp/dhclient.conf that sets a higher timeout?
Yes, I could raise the 60 sec timeout value, but still, the retry option should kick in after 5 minutes. It does not. It appears to me that the interface is shutdown already by the rc script and nothing is ever retried. At least from the logs.
(In reply to comment #2) > Yes, I could raise the 60 sec timeout value, but still, the retry option should > kick in after 5 minutes. It does not. It appears to me that the interface is > shutdown already by the rc script and nothing is ever retried. At least from > the logs. > Basically, dhclient tries to get a lease, if it does not succeed, then it exits, and the init.d script also, leaving the interface in an unconfigured state. Manual intervention (/etc/init.d/net.xxx start) is needed to force the machine into to try to get an address again. This behaviour is unacceptable for any serious server-like machine. Increasing the timeout or the number of retries does not solve the issue, as then the machine will wait in the start script until getting an address. If this never happens, the machine never finishes booting up. For example, we have cable-modem-type Internet connection, where we get the address by DHCP from the ISP. If the service is unavailable for any reason, right when our server boots up, dhclient will never ask again until told so by the administrator.
Please retest w/ 3.1.3_p1, and re-open if needed.