While rebooting/shutting down my system all services are properly shutdown until it gets to dhcpcd, from there the system hangs indefinantly. Manually shutting down the interfaces (/etc/init.d/net.bond0 stop) prior to shutdown/reboot results in a successful shutdown/reboot. Works fine when using a single non bonded interface; ie: eth0 only. Reproducible: Always Steps to Reproduce: 1. install baselayout-2 and openrc 2. bond two or more network interfaces 3. try to reboot or shutdown Actual Results: system doesn't reboot or shutdown properly. Expected Results: a clean shutdown. mike@snafu ~ % emerge --info Portage 2.1.4.4 (default/linux/amd64/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24.5 x86_64) ================================================================= System uname: 2.6.24.5 x86_64 AMD Athlon(tm) 64 FX-57 Processor Timestamp of tree: Sat, 26 Apr 2008 09:30:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 3.2_p17-r1 dev-lang/python: 2.4.4-r9 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 2.0.0 sys-apps/openrc: 0.2.2 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r1 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1 sys-devel/binutils: 2.18-r1 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=athlon-fx -O2 -pipe -msse3" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d" CXXFLAGS="-march=athlon-fx -O2 -pipe -msse3" DISTDIR="/usr/portage/distfiles" FEATURES="ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/portage/local/layman/mpd /usr/portage/local/layman/sunrise /usr/portage/local/layman/mike" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="3dnow 3dnowext X acl alsa amd64 avahi berkdb bzip2 cli cracklib crypt dbus dri fortran gdbm gnome gpm gstreamer gtk hal iconv isdnlog jpeg midi mmx mmxext mudflap multilib ncurses nls nptl nptlonly opengl openmp pam pcre perl png pppd python readline reflection session spl sqlite sse sse2 sse3 ssl svg tcpd unicode vim-syntax xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fglrx" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS --- mike@snafu ~ % grep -v # /etc/conf.d/net dns_domain_lo="setup" config_eth0="null" config_eth1="null" config_eth2="null" slaves_bond0="eth0 eth1 eth2" config_bond0="dhcp" --- mike@snafu ~ % grep -i bonding /etc/conf.d/modules modules_2_6="${modules_2_6} bonding vboxdrv" module_bonding_args_2_6="miimon=100 mode=0" --- eth0 is using the skge (Marvel Yukon) driver. eth1 and eth2 are using the tulip driver. both drivers are compiled into the kernel.
I tried setting "rc_logger="YES" in /etc/rc.conf with the intention of finding a reason why it hangs on shutting down dhcpcd/net.bond0 and found that it consistantly shuts down properly with the rc logger enabled. Although once it's set back to "NO" it repeats the previous behaviour of not shutting down properly.
output of rc.log --- snafu mike # cat /var/log/rc.log rc boot logging started at Sun Apr 27 00:59:12 2008 * Setting system clock using the hardware clock [UTC] ... [ ok ] * Loading module bonding ... [ ok ] * Loading module vboxdrv ... [ ok ] * Autoloaded 2 module(s) * Setting up the Logical Volume Manager ... Locking type 1 initialisation failed. [ ok ] * Setting up dm-crypt mappings ... * Checking swap is not LUKS * dm-crypt map crypt-swap1 ... * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap1 /dev/hdd1 [ ok ] * Running pre_mount commands for crypt-swap1 ... [ ok ] * Checking swap is not LUKS * dm-crypt map crypt-swap2 ... * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap2 /dev/sda1 [ ok ] * Running pre_mount commands for crypt-swap2 ... [ ok ] * dm-crypt map crypt-home ... * cryptsetup will be called with : luksOpen /dev/sys/crhome crypt-home Command successful. [ ok ] * dm-crypt map crypt-tmp ... * cryptsetup will be called with : luksOpen /dev/sys/crtmp crypt-tmp Command successful. [ ok ] * Running pre_mount commands for crypt-tmp ... mke2fs 1.40.6 (09-Feb-2008) [ ok ] [ ok ] * Checking local filesystems ... /dev/sys/root: clean, 8775/128016 files, 194564/512000 blocks If you wish to check the consistency of an XFS filesystem or repair a damaged filesystem, see xfs_check(8) and xfs_repair(8). /dev/hdc1: clean, 37/26104 files, 10380/104388 blocks [ ok ] * Remounting root filesystem read/write ... [ ok ] * Updating /etc/mtab ... [ ok ] * Mounting local filesystems ... [ ok ] * Setting hostname to snafu ... [ ok ] * Configuring kernel parameters ... [ ok ] * Running hdparm on /dev/hdc ... [ ok ] * Running hdparm on /dev/hdd ... [ ok ] * Creating user login records ... [ ok ] * Cleaning /var/run ... [ ok ] * Wiping /tmp directory ... [ ok ] * Restoring Mixer Levels ... XXX write TLV... [ ok ] * Setting terminal encoding [UTF-8] ... [ ok ] * Setting console font [default8x16] ... [ ok ] * Loading key mappings [us] ... [ ok ] * Setting keyboard mode [UTF-8] ... [ ok ] * Bringing up interface lo * 127.0.0.1/8 ... [ ok ] * Adding routes * 127.0.0.0/8 via 127.0.0.1 ... [ ok ] * Mounting USB device filesystem (usbfs) ... [ ok ] * Activating swap devices ... [ ok ] * Initializing random number generator ... [ ok ] rc boot logging stopped at Sun Apr 27 00:59:21 2008 rc default logging started at Sun Apr 27 00:59:21 2008 * Bringing up interface bond0 * Adding slaves to bond0 ... * eth0 eth1 eth2 * Removing addresses * Removing addresses * Removing addresses [ ok ] * dhcp ... * Running dhcpcd ... [ ok ] * received address 192.168.11.4/24 [ ok ] * Starting APC UPS daemon ... [ ok ] * Starting D-BUS system messagebus ... [ ok ] * Starting avahi-daemon ... [ ok ] * Starting avahi-dnsconfd ... [ ok ] * Starting syslog-ng ... [ ok ] * Starting Hardware Abstraction Layer daemon ... [ ok ] * Loading iptables state and starting firewall ... [ ok ] * Mounting network filesystems ... [ ok ] * Starting Music Player Daemon ... [ ok ] * Starting vixie-cron ... [ ok ] * Starting tmp ... [ ok ] * Starting local ... [ ok ] rc default logging stopped at Sun Apr 27 00:59:31 2008
This should have been fixed with this commit. http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc
(In reply to comment #3) > This should have been fixed with this commit. > http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc > Thanks for the reply Roy, Unfortunantly the problem still exists after upgrading, tested over 10 reboots. snafu mike # rc --version rc (OpenRC) git-b6c4a563 (Gentoo Linux)
Roy, any feedback on this issue?
I'm pretty sure this is because the kernel is not de-registering virtual interfaces reliably. I can trivially hang it on my hardened server and will probably stop de-registering the virtual interfaces. This applies to the lot - tap, tun, bridge, bond, etc. Pretty sure it's a kernel bug but it hasn't been "fixed" in years.
Kernel: Any feedback?
Can you please test with gentoo-sources-2.6.27-rX
Roy, how trivially can you reproduce this - what are the commands/events? When the system hangs, does it respond to sysrq combinations? Perhaps we can get a trace from that.
At the time I did this create vif ifconfig vif up ifconfig vif down destroy vif Repeat that loop a few times for tap or bridge and eventually the kernel hangs trying to de-register the interface. This happened nearly 100% of the time with a hardened kernel, less so with gentoo-sources. I did this in single user with no running services (except for udev) to prove to myself it was a kernel issue. I've not tried to actively replicate it, but I'm sure it still happens as when my server was running Gentoo it did sometimes hang stopping the tap0 interface because I sometimes accidently commented out the code to stop the interface from being destroyed in a pre-down function.
I cant find openrc-0.2.2 on portage. Can you try reproduce it with 0.3.0 ? I cant reproduce it on my gentoo machine ( quick testing though ) with 1) gentoo-sources-2.6.26-r4 2) openrc-0.3.0-r1
Instead of trying to reproduce with openrc, you could set up a script that does what is described in comment #10, and run it overnight or something
(In reply to comment #12) > Instead of trying to reproduce with openrc, you could set up a script that does > what is described in comment #10, and run it overnight or something > I'm currently running this, on 2.6.27-r4: #! /bin/sh for run in $(seq 1 100000); do echo "Run #$run" brctl addbr vbr$run tunctl -t vif$run ifconfig vif$run up brctl addif vbr$run vif$run ifconfig vbr$run 30.30.30.30 up ifconfig vbr$run down brctl delif vbr$run vif$run ifconfig vif$run down tunctl -d vif$run brctl delbr vbr$run done ------- so far, no hang, will test on 2.6.26-rX after that...
(In reply to comment #13) > #! /bin/sh > > for run in $(seq 1 100000); do > > echo "Run #$run" > > brctl addbr vbr$run > tunctl -t vif$run > ifconfig vif$run up > brctl addif vbr$run vif$run > ifconfig vbr$run 30.30.30.30 up > ifconfig vbr$run down > brctl delif vbr$run vif$run > ifconfig vif$run down > tunctl -d vif$run > brctl delbr vbr$run > > done Oops, I forgot. You can get tunctl by emerging usermode-utilities or openvpn, and brctl comes with bridge-utils. Kernel must be set up with at least CONFIG_TUN=m (y if you want it in the kernel) for TUN/TAP devices (virtual interfaces) and CONFIG_BRIDGE=m(or y...) for bridging interfaces.
(In reply to comment #14) > [snip] After 26 (!) hours of looping, I finally got my script stuck (with vif447429...) Seems really hard to reproduce, but this time, it is completely stopped. I attach the task dump right away.
Created attachment 174835 [details] Dmesg with task dump
Any news here? Does this still occur with the latest version of openrc?
Actually with the new network script for openrc-0.5.0 (almost ready for release) this should be fixed in a round about way - it's upto the user if the want the interface destroyed or not :) But I'm currently unsure if Gentoo will go with the new script, or provide a USE flag or something.
Michael, If this is a problem after the new openrc is released as described by Roy then feel free to reopen or create a new bug. It then should be assigned to baselayout I believe.
(In reply to comment #19) > If this is a problem after the new openrc is released as described by Roy then > feel free to reopen or create a new bug. It then should be assigned to > baselayout I believe. Technically it's a kernel bug :P