Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 219400 - unreliable virtual interface deregistration
Summary: unreliable virtual interface deregistration
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-27 00:54 UTC by Michael Beasley
Modified: 2009-05-01 18:17 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Dmesg with task dump (dmesg.out,239.04 KB, text/plain)
2008-12-10 09:00 UTC, Mathieu Segaud
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Beasley 2008-04-27 00:54:27 UTC
While rebooting/shutting down my system all services are properly shutdown until it gets to dhcpcd, from there the system hangs indefinantly.  Manually shutting down the interfaces (/etc/init.d/net.bond0 stop) prior to shutdown/reboot results in a successful shutdown/reboot.  Works fine when using a single non bonded interface; ie: eth0 only.  

Reproducible: Always

Steps to Reproduce:
1. install baselayout-2 and openrc
2. bond two or more network interfaces
3. try to reboot or shutdown

Actual Results:  
system doesn't reboot or shutdown properly.

Expected Results:  
a clean shutdown.

mike@snafu ~ % emerge --info
Portage 2.1.4.4 (default/linux/amd64/2008.0, gcc-4.1.2, glibc-2.6.1-r0, 2.6.24.5 x86_64)
=================================================================
System uname: 2.6.24.5 x86_64 AMD Athlon(tm) 64 FX-57 Processor
Timestamp of tree: Sat, 26 Apr 2008 09:30:01 +0000
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p17-r1
dev-lang/python:     2.4.4-r9
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 2.0.0
sys-apps/openrc:     0.2.2
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r1
sys-devel/automake:  1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r1
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=athlon-fx -O2 -pipe -msse3"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/terminfo /etc/udev/rules.d"
CXXFLAGS="-march=athlon-fx -O2 -pipe -msse3"
DISTDIR="/usr/portage/distfiles"
FEATURES="ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict unmerge-orphans userfetch"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/layman/mpd /usr/portage/local/layman/sunrise /usr/portage/local/layman/mike"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X acl alsa amd64 avahi berkdb bzip2 cli cracklib crypt dbus dri fortran gdbm gnome gpm gstreamer gtk hal iconv isdnlog jpeg midi mmx mmxext mudflap multilib ncurses nls nptl nptlonly opengl openmp pam pcre perl png pppd python readline reflection session spl sqlite sse sse2 sse3 ssl svg tcpd unicode vim-syntax xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fglrx"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

---

mike@snafu ~ % grep -v # /etc/conf.d/net
dns_domain_lo="setup"
config_eth0="null"
config_eth1="null"
config_eth2="null"
slaves_bond0="eth0 eth1 eth2"
config_bond0="dhcp"

---

mike@snafu ~ % grep -i bonding /etc/conf.d/modules 
modules_2_6="${modules_2_6} bonding vboxdrv"
module_bonding_args_2_6="miimon=100 mode=0"

---

eth0 is using the skge (Marvel Yukon) driver.
eth1 and eth2 are using the tulip driver.
both drivers are compiled into the kernel.
Comment 1 Michael Beasley 2008-04-27 05:10:45 UTC
I tried setting "rc_logger="YES" in /etc/rc.conf with the intention of finding a reason why it hangs on shutting down dhcpcd/net.bond0 and found that it consistantly shuts down properly with the rc logger enabled.  Although once it's set back to "NO" it repeats the previous behaviour of not shutting down properly.
Comment 2 Michael Beasley 2008-04-27 05:27:30 UTC
output of rc.log

---

snafu mike # cat /var/log/rc.log 

rc boot logging started at Sun Apr 27 00:59:12 2008

 * Setting system clock using the hardware clock [UTC] ...
 [ ok ]
 * Loading module bonding ...
 [ ok ]
 * Loading module vboxdrv ...
 [ ok ]
 * Autoloaded 2 module(s)
 * Setting up the Logical Volume Manager ...
  Locking type 1 initialisation failed.
 [ ok ]
 * Setting up dm-crypt mappings ...
 * Checking swap is not LUKS
 * dm-crypt map crypt-swap1 ...
 * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap1 /dev/hdd1
 [ ok ]
 *   Running pre_mount commands for crypt-swap1 ...
 [ ok ]
 * Checking swap is not LUKS
 * dm-crypt map crypt-swap2 ...
 * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap2 /dev/sda1
 [ ok ]
 *   Running pre_mount commands for crypt-swap2 ...
 [ ok ]
 * dm-crypt map crypt-home ...
 * cryptsetup will be called with :   luksOpen /dev/sys/crhome crypt-home
Command successful.
 [ ok ]
 * dm-crypt map crypt-tmp ...
 * cryptsetup will be called with :   luksOpen /dev/sys/crtmp crypt-tmp
Command successful.
 [ ok ]
 *   Running pre_mount commands for crypt-tmp ...
mke2fs 1.40.6 (09-Feb-2008)
 [ ok ]
 [ ok ]
 * Checking local filesystems ...
/dev/sys/root: clean, 8775/128016 files, 194564/512000 blocks
If you wish to check the consistency of an XFS filesystem or
repair a damaged filesystem, see xfs_check(8) and xfs_repair(8).
/dev/hdc1: clean, 37/26104 files, 10380/104388 blocks
 [ ok ]
 * Remounting root filesystem read/write ...
 [ ok ]
 * Updating /etc/mtab ...
 [ ok ]
 * Mounting local filesystems ...
 [ ok ]
 * Setting hostname to snafu ...
 [ ok ]
 * Configuring kernel parameters ...
 [ ok ]
 * Running hdparm on /dev/hdc ...
 [ ok ]
 * Running hdparm on /dev/hdd ...
 [ ok ]
 * Creating user login records ...
 [ ok ]
 * Cleaning /var/run ...
 [ ok ]
 * Wiping /tmp directory ...
 [ ok ]
 * Restoring Mixer Levels ...
XXX write TLV...
 [ ok ]
 * Setting terminal encoding [UTF-8] ...
 [ ok ]
 * Setting console font [default8x16] ...
 [ ok ]
 * Loading key mappings [us] ...
 [ ok ]
 * Setting keyboard mode [UTF-8] ...
 [ ok ]
 * Bringing up interface lo
 *   127.0.0.1/8 ...
 [ ok ]
 *   Adding routes
 *     127.0.0.0/8 via 127.0.0.1 ...
 [ ok ]
 * Mounting USB device filesystem (usbfs) ...
 [ ok ]
 * Activating swap devices ...
 [ ok ]
 * Initializing random number generator ...
 [ ok ]

rc boot logging stopped at Sun Apr 27 00:59:21 2008


rc default logging started at Sun Apr 27 00:59:21 2008

 * Bringing up interface bond0
 *   Adding slaves to bond0 ...
 *     eth0 eth1 eth2
 *     Removing addresses
 *       Removing addresses
 *         Removing addresses
 [ ok ]
 *   dhcp ...
 *     Running dhcpcd ...
 [ ok ]
 *     received address 192.168.11.4/24
 [ ok ]
 * Starting APC UPS daemon ...
 [ ok ]
 * Starting D-BUS system messagebus ...
 [ ok ]
 * Starting avahi-daemon ...
 [ ok ]
 * Starting avahi-dnsconfd ...
 [ ok ]
 * Starting syslog-ng ...
 [ ok ]
 * Starting Hardware Abstraction Layer daemon ...
 [ ok ]
 * Loading iptables state and starting firewall ...
 [ ok ]
 * Mounting network filesystems ...
 [ ok ]
 * Starting Music Player Daemon ...
 [ ok ]
 * Starting vixie-cron ...
 [ ok ]
 * Starting tmp ...
 [ ok ]
 * Starting local ...
 [ ok ]

rc default logging stopped at Sun Apr 27 00:59:31 2008
Comment 3 Roy Marples 2008-04-27 21:46:01 UTC
This should have been fixed with this commit.
http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc
Comment 4 Michael Beasley 2008-04-28 07:50:34 UTC
(In reply to comment #3)
> This should have been fixed with this commit.
> http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc
>
Thanks for the reply Roy,
Unfortunantly the problem still exists after upgrading, tested over 10 reboots.

snafu mike # rc --version
rc (OpenRC) git-b6c4a563 (Gentoo Linux)


Comment 5 Doug Goldstein (RETIRED) gentoo-dev 2008-10-07 15:30:59 UTC
Roy, any feedback on this issue?
Comment 6 Roy Marples 2008-10-07 15:41:16 UTC
I'm pretty sure this is because the kernel is not de-registering virtual interfaces reliably. I can trivially hang it on my hardened server and will probably stop de-registering the virtual interfaces.

This applies to the lot - tap, tun, bridge, bond, etc. Pretty sure it's a kernel bug but it hasn't been "fixed" in years.
Comment 7 Doug Goldstein (RETIRED) gentoo-dev 2008-10-07 15:59:30 UTC
Kernel: Any feedback?
Comment 8 Mike Pagano gentoo-dev 2008-10-28 00:04:25 UTC
Can you please test with gentoo-sources-2.6.27-rX
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2008-10-28 22:43:09 UTC
Roy, how trivially can you reproduce this - what are the commands/events?
When the system hangs, does it respond to sysrq combinations? Perhaps we can get a trace from that.
Comment 10 Roy Marples 2008-10-28 22:59:55 UTC
At the time I did this

create vif
ifconfig vif up
ifconfig vif down
destroy vif

Repeat that loop a few times for tap or bridge and eventually the kernel hangs trying to de-register the interface. This happened nearly 100% of the time with a hardened kernel, less so with gentoo-sources. I did this in single user with no running services (except for udev) to prove to myself it was a kernel issue.

I've not tried to actively replicate it, but I'm sure it still happens as when my server was running Gentoo it did sometimes hang stopping the tap0 interface because I sometimes accidently commented out the code to stop the interface from being destroyed in a pre-down function.
Comment 11 Markos Chandras (RETIRED) gentoo-dev 2008-12-08 00:37:24 UTC
I cant find openrc-0.2.2 on portage. Can you try reproduce it with 0.3.0 ?

I cant reproduce it on my gentoo machine ( quick testing though ) with

1) gentoo-sources-2.6.26-r4
2) openrc-0.3.0-r1

Comment 12 Daniel Drake (RETIRED) gentoo-dev 2008-12-08 09:32:13 UTC
Instead of trying to reproduce with openrc, you could set up a script that does what is described in comment #10, and run it overnight or something
Comment 13 Mathieu Segaud 2008-12-08 11:10:52 UTC
(In reply to comment #12)
> Instead of trying to reproduce with openrc, you could set up a script that does
> what is described in comment #10, and run it overnight or something
> 

I'm currently running this, on 2.6.27-r4:

#! /bin/sh

for run in $(seq 1 100000); do

echo "Run #$run"

  brctl addbr vbr$run
  tunctl -t vif$run
  ifconfig vif$run up
  brctl addif vbr$run vif$run
  ifconfig vbr$run 30.30.30.30 up
  ifconfig vbr$run down
  brctl delif vbr$run vif$run
  ifconfig vif$run down
  tunctl -d vif$run
  brctl delbr vbr$run

done

-------
so far, no hang, will test on 2.6.26-rX after that...
Comment 14 Mathieu Segaud 2008-12-08 11:38:59 UTC
(In reply to comment #13)

> #! /bin/sh
> 
> for run in $(seq 1 100000); do
> 
> echo "Run #$run"
> 
>   brctl addbr vbr$run
>   tunctl -t vif$run
>   ifconfig vif$run up
>   brctl addif vbr$run vif$run
>   ifconfig vbr$run 30.30.30.30 up
>   ifconfig vbr$run down
>   brctl delif vbr$run vif$run
>   ifconfig vif$run down
>   tunctl -d vif$run
>   brctl delbr vbr$run
> 
> done

Oops, I forgot. You can get tunctl by emerging usermode-utilities or openvpn, and brctl comes with bridge-utils.
Kernel must be set up with at least CONFIG_TUN=m (y if you want it in the kernel) for TUN/TAP devices (virtual interfaces) and CONFIG_BRIDGE=m(or y...)
for bridging interfaces.
Comment 15 Mathieu Segaud 2008-12-10 08:58:35 UTC
(In reply to comment #14)

> [snip]

After 26 (!) hours of looping, I finally got my script stuck (with vif447429...)
Seems really hard to reproduce, but this time, it is completely stopped.
I attach the task dump right away.

Comment 16 Mathieu Segaud 2008-12-10 09:00:26 UTC
Created attachment 174835 [details]
Dmesg with task dump
Comment 17 Mike Pagano gentoo-dev 2009-05-01 17:04:40 UTC
Any news here? Does this still occur with the latest version of openrc?
Comment 18 Roy Marples 2009-05-01 17:16:07 UTC
Actually with the new network script for openrc-0.5.0 (almost ready for release) this should be fixed in a round about way - it's upto the user if the want the interface destroyed or not :)

But I'm currently unsure if Gentoo will go with the new script, or provide a USE flag or something.
Comment 19 Mike Pagano gentoo-dev 2009-05-01 18:02:34 UTC
Michael,

If this is a problem after the new openrc is released as described by Roy then feel free to reopen or create a new bug.  It then should be assigned to baselayout I believe.
Comment 20 Roy Marples 2009-05-01 18:17:27 UTC
(In reply to comment #19)
> If this is a problem after the new openrc is released as described by Roy then
> feel free to reopen or create a new bug.  It then should be assigned to
> baselayout I believe.

Technically it's a kernel bug :P