Summary: | unreliable virtual interface deregistration | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Michael Beasley <youvegotmoxie> |
Component: | [OLD] baselayout | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | CC: | base-system, hwoarang, roy |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Dmesg with task dump |
Description
Michael Beasley
2008-04-27 00:54:27 UTC
I tried setting "rc_logger="YES" in /etc/rc.conf with the intention of finding a reason why it hangs on shutting down dhcpcd/net.bond0 and found that it consistantly shuts down properly with the rc logger enabled. Although once it's set back to "NO" it repeats the previous behaviour of not shutting down properly. output of rc.log --- snafu mike # cat /var/log/rc.log rc boot logging started at Sun Apr 27 00:59:12 2008 * Setting system clock using the hardware clock [UTC] ... [ ok ] * Loading module bonding ... [ ok ] * Loading module vboxdrv ... [ ok ] * Autoloaded 2 module(s) * Setting up the Logical Volume Manager ... Locking type 1 initialisation failed. [ ok ] * Setting up dm-crypt mappings ... * Checking swap is not LUKS * dm-crypt map crypt-swap1 ... * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap1 /dev/hdd1 [ ok ] * Running pre_mount commands for crypt-swap1 ... [ ok ] * Checking swap is not LUKS * dm-crypt map crypt-swap2 ... * cryptsetup will be called with : -c aes -h sha1 -d /dev/urandom create crypt-swap2 /dev/sda1 [ ok ] * Running pre_mount commands for crypt-swap2 ... [ ok ] * dm-crypt map crypt-home ... * cryptsetup will be called with : luksOpen /dev/sys/crhome crypt-home Command successful. [ ok ] * dm-crypt map crypt-tmp ... * cryptsetup will be called with : luksOpen /dev/sys/crtmp crypt-tmp Command successful. [ ok ] * Running pre_mount commands for crypt-tmp ... mke2fs 1.40.6 (09-Feb-2008) [ ok ] [ ok ] * Checking local filesystems ... /dev/sys/root: clean, 8775/128016 files, 194564/512000 blocks If you wish to check the consistency of an XFS filesystem or repair a damaged filesystem, see xfs_check(8) and xfs_repair(8). /dev/hdc1: clean, 37/26104 files, 10380/104388 blocks [ ok ] * Remounting root filesystem read/write ... [ ok ] * Updating /etc/mtab ... [ ok ] * Mounting local filesystems ... [ ok ] * Setting hostname to snafu ... [ ok ] * Configuring kernel parameters ... [ ok ] * Running hdparm on /dev/hdc ... [ ok ] * Running hdparm on /dev/hdd ... [ ok ] * Creating user login records ... [ ok ] * Cleaning /var/run ... [ ok ] * Wiping /tmp directory ... [ ok ] * Restoring Mixer Levels ... XXX write TLV... [ ok ] * Setting terminal encoding [UTF-8] ... [ ok ] * Setting console font [default8x16] ... [ ok ] * Loading key mappings [us] ... [ ok ] * Setting keyboard mode [UTF-8] ... [ ok ] * Bringing up interface lo * 127.0.0.1/8 ... [ ok ] * Adding routes * 127.0.0.0/8 via 127.0.0.1 ... [ ok ] * Mounting USB device filesystem (usbfs) ... [ ok ] * Activating swap devices ... [ ok ] * Initializing random number generator ... [ ok ] rc boot logging stopped at Sun Apr 27 00:59:21 2008 rc default logging started at Sun Apr 27 00:59:21 2008 * Bringing up interface bond0 * Adding slaves to bond0 ... * eth0 eth1 eth2 * Removing addresses * Removing addresses * Removing addresses [ ok ] * dhcp ... * Running dhcpcd ... [ ok ] * received address 192.168.11.4/24 [ ok ] * Starting APC UPS daemon ... [ ok ] * Starting D-BUS system messagebus ... [ ok ] * Starting avahi-daemon ... [ ok ] * Starting avahi-dnsconfd ... [ ok ] * Starting syslog-ng ... [ ok ] * Starting Hardware Abstraction Layer daemon ... [ ok ] * Loading iptables state and starting firewall ... [ ok ] * Mounting network filesystems ... [ ok ] * Starting Music Player Daemon ... [ ok ] * Starting vixie-cron ... [ ok ] * Starting tmp ... [ ok ] * Starting local ... [ ok ] rc default logging stopped at Sun Apr 27 00:59:31 2008 This should have been fixed with this commit. http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc (In reply to comment #3) > This should have been fixed with this commit. > http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=commit;h=b6c4a563685270532d8698f824d50b7ddf61eafc > Thanks for the reply Roy, Unfortunantly the problem still exists after upgrading, tested over 10 reboots. snafu mike # rc --version rc (OpenRC) git-b6c4a563 (Gentoo Linux) Roy, any feedback on this issue? I'm pretty sure this is because the kernel is not de-registering virtual interfaces reliably. I can trivially hang it on my hardened server and will probably stop de-registering the virtual interfaces. This applies to the lot - tap, tun, bridge, bond, etc. Pretty sure it's a kernel bug but it hasn't been "fixed" in years. Kernel: Any feedback? Can you please test with gentoo-sources-2.6.27-rX Roy, how trivially can you reproduce this - what are the commands/events? When the system hangs, does it respond to sysrq combinations? Perhaps we can get a trace from that. At the time I did this create vif ifconfig vif up ifconfig vif down destroy vif Repeat that loop a few times for tap or bridge and eventually the kernel hangs trying to de-register the interface. This happened nearly 100% of the time with a hardened kernel, less so with gentoo-sources. I did this in single user with no running services (except for udev) to prove to myself it was a kernel issue. I've not tried to actively replicate it, but I'm sure it still happens as when my server was running Gentoo it did sometimes hang stopping the tap0 interface because I sometimes accidently commented out the code to stop the interface from being destroyed in a pre-down function. I cant find openrc-0.2.2 on portage. Can you try reproduce it with 0.3.0 ? I cant reproduce it on my gentoo machine ( quick testing though ) with 1) gentoo-sources-2.6.26-r4 2) openrc-0.3.0-r1 Instead of trying to reproduce with openrc, you could set up a script that does what is described in comment #10, and run it overnight or something (In reply to comment #12) > Instead of trying to reproduce with openrc, you could set up a script that does > what is described in comment #10, and run it overnight or something > I'm currently running this, on 2.6.27-r4: #! /bin/sh for run in $(seq 1 100000); do echo "Run #$run" brctl addbr vbr$run tunctl -t vif$run ifconfig vif$run up brctl addif vbr$run vif$run ifconfig vbr$run 30.30.30.30 up ifconfig vbr$run down brctl delif vbr$run vif$run ifconfig vif$run down tunctl -d vif$run brctl delbr vbr$run done ------- so far, no hang, will test on 2.6.26-rX after that... (In reply to comment #13) > #! /bin/sh > > for run in $(seq 1 100000); do > > echo "Run #$run" > > brctl addbr vbr$run > tunctl -t vif$run > ifconfig vif$run up > brctl addif vbr$run vif$run > ifconfig vbr$run 30.30.30.30 up > ifconfig vbr$run down > brctl delif vbr$run vif$run > ifconfig vif$run down > tunctl -d vif$run > brctl delbr vbr$run > > done Oops, I forgot. You can get tunctl by emerging usermode-utilities or openvpn, and brctl comes with bridge-utils. Kernel must be set up with at least CONFIG_TUN=m (y if you want it in the kernel) for TUN/TAP devices (virtual interfaces) and CONFIG_BRIDGE=m(or y...) for bridging interfaces. (In reply to comment #14) > [snip] After 26 (!) hours of looping, I finally got my script stuck (with vif447429...) Seems really hard to reproduce, but this time, it is completely stopped. I attach the task dump right away. Created attachment 174835 [details]
Dmesg with task dump
Any news here? Does this still occur with the latest version of openrc? Actually with the new network script for openrc-0.5.0 (almost ready for release) this should be fixed in a round about way - it's upto the user if the want the interface destroyed or not :) But I'm currently unsure if Gentoo will go with the new script, or provide a USE flag or something. Michael, If this is a problem after the new openrc is released as described by Roy then feel free to reopen or create a new bug. It then should be assigned to baselayout I believe. (In reply to comment #19) > If this is a problem after the new openrc is released as described by Roy then > feel free to reopen or create a new bug. It then should be assigned to > baselayout I believe. Technically it's a kernel bug :P |