502570 – app-emulation/xen-tools - `/etc/xen/scripts/vif-bridge offline' logs error in syslog upon shutdown of PV domU

Bug 502570 - app-emulation/xen-tools - `/etc/xen/scripts/vif-bridge offline' logs error in syslog upon shutdown of PV domU

Summary: app-emulation/xen-tools - `/etc/xen/scripts/vif-bridge offline' logs error in...

Status:	RESOLVED OBSOLETE

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	AMD64 Linux

Importance:	Normal normal (vote)
Assignee:	Gentoo Xen Devs

URL:	http://xen.1045712.n5.nabble.com/XEN-...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-02-26 22:27 UTC by KK
Modified:	2015-06-06 03:46 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description KK 2014-02-26 22:27:44 UTC

Upon shutting down a domU under XEN the script "/etc/xen/scripts/vif-bridge" is invoked with an "offline" argument. This is for the recommended setup of connecting domUs to the dom0 through a bridged device named xenbr0. The relevant snippet of code reads as follows:
==========================================
case "$command" in
    online)
        setup_virtual_bridge_port "$dev"
        mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`"
        if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
        then
                ip link set $dev mtu $mtu || :
        fi
        add_to_bridge "$bridge" "$dev"
        ;;

    offline)
        do_without_error brctl delif "$bridge" "$dev"
        do_without_error ifconfig "$dev" down
        ;;

    add)
        setup_virtual_bridge_port "$dev"
        add_to_bridge "$bridge" "$dev"
        ;;
esac
==========================================


The function "do_without error" called from the "offline)" pattern in the "case" statement is defined in /etc/xen/scripts/xen-hotplug-common.sh which is indirectly sourced through /etc/xen/scripts/vif-common.sh and reads as follows:
==========================================
do_without_error() {
  "$@" 2>/dev/null || log debug "$@ failed"
}
==========================================


The call 'do_without_error brctl delif "$bridge" "$dev"' obviously executes 
    brctl delif "$bridge" "$dev"
and the call 'do_without_error brctl delif "$bridge" "$dev"' executes
    ifconfig "$dev" down
- both discarding any error output, but in case of any error (i.e. exit code <> 0) still logging a failed message to syslog as follows:
==========================================
Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: brctl delif xenbr0 vif1.0 failed
Feb 26 22:14:29 vm-host logger: /etc/xen/scripts/vif-bridge: ifconfig vif1.0 down failed
==========================================


Upon investigating it seems that the problem is related to the fact that the network device (at least for paravirtualized guests using the netfront/netback device model) has already been destroyed by the dom0 kernel when the script is being run. This is evidenced by the following entries in syslog preceding the above quoted error messages:
==========================================
Feb 26 22:14:29 vm-host kernel: [ 6169.989895] xenbr0: port 1(vif1.0) entered disabled state
Feb 26 22:14:29 vm-host kernel: [ 6170.007496] xenbr0: port 1(vif1.0) entered disabled state
Feb 26 22:14:29 vm-host kernel: [ 6170.007568] device vif1.0 left promiscuous mode
Feb 26 22:14:29 vm-host kernel: [ 6170.007571] xenbr0: port 1(vif1.0) entered disabled state
==========================================


These findings are further underpinned by the relevant error messages provided by the function "do_without_error" (captured by redirecting stderr to a file rather than to /dev/null) which are as follows:
==========================================
for brctl: "interface vif1.0 does not exist!"
for ifconfig: "vif1.0: ERROR while getting interface flags: No such device"
==========================================



Suggested fix:
for brctl: check whether the interface still exists and is also still linked to the bridge prior to invoking the brctl command
for ifconfig: check whether the interface still exists and is also still up prior to invoking the ifconfig command as follows:
==========================================
case "$command" in
    online)
        setup_virtual_bridge_port "$dev"
        mtu="`ip link show $bridge | awk '/mtu/ { print $5 }'`"
        if [ -n "$mtu" ] && [ "$mtu" -gt 0 ]
        then
                ip link set $dev mtu $mtu || :
        fi
        add_to_bridge "$bridge" "$dev"
        ;;

    offline)
        brctl show "$bridge" | grep -q "$dev" &&
            do_without_error brctl delif "$bridge" "$dev"
        ifconfig -s | grep -q "$dev" &&
            do_without_error ifconfig "$dev" down
        ;;

    add)
        setup_virtual_bridge_port "$dev"
        add_to_bridge "$bridge" "$dev"
        ;;
esac
==========================================


In terms of functionality my suggested fix does not change anything as in case the interface is still linked to the bridge (is still up) - which might be the case for PCI-passed through devices from dom0 to a domU - the removal from the bridge (bringing the interface down) is performed exactly as before. It however does away the nasty error message in the syslog.

Comment 1 Yixun Lan archtester

2014-02-27 15:17:02 UTC

@kk, your report looks sane to me.. but I feel like this fix should route to upstream (instead of just keeping into Gentoo), think about other distro will also benefit.

would you kindly report this to xen upstream?
 
also I'd personally try to avoid using "do something | grep -q xx && do others" style, slightly adjust as following (grep's -qs options are not portable, man 1 grep)

    offline)
        if brctl show "$bridge" | grep "$dev" > /dev/null 2>&1 ; then
            do_without_error brctl delif "$bridge" "$dev"
        fi
        if ifconfig -s "$dev" > /dev/null 2>&1 ; then
            do_without_error ifconfig "$dev" down
        fi
        ;;

Comment 2 Ian Delaney (RETIRED) gentoo-dev

2014-02-28 15:31:04 UTC

yes, in this case reporting upstream seems apt.  You know where & how?  xen-user or xen-deel ML and xen afaik has a bugzilla.  The URL text box we often use for a link to an upstream bug but can also be used for an email published on a xensource ML page

Comment 3 KK 2014-02-28 18:37:14 UTC

Yixun & Ian,
thanks for your comments. I'll initially send an e-Mail to the xen-user ML. It's anyway mainly a copy&paste thing from my report here - albeit I'll be amending the fix I had suggested with what Yixun has provided.

If appropriate I am sure Ian Campbell on there will redirect/resend my mail to the xen-devel mailing list.

Yixun or Ian - would anyone of you like to be on CC?

Thanks,

KK

Comment 4 Ian Delaney (RETIRED) gentoo-dev

2014-03-01 01:47:11 UTC

(In reply to KK from comment #3)
> Yixun & Ian,
> thanks for your comments. I'll initially send an e-Mail to the xen-user ML.
> It's anyway mainly a copy&paste thing from my report here - albeit I'll be
> amending the fix I had suggested with what Yixun has provided.
> 
> If appropriate I am sure Ian Campbell on there will redirect/resend my mail
> to the xen-devel mailing list.
> 

Just tell me fellow Ian @ gentoo sent you. I've been in contact with him extensively lately.  Add the link of this bug would be apt.

> Yixun or Ian - would anyone of you like to be on CC?
> 
> Thanks,
> 
> KK
hmm, well, add me if you care to.  Won't hurt.

Comment 5 Ian Delaney (RETIRED) gentoo-dev

2015-06-06 03:46:27 UTC

No further news