Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 271896 - sys-apps/baselayout-1 support for building custom bonds via sysfs
Summary: sys-apps/baselayout-1 support for building custom bonds via sysfs
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] baselayout (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-30 18:33 UTC by Mike Williams
Modified: 2009-12-17 18:28 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
sysfs support for bonding (bonding.sh.diff,3.57 KB, patch)
2009-05-30 18:35 UTC, Mike Williams
Details | Diff
revised sysfs support for bonding patch (bonding.sh.diff2,4.82 KB, patch)
2009-10-14 10:30 UTC, Lorand Kelemen
Details | Diff
revised sysfs support for bonding patch (bonding.sh.diff3,3.66 KB, patch)
2009-10-14 17:22 UTC, Lorand Kelemen
Details | Diff
revised sysfs support for bonding patch (bonding.sh.diff4,4.03 KB, patch)
2009-10-16 16:45 UTC, Lorand Kelemen
Details | Diff
emerge info (emerge_info.txt,3.86 KB, text/plain)
2009-10-19 18:23 UTC, Lorand Kelemen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Williams 2009-05-30 18:33:35 UTC
Hi,

Currently to create a bond in Gentoo you have to load the module with the settings you want. However that has the severe issue of not being able to make multiple bonds with different configurations.
For a long long time the bonding module has had a sysfs interface for the creation and modification of bonds, following is a patch to bonding.sh to add support for the sysfs interface.

I've been using this on multiple servers for nearly a year, over dozens of reboots. It should not impact the current syntax at all.
There was a bug in the sysfs support which caused a kernel panic on bringing down bonds back in early 2007.
http://lkml.org/lkml/2007/6/19/279

Mike
Comment 1 Mike Williams 2009-05-30 18:35:31 UTC
Created attachment 192998 [details, diff]
sysfs support for bonding

bonding_bond0="mode=802.3ad miimon=300 xmit_hash_policy=layer2+3 lacp_rate=fast"
Comment 2 Wormo (RETIRED) gentoo-dev 2009-06-01 05:45:15 UTC
Sounds like a cool addition, assigning to maintainers

This was an interesting line btw ;)
#rmmod modprobe >& /dev/null
Comment 3 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-06-01 18:21:06 UTC
This is already supported in openrc/baselayout2.

Baselayout1 is only getting fixes at this point, so I'd be up to vapier to decide if he wants to include this.
Comment 4 SpanKY gentoo-dev 2009-10-11 09:29:34 UTC
the net modules scare me, so unless it's an important bug fix, i'm not going to bother ...
Comment 5 Lorand Kelemen 2009-10-14 10:30:40 UTC
Created attachment 207068 [details, diff]
revised sysfs support for bonding patch
Comment 6 Lorand Kelemen 2009-10-14 10:37:12 UTC
This is a very useful feature backport, if you still want to use the stable baselayout1 and bonding with a monolithic kernel.

I tinkered a bit with the patch and set up bonding (+ ipaliases over the bond) on over 6 test servers, which soon will become prod environments.

Please review the patch for inclusion!

(Note that the order of the bonding options is important in the config!

A working config:

config_eth0=( "null" )
config_eth1=( "null" )
slaves_bond0="eth0 eth1"
config_bond0=( "10.0.0.12 netmask 255.255.255.0" "10.0.0.14 netmask 255.255.255.0" "10.0.0.16 netmask 255.255.255.0" )
bonding_bond0="mode=active-backup arp_validate=3 arp_interval=100 arp_ip_target=+10.0.0.251 arp_ip_target=+10.0.0.252"
routes_bond0=( "default via 10.0.0.254" )
RC_NEED_bond0="net.eth0 net.eth1"

)

Comment 7 Lorand Kelemen 2009-10-14 17:22:17 UTC
Created attachment 207118 [details, diff]
revised sysfs support for bonding patch

Corrected tabs for a refined patching experience.
Comment 8 Lorand Kelemen 2009-10-16 16:45:20 UTC
Created attachment 207326 [details, diff]
revised sysfs support for bonding patch

Fixing a regression introduced in bonding.sh.diff2 (http://bugs.gentoo.org/attachment.cgi?id=207068), now the monster is alive, no more modifications from my side!

The problem: after a reboot not all bonding options were applied. 

This also affects the current openrc/bl2 script (http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git;a=blob;f=net/bonding.sh;h=793280bb8f330f70b1b7e28fd281647b0c04c908;hb=43f6c2196eaa1600f1816e1081e35fd588806047), because before the bond creation you have to bring down any automatically created bonds! (mode and arp_validate are the examples)

Also in my case the stop timeout is an important addition, because without that simply restarting the bond via the initscript fails sometimes.

 * Stopping bond0
 *   Bringing down bond0
 *     Removing slaves from bond0 ...
 *       eth2 eth3                                                                                          [ ok ]
 * Starting bond0
 *   Adding slaves to bond0 ...
 *     eth2 eth3
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted                         [ !! ]
 *   Bringing up bond0
 *     10.0.0.251
 *     network interface bond0 does not exist
 *     Please verify hardware or kernel module (driver)                                                     [ !! ]
Comment 9 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-10-18 00:58:35 UTC
Lorand, following this from your email, you never answered as to why the existing BL2 bonding (oldnet) support wasn't suitable for you. All of your bonding_IFACE stuff seems to duplicate the existing support that starts at "# Configure the bond". Bond creation is further up (look for /sys/class/net/bonding_masters, line 33).

Your config:
config_eth0=( "null" )
config_eth1=( "null" )
slaves_bond0="eth0 eth1"
config_bond0=( "10.0.0.12 netmask 255.255.255.0" "10.0.0.14 netmask
255.255.255.0" "10.0.0.16 netmask 255.255.255.0" )
bonding_bond0="mode=active-backup arp_validate=3 arp_interval=100
arp_ip_target=+10.0.0.251 arp_ip_target=+10.0.0.252"
routes_bond0=( "default via 10.0.0.254" )
RC_NEED_bond0="net.eth0 net.eth1"

Existing working config:
config_eth0="null"
config_eth1="null"
slaves_bond0="eth0 eth1"
config_bond0="10.0.0.12/24 10.0.0.14/24 10.0.0.16/24"
routes_bond0="default via 10.0.0.254"
RC_NEED_bond0="net.eth0 net.eth1"
mode_bond0="active-backup"
arp_validate_bond0=3
arp_interval_bond0=100
arp_ip_target_bond0="+10.0.0.251 +10.0.0.252"

(I am concerned that there may be namespace conflict in there in future, maybe time to start doing them as bonding_$VAR_$IFACE=$VAL ).
Comment 10 Lorand Kelemen 2009-10-18 13:04:03 UTC
I'd like to stick to baselayout-1 -- for now. If we accept that, then primarily this patch is a feature backport (for monolithic kernels, also I don't use ifenslave).

It does more or less the same as the openrc-0.5.0 method, but let me point out the differences:

- forcing bonds down before applying bonding options

	Some are not applied when the bond was created previously, say after a reboot with a monolithic kernel (e.g. mode).

	From http://lwn.net/Articles/142330/ 

	"Some caveats:
	    - slaves can only be assigned when the interface is up
	    - mode can only be changed when the interface is down
	    - Xmit hash policy can be changed only when interface is down"
	
- more failsafe for options: they are applied before and after the interface is brought up (see reasons above)

- implementing timeouts when creating / destroying bonds - sysfs needs some time to propagate the changes

I don't know how relevant these differences are, bonding as expected only works for me with these extra changes, I guess it wouldn't hurt to consider merging them to openrc. If these are not an issue with a bl2/openrc setup, then I'd love to hear why, I did not dig so deep into the core of sysfs/bonding.
Comment 11 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2009-10-19 02:24:17 UTC
(In reply to comment #10)
> I'd like to stick to baselayout-1 -- for now. If we accept that, then primarily
> this patch is a feature backport (for monolithic kernels, also I don't use
> ifenslave).
We're not doing feature additions (ideally even backports to BL1, simply bugfixes only). The push is on getting BL2 to be sane.

> It does more or less the same as the openrc-0.5.0 method, but let me point out
> the differences:
> - forcing bonds down before applying bonding options
>         Some are not applied when the bond was created previously, say after a
> reboot with a monolithic kernel (e.g. mode).
Ok, so we need an additional down around line 41 of bonding_pre_start, just before the configure loop.

>         "Some caveats:
>             - slaves can only be assigned when the interface is up
>             - mode can only be changed when the interface is down
>             - Xmit hash policy can be changed only when interface is down"
These are respected already normally (with the addition of the one down statement). Are there any settings OTHER than slaves that require the interface to be up?

> - more failsafe for options: they are applied before and after the interface is
> brought up (see reasons above)
I think it's more of overkill, just separate the options into where they need to be applied.

> - implementing timeouts when creating / destroying bonds - sysfs needs some
> time to propagate the changes
The constant delays for settling suck IMO. We're adding nearly 10 seconds to some boot paths. We're changing the device state immediately, so I don't believe we need the settling at all for the down calls (beyond waiting for the kernel to ack that it's down, maybe via udev)

> I don't know how relevant these differences are, bonding as expected only works
> for me with these extra changes, I guess it wouldn't hurt to consider merging
> them to openrc. If these are not an issue with a bl2/openrc setup, then I'd
> love to hear why, I did not dig so deep into the core of sysfs/bonding.
BL2 already does nearly pure sysfs by default. Aside from that, can you isolate "this exact change makes it work" from your patch?

Comment 12 Lorand Kelemen 2009-10-19 18:09:13 UTC
> We're not doing feature additions (ideally even backports to BL1, simply
> bugfixes only). The push is on getting BL2 to be sane.

Ok, I can live with that, I have to apply other patches too...

> Ok, so we need an additional down around line 41 of bonding_pre_start, just
> before the configure loop.

No, you need it around line 29, before the creation of the bond.

> Are there any settings OTHER than slaves that require the interface
> to be up?
>
> I think it's more of overkill, just separate the options into where they need
> to be applied.

I will look into the options part. I noticed that certain options must preceed others.
(Which is logical, think of adding arp ip targets before specifying the usage of the arp monitor, also the used mode affects a lot of settings, it's quite complex) It would be nice to find a precise documentation about this, (other then the bonding.txt in the kernel and the kernelcode ...)

Overkill indeed, but it could be much worse to maintain a separate list of options.
I will cycle through all known options and report back as I have the time, maybe this is not an issue, or only in special cases.

> The constant delays for settling suck IMO. We're adding nearly 10 seconds to
> some boot paths. We're changing the device state immediately, so I don't
> believe we need the settling at all for the down calls (beyond waiting for the
> kernel to ack that it's down, maybe via udev)

It suck indeed. However when I remove the sleep at the stop, 3 bond restarts are peachy, but at the fourth:

 * Stopping bond0
 *   Bringing down bond0
 *     Removing slaves from bond0 ...
 *       eth2 eth3                                                                                          [ ok ]
 * Starting bond0
 *   Adding slaves to bond0 ...
 *     eth2 eth3
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted                         [ !! ]
 *   Bringing up bond0
 *     10.0.0.251
!!! add address begin !!!
 *     network interface bond0 does not exist
 *     Please verify hardware or kernel module (driver)

It dies here (module=ifconfig), /lib/rcscripts/net/ifconfig.sh:

# bool ifconfig_add_address(char *iface, char *options ...)
#
# Adds the given address to the interface
ifconfig_add_address() {
        local iface="$1" i=0 r= e= real_iface=$(interface_device "$1")

        echo "!!! add address begin !!!"
        ifconfig_exists "${real_iface}" true || return 1
        echo "!!! add address end !!!"

        # Extract the config
        local -a config=( "$@" )
        config=( ${config[@]:1} )

If it counts, I use

[U] sys-fs/udev
     Available versions:  114 115-r1 119 124-r1 124-r2 141 ~141-r1 ~145 **9999 {devfs-compat doc extras selinux}
     Installed versions:  124-r2(09:53:24 05/08/09)(-selinux)

Maybe you have an idea why this is happening, I wouldn't like to dive into the code more. It affects both the ifconfig and iproute2 operation modes.

Also note that the current BL2 script does not implement sleep cycles (which reeeeally suck, yes)

> BL2 already does nearly pure sysfs by default. Aside from that, can you 
> isolate "this exact change makes it work" from your patch?

Of course, let's try to sort the above issues out and let's fix up the BL2 code, then it's fairly certain that BL1 won't get these features, how sad, but I can understand the motive.
Comment 13 Lorand Kelemen 2009-10-19 18:23:25 UTC
Created attachment 207575 [details]
emerge info

Updated udev, no change in behaviour.

Attaching my current emerge info.

Also another scenario when sleep is removed from stop:

 * Stopping bond0
 *   Bringing down bond0
 *     Removing slaves from bond0 ...
 *       eth2 eth3                                                                                          [ ok ]
 * Starting bond0
 *   Adding slaves to bond0 ...
 *     eth2 eth3
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted
/lib/rcscripts/net/bonding.sh: line 133: echo: write error: Operation not permitted                         [ !! ]
 *   Bringing up bond0
 *     10.0.0.251
RTNETLINK answers: File exists                                                                              [ !! ]
Comment 14 William Hubbs gentoo-dev 2009-12-17 18:28:53 UTC
Since this feature will not be added to baselayout-1 and since you want
to stay with baselayout-1 for now, I am closing this with "needinfo"
status.

Once you upgrade to baselayout-2 and openrc, please feel free to re-open
this bug and submit a patch for changes to openrc.

Thanks much.

William