Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 428604 - sys-apps/openrc: [oldnet] allow a bond to subsume another interface's characteristics
Summary: sys-apps/openrc: [oldnet] allow a bond to subsume another interface's charact...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Hosted Projects
Classification: Unclassified
Component: OpenRC (show other bugs)
Hardware: All Linux
: Normal critical (vote)
Assignee: OpenRC Team
URL:
Whiteboard: openrc:oldnet
Keywords:
Depends on:
Blocks: 417391
  Show dependency tree
 
Reported: 2012-07-30 05:42 UTC by Walter
Modified: 2013-04-25 23:08 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
screenshot of fail (Screen shot 2012-07-30 at 4.00.01 PM.png,3.65 KB, image/png)
2012-07-30 06:02 UTC, Walter
Details
screenshot of fail at boot time (Screen shot 2012-07-30 at 4.19.27 PM.png,11.72 KB, image/png)
2012-07-30 06:20 UTC, Walter
Details
support subsuming kernel autoconfigured interface + warn on nfs root (bonding.sh-0.10.5-nfsroot-subsume.patch,1.37 KB, patch)
2012-07-31 02:23 UTC, Walter
Details | Diff
screenshot of example nfs warning applied before likely crash (bonding.sh-0.10.5-nfsroot-subsume.patch-nfsroot-warning.png,14.93 KB, image/png)
2012-07-31 02:26 UTC, Walter
Details
support subsuming kernel autoconfigured interface + warn on nfs root (v2) (bonding.sh-0.10.5-nfsroot-subsume-2.patch,1.89 KB, patch)
2012-07-31 08:39 UTC, Walter
Details | Diff
support subsuming kernel autoconfigured interface + warn on nfs root (v3) (patch.diff,2.55 KB, patch)
2012-08-01 10:19 UTC, Walter
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Walter 2012-07-30 05:42:51 UTC
I have a 20 node cluster running Gentoo (amd64 arch, custom kernel) with 100% nfs root nodes.

For high availability at layer2, I am using the bonding driver.

Each node has eth0, eth1.  These merge to bond0 in active-backup mode.

I can boot the systems fine with nfsroot, and if manually executing the following commands all goes well:

 vconfig add bond0 51
 ifconfig bond0 x.x.x.x netmask y.y.y.y
 ping x.x.x.y

(Where x.x.x.x is the IP address already assigned to eth0 by kernel-level autoconfiguration, over which the nfsroot is mounted on the port's native VLAN)

This clears the eth0 config, puts up eth1, creates and configures bond0, and failover and everything then works fine between the two physical interfaces.

Unfortunately, when I rewrite the above configuration in /etc/conf.d/net as follows:

config_eth0="null"
config_eth1="null"
slaves_both0="eth0 eth1"
config_bond0="dhcp"      # DHCP on native VLAN should return same as kernel autoconfigured IP, already present on eth0, but this could be null or manual - it doesn't get that far.
config_vlan51="x.x.x.x netmask y.y.y.y"

I can boot normally with /etc/init.d/net.eth0 and net.eth1 in the default runlevel, but if I add net.bond0 the system crashes on boot right after displaying the 'Removing addresses...' status message found in _delete_addresses() in openrc's net/ifconfig.sh.Linux.in file, which is installed to /lib/rc/net/ifconfig.sh

I suppose this is because in removing the address on eth0, openrc renders its own root nfs server unavailable.  The manual sequence of commands does not have this problem.

I cannot find a way or any documentation on a way to disable _delete_addresses() in openrc.

Could you please either:
 A) tell me how to hack openrc to avoid this step on a specific interface
 B) add detection of nfsroot-critical interface configurations and avoid 'Removing addresses...' on these
 C) avoid calling 'Removing addresses...' code on bondX interfaces (the bonding driver seems to handle all this itself, anyway)
 D) add a feature that lets users remove this themselves on a per-interface basis

Very urgent for me, $$$ of hardware sitting idle.

Thanks!
Comment 1 Walter 2012-07-30 05:52:08 UTC
Sorry, manual lines are:

ifconfig bond x.x.x.x netmask y.y.y.y
ifenslave bond0 eth0
ifenslave bon0 eth1
ping x.x.x.y
Comment 2 Walter 2012-07-30 06:02:20 UTC
Created attachment 319672 [details]
screenshot of fail

this one run manually from prompt after booting (removed net.bond0 from default runlevel via chroot to filesystem on nfs boot server), but exactly the same behaviour as this is observed on boot if net.bond0 is in the default runlevel.
Comment 3 Walter 2012-07-30 06:20:38 UTC
Created attachment 319674 [details]
screenshot of fail at boot time
Comment 4 Walter 2012-07-30 06:21:41 UTC
# cat /diskless/core-setup/fs/etc/conf.d/net
#------- raw physical port config ---

config_eth0="null"
config_eth1="null"

#------- bond0 bonded interface -----

# enslave which?
slaves_bond0="eth0 eth1"

# ip config
config_bond0="dhcp"

#------- vlans on bond0 -------------

# bond0 vlans
#  - specify vlan ids
vlans_bond0="51 52"
#  - specify naming convention ('bond0.51'...)
vconfig_bond0="set_name_type DEV_PLUS_VID_NO_PAD"

# vlan #51 config
#  - set_flag 1 = tcpdump and similar show no vlan headers
vconfig_vlan51="set_flag 1"
#  - ip config
config_vlan51="10.0.51.101 netmask 255.255.255.0"
Comment 5 Walter 2012-07-31 02:17:46 UTC
OK, so I tried to move forward alone.  First I attempted an upgrade to openrc-0.10.5 - no difference in behaviour.

Then with hyper-advanced greppology I located an explicit call to _delete_addresses in bonding.sh with an apparently spurious comment and removed it...

        # Must force the slaves to a particular state before adding them
        for IFACE in ${slaves}; do
                _delete_addresses
                _down
        done
        )

Unfortunately, later in the process there were issues as well.  Specifically, the immediately subsequent part where slaves are added to the interface, either by sysfs (openrc's preference) or by ifenslave.

My working command line process was:
 ifconfig bond0 x.x.x.x netmask y.y.y.y
 ifenslave bond0 eth0
 ifenslave bond0 eth1

The command line process preconfigures the bond0 interface to the same IP address as the primary slave interface (eth0). I guess this happens while it is down, before putting it up, and before enslaving the existing interface. This apparently means that the route could switch over without affecting nfsroot.

I replaced the call to '_up' in bonding.sh with:
 'ifconfig bond0 <eth0-ip> netmask <eth0-netmask> up'

Then I noticed that while forcing the bonding.sh code to use ifenslave, I could have 'dhcp' in the /etc/conf.d/net for bond0 and boot worked fine, whereas if I let it use sysfs, boot would fail. Setting config_bond0 to null resolved this.

In short, I got it working for me, but your code is broken for bonding driver + nfsroot users at present and should be resolved.
Comment 6 Walter 2012-07-31 02:23:59 UTC
Created attachment 319804 [details, diff]
support subsuming kernel autoconfigured interface + warn on nfs root

The patch implements a new /etc/conf.d/net option, 'subsume_${IFACE}="<interface>"', for bonding interfaces.  For example, 'subsume_bond0="eth0"'.

This option copies (only) basic interface characteristics (IPv4 address, netmask) from the subsumed interface.  

Note that it does not attempt to duplicate additional routes or other interface characteristics. It has not been tested on IPv6, FreeBSD, or NetBSD.

However, an NFS-root warning is added to advertise the feature where most relevant.  This feature has been carefully developed using openrc's internal 'mountinfo' command ('mountinfo -s /' == 'nfs'?) in order to preserve portability, since FreeBSD's laggX interfaces and NetBSD's agr interfaces may also use (now or in future) the openrc bonding code.

In order to function, the patch also removes the downing of interfaces prior to setup, which breaks things for me and does not appear to be necessary in any of my own testing.  A more conservative change would simply ignore this in the case that the 'subsume' option was in use, but I do think it's safe to apply globally. (The history of the interface downing code snippet seems to be Roy remembering it as needed 2+ years ago @ http://dev.gentoo.org/~vapier/openrc/projects/openrc/ticket/105.html)
Comment 7 Walter 2012-07-31 02:26:34 UTC
Created attachment 319806 [details]
screenshot of example nfs warning applied before likely crash
Comment 8 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2012-07-31 04:34:38 UTC
Removing that _down is going to be problematic, and hardware dependent.
Specifically, some NICs cannot join a bond while they are up. There is no way to test either, forcing the NIC down was the most reliable way.

Can you please test in your nfsroot environment, explicitly doing
config_bond0="$config_eth0" or whatever IP is meant for a node.

I do in generate think that the subsume concept is good, but your config_bond0 DHCP should be sending an DHCP INFORM, rather than doing the DISCOVER/REQUEST cycle. That would also help prevent the address for being dropped.
Comment 9 Walter 2012-07-31 07:34:41 UTC
> Removing that _down is going to be problematic, and hardware dependent.
> Specifically, some NICs cannot join a bond while they are up. There is no
> way to test either, forcing the NIC down was the most reliable way.

Maybe it's possible to work around this with something like:

 # first kick relevant code and libs to be cached locally
 #  (incidentally, i hear busybox has one built-in? unsure if
 #   gentoo comples it in default or not, i suspect not)
 ifconfig --help 1>/dev/null 2>/dev/null
 ifenslave 1>/dev/null 2>/dev/null 
 # down other interfaces here
 # subsume existing ip here and force bonded interface up

Let me give that a try and post back shortly.

> Can you please test in your nfsroot environment, explicitly doing
> config_bond0="$config_eth0" or whatever IP is meant for a node.

There are some issues with that.
 1. The eth0 interface is kernel autoconfigured (diskless node / nfsroot), so there is no $config_eth0 set (it is set to "null"). Of course, openrc *could* generate one (instead of doing subsume) but then that sort of achieves the same thing, with the following drawbacks:
   1A) falsely leaves the impression that it is a permanent, statically defined IP address instead of a dynamic one
   1B) on many diskless nodes /etc/ is mounted from nfs and is read-only
   1C) a subsume-style interface specifier would still be required

> I do in generate think that the subsume concept is good, but your
> config_bond0 DHCP should be sending an DHCP INFORM, rather than doing the
> DISCOVER/REQUEST cycle. That would also help prevent the address for being
> dropped.

True, though from memory:
 1. Some(?) DHCP daemons or clients will ARP an address before re-issue
 2. Most(?) DHCP daemons will re-issue the same IP to a given MAC if there's one still leased.

Therefore this issue probably only affects a minority of deployments. (FWIW, it's definitely irrelevant for my deployment, which has ample addresses, and doesn't care about the IP address changing before services are started a few layers up...)

Not quite sure what you're asking here - is there any existing openrc way to convince the various DHCP daemons to DHCP INFORM in a daemon-neutral way? Or are you suggesting that I try to write one?  I don't really have so much time to delve right now, unfortunately.
Comment 10 Walter 2012-07-31 07:44:48 UTC
Hang on - it hangs when using sysfs to manipulate the interfaces, so the caching idea goes out the window.

I will try hacking my /etc/conf.d/net to test what you requested, though I don't think it's a good path to take for the reasons outlined above.
Comment 11 Walter 2012-07-31 08:06:17 UTC
I added some debug code right after the line inside _delete_addresses() in ifconfig.sh reading:

 ifconfig "${IFACE}" 0.0.0.0 || break
 einfo "made it past ifconfig"

... system dies before it gets to einfo.

If I comment out the call to _delete_addresses() and allow the _down() call, then exactly the same behaviour (death before executing another line) is observed.

It looks like I absolutely have to disable both.
Comment 12 Walter 2012-07-31 08:21:57 UTC
If I disable both, and test your config_bond0="{$config_eth0}" idea, where the config is artificially constructed from the DHCP-supplied IP the kernel autoconfigured with at boot, no change is observed vs. using the subsume patch.

Which is to say, since I know the IP that will be assigned, I set it in the /etc/conf.d/net file exported to the single test node presently booting from the NFS server.

Not sure what the point of that test was, to be honest, but there you go :)

So... I will resubmit the patch with the _delete_addresses() and _down() calls disabled only when subsume_${IFACE} is specified.  I hope this is acceptable.
Comment 13 Walter 2012-07-31 08:39:56 UTC
Created attachment 319828 [details, diff]
support subsuming kernel autoconfigured interface + warn on nfs root (v2)

patch v2 - only skips interface address removal and downing if subsume selected. also forcibly cleans subsumed interface's old IP, fixed spacing & minor changes.
Comment 14 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2012-07-31 16:31:05 UTC
dhcp client survey:
dhcpcd: DISCOVER, INFORM, explicit REQUEST
dhclient: DISCOVER
udhcpc (busybox): DISCOVER, explicit REQUEST
Comment 15 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2012-07-31 16:37:41 UTC
Things that need to be looked at:
1. Hardcoding ifconfig instead of iproute2 concerns me.
2. What do we do if the interface being subsumed has multiple addresses?
3. The IPv6 autoconfig addresses based on EUI64 should NOT be subsumed, since they might not be on the same interface anymore after bonding, but other addresses should. Where does this leave IPv6 NFS-boot?
4. In your case, we temporarily have the same IP on two interfaces in a system, is there any way we can ensure the wrong one isn't used.
4.1. I'm concerned about lockups in case when IP is on two interfaces, that routing picks the wrong interface and you cause NFS to lock up.
5. As a general case of subsume, would it be useful to have a function in conf.d/net that can build the variables from current settings?

Minor nits:
Please use unified diffs. You may find it very useful to just checkout the openrc git repo (see http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git), and have your own local branch of that to generate diffs.
Comment 16 Walter 2012-07-31 23:09:54 UTC
(In reply to comment #15)
> Things that need to be looked at:
> 1. Hardcoding ifconfig instead of iproute2 concerns me.

Oh, that's probably just my not being familiar with the codebase.  If there is a neutral way to both (1) set an address and (2) mark the interface as up that is already built in to openrc, I'm all for using it.  Please point me in the right direction!

> 2. What do we do if the interface being subsumed has multiple addresses?

They are lost but could theoretically all be collected and recreated, though I don't see this as a likely use case.  I don't know how to do that within openrc at present.

There is a function _get_addresses() but I was unsure how to apply them once received, so didn't use it and instead used _get_address().  I did look for a _set_address() but didn't see it, so just used 'ifconfig' as per ifconfig.sh.  

If openrc has some iproute2.sh/ifconfig.sh-neutral way to do all this and support multiple addresses, I agree that would be preferable.

Note that, in addition to multiple addresses, a potential problem exists here for multiple routes (eg: gateways access via the subsumed interface's existing local network).  I have not tested this scenario, perhaps they do migrate gracefully?

> 3. The IPv6 autoconfig addresses based on EUI64 should NOT be subsumed,
> since they might not be on the same interface anymore after bonding, but
> other addresses should. Where does this leave IPv6 NFS-boot?

Actually, bonding in the default mode (active-backup) does explicitly set the same MAC across all enslaved interfaces that the first or 'primary slave' had, so perhaps this is a non-issue.  Non-default configurations are many and varied and openrc should in my view facilitate and stay out of the way rather than attempt to second-guess every potential configuration.

> 4. In your case, we temporarily have the same IP on two interfaces in a
> system, is there any way we can ensure the wrong one isn't used.

Explicit route metrics, perhaps.  However, I didn't see any unified route query and manipulation mechanism built in to openrc, and the right thing seems to happen with the current process, so I decided this was a bit too window-dressing to bother thinking about.  If you think it's worth doing explicitly, that's fine.  My philosophy is generally one of minimalism and laziness - "don't fix what aint broke". Just as we have discovered that the apparent bug workaround (despite no comments to that effect in the file) downing the slave interfaces before bringing up the bonded interface is extra code that was perceived as helpful but has actually broken things for unconsidered use cases, so too adding more code here is potentially going to cause problems for someone.

> 4.1. I'm concerned about lockups in case when IP is on two interfaces, that
> routing picks the wrong interface and you cause NFS to lock up.

It seems to work fine right now. Perhaps the kernel itself marks autoconfigured interface routes as somehow poorer metric than manually configured ones post-boot, or some other magic. I am not the sort to look it up explicitly - "don't fix what aint broke". Anyway, the present method seems totally reliable on gentoo's 3.2.16-hardened.

In the normal case of subsuming the primary slave, which has been kernel autoconfigured, remember that exactly the same packet is leaving the subsumed interface with the same layer2 and layer3 identifiers both before and after the bonding interface is initialized. Therefore using the 'wrong' interface probably matters less.

I am less convinced about the inverse of this transition - ie. moving back from bonding to explicit single interfaces.  Interface _down() behaviour is possibly worth looking at, though this is not relevant for my deployment and probably broadly less useful than under most configurations.

> 5. As a general case of subsume, would it be useful to have a function in
> conf.d/net that can build the variables from current settings?

Not sure what you mean here.

I guess you are saying "how about having something like: 'config_bond0=subsume("eth0")'?" (I suppose with the idea that then all other openrc code could magically function unchanged against the interface)

This doesn't work because, if openrc seems the configuration as a standard configuration, then the default _delete_addresses and _down process on the slave interfaces will occur, crashing NFS root systems. The input it needs to avoid a crash is simply a single interface to subsume, which in my understanding of /etc/conf.d/net is best served with a 'keyword_${IFACE}=' style specifier, as per the current suggestion.

I did think about, instead, having a flag to subsume without necessarily specifying a specific interface, wherein some form of autodetection or default (eg. first slave, try to autodetect primary slave (maybe possible via sysfs? usually specified as a kernel/module argument) could be used.  However, I decided that given the type of configuration to which subsume was relevant and the fact that there's probably some use case out there for which the primary slave was not the correct interface to subsume, then it is probably more appropriate to simply issue a warning where problems may occur and provide only an explicit subsume interface specifier.

> Minor nits:
> Please use unified diffs. You may find it very useful to just checkout the
> openrc git repo (see
> http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git), and have your own
> local branch of that to generate diffs.

OK, checked out via git://git.overlays.gentoo.org/proj/openrc.git

So, as far as I can see, the outstanding items stand as follows:

 0. DHCP INFORM: Seems dead on arrival as only one client daemon implements it and there appears to be no existing openrc support for requesting this behaviour in a daemon-neutral way.

 1 and 2. ifconfig/iproute2-neutrality AND multiple addresses: Unsure if or how openrc supports setting addresses in an ifconfig/iproute2 neutral way and whether setting multiple addresses is even possible in a single call.  As someone who is more experienced with openrc, please advise. I cannot afford the time right now, but would be happy to make modifications based upon your comments/findings here.

 3. IPv6 NFS root: EUI64 address issue raised is actually a non-issue under default bonding driver mode ('active-backup').

 4. IP temporarily on dual interfaces: Metrics could be possible to be more explicit about routing preferences during transition, however I am opposed to adding code here that doesn't fix anything, citing the _down and _delete_address example as a notional bugfix (not commented as such, mind you) that has later gone awry.

 5. Useful to have "function in conf.d/net that can build the variables from current settings?" Not sure I understand the suggestion and/or reasoning but based on what I assume you meant, apparently not.
Comment 17 Walter 2012-08-01 10:19:28 UTC
Created attachment 319964 [details, diff]
support subsuming kernel autoconfigured interface + warn on nfs root (v3)

same as v2 patch but regenerated with more context
Comment 18 Walter 2012-08-02 03:13:00 UTC
Also requested pull to funtoo @ https://github.com/funtoo/openrc/pull/1
Comment 19 Walter 2012-09-26 19:34:09 UTC
*nudge*
Comment 20 William Hubbs gentoo-dev 2012-10-11 01:39:47 UTC
@robbat2:
What is the status of this bug?

Thanks,

William
Comment 21 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2012-10-11 04:00:32 UTC
Walter:
why was your git tree missing two very important commits to bonding.sh?
fb00b10669a0b191ba0467f035d45b77bedd9f09
dd45506a40f3332cef96cc8ca7e258561a200ec0
Which dealt with downing the slaves & which interface the commits got added to.

Your '# Wipe subsumed interface' block in the git tree was also right in the MIDDLE of a multi-line sed command.

It's very difficult to do a native git pull from github. The attached patch is better.

You added an 'eoutdent' line for a block that was supposed to indented still, so I removed that as well.

Other than that, it boots my box fine in a non-subsume config. I don't have a box suitable to test that the subsume side works, but I'll take your word for it until I get bug reports that you've broken bonding ;-).
Comment 22 Walter 2012-10-11 04:10:50 UTC
> why was your git tree missing two very important commits to bonding.sh?
> fb00b10669a0b191ba0467f035d45b77bedd9f09
> dd45506a40f3332cef96cc8ca7e258561a200ec0
> Which dealt with downing the slaves & which interface the commits got added to.

No idea, AFAIK I just did a straight git pull on the day in question, that is all... no funny business!

> Your '# Wipe subsumed interface' block in the git tree was also right
> in the MIDDLE of a multi-line sed command.
> [... snip ...]
> You added an 'eoutdent' line for a block that was supposed to indented
> still, so I removed that as well.

Sounds like something went horribly wrong. Don't know how, don't know why. But it sounds like you've fixed it. Thanks for that.

> Other than that, it boots my box fine in a non-subsume config. I
> don't have a box suitable to test that the subsume side works, but
> I'll take your word for it until I get bug reports that you've
> broken bonding ;-).

Lovely.
Comment 23 William Hubbs gentoo-dev 2012-10-11 18:03:27 UTC
Walter,

I have a couple of concerns about this patch.

First, there is no documentation in net.example.Linux.in showing how to
use it. Can you please attach a patch that provides this?

Also, I am concerned about the use of ifconfig. I would rather see the
ifconfig commands in the patch replaced with the iproute2 equivalents,
because that is where the development of network functionality is going
in Linux these days.

What are your thoughts?
Comment 24 Walter 2012-10-11 21:48:22 UTC
About iproute2 ... the problem is, it's simply not tested with iproute2.

I am happy to do that, but can't do it immediately.

On the matter of documentation, it really is straightforward, so if you really want something in the net file I'll come back with that patch once I get time to look at iproute2.  However, documentation basically appears automatically in the case where someone requires it (ie. NFS root + bonding = fail), so it could be considered intrinsically documented for the relevant use case already.
Comment 25 William Hubbs gentoo-dev 2012-10-11 23:00:55 UTC
(In reply to comment #24)
> On the matter of documentation, it really is straightforward, so if you
> really want something in the net file I'll come back with that patch once I
> get time to look at iproute2.  However, documentation basically appears
> automatically in the case where someone requires it (ie. NFS root + bonding
> = fail), so it could be considered intrinsically documented for the relevant
> use case already.

Depending on an error message for documentation is a bug. It means that the user will have to set things up wrong first to see the documentation, and it particularly doesn't work well for someone administering a remote box. That is why we need a quick explanation in the net.example.Linux.in file. I would put it somewhere in the section about bonding.
If it is really quick, you can just write it here in the bug and I'll put it in the documentation and give you credit for it.

Thanks,

William
Comment 26 Walter 2012-10-11 23:16:23 UTC
# Bonding subsume support (prevents crashes for root-on-NFS)
#  - Only tested in the default bonding mode ('active-backup') with IPv4
#  - Only subsumes basic interface characteristics (IP, netmask) and
#    excludes additional routes, interface properties such as MTU,
#    interface-associated netfilter rules, etc.
# In the example below, the (usually kernel-autoconfigured)
# 'eth0' interface is a member of bond0, which subsumes the
# existing interface configuration without upsetting NFS.
#slaves_bond0="eth0 eth1"
#subsume_bond0="eth0"
Comment 27 William Hubbs gentoo-dev 2012-10-16 21:42:45 UTC
The documentation has been added as commit 1e7c696.
Since the initial support is in, I am going to go ahead and close this
bug. However, can you please open a bug and submit a patch that converts
this code to iproute2?