I have a 20 node cluster running Gentoo (amd64 arch, custom kernel) with 100% nfs root nodes. For high availability at layer2, I am using the bonding driver. Each node has eth0, eth1. These merge to bond0 in active-backup mode. I can boot the systems fine with nfsroot, and if manually executing the following commands all goes well: vconfig add bond0 51 ifconfig bond0 x.x.x.x netmask y.y.y.y ping x.x.x.y (Where x.x.x.x is the IP address already assigned to eth0 by kernel-level autoconfiguration, over which the nfsroot is mounted on the port's native VLAN) This clears the eth0 config, puts up eth1, creates and configures bond0, and failover and everything then works fine between the two physical interfaces. Unfortunately, when I rewrite the above configuration in /etc/conf.d/net as follows: config_eth0="null" config_eth1="null" slaves_both0="eth0 eth1" config_bond0="dhcp" # DHCP on native VLAN should return same as kernel autoconfigured IP, already present on eth0, but this could be null or manual - it doesn't get that far. config_vlan51="x.x.x.x netmask y.y.y.y" I can boot normally with /etc/init.d/net.eth0 and net.eth1 in the default runlevel, but if I add net.bond0 the system crashes on boot right after displaying the 'Removing addresses...' status message found in _delete_addresses() in openrc's net/ifconfig.sh.Linux.in file, which is installed to /lib/rc/net/ifconfig.sh I suppose this is because in removing the address on eth0, openrc renders its own root nfs server unavailable. The manual sequence of commands does not have this problem. I cannot find a way or any documentation on a way to disable _delete_addresses() in openrc. Could you please either: A) tell me how to hack openrc to avoid this step on a specific interface B) add detection of nfsroot-critical interface configurations and avoid 'Removing addresses...' on these C) avoid calling 'Removing addresses...' code on bondX interfaces (the bonding driver seems to handle all this itself, anyway) D) add a feature that lets users remove this themselves on a per-interface basis Very urgent for me, $$$ of hardware sitting idle. Thanks!
Sorry, manual lines are: ifconfig bond x.x.x.x netmask y.y.y.y ifenslave bond0 eth0 ifenslave bon0 eth1 ping x.x.x.y
Created attachment 319672 [details] screenshot of fail this one run manually from prompt after booting (removed net.bond0 from default runlevel via chroot to filesystem on nfs boot server), but exactly the same behaviour as this is observed on boot if net.bond0 is in the default runlevel.
Created attachment 319674 [details] screenshot of fail at boot time
# cat /diskless/core-setup/fs/etc/conf.d/net #------- raw physical port config --- config_eth0="null" config_eth1="null" #------- bond0 bonded interface ----- # enslave which? slaves_bond0="eth0 eth1" # ip config config_bond0="dhcp" #------- vlans on bond0 ------------- # bond0 vlans # - specify vlan ids vlans_bond0="51 52" # - specify naming convention ('bond0.51'...) vconfig_bond0="set_name_type DEV_PLUS_VID_NO_PAD" # vlan #51 config # - set_flag 1 = tcpdump and similar show no vlan headers vconfig_vlan51="set_flag 1" # - ip config config_vlan51="10.0.51.101 netmask 255.255.255.0"
OK, so I tried to move forward alone. First I attempted an upgrade to openrc-0.10.5 - no difference in behaviour. Then with hyper-advanced greppology I located an explicit call to _delete_addresses in bonding.sh with an apparently spurious comment and removed it... # Must force the slaves to a particular state before adding them for IFACE in ${slaves}; do _delete_addresses _down done ) Unfortunately, later in the process there were issues as well. Specifically, the immediately subsequent part where slaves are added to the interface, either by sysfs (openrc's preference) or by ifenslave. My working command line process was: ifconfig bond0 x.x.x.x netmask y.y.y.y ifenslave bond0 eth0 ifenslave bond0 eth1 The command line process preconfigures the bond0 interface to the same IP address as the primary slave interface (eth0). I guess this happens while it is down, before putting it up, and before enslaving the existing interface. This apparently means that the route could switch over without affecting nfsroot. I replaced the call to '_up' in bonding.sh with: 'ifconfig bond0 <eth0-ip> netmask <eth0-netmask> up' Then I noticed that while forcing the bonding.sh code to use ifenslave, I could have 'dhcp' in the /etc/conf.d/net for bond0 and boot worked fine, whereas if I let it use sysfs, boot would fail. Setting config_bond0 to null resolved this. In short, I got it working for me, but your code is broken for bonding driver + nfsroot users at present and should be resolved.
Created attachment 319804 [details, diff] support subsuming kernel autoconfigured interface + warn on nfs root The patch implements a new /etc/conf.d/net option, 'subsume_${IFACE}="<interface>"', for bonding interfaces. For example, 'subsume_bond0="eth0"'. This option copies (only) basic interface characteristics (IPv4 address, netmask) from the subsumed interface. Note that it does not attempt to duplicate additional routes or other interface characteristics. It has not been tested on IPv6, FreeBSD, or NetBSD. However, an NFS-root warning is added to advertise the feature where most relevant. This feature has been carefully developed using openrc's internal 'mountinfo' command ('mountinfo -s /' == 'nfs'?) in order to preserve portability, since FreeBSD's laggX interfaces and NetBSD's agr interfaces may also use (now or in future) the openrc bonding code. In order to function, the patch also removes the downing of interfaces prior to setup, which breaks things for me and does not appear to be necessary in any of my own testing. A more conservative change would simply ignore this in the case that the 'subsume' option was in use, but I do think it's safe to apply globally. (The history of the interface downing code snippet seems to be Roy remembering it as needed 2+ years ago @ http://dev.gentoo.org/~vapier/openrc/projects/openrc/ticket/105.html)
Created attachment 319806 [details] screenshot of example nfs warning applied before likely crash
Removing that _down is going to be problematic, and hardware dependent. Specifically, some NICs cannot join a bond while they are up. There is no way to test either, forcing the NIC down was the most reliable way. Can you please test in your nfsroot environment, explicitly doing config_bond0="$config_eth0" or whatever IP is meant for a node. I do in generate think that the subsume concept is good, but your config_bond0 DHCP should be sending an DHCP INFORM, rather than doing the DISCOVER/REQUEST cycle. That would also help prevent the address for being dropped.
> Removing that _down is going to be problematic, and hardware dependent. > Specifically, some NICs cannot join a bond while they are up. There is no > way to test either, forcing the NIC down was the most reliable way. Maybe it's possible to work around this with something like: # first kick relevant code and libs to be cached locally # (incidentally, i hear busybox has one built-in? unsure if # gentoo comples it in default or not, i suspect not) ifconfig --help 1>/dev/null 2>/dev/null ifenslave 1>/dev/null 2>/dev/null # down other interfaces here # subsume existing ip here and force bonded interface up Let me give that a try and post back shortly. > Can you please test in your nfsroot environment, explicitly doing > config_bond0="$config_eth0" or whatever IP is meant for a node. There are some issues with that. 1. The eth0 interface is kernel autoconfigured (diskless node / nfsroot), so there is no $config_eth0 set (it is set to "null"). Of course, openrc *could* generate one (instead of doing subsume) but then that sort of achieves the same thing, with the following drawbacks: 1A) falsely leaves the impression that it is a permanent, statically defined IP address instead of a dynamic one 1B) on many diskless nodes /etc/ is mounted from nfs and is read-only 1C) a subsume-style interface specifier would still be required > I do in generate think that the subsume concept is good, but your > config_bond0 DHCP should be sending an DHCP INFORM, rather than doing the > DISCOVER/REQUEST cycle. That would also help prevent the address for being > dropped. True, though from memory: 1. Some(?) DHCP daemons or clients will ARP an address before re-issue 2. Most(?) DHCP daemons will re-issue the same IP to a given MAC if there's one still leased. Therefore this issue probably only affects a minority of deployments. (FWIW, it's definitely irrelevant for my deployment, which has ample addresses, and doesn't care about the IP address changing before services are started a few layers up...) Not quite sure what you're asking here - is there any existing openrc way to convince the various DHCP daemons to DHCP INFORM in a daemon-neutral way? Or are you suggesting that I try to write one? I don't really have so much time to delve right now, unfortunately.
Hang on - it hangs when using sysfs to manipulate the interfaces, so the caching idea goes out the window. I will try hacking my /etc/conf.d/net to test what you requested, though I don't think it's a good path to take for the reasons outlined above.
I added some debug code right after the line inside _delete_addresses() in ifconfig.sh reading: ifconfig "${IFACE}" 0.0.0.0 || break einfo "made it past ifconfig" ... system dies before it gets to einfo. If I comment out the call to _delete_addresses() and allow the _down() call, then exactly the same behaviour (death before executing another line) is observed. It looks like I absolutely have to disable both.
If I disable both, and test your config_bond0="{$config_eth0}" idea, where the config is artificially constructed from the DHCP-supplied IP the kernel autoconfigured with at boot, no change is observed vs. using the subsume patch. Which is to say, since I know the IP that will be assigned, I set it in the /etc/conf.d/net file exported to the single test node presently booting from the NFS server. Not sure what the point of that test was, to be honest, but there you go :) So... I will resubmit the patch with the _delete_addresses() and _down() calls disabled only when subsume_${IFACE} is specified. I hope this is acceptable.
Created attachment 319828 [details, diff] support subsuming kernel autoconfigured interface + warn on nfs root (v2) patch v2 - only skips interface address removal and downing if subsume selected. also forcibly cleans subsumed interface's old IP, fixed spacing & minor changes.
dhcp client survey: dhcpcd: DISCOVER, INFORM, explicit REQUEST dhclient: DISCOVER udhcpc (busybox): DISCOVER, explicit REQUEST
Things that need to be looked at: 1. Hardcoding ifconfig instead of iproute2 concerns me. 2. What do we do if the interface being subsumed has multiple addresses? 3. The IPv6 autoconfig addresses based on EUI64 should NOT be subsumed, since they might not be on the same interface anymore after bonding, but other addresses should. Where does this leave IPv6 NFS-boot? 4. In your case, we temporarily have the same IP on two interfaces in a system, is there any way we can ensure the wrong one isn't used. 4.1. I'm concerned about lockups in case when IP is on two interfaces, that routing picks the wrong interface and you cause NFS to lock up. 5. As a general case of subsume, would it be useful to have a function in conf.d/net that can build the variables from current settings? Minor nits: Please use unified diffs. You may find it very useful to just checkout the openrc git repo (see http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git), and have your own local branch of that to generate diffs.
(In reply to comment #15) > Things that need to be looked at: > 1. Hardcoding ifconfig instead of iproute2 concerns me. Oh, that's probably just my not being familiar with the codebase. If there is a neutral way to both (1) set an address and (2) mark the interface as up that is already built in to openrc, I'm all for using it. Please point me in the right direction! > 2. What do we do if the interface being subsumed has multiple addresses? They are lost but could theoretically all be collected and recreated, though I don't see this as a likely use case. I don't know how to do that within openrc at present. There is a function _get_addresses() but I was unsure how to apply them once received, so didn't use it and instead used _get_address(). I did look for a _set_address() but didn't see it, so just used 'ifconfig' as per ifconfig.sh. If openrc has some iproute2.sh/ifconfig.sh-neutral way to do all this and support multiple addresses, I agree that would be preferable. Note that, in addition to multiple addresses, a potential problem exists here for multiple routes (eg: gateways access via the subsumed interface's existing local network). I have not tested this scenario, perhaps they do migrate gracefully? > 3. The IPv6 autoconfig addresses based on EUI64 should NOT be subsumed, > since they might not be on the same interface anymore after bonding, but > other addresses should. Where does this leave IPv6 NFS-boot? Actually, bonding in the default mode (active-backup) does explicitly set the same MAC across all enslaved interfaces that the first or 'primary slave' had, so perhaps this is a non-issue. Non-default configurations are many and varied and openrc should in my view facilitate and stay out of the way rather than attempt to second-guess every potential configuration. > 4. In your case, we temporarily have the same IP on two interfaces in a > system, is there any way we can ensure the wrong one isn't used. Explicit route metrics, perhaps. However, I didn't see any unified route query and manipulation mechanism built in to openrc, and the right thing seems to happen with the current process, so I decided this was a bit too window-dressing to bother thinking about. If you think it's worth doing explicitly, that's fine. My philosophy is generally one of minimalism and laziness - "don't fix what aint broke". Just as we have discovered that the apparent bug workaround (despite no comments to that effect in the file) downing the slave interfaces before bringing up the bonded interface is extra code that was perceived as helpful but has actually broken things for unconsidered use cases, so too adding more code here is potentially going to cause problems for someone. > 4.1. I'm concerned about lockups in case when IP is on two interfaces, that > routing picks the wrong interface and you cause NFS to lock up. It seems to work fine right now. Perhaps the kernel itself marks autoconfigured interface routes as somehow poorer metric than manually configured ones post-boot, or some other magic. I am not the sort to look it up explicitly - "don't fix what aint broke". Anyway, the present method seems totally reliable on gentoo's 3.2.16-hardened. In the normal case of subsuming the primary slave, which has been kernel autoconfigured, remember that exactly the same packet is leaving the subsumed interface with the same layer2 and layer3 identifiers both before and after the bonding interface is initialized. Therefore using the 'wrong' interface probably matters less. I am less convinced about the inverse of this transition - ie. moving back from bonding to explicit single interfaces. Interface _down() behaviour is possibly worth looking at, though this is not relevant for my deployment and probably broadly less useful than under most configurations. > 5. As a general case of subsume, would it be useful to have a function in > conf.d/net that can build the variables from current settings? Not sure what you mean here. I guess you are saying "how about having something like: 'config_bond0=subsume("eth0")'?" (I suppose with the idea that then all other openrc code could magically function unchanged against the interface) This doesn't work because, if openrc seems the configuration as a standard configuration, then the default _delete_addresses and _down process on the slave interfaces will occur, crashing NFS root systems. The input it needs to avoid a crash is simply a single interface to subsume, which in my understanding of /etc/conf.d/net is best served with a 'keyword_${IFACE}=' style specifier, as per the current suggestion. I did think about, instead, having a flag to subsume without necessarily specifying a specific interface, wherein some form of autodetection or default (eg. first slave, try to autodetect primary slave (maybe possible via sysfs? usually specified as a kernel/module argument) could be used. However, I decided that given the type of configuration to which subsume was relevant and the fact that there's probably some use case out there for which the primary slave was not the correct interface to subsume, then it is probably more appropriate to simply issue a warning where problems may occur and provide only an explicit subsume interface specifier. > Minor nits: > Please use unified diffs. You may find it very useful to just checkout the > openrc git repo (see > http://git.overlays.gentoo.org/gitweb/?p=proj/openrc.git), and have your own > local branch of that to generate diffs. OK, checked out via git://git.overlays.gentoo.org/proj/openrc.git So, as far as I can see, the outstanding items stand as follows: 0. DHCP INFORM: Seems dead on arrival as only one client daemon implements it and there appears to be no existing openrc support for requesting this behaviour in a daemon-neutral way. 1 and 2. ifconfig/iproute2-neutrality AND multiple addresses: Unsure if or how openrc supports setting addresses in an ifconfig/iproute2 neutral way and whether setting multiple addresses is even possible in a single call. As someone who is more experienced with openrc, please advise. I cannot afford the time right now, but would be happy to make modifications based upon your comments/findings here. 3. IPv6 NFS root: EUI64 address issue raised is actually a non-issue under default bonding driver mode ('active-backup'). 4. IP temporarily on dual interfaces: Metrics could be possible to be more explicit about routing preferences during transition, however I am opposed to adding code here that doesn't fix anything, citing the _down and _delete_address example as a notional bugfix (not commented as such, mind you) that has later gone awry. 5. Useful to have "function in conf.d/net that can build the variables from current settings?" Not sure I understand the suggestion and/or reasoning but based on what I assume you meant, apparently not.
Created attachment 319964 [details, diff] support subsuming kernel autoconfigured interface + warn on nfs root (v3) same as v2 patch but regenerated with more context
Also requested pull to funtoo @ https://github.com/funtoo/openrc/pull/1
*nudge*
@robbat2: What is the status of this bug? Thanks, William
Walter: why was your git tree missing two very important commits to bonding.sh? fb00b10669a0b191ba0467f035d45b77bedd9f09 dd45506a40f3332cef96cc8ca7e258561a200ec0 Which dealt with downing the slaves & which interface the commits got added to. Your '# Wipe subsumed interface' block in the git tree was also right in the MIDDLE of a multi-line sed command. It's very difficult to do a native git pull from github. The attached patch is better. You added an 'eoutdent' line for a block that was supposed to indented still, so I removed that as well. Other than that, it boots my box fine in a non-subsume config. I don't have a box suitable to test that the subsume side works, but I'll take your word for it until I get bug reports that you've broken bonding ;-).
> why was your git tree missing two very important commits to bonding.sh? > fb00b10669a0b191ba0467f035d45b77bedd9f09 > dd45506a40f3332cef96cc8ca7e258561a200ec0 > Which dealt with downing the slaves & which interface the commits got added to. No idea, AFAIK I just did a straight git pull on the day in question, that is all... no funny business! > Your '# Wipe subsumed interface' block in the git tree was also right > in the MIDDLE of a multi-line sed command. > [... snip ...] > You added an 'eoutdent' line for a block that was supposed to indented > still, so I removed that as well. Sounds like something went horribly wrong. Don't know how, don't know why. But it sounds like you've fixed it. Thanks for that. > Other than that, it boots my box fine in a non-subsume config. I > don't have a box suitable to test that the subsume side works, but > I'll take your word for it until I get bug reports that you've > broken bonding ;-). Lovely.
Walter, I have a couple of concerns about this patch. First, there is no documentation in net.example.Linux.in showing how to use it. Can you please attach a patch that provides this? Also, I am concerned about the use of ifconfig. I would rather see the ifconfig commands in the patch replaced with the iproute2 equivalents, because that is where the development of network functionality is going in Linux these days. What are your thoughts?
About iproute2 ... the problem is, it's simply not tested with iproute2. I am happy to do that, but can't do it immediately. On the matter of documentation, it really is straightforward, so if you really want something in the net file I'll come back with that patch once I get time to look at iproute2. However, documentation basically appears automatically in the case where someone requires it (ie. NFS root + bonding = fail), so it could be considered intrinsically documented for the relevant use case already.
(In reply to comment #24) > On the matter of documentation, it really is straightforward, so if you > really want something in the net file I'll come back with that patch once I > get time to look at iproute2. However, documentation basically appears > automatically in the case where someone requires it (ie. NFS root + bonding > = fail), so it could be considered intrinsically documented for the relevant > use case already. Depending on an error message for documentation is a bug. It means that the user will have to set things up wrong first to see the documentation, and it particularly doesn't work well for someone administering a remote box. That is why we need a quick explanation in the net.example.Linux.in file. I would put it somewhere in the section about bonding. If it is really quick, you can just write it here in the bug and I'll put it in the documentation and give you credit for it. Thanks, William
# Bonding subsume support (prevents crashes for root-on-NFS) # - Only tested in the default bonding mode ('active-backup') with IPv4 # - Only subsumes basic interface characteristics (IP, netmask) and # excludes additional routes, interface properties such as MTU, # interface-associated netfilter rules, etc. # In the example below, the (usually kernel-autoconfigured) # 'eth0' interface is a member of bond0, which subsumes the # existing interface configuration without upsetting NFS. #slaves_bond0="eth0 eth1" #subsume_bond0="eth0"
The documentation has been added as commit 1e7c696. Since the initial support is in, I am going to go ahead and close this bug. However, can you please open a bug and submit a patch that converts this code to iproute2?