I have a multi-node cluster that hosts virtual machines (at the moment using libvirt). In case your wondering, this is the project here: https://hackaday.io/project/10529-solar-powered-cloud-computing I'm using `netifrc` to: 1. set up a bonded network interface with two GbE interfaces for the trunk link to the outside world. 2. set up VLAN ports on that bonded interface for all the VLANs (about 73 of them at present) 3. set up bridges for each VLAN libvirt then makes use of the bridges to connect VM instances to the outside world. This works pretty well, except for one problem. A couple of those instances happen to be routers (OpenBSD VMs), with IPv6 route advertisements enabled. This fundamentally means that the VM host, *can hear* the IPv6 RAs from the guest. I however DO NOT WANT, the VM host, to act on those RAs. They have a fixed IPv4 and IPv6 address, and a fixed default route, that they are to use. Bear in mind that I do not wish to disable IPv6, as I often have to manage them over an IPv6-only VPN tunnel. To counteract this, I have tried the following in `/etc/sysctl.conf`: net.ipv6.conf.all.autoconf = 0 net.ipv6.conf.all.accept_ra = 0 net.ipv6.conf.all.accept_dad = 0 My `/etc/conf.d/net` config file looks like this: ############################################################################## # Private VLANs config_stuartl="null" config_bond0_8="null" rc_net_stuartl_need="net.bond0.8" config_david="null" config_bond0_10="null" rc_net_david_need="net.bond0.10" config_guest="null" config_bond0_5="null" rc_net_guest_need="net.bond0.5" # Instance VLANs config_vlan128="null" config_bond0_128="null" rc_net_vlan128_need="net.bond0.128" config_vlan129="null" config_bond0_129="null" rc_net_vlan129_need="net.bond0.129" config_vlan130="null" config_bond0_130="null" rc_net_vlan130_need="net.bond0.130" config_vlan131="null" # … # snip a lot of VLANs # … # DMZ VLAN config_dmz="null" config_bond0_249="null" rc_net_dmz_need="net.bond0.249" # Host management VLAN config_hostmgmt=" 10.20.22.3/24 2001:db8:123:fc::3/64 " config_bond0_252="null" rc_net_hostmgmt_need="net.bond0.252" # Virtual instance management config_virtmgmt=" 10.20.20.3/24 2001:db8:123:fa::3/64 " config_bond0_250="null" rc_net_virtmgmt_need="net.bond0.250" # Storage public interface VLAN config_storagepub=" 10.20.23.3/24 2001:db8:123:fd::3/64 " config_bond1_253="null" rc_net_storagepub_need="net.bond1.253" routes_hostmgmt=" default via 10.20.22.254 default via 2001:db8:123:fc::fe " Whatever I do though, I observe two things: 1. netifrc, *always*, pauses for 5 seconds for each interface with "Waiting for IPv6 addresses (5 seconds)". On my desktop, sure, do wait for SLAAC, but on this box… 5 * 73 seconds… think about how long that takes… and for what benefit? 2. After a while, the host hears the RAs on the guest, and decides it wants in on the fun. Upshot: if I try to SSH to the host via IPv6, it tries to reply with its own made-up IP address, not the one I'm trying to reach it by. I can only restore comms by SSHing in via IPv4, and running a `ip -6 route flush` / `ip -6 addr flush` on the offending interfaces. I would expect config_INT="null" to mean, "no configuration". Not even SLAAC. In the interests of preserving existing behaviour for others, I can live with having a flag somewhere to disable SLAAC on those interfaces. Is there a way to have netifrc just bring the interface up, not wait for an IPv6 address and just leave it "unconfigured"?
1. "Waiting for IPv6 addresses": It's NOT waiting for SLAAC, it's waiting for tentative address resolution. I'll improve the message to clarify that. This codepath is: If there is a tentative address, wait for $dad_timeout seconds. It's possible for link-local addresses to run the DAD algorithm as well, which is why you usually want it. Set "dad_timeout=0" or "dad_timeout_bond0=0" etc. I'm wondering why your kernel reports a tentative address despite "net.ipv6.conf.all.accept_dad=0". See _iproute2_ipv6_tentative in netifrc iproute2. 2. It's NOT netifrc that does SLAAC. SLAAC is 100% kernel-space. "null" just brings up the interface with no netifrc configuration. By the sounds it it, one of your interfaces is not getting accept_ra=0. When the problem happens, can you dump all of your non-zero ipv6 sysctls? # sudo sysctl -a |sed -rn '/= 0$/d; /ipv6.*(accept_ra|autoconf|accept_dad)/p' netifrc doesn't touch any of the IPv6 sysctls on it's own, but I have seen the kernel get the wrong value on it for interfaces created BEFORE net.ipv6.conf.all.accept_ra was set. 3. Separately, I think this is a very good use case for openvswitch, because you don't generally want the hypervisors to participate in IPv4/IPv6 traffic with the guests at all, and it's a LOT cleaner to do bonded trunk w/ vlan to instance in OpenVswitch. libvirt even provides very good support for it.
redhatter: ping
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/netifrc.git/commit/?id=5496033be61f97755627ba1da45421a2a635c09e commit 5496033be61f97755627ba1da45421a2a635c09e Author: Robin H. Johnson <robbat2@gentoo.org> AuthorDate: 2017-11-14 20:41:27 +0000 Commit: Robin H. Johnson <robbat2@gentoo.org> CommitDate: 2017-11-14 20:41:27 +0000 net/iproute2: clarify "waiting for IPv6 addresses" Per bug 636846, a user thought that "waiting for IPv6 addresses" was SLAAC (Stateless Autoconfiguration). In Linux, SLAAC is entirely kernel-side, and the waiting is actually for DAD (duplicate address detection) on link-local IPv6 addresses. - Improve the message to include both DAD & tentative. - If --verbose is used, print the tentative addresses. If either of the accept_dad sysctls are set to zero, then the kernel should NOT mark any addresses as tentative. - net.ipv6.conf.all.accept_dad=0 - net.ipv6.conf.$IFACE.accept_dad=0 Bug: https://bugs.gentoo.org/636846 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> net/iproute2.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)}
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=56c4748dce0925382fc68822034d76f1634348c7 commit 56c4748dce0925382fc68822034d76f1634348c7 Author: Robin H. Johnson <robbat2@gentoo.org> AuthorDate: 2017-11-27 20:29:43 +0000 Commit: Robin H. Johnson <robbat2@gentoo.org> CommitDate: 2017-11-27 21:00:29 +0000 net-misc/netifrc: bump. New functionality & improvements: - Wireless: 'iw' module to replace older 'iwconfig' module. (Brian Evans <grknight@gentoo.org>) - iproute2: VXLAN & GRETAP support (iplink_$IFVAR) (Sergey Popov <pinkbyte@gentoo.org>) - Bonding: ARP IP targets (Marc Schiffbauer <mschiff@gentoo.org>) - wpa_supplicant: better matching of wired connections (Henning Schild <henning@hennsch.de>) - IPv6: clearer message for Tentative duplicate address detection (DAD). - Refactor veinfo printing of iproute2 commands. Fixes: - Avoid moduleslist race condition (Hagbard Celine <hagbardcelin@gmail.com>) - Delete IPv6 tunnel correctly (stkchp <s@tkch.net>) Bug: https://bugs.gentoo.org/638836 Bug: https://bugs.gentoo.org/637474 Bug: https://bugs.gentoo.org/636846 Bug: https://github.com/gentoo/netifrc/pull/24 Bug: https://github.com/gentoo/netifrc/pull/26 Bug: https://github.com/gentoo/netifrc/pull/25 Bug: https://github.com/gentoo/netifrc/pull/27 Package-Manager: Portage-2.3.16, Repoman-2.3.6 net-misc/netifrc/Manifest | 17 +++++---- net-misc/netifrc/netifrc-0.6.0.ebuild | 71 +++++++++++++++++++++++++++++++++++ 2 files changed, 80 insertions(+), 8 deletions(-)}
stuartl/redhatter: reping
Hi, I only just saw your note… I'll give it a try as soon as I can.
Okay, I've given it a shot, as well as verifying that I have disabled DAD… net.ipv6.conf.all.autoconf = 0 net.ipv6.conf.all.accept_ra = 0 net.ipv6.conf.all.accept_dad = 0 It still sat there waiting for DAD at every single interface. I tried setting dad_timeout, but it seems there's no such file in /proc/sys/net/ipv6/conf/all. I've since located the option as part of /etc/conf.d/net, along with the 'nodad' option which is supposedly equivalent to setting this to zero; so I've tried setting the latter. This has no effect, it still waits 5 seconds for each interface. I'll wait for the machine to boot up and see if I can configure dad_timeout on all the relevant interfaces.
Created attachment 515540 [details] Screenshot of boot sequence, showing failed network interface init. Well, that works, it comes up and shows everything configured as it should be. However, it has the annoying effect that things are wrongly flagged as "failed". I've attached a screenshot that was captured via the IPMI console. The interfaces are bridged correctly however, and things seem to work. So I can live with that. I might have a closer look at openvswitch, however one issue I see is that there doesn't seem to be any documentation in net.example.bz2 shipped with netifrc that explains how that's set up. Bridges are fairly simple to set up and understand.
(In reply to Stuart Longland from comment #7) > I tried setting dad_timeout, but it seems there's no such file in > /proc/sys/net/ipv6/conf/all. I've since located the option as part of > /etc/conf.d/net, along with the 'nodad' option which is supposedly > equivalent to setting this to zero; so I've tried setting the latter. > > This has no effect, it still waits 5 seconds for each interface. > > I'll wait for the machine to boot up and see if I can configure dad_timeout > on all the relevant interfaces. If you had dad_timeout=0 in conf.d/net, then the sleep 1 inside that loop did not fire, and the 5 second wait was somewhere else. (In reply to Stuart Longland from comment #8) > Created attachment 515540 [details] > Screenshot of boot sequence, showing failed network interface init. > > Well, that works, it comes up and shows everything configured as it should > be. > > However, it has the annoying effect that things are wrongly flagged as > "failed". I've attached a screenshot that was captured via the IPMI console. Weird things here: - After your other sysctl settings, the kernel SHOULD not have reported any addresses as tentative. Could you show it with '/etc/init.d/net.bond0.150 --verbose restart'? It should print out all of the tentative addresses. - It's the DAD timeout that is flagged as failed, which is correct in this case because your kernel still claims there are tentative addresses. It only prints !! to say that the DAD timeout did not complete. iproute2_post_start always returns 0 and the interface does start. I'll change the print output to be even clearer about this. > I might have a closer look at openvswitch, however one issue I see is that > there doesn't seem to be any documentation in net.example.bz2 shipped with > netifrc that explains how that's set up. Bridges are fairly simple to set > up and understand. openvswitch would be done entirely underneath the netifrc layer: it would handle all of the L2 devices, and netifrc just sets up L3 afterwards.
lithium ~ # * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/net.vlan150 start * Bringing up interface vlan150 * Skipping module adsl due to missing program: /usr/sbin/adsl-start /usr/sbin/pppoe-start * Skipping module br2684ctl due to missing program: br2684ctl * Skipping module clip due to missing program: /usr/sbin/atmsigd * Skipping module ethtool due to missing program: ethtool * Skipping module netplugd due to missing program: /sbin/netplugd * Skipping module ifplugd due to missing program: /usr/sbin/ifplugd * Skipping module ipppd due to missing program: /usr/sbin/ipppd * Skipping module iwconfig due to missing program: /sbin/iwconfig * Skipping module iw due to missing program: /usr/sbin/iw * Skipping module pppd due to missing program: /usr/sbin/pppd * Skipping module pump due to missing program: /sbin/pump * Loaded modules: apipa arping bonding l2tp tuntap bridge ccwgroup dummy hsr macvlan macchanger macnet wpa_supplicant ssidnet iproute2 firewalld system vlan dhcpcd ip6rd ip6to4 * ip link set dev vlan150 up * Creating bridge vlan150 ... * Adding ports to vlan150 * bond0.150 ... * ip link set dev bond0.150 up [ ok ] * ip link set dev vlan150 promisc on * ip link set dev vlan150 up * Configuring vlan150 for MAC address 0C:C4:7A:A9:2B:98 ... [ ok ] * ip -4 route flush table cache dev vlan150 * ip -6 route flush table cache dev vlan150 * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/named start * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/distccd start * Starting distccd ... * Starting named ... * Checking named configuration ... [ ok ] * start-stop-daemon: fopen `/run/named/named.pid': No such file or directory * Detaching to start `/usr/sbin/named' ... [ ok ] * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/libvirtd start * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/netmount start * Starting libvirtd ... * start-stop-daemon: fopen `/var/run/libvirtd.pid': No such file or directory * Mounting network filesystems ... * Detaching to start `/usr/sbin/libvirtd' ... [ ok ] * Executing: /lib64/rc/sh/openrc-run.sh /lib64/rc/sh/openrc-run.sh /etc/init.d/apache2 start [ ok ] * Starting apache2 ... * Detaching to start `/usr/sbin/apache2' ... * Detaching to start `/usr/bin/distccd' ...
(In reply to Stuart Longland from comment #7) > Okay, I've given it a shot, as well as verifying that I have disabled DAD… > > net.ipv6.conf.all.autoconf = 0 > net.ipv6.conf.all.accept_ra = 0 > net.ipv6.conf.all.accept_dad = 0 autoconf & accept_ra don't impact DAD. They control L3 generation address & routing only. autoconf can use DAD, but turning off autoconf doesn't turn off DAD at all. The relevant sysctls are dad_transmits, accept_dad, enhanced_dad, optimistic_dad. dad_timeout is the netifrc knob in conf.d/net for how long to wait for DAD to complete.
(In reply to Stuart Longland from comment #10) > lithium ~ # * Executing: /lib64/rc/sh/openrc-run.sh > /lib64/rc/sh/openrc-run.sh /etc/init.d/net.vlan150 start I don't see any tentative addresses on the interface here at all. It didn't go into the DAD loop at all (but did in your IPMI screenshot).
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/netifrc.git/commit/?id=1b52dd84770eab44b5590ecb5cd386eca36f9b34 commit 1b52dd84770eab44b5590ecb5cd386eca36f9b34 Author: Robin H. Johnson <robbat2@gentoo.org> AuthorDate: 2018-01-21 21:50:38 +0000 Commit: Robin H. Johnson <robbat2@gentoo.org> CommitDate: 2018-01-21 21:50:42 +0000 net/iproute2: improve DAD tentative wait/output. If an interface had dad_timeout=0 set, then the wait loop output is confusing. Skip it entirely, printing a useful message: > Not waiting for DAD timeout on tentative IPv6 addresses (per conf.d/net dad_timeout) Refactor the DAD tentantive conditionals for ease of debugging. Bug 636846 suggests that some kernels are still showing tentative addresses despite sysctls being set. Bug: https://bugs.gentoo.org/636846 Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> net/iproute2.sh | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-)}
Indeed, and I do have accept_dad set… the annoying thing is that the critical point is during boot. Thus the earlier command that was suggested, I cannot run as there's no shell at that point. I'm not sure if interactive mode would give us what we're after. Perhaps I need to migrate some VMs off one host (I have two of them) and do some experiments. The funny thing is the sysctl service that loads /etc/sysctl.conf is in the boot runlevel while the network scripts are in default. So sysctl ought to have loaded first. I'm wondering if perhaps sysctl needs to be re-loaded when a new interface is created in case it references interfaces that weren't there before?
(In reply to Stuart Longland from comment #14) > Indeed, and I do have accept_dad set… the annoying thing is that the > critical point is during boot. Thus the earlier command that was suggested, > I cannot run as there's no shell at that point. > > I'm not sure if interactive mode would give us what we're after. Perhaps I > need to migrate some VMs off one host (I have two of them) and do some > experiments. You can set "rc_verbose=yes" in /etc/conf.d/net to always have the net services print verbose output. > The funny thing is the sysctl service that loads /etc/sysctl.conf is in the > boot runlevel while the network scripts are in default. So sysctl ought to > have loaded first. > > I'm wondering if perhaps sysctl needs to be re-loaded when a new interface > is created in case it references interfaces that weren't there before? Yes, but I think that is probably out of scope for netifrc, and I don't think it's done by ANY other distro either. The Debian interfaces system explicitly shows setting it w/ sysctl when bringing up the interface. Say your /etc/sysctl.conf contains: net.ipv6.conf.all.dad_transmits=1 net.ipv6.conf.default.dad_transmits=2 net.ipv6.conf.bond9.dad_transmits=3 And you do this: 1. load sysctl.conf 2. create bond9 3. read net.ipv6.conf.bond9.dad_transmits 4. load sysctl.conf 5. read net.ipv6.conf.bond9.dad_transmits #3 will say: net.ipv6.conf.bond9.dad_transmits=2 #5 will say: net.ipv6.conf.bond9.dad_transmits=3
Hi, I was today looking at the same/similar problem with bridge interfaces. I have a few bridge interfaces configured on my desktop which i'm using for getting certain vlans for virtual machines, thus these interfaces usually don't need an ip address. Now when enabling ipv6 i getting an ipv6 address on these interfaces via SLAAC. Usually, as already mentioned here, i would disable ipv6 autoconf/accept_ra with sysctl. Unfortunately this doesn't work as expected. At the time sysctl is starting the bridge interface doesn't exist yet, which means they don't get the setting. If played around a bit with certain configurations and the only way i could find so far to not get IPv6 addresses was via postup() looking like this: postup() { if [ "${IFACE}" = "brdmz" ]; then sysctl net.ipv6.conf.brdmz.autoconf=0 sysctl net.ipv6.conf.brdmz.accept_ra=0 ip link set brdmz down ip link set brdmz up fi return 0 } Now i know that the SLAAC behavior can't be controlled via net scripts (and as far as i could see, there is also no simply solution with the newnet script), and the netifrc scripts are out of scope (for now), but maybe this could be considered for further improvements.
(In reply to Michael Mair-Keimberger (iamnr3) from comment #16) > Hi, > > I was today looking at the same/similar problem with bridge interfaces. ... > If played around a bit with certain configurations and the only way i could > find so far to not get IPv6 addresses was via postup() looking like this: > > postup() { > if [ "${IFACE}" = "brdmz" ]; then > sysctl net.ipv6.conf.brdmz.autoconf=0 > sysctl net.ipv6.conf.brdmz.accept_ra=0 > ip link set brdmz down > ip link set brdmz up > fi > return 0 > } > > Now i know that the SLAAC behavior can't be controlled via net scripts (and > as far as i could see, there is also no simply solution with the newnet > script), and the netifrc scripts are out of scope (for now), but maybe this > could be considered for further improvements. This sounds like a case where you should probably set net.ipv6.conf.default.autoconf=0 net.ipv6.conf.default.accept_ra=0 very early in your system boot; possibly as early as your kernel cmdline depending when your network modules are being loaded. And then explicitly turn up autoconf/RA for the interfaces where you DO want it.
(In reply to Robin Johnson from comment #17) > (In reply to Michael Mair-Keimberger (iamnr3) from comment #16) > > Hi, > > > > I was today looking at the same/similar problem with bridge interfaces. > ... > > If played around a bit with certain configurations and the only way i could > > find so far to not get IPv6 addresses was via postup() looking like this: > > > > postup() { > > if [ "${IFACE}" = "brdmz" ]; then > > sysctl net.ipv6.conf.brdmz.autoconf=0 > > sysctl net.ipv6.conf.brdmz.accept_ra=0 > > ip link set brdmz down > > ip link set brdmz up > > fi > > return 0 > > } > > > > Now i know that the SLAAC behavior can't be controlled via net scripts (and > > as far as i could see, there is also no simply solution with the newnet > > script), and the netifrc scripts are out of scope (for now), but maybe this > > could be considered for further improvements. > > This sounds like a case where you should probably set > net.ipv6.conf.default.autoconf=0 > net.ipv6.conf.default.accept_ra=0 > > very early in your system boot; possibly as early as your kernel cmdline > depending when your network modules are being loaded. > > And then explicitly turn up autoconf/RA for the interfaces where you DO want > it. Thanks for the idea. This definitely sounds better then my "hack". And I actually didn't even know that you can set sysctl settings via cmdline... really nice to know.
I appear to be having a similar problem to this bug. My situation is different and fairly simple, though, which might make it easier to debug. I have a Linode VPS. My /etc/conf.d/net file looks like this (with certain parts of the IP starred out): > config_eth0="72.14.***.***/24" > routes_eth0="default via 72.14.***.1" That's all that's in the file, but when I run "/etc/init.d/net.eth0 -D restart", I see the following output: > # /etc/init.d/net.eth0 -D restart > * Bringing down interface eth0 > * Bringing up interface eth0 > * 72.14.***.***/24 ... [ ok ] > * Adding routes > * default via 72.14.***.1 ... [ ok ] > * Waiting for tentative IPv6 addresses to complete DAD (5 seconds) . [ ok ] and ifconfig shows that the interface has an IPv6 address as well as an IPv4 address: > eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 > inet 72.14.***.*** netmask 255.255.255.0 broadcast 72.14.***.255 > inet6 2600:3c00::**** prefixlen 64 scopeid 0x0<global> > inet6 fe80::fcfd:48ff:fe0e:b955 prefixlen 64 scopeid 0x20<link> > ether fe:fd:**:**:**:** txqueuelen 1000 (Ethernet) > RX packets 584038818 bytes 306151911321 (285.1 GiB) > RX errors 0 dropped 0 overruns 0 frame 0 > TX packets 516938587 bytes 619987668480 (577.4 GiB) > TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 net.ipv6.conf.all.accept_dad is set to 0: > # sysctl net.ipv6.conf.all.accept_dad net.ipv6.conf.all.accept_dad = 0 I'm not sure how to fix this, but if there's any debugging I can do, let me know.
(note that the "2600:3c00::****" address is actually longer than that, I just condensed the different parts down into one.)
Another observation: When I try to run "ifconfig eth0 del 2600:3c00::****/64" manually, it deletes the address for a limited time, but a few seconds later it'll show back up in the output. So I assume that it is, indeed, actively receiving advertisements.
(In reply to Sophie Hamilton from comment #19) > I appear to be having a similar problem to this bug. My situation is > different and fairly simple, though, which might make it easier to debug. ... > net.ipv6.conf.all.accept_dad is set to 0: > > > # sysctl net.ipv6.conf.all.accept_dad > net.ipv6.conf.all.accept_dad = 0 > > I'm not sure how to fix this, but if there's any debugging I can do, let me > know. "DAD" is just the detection of address collisions on the network. Did you disable the accept_ra & autoconf options as mentioned in this bug? That will stop it from getting the 2600:../64 address, but it will still have the fe80::.../64 address unless you entirely disable IPv6. ideally, you should these sysctls set BEFORE the kernel loads the module for the network device, or at least before anything marks the interface as "up". net.ipv6.conf.default.autoconf=0 net.ipv6.conf.default.accept_ra=0 net.ipv6.conf.all.autoconf=0 net.ipv6.conf.all.accept_ra=0 If you have them populated before the interface is created, the default value should populate the interface-specific sysctl.
Thank you! I apologise - I thought that this bug was one I was experiencing too, but I guess it was just my misunderstanding. I'll back out of this bug now - thank you again.