Kernel configuration works on 4.14, on both 4.15 and 4.16 the bond interface exists but doesn't actually work. (tested with 4.15.17 and 4.16.2) eno1, eno2 is ixgbe in a 802.3ad bonding config, eno3 is simple network connection for management network. dmesg for 4.15.17 says: [ 27.181948] ixgbe 0000:82:00.0: registered PHC device on eno1 [ 27.315503] bond0: Enslaving eno1 as a backup interface with an up link [ 27.523576] pps pps1: new PPS source ptp5 [ 27.523581] ixgbe 0000:82:00.1: registered PHC device on eno2 [ 27.657394] bond0: Enslaving eno2 as a backup interface with an up link [ 27.765925] ixgbe 0000:82:00.0 eno1: changing MTU from 1500 to 9000 [ 28.439232] ixgbe 0000:82:00.1 eno2: changing MTU from 1500 to 9000 [ 33.241460] ixgbe 0000:82:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 33.270140] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready [ 33.850613] ixgbe 0000:82:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 34.710075] ixgbe 0000:82:00.0 eno1: NIC Link is Down [ 34.710344] ixgbe 0000:82:00.0 eno1: speed changed to 0 for port eno1 [ 35.350070] ixgbe 0000:82:00.1 eno2: NIC Link is Down [ 35.381561] IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready [ 35.720199] ixgbe 0000:82:00.1 eno2: speed changed to 0 for port eno2 [ 36.084363] ixgbe 0000:82:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 36.700059] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond [ 37.684368] ixgbe 0000:82:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX [ 38.898510] tg3 0000:01:00.0 eno3: Link is up at 1000 Mbps, full duplex [ 38.898519] tg3 0000:01:00.0 eno3: Flow control is on for TX and on for RX [ 38.898523] tg3 0000:01:00.0 eno3: EEE is disabled [ 38.898544] IPv6: ADDRCONF(NETDEV_CHANGE): eno3: link becomes ready ip a / ifconfig suggests all devices are up and connected, but no data transfer over the bonded interface works. /proc/net/bonding/bond0 looks "ok" but for both slave devices: Slave Interface: eno2 MII Status: down
I cannot find a similar bug report yet. Are you able to do a bisect?
Yes, we'll try to bisect it
Success: # git bisect good 4d2c0cda07448ea6980f00102dc3964eb25e241c is the first bad commit commit 4d2c0cda07448ea6980f00102dc3964eb25e241c Author: Mahesh Bandewar <maheshb@google.com> Date: Wed Sep 27 18:03:49 2017 -0700 bonding: speed/duplex update at NETDEV_UP event Some NIC drivers don't have correct speed/duplex settings at the time they send NETDEV_UP notification and that messes up the bonding state. Especially 802.3ad mode which is very sensitive to these settings. In the current implementation we invoke bond_update_speed_duplex() when we receive NETDEV_UP, however, ignore the return value. If the values we get are invalid (UNKNOWN), then slave gets removed from the aggregator with speed and duplex set to UNKNOWN while link is still marked as UP. This patch fixes this scenario. Also 802.3ad mode is sensitive to these conditions while other modes are not, so making sure that it doesn't change the behavior for other modes. Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> :040000 040000 d8c0cdd0d36e0360d0dea20417fdd690fd9db57e 0c78a15116c4f157e7eaa418888e3e7c54146a76 M drivers
A config fix works: modprobe bonding miimon=100 Default is 0, any non-zero value should work.
Solving as fixed with the identified workaround.