Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 653668 - >=gentoo-sources-4.15: network bonding doesn't work
Summary: >=gentoo-sources-4.15: network bonding doesn't work
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://bugs.launchpad.net/ubuntu/+so...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-04-21 08:29 UTC by Patrick Lauer
Modified: 2018-07-21 14:12 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Patrick Lauer gentoo-dev 2018-04-21 08:29:38 UTC
Kernel configuration works on 4.14, on both 4.15 and 4.16 the bond interface exists but doesn't actually work. (tested with 4.15.17 and 4.16.2) 

eno1, eno2 is ixgbe in a 802.3ad bonding config, eno3 is simple network connection for management network.

dmesg for 4.15.17 says:
[   27.181948] ixgbe 0000:82:00.0: registered PHC device on eno1
[   27.315503] bond0: Enslaving eno1 as a backup interface with an up link
[   27.523576] pps pps1: new PPS source ptp5
[   27.523581] ixgbe 0000:82:00.1: registered PHC device on eno2
[   27.657394] bond0: Enslaving eno2 as a backup interface with an up link
[   27.765925] ixgbe 0000:82:00.0 eno1: changing MTU from 1500 to 9000
[   28.439232] ixgbe 0000:82:00.1 eno2: changing MTU from 1500 to 9000
[   33.241460] ixgbe 0000:82:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[   33.270140] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[   33.850613] ixgbe 0000:82:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[   34.710075] ixgbe 0000:82:00.0 eno1: NIC Link is Down
[   34.710344] ixgbe 0000:82:00.0 eno1: speed changed to 0 for port eno1
[   35.350070] ixgbe 0000:82:00.1 eno2: NIC Link is Down
[   35.381561] IPv6: ADDRCONF(NETDEV_UP): eno3: link is not ready
[   35.720199] ixgbe 0000:82:00.1 eno2: speed changed to 0 for port eno2
[   36.084363] ixgbe 0000:82:00.0 eno1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[   36.700059] bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond
[   37.684368] ixgbe 0000:82:00.1 eno2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[   38.898510] tg3 0000:01:00.0 eno3: Link is up at 1000 Mbps, full duplex
[   38.898519] tg3 0000:01:00.0 eno3: Flow control is on for TX and on for RX
[   38.898523] tg3 0000:01:00.0 eno3: EEE is disabled
[   38.898544] IPv6: ADDRCONF(NETDEV_CHANGE): eno3: link becomes ready

ip a / ifconfig suggests all devices are up and connected, but no data transfer over the bonded interface works.

 /proc/net/bonding/bond0 looks "ok" but for both slave devices:

Slave Interface: eno2
MII Status: down
Comment 1 Thomas Deutschmann (RETIRED) gentoo-dev 2018-04-23 17:22:28 UTC
I cannot find a similar bug report yet. Are you able to do a bisect?
Comment 2 Patrick Lauer gentoo-dev 2018-04-28 10:31:07 UTC
Yes, we'll try to bisect it
Comment 3 Patrick Lauer gentoo-dev 2018-05-21 14:12:10 UTC
Success:

# git bisect good
4d2c0cda07448ea6980f00102dc3964eb25e241c is the first bad commit
commit 4d2c0cda07448ea6980f00102dc3964eb25e241c
Author: Mahesh Bandewar <maheshb@google.com>
Date:   Wed Sep 27 18:03:49 2017 -0700

    bonding: speed/duplex update at NETDEV_UP event

    Some NIC drivers don't have correct speed/duplex settings at the
    time they send NETDEV_UP notification and that messes up the
    bonding state. Especially 802.3ad mode which is very sensitive
    to these settings. In the current implementation we invoke
    bond_update_speed_duplex() when we receive NETDEV_UP, however,
    ignore the return value. If the values we get are invalid
    (UNKNOWN), then slave gets removed from the aggregator with
    speed and duplex set to UNKNOWN while link is still marked as UP.

    This patch fixes this scenario. Also 802.3ad mode is sensitive to
    these conditions while other modes are not, so making sure that it
    doesn't change the behavior for other modes.

    Signed-off-by: Mahesh Bandewar <maheshb@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

:040000 040000 d8c0cdd0d36e0360d0dea20417fdd690fd9db57e 0c78a15116c4f157e7eaa418888e3e7c54146a76 M      drivers
Comment 4 Patrick Lauer gentoo-dev 2018-05-21 14:44:34 UTC
A config fix works:  modprobe bonding miimon=100
Default is 0, any non-zero value should work.
Comment 5 Mike Pagano gentoo-dev 2018-07-21 14:12:08 UTC
Solving as fixed with the identified workaround.