Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 910442 - net-misc/dhcpcd race assigns addresses to incorrect interfaces
Summary: net-misc/dhcpcd race assigns addresses to incorrect interfaces
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: William Hubbs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-17 02:33 UTC by Nick Bastin
Modified: 2023-07-18 10:45 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Example of race with request, dmesg and /var/log/messages (wnje,7.30 KB, text/plain)
2023-07-17 02:38 UTC, Nick Bastin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Bastin 2023-07-17 02:33:26 UTC
In conditions where interfaces create link where none exists (bcm NetExtreme firmware loads, at least) dhcpcd may (highly timing dependant) attempt to get an address.  This will then result in a future _response_ on a different interface creating an address on the wrong interface.  This will resolve itself if the link is actually dead, but will never resolve itself if the link is live (but not connected to a DHCP server).

Reproducible: Sometimes

Steps to Reproduce:
You need a set of interfaces connected to the same chip - 4 is probably best but 2 would likely happen at some point.  If only one interface is live then the issue will resolve itself, but if two or more are live and only one can reach dhcp you will eventually find the address on the wrong interface.
Actual Results:  
IP address assigned to wrong interface

Expected Results:  
IP address assigned to correct interface (this works fine with isc-dhcp, as an aside).
Comment 1 Nick Bastin 2023-07-17 02:38:34 UTC
Created attachment 865629 [details]
Example of race with request, dmesg and /var/log/messages

This is a weak example - the event happens but nothing bad results.  That being said, bad results are _easy_ to see from here, and they _do_ happen.
Comment 2 Nick Bastin 2023-07-17 02:41:56 UTC
Just to be clear, this _really_ screws up installs on machines with wide interface controllers, since the live cd uses this package.
Comment 3 Roy Marples 2023-07-17 08:29:39 UTC
I don't see anything wrong with the example log attached, nor do I see any kind of race or wrong address assignment. You'll have to point that out please :)

What I do see is this flow:

* kernel announces interface
* dhcpcd sees this and markes it IFF_UP
* kernel takes IFF_UP and then announces carrier is available
* dhcpcd sees this and starts auto configuration
* kernel realises carrier isn't really available and announces it's down
* dhcpcd sees this and removes any any configuration attempted so far
* kernel finally says one interface really has a carrier and announces it
* dhcpcd sees this and successufully completes auto configuration

If my guess is correct, then any fixes must happen in then kernel driver because dhcpcd is just reacting to the announced kernel state.

You might want to enable debug in dhcpcd.conf to get more detail as well from the dhcpcd side.
Comment 4 Nick Bastin 2023-07-17 20:53:55 UTC
Sorry, I'm sure I wasn't clear - the attachment isn't actually a problem, it was just one that I happened to capture that showed the foundations of a problem based on the luck of the order of initialization (which does occur, but sadly not in that case, and quite intermittently).

There are two cases that occur, only one is the fault (maybe) of dhcpcd.  The dhcpcd issue (as I tried to explain in the initial report) is that it can receive a response and bind it to the wrong interface, at which point you are stuck until you fix it manually (it will lose the IP at T2, but the proper interface doesn't ever seem to pick one up unless you intervene, although this may be a configuration artifact of the live cd).  The good news is because this problem persists I should be able to get some logs when I can reproduce it.  In fairness it's only conjecture that this is related a separate DISCOVER.

I am traveling this week so unfortunately I can't run as many tests as I'd like, but I'll try to run a bunch of reboot cycles when I get back.

As an aside, I'll note that I've observed some fairly weird effects when the BCM57800 chip is initialized, and it's also possible that the problem is it's capable of delivering packets on multiple rings (as it has an internal L2 switch) while the driver is initializing before IFF_LOWER_UP, in which case it's entirely possible that the userspace is simply being lied to if it acts before the driver setup is complete.
Comment 5 Roy Marples 2023-07-18 10:45:37 UTC
I have created a patch here (not yet on master branch) https://github.com/NetworkConfiguration/dhcpcd/commit/da7cc24cff0b74357106d232921b89cd253c29a5

This changes the carrier check on linux from just IFF_RUNNING to IFF_RUNNING and IFF_LOWER_UP and !IFF_DORMANT. According to the kernel documnentation this should not be needed but it might fix your case of carrier appearing and then disappearing just after firmware initialisation.

If any @Gentoo people or anyone else can test this as well incase it breaks anything I'd appreciate any feedback.