| Summary: | net-misc/dhcpcd race assigns addresses to incorrect interfaces | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Nick Bastin <nbastin> |
| Component: | Current packages | Assignee: | William Hubbs <williamh> |
| Status: | RESOLVED NEEDINFO | ||
| Severity: | normal | CC: | base-system, roy |
| Priority: | Normal | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: | Example of race with request, dmesg and /var/log/messages | ||
|
Description
Nick Bastin
2023-07-17 02:33:26 UTC
Created attachment 865629 [details]
Example of race with request, dmesg and /var/log/messages
This is a weak example - the event happens but nothing bad results. That being said, bad results are _easy_ to see from here, and they _do_ happen.
Just to be clear, this _really_ screws up installs on machines with wide interface controllers, since the live cd uses this package. I don't see anything wrong with the example log attached, nor do I see any kind of race or wrong address assignment. You'll have to point that out please :) What I do see is this flow: * kernel announces interface * dhcpcd sees this and markes it IFF_UP * kernel takes IFF_UP and then announces carrier is available * dhcpcd sees this and starts auto configuration * kernel realises carrier isn't really available and announces it's down * dhcpcd sees this and removes any any configuration attempted so far * kernel finally says one interface really has a carrier and announces it * dhcpcd sees this and successufully completes auto configuration If my guess is correct, then any fixes must happen in then kernel driver because dhcpcd is just reacting to the announced kernel state. You might want to enable debug in dhcpcd.conf to get more detail as well from the dhcpcd side. Sorry, I'm sure I wasn't clear - the attachment isn't actually a problem, it was just one that I happened to capture that showed the foundations of a problem based on the luck of the order of initialization (which does occur, but sadly not in that case, and quite intermittently). There are two cases that occur, only one is the fault (maybe) of dhcpcd. The dhcpcd issue (as I tried to explain in the initial report) is that it can receive a response and bind it to the wrong interface, at which point you are stuck until you fix it manually (it will lose the IP at T2, but the proper interface doesn't ever seem to pick one up unless you intervene, although this may be a configuration artifact of the live cd). The good news is because this problem persists I should be able to get some logs when I can reproduce it. In fairness it's only conjecture that this is related a separate DISCOVER. I am traveling this week so unfortunately I can't run as many tests as I'd like, but I'll try to run a bunch of reboot cycles when I get back. As an aside, I'll note that I've observed some fairly weird effects when the BCM57800 chip is initialized, and it's also possible that the problem is it's capable of delivering packets on multiple rings (as it has an internal L2 switch) while the driver is initializing before IFF_LOWER_UP, in which case it's entirely possible that the userspace is simply being lied to if it acts before the driver setup is complete. I have created a patch here (not yet on master branch) https://github.com/NetworkConfiguration/dhcpcd/commit/da7cc24cff0b74357106d232921b89cd253c29a5 This changes the carrier check on linux from just IFF_RUNNING to IFF_RUNNING and IFF_LOWER_UP and !IFF_DORMANT. According to the kernel documnentation this should not be needed but it might fix your case of carrier appearing and then disappearing just after firmware initialisation. If any @Gentoo people or anyone else can test this as well incase it breaks anything I'd appreciate any feedback. |