use six e1000 cards in a server with latest baselayout setup to put each pair of cards into bonding mode....notice that several of the bonds will complain or not properly initialize because the cards were not active long enough before bonding called (latest e1000 drivers dont power on card until module loaded). Rebooting server multiple times will result in different working configurations (see behavior across multiple machines and switches). I've also seen ntp-client fail in simpler configs because e1000 isn't fully setup by the time script is run. I'm tempted to think we need to provide some place for user specified delay after module loading. yes, I'm aware that portfast on some switches helps but it seems we need to modify the startup scripts generically somehow for people who dont have it.
and have you tried tweaking RC_NET_STRICT_CHECKING in /etc/conf.d/rc ?
Are you depending the bonds correctly? /etc/conf.d/net sample depend_bond0() { need net.eth0 net.eth1 } depend_bond1() { need net.eth2 net.eth3 } depend_bond2() { need net.eth4 net.eth5 }
Closing as WORKSFORME
problem still exists. Discussed bug tonight on gentoo-server irc channel. Consensus seems to be that a bug does exist. And, I believe Dell sent out a notice about 1 year ago saying that certain intel gigE nics would have long initialization times. strict net checking doesnt really help as the net startup script succeeds. The nics arent just setup yet. It takes a few seconds more. Bonding dependencies are correct.
So would you say that the real problem is with the kernel driver then as it's returning too fast?
You could do this do delay init. preup() { # Sleep 5 seconds before bringing up a bonded interface [[ ${IFACE} == "bond"* ]] && sleep 5 } But otherwise this sounds like a kernel bug.
another note: I have found that some switches, especially cisco, have a portfast option that allows ports to be manually designated as 'server only'. It skips the whole spanning tree loop checks and allows the link to go up real fast. Currently, I am having to enable this option on my switches to get around the e1000 bug.
Usual procedure is to get this reported upstream. Is it reproducible on 2.6.15?
Try a recent -git release of Linus' tree (e.g. git-sources). There have been many e1000 fixes committed over the last few days.
Please reopen when you respond to comment #8 or #9.