Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 33272

Summary: baselayout 1.8.6.12 can't start net.eth0 anymore
Product: Gentoo Linux Reporter: JoWilly <jowilly>
Component: New packagesAssignee: Martin Schlemmer (RETIRED) <azarah>
Status: RESOLVED FIXED    
Severity: critical CC: andy.dalton, bevand_m, blademan, bugs.gentoo.org, cm, dan.dickey, deviantgeek, divided.mind, gentoo-bugs, gentoo, gentoo, giovanni.bobbio, gurligebis, jochen.eisinger, keith, m.debruijne, seemant, simon, swtaylor
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: Patch to fix net.eth0 carrier detection
patch against net.ethX
net_eth0-fix-carrier-detection.patch
This workaround is needed because the behaviour of some ethernet drivers breaks the carrier autodection stuff in net.eth0
A little revised patch that works for me (none of the others did)
$(LC_MESSAGES=C ifconfig ...)

Description JoWilly 2003-11-11 19:49:31 UTC
Hi,

eth0 is connected to an ADSL router.

I have just updated baselayout to 1.8.6.12, this is what I get:

 * Bringing eth0 up...
 * eth0 is not plugged in or has no carrier signal                        [ !! ]

When I comment these lines in /etc/init.d/net.eth0 it works:

---CUT----
else
		# Check that eth0 was not brough up by the kernel ...
	#	if [ "${status_IFACE}" != "up" ]
	#	then
			# Check that the interface has a carrier
	#		if [ "${carrier_IFACE}" = "running" ]
	#		then
				/sbin/dhcpcd ${dhcpcd_IFACE} ${IFACE} >/dev/null || {
					retval=$?
					eend ${retval} "Failed to bring ${IFACE} up"
					return ${retval}
				}
	#		else
	#			eend 1 "${IFACE} is not plugged in or has no carrier signal"
	#			return 1
	#		fi
	#	fi
	fi
	eend 0
---CUT---


But there still is one strange thing happening (this is the reason why I tried to update baselayout, to check if this was fixed) : eth0 takes a long time to come up (maybe 1 or 2 minutes).

A few months ago it was comming up fast. I then eventually updated baselayout and it became slow.
... and now with v 1.8.6.12 it doesn't come up anymore without commenting the above lines.

The eth0 driver is compiled in the kernel, no module.
Comment 1 ferret 2003-11-11 21:20:34 UTC
better would be to alter:

carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 IFACE) print "running" }')"

to:

carrier_IFACE="$(ifconfig | grep -A4 ${iface} | gawk '/RUNNING/ { print "running" }')"

By the way, I don't have the taking-two-minutes problem.
Comment 2 Arve Knudsen 2003-11-12 00:35:24 UTC
Glad I keep backups, I diffed the last version of net.eth0 with this one and promptly removed the carrier signal test. I wish these scripts could be tested a little better before making their way into Portage, it seems like with every other version of baselayout something breaks. At least .12 seems to correct a problem I brought up quite some time ago, with the return value of fsck being treated incorrectly (the maintainer mustve finally understood it *was* a problem).
Comment 3 jack_mort 2003-11-12 02:43:49 UTC
I had the same problem here...

Found a solution by replacing :

status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/Link/ { if ($1 == IFACE) print "up" }')"

with

status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '"/Link/" { if ($1 == IFACE) print "up" }')"

and

carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 == IFACE) print "running" }')"

with

carrier_IFACE="$(ifconfig -a | gawk -v IFACE="${iface}" '"/RUNNING/" { if ($1 == IFACE) print "running" }')"


Now, eth0 is bringing up as before :-)

Hope this can help...
Comment 4 Stuart Bouyer 2003-11-12 16:57:45 UTC
Same problem here, running development-sources 2.6.0-test9 with nic compiled into the kernel. When coming back from boot or if I stop net.etho then ifconfig shows only lo, however ifconfig -a gives lo and eth0.

By following Jack's instructions, net.eth0 works again.

I'm running net-tools-1.60-r7 if that's of any consequence
Comment 5 Brad 2003-11-12 18:02:56 UTC
I ran into the same problem, if I look at my eth0 the term RUNNING is not displayed in the output of ifconfig eth0, so the carrier detection seems broken, if I comment out the line (or just change the condition from = to !=, it brings it up fine (with dhcpcd).  Should the interface be reproted as running?  I'm using the nvnet driver.
Comment 6 Bjarke Istrup Pedersen (RETIRED) gentoo-dev 2003-11-13 04:10:02 UTC
I just removed the carrier test, but this could really be a problem for people running systems that's not easy to get to, since they would have problems after a reboot.
Comment 7 Mario Vazquez 2003-11-13 08:08:57 UTC
Have a laptop with build-in natsemi lan, which connects through dhcp.  Try with build natsemi support on kernel and as a module and eth0 fail to load on both configurations. After system is up, I was able to startup eth0 manualy with dhcpcd eth0 command.
To resolve the problem I keep baselayout 1.8.6.12 and use a previous copy of net.eth0 from baselayout 1.8.6.10 (or 11).

BTW, my kernel is ck-sources 2.4.22-r2.
Comment 8 Romain GAILLEGUE 2003-11-13 08:57:27 UTC
Same probleme with mm-sources-2.6.0-test9-mm1 and nic build in module

with the modifactions of jack_mort (post 3) works again 
Comment 9 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-13 09:37:54 UTC
Please try below patch.

--
Index: init.d/net.eth0
===================================================================
RCS file: /home/cvsroot/gentoo-src/rc-scripts/init.d/net.eth0,v
retrieving revision 1.32
diff -u -r1.32 net.eth0
--- init.d/net.eth0     11 Nov 2003 19:37:24 -0000      1.32
+++ init.d/net.eth0     13 Nov 2003 17:37:22 -0000
@@ -32,8 +32,8 @@
        dhcpcd_IFACE="$(eval echo \$\{dhcpcd_${iface}\})"
        inet6_IFACE="$(eval echo \$\{inet6_${iface}\})"
        alias_IFACE="$(eval echo \$\{alias_${iface}\})"
-       status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/Link/ { if ($1 == IFACE) print "up" }')"
-       carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 == IFACE) print "running" }')"
+       status_IFACE="$(ifconfig "${iface}" | gawk '/Link/ { print "up" }')"
+       carrier_IFACE="$(ifconfig "${iface}" | gawk '/RUNNING/ { print "running" }')"
        vlans="$(eval echo \$\{iface_${IFACE}_vlans\})"
 }
  
Comment 10 Pat Suwalski 2003-11-13 15:56:30 UTC
Yeah, this RUNNING check is a terrible idea. You get chicken-egg with wireless cards. For my orinoco to show RUNNING, dhcpcd eth1 has to be executed... dhcpcd is executed by net.eth1.
Comment 11 Pat Suwalski 2003-11-13 16:08:15 UTC
I should really be more specific. Here's the output of 'ifconfig eth1' at system startup:

eth1      Link encap:Ethernet  HWaddr 11:11:11:11:11:11
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:11 Base address:0x100

I've removed my MAC address.

Line 66:
if [ "${carrier_IFACE}" = "running" ]
then
/sbin/dhcpcd ${dhcpcd_IFACE} ${IFACE} >/dev/null

This condition can never be reached.
Comment 12 Derk W te Bokkel 2003-11-13 16:23:56 UTC
patch does not work for me although  the net seems to come up it really does not.

I first thought it had worked as "it came up" but eth0 was not working.

Old version still works properly after it is sub'd it back in.

derk
Comment 13 Derk W te Bokkel 2003-11-13 17:01:33 UTC
NOTE: my net.eth0 is also configured for 'dhcp' configuration so comment #11 applies to me also ... (by the way my net.eth1 was still old style and therefore came up properly. In order to get internet access back I copied it over top of the problematic net.eth0)

derk

Comment 14 Jason Rhinelander 2003-11-13 18:34:21 UTC
In reply to the patch in comment 9, that patch makes things worse - `ifconfig eth0` will ALWAYS contain "eth0     Link", whether or not the interface is actually up, so not only does it not bring the interface up, it SAYS it IS bringing the interface up.

The "fix" here is to remove all references to the carrier_IFACE, and to _not_ apply the comment 9 patch for status_IFACE.  After a bit of testing, my ifconfig eth0 does not change at all when the cable it plugged in vs. not plugged in.  Neither does mii-tool help as it won't report on a down interface.
Comment 15 Dead Schorsch 2003-11-14 00:54:42 UTC
Comment #9 changed the "No link" message to "SCIOADDR: Network Unreachable", but does not solve the problem at all. Can somebody tell me how to extract that init script from the previous baselayout version, because it worked fine.
Comment 16 jack_mort 2003-11-14 02:17:05 UTC
Hi,

I've just made a patch, as it seems the problem is solved for people who tried my modifications :-)

Here it is :

--- /etc/init.d/net.eth0        2003-11-14 11:03:40.000000000 +0100
+++ /etc/init.d/net.eth0        2003-11-14 11:06:57.000000000 +0100
@@ -32,8 +32,8 @@
        dhcpcd_IFACE="$(eval echo \$\{dhcpcd_${iface}\})"
        inet6_IFACE="$(eval echo \$\{inet6_${iface}\})"
        alias_IFACE="$(eval echo \$\{alias_${iface}\})"
-       status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/Link/ { if ($1 == IFACE) print "up" }')"
-       carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 == IFACE) print "running" }')"
+       status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '"/Link/" { if ($1 == IFACE) print "up" }')"
+       carrier_IFACE="$(ifconfig -a | gawk -v IFACE="${iface}" '"/RUNNING/" { if ($1 == IFACE) print "running" }')"
        vlans="$(eval echo \$\{iface_${iface}_vlans\})"
 }

Good luck :-)
Comment 17 Klaus S. Madsen 2003-11-14 18:30:34 UTC
None of the two patches work for me (I use dhcp for my network card).

The problem is that upon boot (or after running /etc/init.d/net.eth0 stop), eth0 is not listed in the output of ifconfig. 

So ${status_IFACE} contains the empty string, as does ${carrier_IFACE}. Therefore the script outputs that no cable is plugged into my card, instead of starting dhcpcd.
Comment 18 Troy Dack 2003-11-15 04:52:45 UTC
For Comment #15:
simply re-merge baselayout-1.8.6.11, eg:

emerge =baselayout-1.8.6.11

Then do the etc-update dance and review the changes, most you can probably
just delete (in which case you'll keep the 1.8.6.12 version), but the net.eth0
changes you should look at carefully and either merge them or overwrite the
file (in which case you'll go from a .12 net.eth0 to a .11 net.eth0)

BTW, I got bitten by this too, had to play musical monitors and keyboards to find out why my headless box in the corner was not responding to any network requests :)
Comment 19 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-15 06:17:22 UTC
In replay to comment #14, sure the Link changes was not really thought
through, or tested.

As for 'ifconfig eth0' not showing 'RUNNING' when the cable is in ...
broken driver ?
Comment 20 Simon Cooper 2003-11-15 07:37:23 UTC
i dont think its a broken driver, i use the 8139too driver which is pretty thoroughly tested by now. I think the carrier test should be removed, but there must be some way of telling the script there isnt a cable connected so it doesnt spend 5 minutes waiting for a DHCPOFFER that will never come. Until that can be figured out just remove the offending carrier test lines
Comment 21 Simon Cooper 2003-11-15 07:40:52 UTC
just noticed a typo in it as well:

# Check that eth0 was not brough up by the kernel ...

s/brough/brought
Comment 22 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-15 08:02:55 UTC
*** Bug 33472 has been marked as a duplicate of this bug. ***
Comment 23 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-15 12:51:08 UTC
Could those of you that have issues with carrier detection merge 'ethtool',
and do:

  # ethtool eth0 | grep Link
        Link detected: yes
  #


Thanks.
Comment 24 Simon Watson 2003-11-15 15:34:03 UTC
Results on my - carrier detection broken - machine:

root@tomsk swat # ethtool eth0
Settings for eth0:
No data available
root@tomsk swat # 
Comment 25 Jason Rhinelander 2003-11-16 00:34:25 UTC
In reply to comment 19, RUNNING is displayed if the IFF_RUNNING bit is set -
however very few drivers actually SET this bit, so, as you put it, most drivers
are "broken".  If you don't believe me, grep through the kernel source for
IFF_RUNNING - only a very small number of drivers set it directly.  Most call
netif_carrier_on/_off, which doesn't set it.  linux/net/core/dev.c will set
IFF_RUNNING when you retrieve interface information - but only if both the
carrier is ok AND the interface is running.

Regarding the ethtool information, the below is from a Broadcom Gigabit network
adapter (using the tg3 kernel driver):

[root@hades root]# ethtool eth0 && ifconfig eth0 down && ethtool eth0 && ifconfig eth0 up
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 10Mb/s
        Duplex: Half
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes
Settings for eth0:
Cannot get device settings: Resource temporarily unavailable
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: no


Running the same on another machine with a 3Com card gives similar results.


The current carrier logic needs a bit of work, IFF_RUNNING _is_ useful, but is
only reliable when the interface is up.  For example, if I run ``ifconfig eth0
up'' (but don't specify any network address) the RUNNING shows up as you would
expect - it's there if the cable is plugged in, not there if the cable is
unplugged.

So, the logic should be like this:

- run ifconfig eth0 up
- check ``ifconfig eth0'' for 'RUNNING'
- if not there, display warning, otherwise
- run dhcpcd (dhcpcd doesn't care if the interface is already up)


I'm attaching (so as not to lose tabs, etc.) a /etc/init.d/net.eth0 patch that
fixes the carrier detection logic without removing the feature.  I quite like the feature, in fact, as it now means I can start up my notebook without an active connection and won't have to wait for dhcpcd to timeout.



There's also a minor bug in the baselayout ebuild - the ``chown root.uucp''
needs to change to root:uucp for updated POSIX compliance with newer glibc's,
along the same vein as the `head -1' => `head -n 1' changes that have been
needed all over the place recently.
Comment 26 Jason Rhinelander 2003-11-16 00:35:15 UTC
Created attachment 20807 [details, diff]
Patch to fix net.eth0 carrier detection
Comment 27 Tim Adelt 2003-11-16 04:19:53 UTC
Created attachment 20812 [details, diff]
patch against net.ethX

the bug is due to a typo in line 66

comparison won't work: there is a missing '='.
Comment 28 Jeld The Dark Elf 2003-11-16 04:58:43 UTC
Hi, I am using 3c59x driver with my 3com card
Unless I either run dhcpcd or ifconfig eth0 up
my eth0 doesn't show up on ifconf at all, so I would say
either disable the link check completely or do ifconfig up first.
Now, about the actual check procedure. When my interface is down, ifconfig
will not show it in the list at all, so the check will fail because it thinks there is no link. ifconfig -a however will actually list the interface. The problem is, Link work will be there and RUNNING will not. 

here is teh output of a "down" interface

ifconfig:
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:380 (380.0 b)  TX bytes:380 (380.0 b)

ifconfig -a

eth0      Link encap:Ethernet  HWaddr 00:60:08:A9:DD:67  
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:31494 errors:0 dropped:0 overruns:3 frame:0
          TX packets:302 errors:0 dropped:0 overruns:0 carrier:0
          collisions:5 txqueuelen:1000 
          RX bytes:2215009 (2.1 Mb)  TX bytes:38780 (37.8 Kb)
          Interrupt:11 Base address:0xd000 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:6 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:380 (380.0 b)  TX bytes:380 (380.0 b)

so, I had to swap status and carrier vars, and change ifconfig to ifconfig -a to get it to work.

P.S. Not to mention the fact that the stupid dependency checking promptly brought down all the services that depended on net, but since restart of the interface failed, it didn;t bring any of them up. After fixing the script and starting net.eth0 I had to manualy start half a dozen of services since it wouldn't pick it up. Sucks.
Comment 29 Jason Rhinelander 2003-11-16 11:32:33 UTC
In reply to comment 27, = works just as well (and exactly the same) as == in a bash [ comparison ]:

$ if [ "foo" = "foo" ]; then echo "bar"; fi
bar
$ if [ "foo2" = "foo" ]; then echo "bar"; fi
$ if [ "foo" == "foo" ]; then echo "bar"; fi
bar
$ if [ "foo2" == "foo" ]; then echo "bar"; fi

In reply to comment 28, try applying my patch - it does an ifconfig up before checking for RUNNING.  Also, for next time, it might be useful to remember the ``rc'' command - without arguments, it starts any services needed for your current runlevel, and stops any that shouldn't be running.
Comment 30 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-16 12:05:48 UTC
Created attachment 20832 [details, diff]
net_eth0-fix-carrier-detection.patch

Jason has the answer, thanks!  Please try attached patch with a bit
more error checking.

As for the other 'fix' in comment #3:

--
Found a solution by replacing :

status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/Link/ { if ($1 == IFACE)
print "up" }')"

with

status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '"/Link/" { if ($1 ==
IFACE) print "up" }')"

--

I do not like the '"/Link/"', etc things in gawk - its not correct according
to me (could be wrong).  I unfortunately did not check the status_IFACE at
the time I added the patch (cant remember which bug), but the 'more correct'
way is to do "gawk '$0 ~ /Link/ ....".
Comment 31 Seemant Kulleen (RETIRED) gentoo-dev 2003-11-16 12:11:02 UTC
Az, that patch seems to work for me
Comment 32 Ernst Herzberg 2003-11-16 15:57:07 UTC
Hm, i'm not happy with the patches so far. My understanding is that the new
net.ethX tries to avoid the long delay if the network cable is not pluggend
in and the interface want an ip adress with dhcp. That is a problem if you use
a laptop 'in fresh air' ;-)

But that is not real life. DHCP is desinged no minimize admin work. In most
cases if no network connection is found it is an error condition. Not only
if you wat use it on a server, even also if you try to plugin the laptop on
customer site and the network connection is not working propertly. What you
have to do in this case is to bring up the interface manualy and to start 
every depended service by hand. Or, if you a usual windows user, restart the 
machine.

dhcpcd has the option -t <timeout>, default 60sec. And i think, that can be 
a better resolution. Use a short timeout for laptops or a long for servers.
This resolution not only fixes this ugly carrier detect problem, it will also
fix the upcoming problem if a dhcp server is unavalilable due to reboot or
simple network overloads/problems or if there is no dhcp server in a working
network at all. It is also possible to fork a script or dhcpcd into the 
background to bring up the interface if a connection/dhcp server becomes 
available.

Pls forget the carrier detection.
Comment 33 Jason Rhinelander 2003-11-16 17:36:26 UTC
Alright, let's see if I can address this and save the devs some work:

> Hm, i'm not happy with the patches so far. My understanding is that the new
> net.ethX tries to avoid the long delay if the network cable is not pluggend
> in and the interface want an ip adress with dhcp. That is a problem if you use
> a laptop 'in fresh air' ;-)
> 
> But that is not real life. DHCP is desinged no minimize admin work. In most
> cases if no network connection is found it is an error condition. Not only
> if you wat use it on a server, even also if you try to plugin the laptop on
> customer site and the network connection is not working propertly. What you
> have to do in this case is to bring up the interface manualy and to start 
> every depended service by hand. Or, if you a usual windows user, restart the 
> machine.

Yes, that is precisely the point.  If no cable is plugged in, it isn't possible
that a connection can be established - there's no cable to establish the
connection on.  The carrier detection aids in that it displays a message on
boot if no network cable is plugged in.  An error is going to be displayed, you
can't change that fact, but an error such as "No cable connected" is going to
help you fix the problem faster than a generic "Couldn't start eth0" error
message.

With regards to your point about manually starting the interface, well, yes
you will have to do this - but the carrier detect doesn't factor in here
because the only time it will show up (and bypass dhcpcd) is if there is no
cable connected.  Once you've figured out the problem, and have brought the
network connection up manually, it's simply a matter of running ``rc'' - you
certainly don't have to manually start each service.

> dhcpcd has the option -t <timeout>, default 60sec. And i think, that can be 
> a better resolution. Use a short timeout for laptops or a long for servers.
> This resolution not only fixes this ugly carrier detect problem, it will also
> fix the upcoming problem if a dhcp server is unavalilable due to reboot or
> simple network overloads/problems or if there is no dhcp server in a working
> network at all. It is also possible to fork a script or dhcpcd into the 
> background to bring up the interface if a connection/dhcp server becomes 
> available.

I've done that on my own laptop; I lowered it from 60 to 15.  I would have
lowered it to 10 or even 5, except those didn't work - I ended up with
occassional dhcpcd timeouts on my home connection.  It was still a nuissance
as if I wanted to start up my notebook without a connection I'd have to wait
an additional 15 seconds for something that I _know_ is going to fail.

I fail to see what is "ugly" about the detection (at least, after applying the
final patch in comment 30).  Forking it off into the background isn't a
realistic option - many, many services depend on the network interface being
up, and can't be started until the network interface is up.

Nothing in your objection actually has anything to do with the carrier
detection - it seems to be more of a rant about not liking Gentoo's init
scripts for bringing up the network interface.  If you have a well thought out
suggestion on how to improve it in such a way that it doesn't introduce other
problems, please, by all means, file a bug!  I may even help you with it, and
I'm sure several devs would come on board if it was a really promising idea.

But that isn't what THIS bug is about.  This bug is about using carrier
detection to immediately return an informative error rather than waiting for a
dhcpcd timeout that is _GUARANTEED_ to happen.  It's a nice shortcut, and I
expect many users will find it useful - I know I've run into situations in the
back where a "network cable unplugged"-type error would have saved me time.  If
there are situations where the carrier detection breaks (for example, some
exotic network card that doesn't properly set the kernel's network interface
carrier bit), then please support then - in such a case it may be necessary to
make the code optional.

> Pls forget the carrier detection.

If you have some legitimate objections to the carrier detection rather than a
rant about Gentoo's boot design, please add your comments and I'll attempt to
help get them resolved.  Otherwise, the carrier detection is a useful feature
that, once patched, should help users diagnose network startup issues without
adverse effect.






That reminds me of something else: Azarah, perhaps the carrier detection should
go into the non-DHCP network setup as well?  You could check RUNNING
immediately after running ifconfig up.  In such a case it should probably be a
warning rather than a failure since the machine _does_ have an IP and the other
services _can_ start, even if they won't be accessible.  But even just a
warning might be useful to many.
Comment 34 Ernst Herzberg 2003-11-16 21:08:52 UTC
Ok. last post here,the discussion should be carry on another places. 
gento-dev?

> This bug is about using carrier detection to immediately return an
> informative error rather than waiting for a dhcpcd timeout that is 
> _GUARANTEED_ to happen. 

Exacty that does'nt happen every time. 

Bring to mind you have running some servers far far away, running gentoo.
In this location you have a power failure. You lost the machines if in
this millisecond where you ask for the carrier the switch is not available for
ever reason. Services like sshd won't start up even the network is available 
a second later. If you need this server for your laptop you are lost.
Comment 35 Jason Rhinelander 2003-11-17 00:36:44 UTC
Hmm, you've got a point (re: comment 34).  I'm all in favour of making this an option in /etc/conf.d/net.  It _is_ a nice feature for users with laptops, and a useful debugging tool for desktop systems, but it is a potential problem with headless machines (i.e. servers), as mentioned, since the check is an instantaneous thing, while dhcpcd by default allows up to 60 seconds for the network connection to be established.  However, the approach I am suggesting is not "drop it, it's stupid, it can be bad sometimes so should never be used" but rather "do the carrier check by default, but provide an easy option to turn it off for those cases where it isn't wanted."
Comment 36 Jason Rhinelander 2003-11-17 00:38:02 UTC
This is a report of comment 34, prompted by Bugzilla's wrapping (or lack thereof).  I apologize.

Hmm, you've got a point (re: comment 34).  I'm all in favour of making this an
option in /etc/conf.d/net.  It _is_ a nice feature for users with laptops, and
a useful debugging tool for desktop systems, but it is a potential problem with
headless machines (i.e. servers), as mentioned, since the check is an
instantaneous thing, while dhcpcd by default allows up to 60 seconds for the
network connection to be established.  However, the approach I am suggesting is
not "drop it, it's stupid, it can be bad sometimes so should never be used" but
rather "do the carrier check by default, but provide an easy option to turn it
off for those cases where it isn't wanted."
Comment 37 Simon Watson 2003-11-17 01:27:37 UTC
Can we get back to the point please?

The simple problem that I am having is that carrier detection does not work on my machine. For one reason or another my network card does not appear to support it (see comment 24). This has caused problems for me, and also it seems for other people. It returned that there was no carrier - when there blatantly was! Surely if carrier detection is that unreliable - then there is no way we should allow it to be a default option?
Comment 38 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-17 12:08:40 UTC
Simon, did you even try Jason's original patch, or my final version ??
Comment 39 Imad R. Faiad 2003-11-17 13:44:09 UTC
Martin, many thanks, your latest patch fixed it :-)
Comment 40 Marc Bevand 2003-11-18 04:19:58 UTC
Created attachment 20905 [details, diff]
This workaround is needed because the behaviour of some ethernet drivers breaks the carrier autodection stuff in net.eth0

The new carrier autodetection stuff in net.eth0 has introduced a problem: after

a 'ifconfig up' some ethernet drivers (eg: e100) do not immediately mark the
interface as 'RUNNING' (and as Jason has pointed it out in comment #25, some
drivers do not even do it).

But for those that have the chance to have a driver that set this IFF_RUNNING
bit (but after some delay), I propose this patch.
Comment 41 SpanKY gentoo-dev 2003-11-18 09:00:38 UTC
*** Bug 33761 has been marked as a duplicate of this bug. ***
Comment 42 Jason Rhinelander 2003-11-18 10:53:48 UTC
I noticed the same thing as Marc (comment 40) yesterday on my broadcom (tg3)
card, but only when the system first comes up.  `ifconfig down; ifconfig up'
would show RUNNING immediately, but as Marc described, I also found a 2 second
timeout was needed when the system first booted for my tg3 adapter.
Comment 43 Scott Taylor (RETIRED) gentoo-dev 2003-11-18 12:00:18 UTC
Even comment #3 got things running on my set of mostly-e100 machines.
Checking for a carrier does sound like a decent idea. I might even suggest firing
up dhcp but *not waiting* to see if it succeeded if there was no carrier, but
watching for success if there already was a carrier. So leaving it on for laptops
and such would not slow down booting but allow them to hopefully come to life as
soon as they are plugged in. And get around the whole problem of cards not being
properly recognized by this logic.

In any case, what are the chances of getting something - anything - applied to
baselayout soon? I had to go patch this file on one of my machines here that I'd
forgotten to fix... I was hoping it would've solved itself by now.
Comment 44 Robert Thorneycroft 2003-11-18 18:13:59 UTC
OK I could be wrong on this, but I believe that looking for Link in the script is incorrect, the only time this will not be present is in event of the interface not being present at all but this is not what the script appears to be looking for, it  is just trying to check whether the status of the interface is up.

As such I have made the following modifications to my script, which appears to work properly in all situations apart from a complete card failure (If we want to check for this it will require some additional code).

Changes to net.eth0 script are as follows:

<status_IFACE="$(ifconfig ${iface} | gawk '/UP/ { print "up" }')"
<carrier_IFACE="$(ifconfig ${iface} | gawk '/RUNNING/ { print "running" }')"
---
>status_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/Link/ { if ($1 == IFACE) print "up" }')"
>carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 == IFACE) print "running" }')"

<if [ "${carrier_IFACE}" != "running" ]
---
>if [ "${carrier_IFACE}" = "running" ]

Kind regards,

Robert Thorneycroft
Comment 45 Simon Watson 2003-11-18 23:39:02 UTC
Re #38 - Sorry, yes I tried the final patch and it does seem better.
Comment 46 Andreas Vinsander 2003-11-19 01:03:22 UTC
Created attachment 20935 [details, diff]
A little revised patch that works for me (none of the others did)

This is a revised version of Roberts script (comment 44).
It works for me (none of the others did), plz give it a try!

/Andreas
(I use the eepro100 driver, maybe interesting to know that to be able to decide
what works and not)
Comment 47 Robert Thorneycroft 2003-11-19 11:26:18 UTC
Re #46  and latest patch.

Some parts of this patch seem to be unnecessary with the new entries suggested 
in my earlier post.

Specifically the following sections make the code more difficult to read and do 
not perform anything that was not covered before:

-	carrier_IFACE="$(ifconfig | gawk -v IFACE="${iface}" '/RUNNING/ { if ($1 == IFACE) print "running" }')"

and

 			# Check that the interface has a carrier
-			if [ "${carrier_IFACE}" = "running" ]
+			/sbin/ifconfig ${IFACE} |grep -q 'RUNNING' >/dev/null
+			if [ $? -eq 0 ]

If the line:
carrier_IFACE="$(ifconfig ${iface} | gawk '/RUNNING/ { print "running" }')"
is used instead of removing the carrier_IFACE definition from setup_env(), then
there is no need to run ifconfig commands and check return codes in the middle 
of iface_start(), doing so only makes the code more complex and difficult to 
understand.

It should still be noted that the following modification also needs to be made:

- if [ "${carrier_IFACE}" = "running" ]
+ if [ "${carrier_IFACE}" != "running" ]

This is because if the status is already running then dhcpcd does not need to be
started, in your example you appear to be only trying to start dhcpcd if the 
interface is already up and running?

I apologise if I did not understand your code correctly.

Kind regards,

Robert Thorneycroft
Comment 48 Ernst Herzberg 2003-11-19 19:30:49 UTC
Some thoughts:

A dhcp interface is like a pcmcia card. If it is not plugged at boot time it
is a bad idea to wait for a timeout that somebody plug it in. But you are able
to plug in this card at every time. And all services will immediatly listen
also on this interface magically, if not otherwise configured.

You can at every time load a module for a network card, even if it is not
connected to a network. If it is successful loaded you can see it in
/proc/net/dev. A 'dhcpcd -t 99999 &' will wait a long time to bring that interface UP;-) A carrier detection is not really nessesary, but will speed up
things. Especial when it is build into dhcpcd.

Some services should start even no (remote) network is available. sshd is a 
good example. From a server site view. And sometimes you can abuse a laptop
as a server. Or think about PCI hotplug.
Comment 49 SpanKY gentoo-dev 2003-11-20 14:34:01 UTC
*** Bug 33961 has been marked as a duplicate of this bug. ***
Comment 50 Martin Holzer (RETIRED) gentoo-dev 2003-11-20 15:13:42 UTC
*** Bug 33970 has been marked as a duplicate of this bug. ***
Comment 51 Martin Holzer (RETIRED) gentoo-dev 2003-11-20 15:13:50 UTC
*** Bug 33969 has been marked as a duplicate of this bug. ***
Comment 52 Andreas Vinsander 2003-11-21 07:22:02 UTC
Re: comment 47
The problem is as Jason stated in comment 25, that the 'RUNNING' info is not shown  if the device isn't up (at least for my device). That's why we/I need to do the check later on.

Also I added a try to take the device down when no carrier was found, which makes it behave better if I try to restart the net.eth0 script later - else I end up with a message that everything is ok when it isn't.

More comments? Anybody else that have tried my little patch? Results?
Comment 53 Paul Taylor 2003-11-22 14:37:01 UTC
Can baselayout 1.8.6.12 be pulled from the portage tree until these issues 
are thrashed out?
There's a good chance than anyone who installs 1.8.6.12 will break their 
network connection (as I did) - that well and truly cancels any potential 
gains it may have over 1.8.6.11 (and which works fine for me.)
Comment 54 Siegbert Baude 2003-11-23 09:13:46 UTC
Just to add some data: net.eth0 failed on my system, too.
NIC: SMSC EPIC/100 83c170

Martin's patch from comment 30 solved the problem.

ethtool's output is independent of ifconfig up|down status:
With cable plugged in:
bash-2.05b# ethtool eth0|grep Link
        Link detected: yes

With cable plugged off:
bash-2.05b# ethtool eth0|grep Link
        Link detected: no

But all ifconfig output is independent of the cable plugging, so there is no way to detect carrier with this NIC simply by looking at the ifconfig output. You would need the routines of ethtool to achieve this.

If ifconfig is down:
bash-2.05b# ifconfig|grep eth0
bash-2.05b# ifconfig eth0
eth0      Protokoll:Ethernet  Hardware Adresse 00:E0:29:28:0C:A1
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:314937 errors:0 dropped:0 overruns:0 frame:3
          TX packets:59428 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenl
Comment 55 Siegbert Baude 2003-11-23 09:13:46 UTC
Just to add some data: net.eth0 failed on my system, too.
NIC: SMSC EPIC/100 83c170

Martin's patch from comment 30 solved the problem.

ethtool's output is independent of ifconfig up|down status:
With cable plugged in:
bash-2.05b# ethtool eth0|grep Link
        Link detected: yes

With cable plugged off:
bash-2.05b# ethtool eth0|grep Link
        Link detected: no

But all ifconfig output is independent of the cable plugging, so there is no way to detect carrier with this NIC simply by looking at the ifconfig output. You would need the routines of ethtool to achieve this.

If ifconfig is down:
bash-2.05b# ifconfig|grep eth0
bash-2.05b# ifconfig eth0
eth0      Protokoll:Ethernet  Hardware Adresse 00:E0:29:28:0C:A1
          BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:314937 errors:0 dropped:0 overruns:0 frame:3
          TX packets:59428 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:100
          RX bytes:86662192 (82.6 Mb)  TX bytes:3285136 (3.1 Mb)
          Interrupt:10 Basisadresse:0xd400

If ifconfig is up:
bash-2.05b# ifconfig
eth0      Protokoll:Ethernet  Hardware Adresse 00:E0:29:28:0C:A1
          inet Adresse:134.60.106.64  Bcast:134.60.106.127  Maske:255.255.255.128
          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:315476 errors:0 dropped:0 overruns:0 frame:3
          TX packets:59438 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:100
          RX bytes:86701793 (82.6 Mb)  TX bytes:3286730 (3.1 Mb)
          Interrupt:10 Basisadresse:0xd400
Comment 56 Paul Slinski 2003-11-24 07:32:32 UTC
I hate to sound harsh, this needs to be pulled and rolled back until a better solution is implimented. It took down 9 of 20 servers here and 4 workstations. I can deal with most all issues, however, this is a bigger problem than a buggy application or ebuild. Please roll it back.
Comment 57 Matt Eaton 2003-11-24 07:45:30 UTC
As per comment #54->
This works great for me (2.6 8139too driver).

(replacing just the carrier_IFACE= line):

carrier_IFACE="$(ethtool ${iface} | gawk '/Link detected: yes/ { print "running" }')"

Though I'm not sure it would work for all network drivers:

From ethtool manpage:
BUGS
       Not supported (in part or whole) on all ethernet drivers.
Comment 58 Christian Axelsson 2003-11-25 04:32:37 UTC
How about giving up this current net.X and make some good ifplugd replacements?
Comment 59 Martin Schlemmer (RETIRED) gentoo-dev 2003-11-26 11:49:37 UTC
*** Bug 34370 has been marked as a duplicate of this bug. ***
Comment 60 Darryl Bleau 2003-11-28 12:14:27 UTC
Just thought I'd add a 'me-too', v1.32 of net.eth0 broke all our network cards in the office on every gentoo box.

Going back to the 1.31 version (or, I would suspect, 1.34 currently in cvs) will fix the issue. Perhaps this should be made live asap if not done already?
Comment 61 didier Belot 2003-11-29 01:57:18 UTC
Created attachment 21458 [details, diff]
$(LC_MESSAGES=C ifconfig ...)

On a localized system, you can't rely on messages displayed by the software,
unless you reset LC_MESSAGES when running it.

On my french gentoo, ifconfig say Lien instead of Link ! ;-)

hope this help.
Comment 62 Dan A. Dickey 2003-12-01 06:23:36 UTC
I thought I'd add my two cents as well... since everyone else seems to be doing so.

I'm in favor of not checking for a link state in net.eth0.

I was un-aware of ethtool.  I've been using mii-tool for years.
It seems, at least for me; that mii-tool works a bit better for the cards
that support it.  I can ifconfig eth0 down, and ethtool will not show a link.
mii-tool however, will properly show what is going on at the "plug" level.
In my case, after a slight pause for autonegotiation to occur; mii-tool shows
that there is a link even though eth0 is down.  ethtool still shows no link.
I suspect that when one uses ifconfig to up or down an interface, it resets
the connection and auto-negotiation needs to occur before settling to link on.
I've done some network driver programming myself, and have been using mii-tool
for a while.  Ethtool is new to me.  It must operate on a somewhat higher level,
perhaps that of ethX (hence the name!).  mii-tool talks more or less directly
to the driver, which in turn takes a look at the MII registers for the card.
Not all cards support MII (would have to be pretty old for this to be the case),
I don't think I even have anything like this... I've got about 20 old NICs
in a box in my basement (various brands) and I can't remember ONE that does not
support MII.

Anyways, that's my two cents.
Comment 63 Imad R. Faiad 2003-12-10 08:58:10 UTC
none of the fixes provided thus far work on all nic's
one fix, without downgrading baselayout, is to grab net.eth0 script
from rc-scripts-1.4.3.11p2.tar.bz2, and use it until this problem is solved.
Comment 64 Dan A. Dickey 2004-02-23 05:11:20 UTC
Wow, this bug is still "new".  :)

After checking out the latest addition, I'd like to ammend my comment above.

It has since become apparent to me that mii-tool is "old" and on the way out,
while ehtool is "new" and on the way in.
Any use of mii-tool should be migrated to ethtool.
Any planned use of checking for link state in net.ethX scripts should
use ethtool.
Comment 65 Jason Rhinelander 2004-02-23 11:47:05 UTC
Re: comment 63, ethtool won't solve this problem.  In particular, see 'man ethtool':

BUGS
        Not supported (in part or whole) on all ethernet devices

Furthermore, try this:
# mii-tool eth0
# ethtool eth0
# ifconfig eth0 down
# mii-tool eth0
# ethtool eth0




In reply to this bug in general, here are the outputs of mii-tool and ethtool given four different network cards I have quick access to, which are all fairly common:

On my 'tg3' (Broadcom gigabit) on my Dell notebook, I get:

# mii-tool # Same whether up or down:
eth0: negotiated 100baseTx-FD flow-control, link ok
# ethtool eth0 # when eth0 down:
Settings for eth0:
Cannot get device settings: Resource temporarily unavailable
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: no
# ethtool eth0 # when eth0 up:
Settings for eth0:
        Supported ports: [ MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes




Now, switching over to my desktop system, which uses 'forcedeth' (the reverse-engineered nForce Ethernet driver):

# mii-tool eth0 # same whether up or down:
SIOCGMIIPHY on 'eth0' failed: Operation not supported
# ethtool eth0 # same whether up or down:
Settings for eth0:
No data available



Here's a via-rhine:
# mii-tool eth0 # eth0 up:
eth0: negotiated 100baseTx-FD flow-control, link ok
# mii-tool eth0 # eth0 down:
SIOCGMIIPHY on 'eth0' failed: Operation not supported
# ethtool eth0 # same output whether up or down:
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Current message level: 0x00000001 (1)
        Link detected: yes


And finally, here's a 3c59x:
# mii-tool eth0 # eth0 up:
eth0: negotiated 100baseTx-FD, link ok
# mii-tool eth0 # eth0 down:
  No MII transceiver present!.
# ethtool eth0 # Whether up or down:
Settings for eth0:
No data available



So, I've given 4 network cards, which are all reasonably common (I've got a Realtek around here somewhere, but it's not as easily accessible as it isn't currently in a machine), none of which work in both mii-tool and ethtool.  It doesn't appear that either mii-tool or ethtool can get the job done, so depending on either would appear to be out of the question.

In the interests of getting this bug fixed, how about adding some new configuration options to /etc/conf.d/net - perhaps along the lines of:

# To enable link-detection (does not work on all network cards):
#checklink_eth0="mii-tool"
#checklink_eth1="ethtool"
#checklink_eth2="ifconfig"

That would give the capability to people with cards that support it, and not force a significant problem onto those with cards that don't support it.

Furthermore, as has already been mentioned about 50 comments ago, not bringing up a network interface because the cable is unplugged is often a Bad Thing - if a cable is unplugged, to get the system back on the network all I should have to do is plug the cable back in.  The above mentioned configuration option would give people the ability by simply commenting it out.
Comment 66 Aron Griffis (RETIRED) gentoo-dev 2004-05-05 19:52:57 UTC
As Jason mentioned in the previous comment, mii-tool and ethtool support are spotty in the drivers.  mii-tool only works up to 100 Mbit adapters (though tg3 seems to support it for link detection).  I have two 10 Gbit adapters that only support ethtool.

Since this bug was opened, net.eth0 was completely rewritten.  It now has preup() and predown() functions which can be defined in /etc/conf.d/net.  The functions are called with the interface as the first parameter, for example you can do:

preup() {
  if [[ $1 == eth0 ]]; then
    # hey, I *know* this card supports ethtool
    ethtool | grep -q 'Link detected: yes'
    return $?
  fi
  return 0
}

net.eth0 will abort if the return value from preup() is non-zero.  Yes, this requires more work on your part, but it means you get to choose unequivocally what kind of link detection you want to do.  That's what Gentoo is about, right?  Choice!  :-)

I don't foresee using mii-tool or ethtool or any other kind of link detection directly in net.* any time in the future, so I'm closing this bug at this point FIXED since you have preup() available.
Comment 67 Aron Griffis (RETIRED) gentoo-dev 2004-05-07 15:39:25 UTC
*** Bug 25480 has been marked as a duplicate of this bug. ***
Comment 68 Ciaran McCreesh 2004-07-27 14:33:23 UTC
*** Bug 58586 has been marked as a duplicate of this bug. ***