Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 359671 - r8169 stopped working in gentoo-sources-2.6.38
Summary: r8169 stopped working in gentoo-sources-2.6.38
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: Normal major (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard: linux-3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-20 21:49 UTC by Sannin
Modified: 2011-08-19 14:02 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
dmesg (dmesg,41.62 KB, text/plain)
2011-03-20 21:50 UTC, Sannin
Details
.config (config,64.74 KB, text/plain)
2011-03-20 21:50 UTC, Sannin
Details
Bisect log (bisect.log,2.81 KB, text/plain)
2011-03-29 13:13 UTC, Sannin
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Sannin 2011-03-20 21:49:40 UTC
I updated the kernel to 2.6.38 recently. I had many problems (mostly ACPI related), but the reason i open this bug is because my nic stopped working. In 2.6.37-r1 kernel it works just fine. r8169 was built-in, but i have the same results as module. Some more info:

dmesg:

r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready

ethtool: 

Settings for eth0:
	Supported ports: [ TP MII ]
	Supported link modes:   10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Supports auto-negotiation: Yes
	Advertised link modes:  10baseT/Half 10baseT/Full 
	                        100baseT/Half 100baseT/Full 
	                        1000baseT/Half 1000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Speed: 10Mb/s
	Duplex: Half
	Port: MII
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: pumbg
	Wake-on: g
	Current message level: 0x00000033 (51)
			       drv probe ifdown ifup
	Link detected: no

Reproducible: Always
Comment 1 Sannin 2011-03-20 21:50:17 UTC
Created attachment 266647 [details]
dmesg
Comment 2 Sannin 2011-03-20 21:50:37 UTC
Created attachment 266649 [details]
.config
Comment 3 Sannin 2011-03-29 13:12:39 UTC
I did a git bisect to figure out the problem. It took me several tries to do that correctly but nevermind that :P. It seems that the guilty commit is this:

eee3a96c6368f47df8df5bd4ed1843600652b337 is the first bad commit
commit eee3a96c6368f47df8df5bd4ed1843600652b337
Author: françois romieu <romieu@fr.zoreil.com>
Date:   Sat Jan 8 02:17:26 2011 +0000

    r8169: delay phy init until device opens.
    
    It workarounds the 60s firmware load failure timeout for the
    non-modular case.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>


There is one difference with my current gentoo-sources-2.6.37-r1 kernel and the working git kernels i built. In the first case i never had any message that eth0 is down, but in the working git kernels i get this:

r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready
Adding 1060284k swap on /dev/sda2.  Priority:-1 extents:1 across:1060284k 
r8169 0000:02:00.0: eth0: link up
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Link is down at first, and it becomes ready some seconds later. If you need any more information from me, please tell me what to do.
Comment 4 Sannin 2011-03-29 13:13:46 UTC
Created attachment 267691 [details]
Bisect log
Comment 5 Arkadiusz Miskiewicz 2011-03-31 13:51:40 UTC
Bugreport also to https://bugzilla.kernel.org/ because it's not gentoo specific problem.

It works fine on 100Mbps switch but fails to work on 1Gbps switch here.
Comment 6 Stratos Psomadakis (RETIRED) gentoo-dev 2011-04-21 11:20:58 UTC
(In reply to comment #3)
> There is one difference with my current gentoo-sources-2.6.37-r1 kernel and the
> working git kernels i built. In the first case i never had any message that
> eth0 is down, but in the working git kernels i get this:
> 
> r8169 0000:02:00.0: eth0: link down
> r8169 0000:02:00.0: eth0: link down
> ADDRCONF(NETDEV_UP): eth0: link is not ready
> Adding 1060284k swap on /dev/sda2.  Priority:-1 extents:1 across:1060284k 
> r8169 0000:02:00.0: eth0: link up
> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> 
> Link is down at first, and it becomes ready some seconds later. If you need any
> more information from me, please tell me what to do.
This is the dmesg output with the latest git kernels?
The log is the same with a bug 'reported' at Gentoo Forums [1], which was solved by upgrading to a newer version of dhcpcd.
And it's different from the original dmesg log you posted, with a 2.6.38 kernel.

[1] http://forums.gentoo.org/viewtopic-p-6648853.html
Comment 7 Sannin 2011-04-22 20:01:47 UTC
(In reply to comment #6)

> This is the dmesg output with the latest git kernels?
> The log is the same with a bug 'reported' at Gentoo Forums [1], which was
> solved by upgrading to a newer version of dhcpcd.
> And it's different from the original dmesg log you posted, with a 2.6.38
> kernel.
> 
> [1] http://forums.gentoo.org/viewtopic-p-6648853.html


That dmesg output was from a working git kernel somewhere between 2.6.37 and 2.6.38. The nic stopped working after the commit i have reported in my previous post with this dmesg output:

r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready

I am not using dhcp in my configuration, so i guess its a different bug.
Comment 8 Leho Kraav (:macmaN @lkraav) 2011-05-05 07:48:46 UTC
it looks like i'm getting completing system hangs with 2.6.38-r3 r8169 when doing high activity network stuff over 1Gbps link.

im seeing this in syslog during transfer, after a little while machine locks up:

May  5 10:42:57 kernel: [  507.990569] net_ratelimit: 6 callbacks suppressed
May  5 10:42:57 kernel: [  507.990575] r8169 0000:04:00.0: eth0: link up
May  5 10:42:58 kernel: [  509.009880] r8169 0000:04:00.0: eth0: link up
May  5 10:42:58 kernel: [  509.083173] r8169 0000:04:00.0: eth0: link up
May  5 10:42:59 kernel: [  509.212986] r8169 0000:04:00.0: eth0: link up
May  5 10:42:59 kernel: [  509.263041] r8169 0000:04:00.0: eth0: link up
...
Comment 9 Leho Kraav (:macmaN @lkraav) 2011-05-05 07:51:41 UTC
(In reply to comment #8)
> it looks like i'm getting completing system hangs with 2.6.38-r3 r8169 when
> doing high activity network stuff over 1Gbps link.

forgot to mention, 2.6.38-tuxonice-r4 used to be rock solid on the same hardware.
Comment 10 Leho Kraav (:macmaN @lkraav) 2011-05-05 09:04:23 UTC
for me, it could be this: https://bugzilla.kernel.org/show_bug.cgi?id=32962
Comment 11 Leho Kraav (:macmaN @lkraav) 2011-05-05 13:11:55 UTC
i've installed realtek's out-of-tree r8168 driver and have a working nic again for now. more details at https://bugzilla.kernel.org/show_bug.cgi?id=32962#c22
Comment 12 Stratos Psomadakis (RETIRED) gentoo-dev 2011-05-06 12:11:07 UTC
Can anyone try davem's net-next tree? [1] 
According to an upstream bugs [2][3], it could fix your problem (support for new chips). If it works, maybe we can try to port the changes of the net-next tree to 2.6.39 (when it gets released). Backporting to 2.6.38 would be overkill probably.

[1] http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=summary
[2] https://bugzilla.kernel.org/show_bug.cgi?id=32962
[3] https://bugzilla.kernel.org/show_bug.cgi?id=34172
Comment 13 Sannin 2011-05-07 01:08:15 UTC
It doesn't work for my problem. i still get this:

r8169 0000:02:00.0: eth0: link down
r8169 0000:02:00.0: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready

It may work for the other guys, cause the bug they are facing seems different from mine.
Comment 14 Leho Kraav (:macmaN @lkraav) 2011-05-07 07:26:08 UTC
@Sannin: i have now done a couple of hundred GBs backup runs and all my normal work stuff with the r8168 out of tree driver. no errors anywhere, performance has been as expected. other than a bit of a maintenance headache i'm not sure i see a reason not to use this driver. although nowhere in your logs is it visible what network card model you actually have.
Comment 15 Sannin 2011-05-12 18:38:02 UTC
I think i am hitting a wall here... Not even the driver from realtek works for me:

~ $ dmesg | grep -i eth0
eth0: Identified chip type is 'RTL8168C/8111C'.
r8168: eth0: link down
ADDRCONF(NETDEV_UP): eth0: link is not ready

Maybe it was something with my .config, but i tried a new .config from kernel-seeds.org but it didn't work as well.
Comment 16 Mike Pagano gentoo-dev 2011-06-15 20:26:03 UTC
Sannin, Can I see the output of lspci -vv

Can you try to turn Auto-negotiation off with something like:

/usr/sbin/ethtool -s eth0 autoneg off
Comment 17 Sannin 2011-07-22 18:10:47 UTC
Sorry for taking forever to post a reply, i had some health issues.

I found out by a coincidence where was the problem. It seems that this nic does not work so well with my Zyxel 660HW after kernel 2.6.37. So the bug appears under these conditions:

- RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
- Zyxel 660HW
- kernel >=2.6.38

The funny thing is that i discovered the cause when i connected a Windows 7 pc with the same nic on my router and it didn't work correctly either.
I bought another router so case closed for me, but if you need some other info please tell me.

Thank you all for your efforts.
Comment 18 Leho Kraav (:macmaN @lkraav) 2011-08-19 14:02:19 UTC
i should note here that my network controller is still incompatible with r8169 in 3.0.2. r8168 requires regex changes in Makefile, otherwise seems to work fine with 3+.