I updated the kernel to 2.6.38 recently. I had many problems (mostly ACPI related), but the reason i open this bug is because my nic stopped working. In 2.6.37-r1 kernel it works just fine. r8169 was built-in, but i have the same results as module. Some more info: dmesg: r8169 0000:02:00.0: eth0: link down r8169 0000:02:00.0: eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready ethtool: Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Speed: 10Mb/s Duplex: Half Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) drv probe ifdown ifup Link detected: no Reproducible: Always
Created attachment 266647 [details] dmesg
Created attachment 266649 [details] .config
I did a git bisect to figure out the problem. It took me several tries to do that correctly but nevermind that :P. It seems that the guilty commit is this: eee3a96c6368f47df8df5bd4ed1843600652b337 is the first bad commit commit eee3a96c6368f47df8df5bd4ed1843600652b337 Author: françois romieu <romieu@fr.zoreil.com> Date: Sat Jan 8 02:17:26 2011 +0000 r8169: delay phy init until device opens. It workarounds the 60s firmware load failure timeout for the non-modular case. Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net> There is one difference with my current gentoo-sources-2.6.37-r1 kernel and the working git kernels i built. In the first case i never had any message that eth0 is down, but in the working git kernels i get this: r8169 0000:02:00.0: eth0: link down r8169 0000:02:00.0: eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready Adding 1060284k swap on /dev/sda2. Priority:-1 extents:1 across:1060284k r8169 0000:02:00.0: eth0: link up ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Link is down at first, and it becomes ready some seconds later. If you need any more information from me, please tell me what to do.
Created attachment 267691 [details] Bisect log
Bugreport also to https://bugzilla.kernel.org/ because it's not gentoo specific problem. It works fine on 100Mbps switch but fails to work on 1Gbps switch here.
(In reply to comment #3) > There is one difference with my current gentoo-sources-2.6.37-r1 kernel and the > working git kernels i built. In the first case i never had any message that > eth0 is down, but in the working git kernels i get this: > > r8169 0000:02:00.0: eth0: link down > r8169 0000:02:00.0: eth0: link down > ADDRCONF(NETDEV_UP): eth0: link is not ready > Adding 1060284k swap on /dev/sda2. Priority:-1 extents:1 across:1060284k > r8169 0000:02:00.0: eth0: link up > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > Link is down at first, and it becomes ready some seconds later. If you need any > more information from me, please tell me what to do. This is the dmesg output with the latest git kernels? The log is the same with a bug 'reported' at Gentoo Forums [1], which was solved by upgrading to a newer version of dhcpcd. And it's different from the original dmesg log you posted, with a 2.6.38 kernel. [1] http://forums.gentoo.org/viewtopic-p-6648853.html
(In reply to comment #6) > This is the dmesg output with the latest git kernels? > The log is the same with a bug 'reported' at Gentoo Forums [1], which was > solved by upgrading to a newer version of dhcpcd. > And it's different from the original dmesg log you posted, with a 2.6.38 > kernel. > > [1] http://forums.gentoo.org/viewtopic-p-6648853.html That dmesg output was from a working git kernel somewhere between 2.6.37 and 2.6.38. The nic stopped working after the commit i have reported in my previous post with this dmesg output: r8169 0000:02:00.0: eth0: link down r8169 0000:02:00.0: eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready I am not using dhcp in my configuration, so i guess its a different bug.
it looks like i'm getting completing system hangs with 2.6.38-r3 r8169 when doing high activity network stuff over 1Gbps link. im seeing this in syslog during transfer, after a little while machine locks up: May 5 10:42:57 kernel: [ 507.990569] net_ratelimit: 6 callbacks suppressed May 5 10:42:57 kernel: [ 507.990575] r8169 0000:04:00.0: eth0: link up May 5 10:42:58 kernel: [ 509.009880] r8169 0000:04:00.0: eth0: link up May 5 10:42:58 kernel: [ 509.083173] r8169 0000:04:00.0: eth0: link up May 5 10:42:59 kernel: [ 509.212986] r8169 0000:04:00.0: eth0: link up May 5 10:42:59 kernel: [ 509.263041] r8169 0000:04:00.0: eth0: link up ...
(In reply to comment #8) > it looks like i'm getting completing system hangs with 2.6.38-r3 r8169 when > doing high activity network stuff over 1Gbps link. forgot to mention, 2.6.38-tuxonice-r4 used to be rock solid on the same hardware.
for me, it could be this: https://bugzilla.kernel.org/show_bug.cgi?id=32962
i've installed realtek's out-of-tree r8168 driver and have a working nic again for now. more details at https://bugzilla.kernel.org/show_bug.cgi?id=32962#c22
Can anyone try davem's net-next tree? [1] According to an upstream bugs [2][3], it could fix your problem (support for new chips). If it works, maybe we can try to port the changes of the net-next tree to 2.6.39 (when it gets released). Backporting to 2.6.38 would be overkill probably. [1] http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=summary [2] https://bugzilla.kernel.org/show_bug.cgi?id=32962 [3] https://bugzilla.kernel.org/show_bug.cgi?id=34172
It doesn't work for my problem. i still get this: r8169 0000:02:00.0: eth0: link down r8169 0000:02:00.0: eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready It may work for the other guys, cause the bug they are facing seems different from mine.
@Sannin: i have now done a couple of hundred GBs backup runs and all my normal work stuff with the r8168 out of tree driver. no errors anywhere, performance has been as expected. other than a bit of a maintenance headache i'm not sure i see a reason not to use this driver. although nowhere in your logs is it visible what network card model you actually have.
I think i am hitting a wall here... Not even the driver from realtek works for me: ~ $ dmesg | grep -i eth0 eth0: Identified chip type is 'RTL8168C/8111C'. r8168: eth0: link down ADDRCONF(NETDEV_UP): eth0: link is not ready Maybe it was something with my .config, but i tried a new .config from kernel-seeds.org but it didn't work as well.
Sannin, Can I see the output of lspci -vv Can you try to turn Auto-negotiation off with something like: /usr/sbin/ethtool -s eth0 autoneg off
Sorry for taking forever to post a reply, i had some health issues. I found out by a coincidence where was the problem. It seems that this nic does not work so well with my Zyxel 660HW after kernel 2.6.37. So the bug appears under these conditions: - RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02) - Zyxel 660HW - kernel >=2.6.38 The funny thing is that i discovered the cause when i connected a Windows 7 pc with the same nic on my router and it didn't work correctly either. I bought another router so case closed for me, but if you need some other info please tell me. Thank you all for your efforts.
i should note here that my network controller is still incompatible with r8169 in 3.0.2. r8168 requires regex changes in Makefile, otherwise seems to work fine with 3+.