Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 716286

Summary: sys-kernel/gentoo-sources-5.4.28: r8169 is unstable.
Product: Gentoo Linux Reporter: anonymous <fakih18716>
Component: Current packagesAssignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status: RESOLVED FIXED    
Severity: normal Keywords: UPSTREAM
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/drivers/net/ethernet/realtek/r8169_main.c
See Also: https://bugzilla.kernel.org/show_bug.cgi?id=207205
Whiteboard: Unknown hardware erratum, Linux 5.8
Package list:
Runtime testing required: ---
Attachments: On 4.19.97, r8169 works fine.
On 5.4, r8169 stops working after a while.

Description anonymous 2020-04-05 09:34:10 UTC
I upgraded from 4.19.97 to 5.4.28.
With 5.4.28, r8169 is unstable. It stops working after a few minutes.
Comment 1 Mike Pagano gentoo-dev 2020-04-05 11:29:38 UTC
attach dmesg from working to non working
attach your .config

try the latest 5.4.X which is 5.4.30 as of this writing
try the latest 5.5.X which is 5.5.15 as of this writing
Comment 2 Maxim P. Dementiev 2020-04-07 16:05:05 UTC
Stable sys-kernel/gentoo-sources-5.4.28 contains this issue:
https://gitlab.freedesktop.org/drm/intel/issues/827
The system became unstable.
Comment 3 anonymous 2020-04-09 11:53:25 UTC
I couldn't test linux 5.5 because zfs-kmod and virtualbox-modules doesn't support linux 5.5. Instead, I have test results from 4.19 and 5.4.
Comment 4 anonymous 2020-04-09 11:54:02 UTC
Created attachment 631596 [details]
On 4.19.97, r8169 works fine.
Comment 5 anonymous 2020-04-09 11:54:43 UTC
Created attachment 631598 [details]
On 5.4, r8169 stops working after a while.
Comment 6 Mike Pagano gentoo-dev 2020-05-21 17:16:46 UTC
There a lot of 8169 patches in the net-next.git tree queued up.


Around 10 or so.  Not sure if adding all of those will fix your issues or make new ones.  

For sure, I expect these patches to make it to a future kernel.  

Do you want to grab all 10, apply to 5.6.X and test ?

If so, apply this one and all newer ones

"r8169: add helper r8168g_wait_ll_share_fifo_ready"

Link in URL field.
Comment 7 anonymous 2020-05-22 05:37:50 UTC
zfs-kmod breaks after 5.4
Comment 8 Mike Pagano gentoo-dev 2020-05-22 11:49:12 UTC
(In reply to crocket from comment #7)
> zfs-kmod breaks after 5.4

Is there an upstream bug on that?
Comment 9 anonymous 2020-05-22 12:06:43 UTC
I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4 and below.
Comment 10 Mike Pagano gentoo-dev 2020-05-22 22:19:34 UTC
(In reply to crocket from comment #9)
> I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4
> and below.

You can try applying those to 5.4. Running out of ideas for you
Comment 11 anonymous 2020-05-23 12:44:54 UTC
I think I will let https://bugzilla.kernel.org/show_bug.cgi?id=207205 fix it.
Comment 12 Richard Yao (RETIRED) gentoo-dev 2020-05-24 02:24:46 UTC
(In reply to crocket from comment #9)
> I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4
> and below.

Use 0.8.4 from ~amd64 to get Linux 5.6 support. It will likely be stabilized in the near future. Alternatively, you can use the 9999 ebuild if you are running a bleeding edge kernel. It has no version check.
Comment 13 anonymous 2020-05-24 07:07:15 UTC
The new patches will become available on 5.8. I will wait for a version of zfs-kmod that works with 5.8.
Comment 14 Richard Yao (RETIRED) gentoo-dev 2020-05-25 00:12:20 UTC
I installed sys-kernel/gentoo-sources-5.4.38 on a system with an Asrock B450 Pro4 motherboard that has the following NIC today:

09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
        Subsystem: ASRock Incorporation Motherboard (one of many)
        Flags: bus master, fast devsel, latency 0, IRQ 24
        I/O ports at d000 [size=256]
        Memory at f7504000 (64-bit, non-prefetchable) [size=4K]
        Memory at f7500000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: r8169
        Kernel modules: r8169

It has been a few hours. So far, I am unable to reproduce this issue. I will try using Google Stadia to do further testing after a new chromium build finishes.
Comment 15 Richard Yao (RETIRED) gentoo-dev 2020-05-25 00:15:08 UTC
By the way, it looks like Linux 5.4.32 included a fix for this issue:

commit 74107d56d1e8e6ac5a061059941b7e2d03522df6
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Apr 4 23:48:45 2020 +0200

    r8169: change back SG and TSO to be disabled by default
    
    [ Upstream commit 95099c569a9fdbe186a27447dfa8a5a0562d4b7f ]
    
    There has been a number of reports that using SG/TSO on different chip
    versions results in tx timeouts. However for a lot of people SG/TSO
    works fine. Therefore disable both features by default, but allow users
    to enable them. Use at own risk!
    
    Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.4.32
Comment 16 Richard Yao (RETIRED) gentoo-dev 2020-05-25 03:19:41 UTC
Looking at the upstream issue, it seems that crocket is still affected by this even on Linux 5.4.40, so the patch in 5.4.32 is not a solution for his issue.

It seems like our hardware is slightly different. From dmesg, he has the RTL8168f/8111f while I have the RTL8168h/8111h. My guess is that the hardware revisions fixed at least one erratum that a change to the kernel driver triggered.

There are roughly 13 months worth of changes to r8169, which makes this particularly difficult to track down as there appear to be more than 100 changes (with a refactoring of the kernel source tree midway) to consider.

If crocket would be willing to spend some time building and testing kernels to help narrow this down, it would be really useful if he could try bisecting this. A normal git bisect should give us the precise patch where things went wrong, but I won't ask an end user to do that on their system, so I will suggest an alternative.

It is possible to manually get the tarballs for older kernels, extract them to /usr/src, use eselect to change the /usr/src/linux symlink and then build either manually or with genkernel:

https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.3.18.tar.xz
https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.2.21.tar.xz
https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.1.21.tar.xz
https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.0.21.tar.xz

The idea is to build Linux 5.1.y and see if it is affected. If it is affected, try Linux 5.0.y. If not, try Linux 5.2.y. If 5.0.y was picked, then the result of that would tell us which version introduced the bug. If Linux 5.2.y was picked, then if it is affected, we know that 5.2.y is where the bug started. If Linux 5.2.y is not affected, then we need a test of 5.3.y.

Anyway, if we know which major kernel version introduced the issue, it would greatly narrow down what we need to consider in terms of finding the bug. Bisecting Linus' tree would end up giving us the precise commit, but I won't suggest that an end user do that.

By the way, one possible workaround would be to switch to the vendor driver:

https://packages.gentoo.org/packages/net-misc/r8168

It presumably isn't affected by this. Note that the comment in the ebuild says that it is only up to 4.15, but that is outdated because the upstream source says that it supports up to 5.6:

https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software

I will do a commit to portage to fix the comment, although that could take up to a day before it is reflected in what end users see on their machines.
Comment 17 Richard Yao (RETIRED) gentoo-dev 2020-05-25 14:30:58 UTC
I forgot to include a tarball link for Linux 4.20.y last night:

https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.20.17.tar.xz
Comment 18 anonymous 2020-05-26 05:41:03 UTC
I already bisected once.

Look at https://bugzilla.kernel.org/show_bug.cgi?id=207205#c9
Comment 19 anonymous 2020-09-29 13:20:57 UTC
This issue was still not fixed on gentoo-sources-5.8.7
However, it is now somehow fixed on gentoo-sources-5.8.12

A maintainer of r8169 driver says he didn't change anything between 5.8.7 and 5.8.12

I suspect gentoo's own patches to linux kernel introduced the issue.
Comment 20 Mike Pagano gentoo-dev 2021-02-13 20:14:17 UTC
Ok, sounds like it's working in later kernels. My system has a wireless card that uses this driver, but I don't use wireless. If you still have issues, please re-open this and I will test on my own system.