Summary: | sys-kernel/gentoo-sources-5.4.28: r8169 is unstable. | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | anonymous <fakih18716> |
Component: | Current packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | Keywords: | UPSTREAM |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/log/drivers/net/ethernet/realtek/r8169_main.c | ||
See Also: | https://bugzilla.kernel.org/show_bug.cgi?id=207205 | ||
Whiteboard: | Unknown hardware erratum, Linux 5.8 | ||
Package list: | Runtime testing required: | --- | |
Attachments: |
On 4.19.97, r8169 works fine.
On 5.4, r8169 stops working after a while. |
Description
anonymous
2020-04-05 09:34:10 UTC
attach dmesg from working to non working attach your .config try the latest 5.4.X which is 5.4.30 as of this writing try the latest 5.5.X which is 5.5.15 as of this writing Stable sys-kernel/gentoo-sources-5.4.28 contains this issue: https://gitlab.freedesktop.org/drm/intel/issues/827 The system became unstable. I couldn't test linux 5.5 because zfs-kmod and virtualbox-modules doesn't support linux 5.5. Instead, I have test results from 4.19 and 5.4. Created attachment 631596 [details]
On 4.19.97, r8169 works fine.
Created attachment 631598 [details]
On 5.4, r8169 stops working after a while.
There a lot of 8169 patches in the net-next.git tree queued up. Around 10 or so. Not sure if adding all of those will fix your issues or make new ones. For sure, I expect these patches to make it to a future kernel. Do you want to grab all 10, apply to 5.6.X and test ? If so, apply this one and all newer ones "r8169: add helper r8168g_wait_ll_share_fifo_ready" Link in URL field. zfs-kmod breaks after 5.4 (In reply to crocket from comment #7) > zfs-kmod breaks after 5.4 Is there an upstream bug on that? I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4 and below. (In reply to crocket from comment #9) > I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4 > and below. You can try applying those to 5.4. Running out of ideas for you I think I will let https://bugzilla.kernel.org/show_bug.cgi?id=207205 fix it. (In reply to crocket from comment #9) > I don't know, but zfs-kmod ebuild says it is only compatible with linux 5.4 > and below. Use 0.8.4 from ~amd64 to get Linux 5.6 support. It will likely be stabilized in the near future. Alternatively, you can use the 9999 ebuild if you are running a bleeding edge kernel. It has no version check. The new patches will become available on 5.8. I will wait for a version of zfs-kmod that works with 5.8. I installed sys-kernel/gentoo-sources-5.4.38 on a system with an Asrock B450 Pro4 motherboard that has the following NIC today: 09:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: ASRock Incorporation Motherboard (one of many) Flags: bus master, fast devsel, latency 0, IRQ 24 I/O ports at d000 [size=256] Memory at f7504000 (64-bit, non-prefetchable) [size=4K] Memory at f7500000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: r8169 Kernel modules: r8169 It has been a few hours. So far, I am unable to reproduce this issue. I will try using Google Stadia to do further testing after a new chromium build finishes. By the way, it looks like Linux 5.4.32 included a fix for this issue: commit 74107d56d1e8e6ac5a061059941b7e2d03522df6 Author: Heiner Kallweit <hkallweit1@gmail.com> Date: Sat Apr 4 23:48:45 2020 +0200 r8169: change back SG and TSO to be disabled by default [ Upstream commit 95099c569a9fdbe186a27447dfa8a5a0562d4b7f ] There has been a number of reports that using SG/TSO on different chip versions results in tx timeouts. However for a lot of people SG/TSO works fine. Therefore disable both features by default, but allow users to enable them. Use at own risk! Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.4.32 Looking at the upstream issue, it seems that crocket is still affected by this even on Linux 5.4.40, so the patch in 5.4.32 is not a solution for his issue. It seems like our hardware is slightly different. From dmesg, he has the RTL8168f/8111f while I have the RTL8168h/8111h. My guess is that the hardware revisions fixed at least one erratum that a change to the kernel driver triggered. There are roughly 13 months worth of changes to r8169, which makes this particularly difficult to track down as there appear to be more than 100 changes (with a refactoring of the kernel source tree midway) to consider. If crocket would be willing to spend some time building and testing kernels to help narrow this down, it would be really useful if he could try bisecting this. A normal git bisect should give us the precise patch where things went wrong, but I won't ask an end user to do that on their system, so I will suggest an alternative. It is possible to manually get the tarballs for older kernels, extract them to /usr/src, use eselect to change the /usr/src/linux symlink and then build either manually or with genkernel: https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.3.18.tar.xz https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.2.21.tar.xz https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.1.21.tar.xz https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.0.21.tar.xz The idea is to build Linux 5.1.y and see if it is affected. If it is affected, try Linux 5.0.y. If not, try Linux 5.2.y. If 5.0.y was picked, then the result of that would tell us which version introduced the bug. If Linux 5.2.y was picked, then if it is affected, we know that 5.2.y is where the bug started. If Linux 5.2.y is not affected, then we need a test of 5.3.y. Anyway, if we know which major kernel version introduced the issue, it would greatly narrow down what we need to consider in terms of finding the bug. Bisecting Linus' tree would end up giving us the precise commit, but I won't suggest that an end user do that. By the way, one possible workaround would be to switch to the vendor driver: https://packages.gentoo.org/packages/net-misc/r8168 It presumably isn't affected by this. Note that the comment in the ebuild says that it is only up to 4.15, but that is outdated because the upstream source says that it supports up to 5.6: https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software I will do a commit to portage to fix the comment, although that could take up to a day before it is reflected in what end users see on their machines. I forgot to include a tarball link for Linux 4.20.y last night: https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.20.17.tar.xz I already bisected once. Look at https://bugzilla.kernel.org/show_bug.cgi?id=207205#c9 This issue was still not fixed on gentoo-sources-5.8.7 However, it is now somehow fixed on gentoo-sources-5.8.12 A maintainer of r8169 driver says he didn't change anything between 5.8.7 and 5.8.12 I suspect gentoo's own patches to linux kernel introduced the issue. Ok, sounds like it's working in later kernels. My system has a wireless card that uses this driver, but I don't use wireless. If you still have issues, please re-open this and I will test on my own system. |