Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 914071 - kernel 6.1.46 fails with IO error
Summary: kernel 6.1.46 fails with IO error
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Distribution Kernel Project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-09-12 23:36 UTC by Xi
Modified: 2023-10-10 20:54 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel 6.1.46 (6.1.46.config,148.62 KB, application/xml)
2023-09-12 23:36 UTC, Xi
Details
bisect.log (bisect.log,2.82 KB, text/plain)
2023-09-23 03:44 UTC, Xi
Details
new bisect.log (bisect.log,2.28 KB, text/plain)
2023-09-30 12:43 UTC, Xi
Details
revert patch (0001-Revert-misc-rtsx-judge-ASPM-Mode-to-set-PETXCFG-Reg.patch,6.00 KB, application/mbox)
2023-09-30 12:43 UTC, Xi
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xi 2023-09-12 23:36:30 UTC
Created attachment 870430 [details]
kernel 6.1.46

re. https://forums.gentoo.org/viewtopic-t-1165014.html?sid=999b17c58b748d3b987315b9f3c9c549

In summary, my system was fine with kernel 6.1.41. When I upgraded to 6.1.46, I got this strange IO error, but I could read and write my root partition in emergency mode. I eventually found this bug post, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184. But none of the "nvme_core" values really fix my issue. I can only use my system for a few minutes. Then the same IO error repeats. A few people in the post also experienced the same.

I wonder if it is a regression issue. What changed about APST from 6.1.41 to 6.1.46? I even tried 6.5.1 and experienced the same IO errors.
Comment 1 Mike Pagano gentoo-dev 2023-09-13 17:48:07 UTC
which kernel is this?  What error are you referencing ?
Comment 2 Xi 2023-09-13 22:18:03 UTC
The gentoo-kernel.

6.1.41 is the working one, and 6.1.46 is the broken one.

The errors look like this.

```
[ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0
[ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0
[ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0
[ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
[ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0
[ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0
[ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0
```
Comment 3 Xi 2023-09-16 00:11:38 UTC
(In reply to Mike Pagano from comment #1)
> which kernel is this?  What error are you referencing ?

Oh, sorry. I meant to say '=sys-kernel/gentoo-sources-6.1.46'. I did not know there was a package called "gentoo-kernel" until today.
Comment 4 Xi 2023-09-16 00:13:06 UTC
BTW, I just emerged '=sys-kernel/gentoo-kernel-6.1.53', and it still has the issue. I did not change any kernel configuration.
Comment 5 Xi 2023-09-17 02:35:30 UTC
I have tried '=sys-kernel/gentoo-kernel-6.1.41' and it works on my system.

I have used these kernels in the past and have only started to experience this issue since 6.1.46. I suspect this issue is introduced by upstream kernel changes.


```
Jul 29  2021 kernel-config-5.10.52-gentoo-x86_64
Sep  4  2021 kernel-config-5.10.61-gentoo-x86_64
Nov 24  2021 kernel-config-5.10.76-gentoo-r1-x86_64
Mar 25 14:13 kernel-config-5.15.102-gentoo-x86_64
Jan 20  2022 kernel-config-5.15.11-gentoo-x86_64
Sep  9 10:34 kernel-config-5.15.127-gentoo-x86_64
Feb  6  2022 kernel-config-5.15.16-gentoo-x86_64
Mar 12  2022 kernel-config-5.15.26-gentoo-x86_64
Apr  9  2022 kernel-config-5.15.32-gentoo-r1-x86_64
Jul 10  2022 kernel-config-5.15.41-gentoo-x86_64
Jul 25  2022 kernel-config-5.15.52-gentoo-x86_64
Sep 18  2022 kernel-config-5.15.59-gentoo-x86_64
Oct  2  2022 kernel-config-5.15.69-gentoo-x86_64
Oct 15  2022 kernel-config-5.15.72-gentoo-x86_64
Oct 23  2022 kernel-config-5.15.74-gentoo-x86_64
Nov 27  2022 kernel-config-5.15.75-gentoo-x86_64
Dec  9  2022 kernel-config-5.15.80-gentoo-x86_64
Feb  4  2023 kernel-config-5.15.88-gentoo-x86_64
Mar 19 11:19 kernel-config-6.1.12-gentoo-x86_64
Apr 18 10:44 kernel-config-6.1.19-gentoo-x86_64
May 28 10:50 kernel-config-6.1.28-gentoo-x86_64
Jul  1 13:57 kernel-config-6.1.31-gentoo-x86_64
Jul  8 12:14 kernel-config-6.1.38-gentoo-x86_64
Sep 15 21:51 kernel-config-6.1.41-gentoo-x86_64
Sep 17 10:41 kernel-config-6.1.46-gentoo-x86_64
Sep  9 11:46 kernel-config-6.1.50-gentoo-x86_64
Sep 15 18:44 kernel-config-6.1.53-gentoo-x86_64
Sep 17 10:10 kernel-config-6.1.53-x86_64
Sep 14 21:03 kernel-config-6.5.1-gentoo-x86_64
```
Comment 6 Xi 2023-09-23 03:43:29 UTC
I tested on vanilla-sources and still experienced this problem. I did a kernel bisect and found the commit that caused this issue.

```
e61f0ad73668912feef345e35beeefcce5bbbd63 is the first bad commit
commit e61f0ad73668912feef345e35beeefcce5bbbd63
Author: Alvin Lee <Alvin.Lee2@amd.com>
Date:   Fri Aug 11 16:07:05 2023 -0500

    drm/amd/display: Disable phantom OTG after enable for plane disable

    commit dc55b106ad477c67f969f3432d9070c6846fb557 upstream

    [Description]
    - Need to disable phantom OTG after it's enabled
      in order to restore it to it's original state.
    - If it's enabled and then an MCLK switch comes in
      we may not prefetch the correct data since the phantom
      OTG could already be in the middle of the frame.

    Reviewed-by: Jun Lei <Jun.Lei@amd.com>
    Acked-by: Alan Liu <HaoPing.Liu@amd.com>
    Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

 drivers/gpu/drm/amd/display/dc/core/dc.c                 | 14 +++++++++++++-
 drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c        |  8 ++++++++
 drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h |  1 +
 3 files changed, 22 insertions(+), 1 deletion(-)
```

I also attached the bisect.log file
Comment 7 Xi 2023-09-23 03:44:03 UTC
Created attachment 871148 [details]
bisect.log
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-23 13:28:41 UTC
It feels unlikely that the result is accurate here. Consider re-doing the bisect but requiring two bad boots to mark a commit as bad or good.

If you revert that commit, does the issue go away?
Comment 9 Xi 2023-09-23 23:02:55 UTC
I applied a revert commit of the "e61f0ad73668912feef345e35beeefcce5bbbd63" commit to the "=sys-kernel/vanilla-sources-6.1.46", but it did not resolve the issue.

I will give it another try. Maybe I marked the wrong commit.
Comment 10 Xi 2023-09-30 12:42:39 UTC
I think I found it. Commit 8ee39ec479147e29af704639f8e55fce246ed2d9 is the one that introduced this issue.

I created a revert patch and applied it to "linux-6.1.53-gentoo-r1" and "linux-6.5.5-gentoo", and both kernels worked.
Comment 11 Xi 2023-09-30 12:43:20 UTC
Created attachment 871833 [details]
new bisect.log
Comment 12 Xi 2023-09-30 12:43:58 UTC
Created attachment 871834 [details]
revert patch
Comment 13 Xi 2023-10-07 01:17:27 UTC
https://wiki.archlinux.org/title/Dell_XPS_15_(9560)

The issue is indeed in the rtsx driver module. The easiest fix for me is simply to disable that driver since I don't use it at all.
Comment 14 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-10-10 20:52:11 UTC
Thank you!
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-10-10 20:54:11 UTC
I think this is fixed in 6.1.56?


commit 37435ddfadc6ece211415970af44866e2f695ee2
Author: Ricky WU <ricky_wu@realtek.com>
Date:   Wed Sep 20 09:11:19 2023 +0000

    misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
    
    commit 0e4cac557531a4c93de108d9ff11329fcad482ff upstream.