Created attachment 870430 [details] kernel 6.1.46 re. https://forums.gentoo.org/viewtopic-t-1165014.html?sid=999b17c58b748d3b987315b9f3c9c549 In summary, my system was fine with kernel 6.1.41. When I upgraded to 6.1.46, I got this strange IO error, but I could read and write my root partition in emergency mode. I eventually found this bug post, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184. But none of the "nvme_core" values really fix my issue. I can only use my system for a few minutes. Then the same IO error repeats. A few people in the post also experienced the same. I wonder if it is a regression issue. What changed about APST from 6.1.41 to 6.1.46? I even tried 6.5.1 and experienced the same IO errors.
which kernel is this? What error are you referencing ?
The gentoo-kernel. 6.1.41 is the working one, and 6.1.46 is the broken one. The errors look like this. ``` [ 746.341551] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #525023: comm NetworkManager: reading directory iblock 0 [ 746.343318] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #524289: comm pool: reading directory iblock 0 [ 746.356125] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272213: comm systemd-udevd: reading directory iblock 0 [ 746.356139] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0 [ 746.356332] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272193: comm systemd-udevd: reading directory iblock 0 [ 746.356338] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272825: comm systemd-udevd: reading directory iblock 0 [ 746.356400] EXT4-fs error (device nvme0n1p7): ext4_find_entry:1463: inode #11272210: comm systemd-udevd: reading directory iblock 0 ```
(In reply to Mike Pagano from comment #1) > which kernel is this? What error are you referencing ? Oh, sorry. I meant to say '=sys-kernel/gentoo-sources-6.1.46'. I did not know there was a package called "gentoo-kernel" until today.
BTW, I just emerged '=sys-kernel/gentoo-kernel-6.1.53', and it still has the issue. I did not change any kernel configuration.
I have tried '=sys-kernel/gentoo-kernel-6.1.41' and it works on my system. I have used these kernels in the past and have only started to experience this issue since 6.1.46. I suspect this issue is introduced by upstream kernel changes. ``` Jul 29 2021 kernel-config-5.10.52-gentoo-x86_64 Sep 4 2021 kernel-config-5.10.61-gentoo-x86_64 Nov 24 2021 kernel-config-5.10.76-gentoo-r1-x86_64 Mar 25 14:13 kernel-config-5.15.102-gentoo-x86_64 Jan 20 2022 kernel-config-5.15.11-gentoo-x86_64 Sep 9 10:34 kernel-config-5.15.127-gentoo-x86_64 Feb 6 2022 kernel-config-5.15.16-gentoo-x86_64 Mar 12 2022 kernel-config-5.15.26-gentoo-x86_64 Apr 9 2022 kernel-config-5.15.32-gentoo-r1-x86_64 Jul 10 2022 kernel-config-5.15.41-gentoo-x86_64 Jul 25 2022 kernel-config-5.15.52-gentoo-x86_64 Sep 18 2022 kernel-config-5.15.59-gentoo-x86_64 Oct 2 2022 kernel-config-5.15.69-gentoo-x86_64 Oct 15 2022 kernel-config-5.15.72-gentoo-x86_64 Oct 23 2022 kernel-config-5.15.74-gentoo-x86_64 Nov 27 2022 kernel-config-5.15.75-gentoo-x86_64 Dec 9 2022 kernel-config-5.15.80-gentoo-x86_64 Feb 4 2023 kernel-config-5.15.88-gentoo-x86_64 Mar 19 11:19 kernel-config-6.1.12-gentoo-x86_64 Apr 18 10:44 kernel-config-6.1.19-gentoo-x86_64 May 28 10:50 kernel-config-6.1.28-gentoo-x86_64 Jul 1 13:57 kernel-config-6.1.31-gentoo-x86_64 Jul 8 12:14 kernel-config-6.1.38-gentoo-x86_64 Sep 15 21:51 kernel-config-6.1.41-gentoo-x86_64 Sep 17 10:41 kernel-config-6.1.46-gentoo-x86_64 Sep 9 11:46 kernel-config-6.1.50-gentoo-x86_64 Sep 15 18:44 kernel-config-6.1.53-gentoo-x86_64 Sep 17 10:10 kernel-config-6.1.53-x86_64 Sep 14 21:03 kernel-config-6.5.1-gentoo-x86_64 ```
I tested on vanilla-sources and still experienced this problem. I did a kernel bisect and found the commit that caused this issue. ``` e61f0ad73668912feef345e35beeefcce5bbbd63 is the first bad commit commit e61f0ad73668912feef345e35beeefcce5bbbd63 Author: Alvin Lee <Alvin.Lee2@amd.com> Date: Fri Aug 11 16:07:05 2023 -0500 drm/amd/display: Disable phantom OTG after enable for plane disable commit dc55b106ad477c67f969f3432d9070c6846fb557 upstream [Description] - Need to disable phantom OTG after it's enabled in order to restore it to it's original state. - If it's enabled and then an MCLK switch comes in we may not prefetch the correct data since the phantom OTG could already be in the middle of the frame. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Alan Liu <HaoPing.Liu@amd.com> Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> drivers/gpu/drm/amd/display/dc/core/dc.c | 14 +++++++++++++- drivers/gpu/drm/amd/display/dc/dcn32/dcn32_optc.c | 8 ++++++++ drivers/gpu/drm/amd/display/dc/inc/hw/timing_generator.h | 1 + 3 files changed, 22 insertions(+), 1 deletion(-) ``` I also attached the bisect.log file
Created attachment 871148 [details] bisect.log
It feels unlikely that the result is accurate here. Consider re-doing the bisect but requiring two bad boots to mark a commit as bad or good. If you revert that commit, does the issue go away?
I applied a revert commit of the "e61f0ad73668912feef345e35beeefcce5bbbd63" commit to the "=sys-kernel/vanilla-sources-6.1.46", but it did not resolve the issue. I will give it another try. Maybe I marked the wrong commit.
I think I found it. Commit 8ee39ec479147e29af704639f8e55fce246ed2d9 is the one that introduced this issue. I created a revert patch and applied it to "linux-6.1.53-gentoo-r1" and "linux-6.5.5-gentoo", and both kernels worked.
Created attachment 871833 [details] new bisect.log
Created attachment 871834 [details] revert patch
https://wiki.archlinux.org/title/Dell_XPS_15_(9560) The issue is indeed in the rtsx driver module. The easiest fix for me is simply to disable that driver since I don't use it at all.
Thank you!
I think this is fixed in 6.1.56? commit 37435ddfadc6ece211415970af44866e2f695ee2 Author: Ricky WU <ricky_wu@realtek.com> Date: Wed Sep 20 09:11:19 2023 +0000 misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe commit 0e4cac557531a4c93de108d9ff11329fcad482ff upstream.