Created attachment 875513 [details] Kernel-Log of error trace. Greetings, I use Gentoo Unstable on two laptops. Upon upgrading to 6.6.2, the rfkill switch on one of my laptops quit working correctly. The switch on the laptop that quit working is an actual hardblock rfkill switch. Upon trying to disconnect the Wifi, everything freezes up and the only solution is to forcibly power off the machine. It just hangs; can't check logs, can't issue an unblock via rfkill, can't do anything at all related the network, etc. The laptop that doesn't have this issue uses a softblocking rfkill switch, and it works like normal. I also octuple boot this machine with Arch, Artix and Void (amongst others), Gentoo is my "main" flavor though. All that have been upgraded to 6.6.2 have the same issue (Arch, Artix, Gentoo). Downgrading to 6.6.1 on Arch, Artix and Gentoo fixed the issue on all three distros. The interesting thing is that Void's issue is between kernels 6.5.11 (OK) and 6.5.12 (BROKE), downgrading to 6.5.11 fixed the issue. This leads me to think it may be a driver change in ilwifi, or possibly a patch of some sort that has been very recently added to the newest kernels. I have been searching all bug reports forums, etc. and haven't seen anything on this. I have also tried to decipher changes in 6.6.2 and 6.5.12 at kernel.org, but I'm not a c developer. It does look to me like there are some common changes in the intel/iwlwifi driver section, but it is kind of greek to me. I will attach a text file with a capture of what happens when the rfkill switch is pressed to turn off the Wifi. The laptop uses an Intel 7260 (rev bb) card. It is an old Dell laptop but highly upgraded. ;) Please don't berate my potato of a laptop. This is my first bug report in many years of using Linux, so please be patient. I am willing to get more info or test, etc. to the best of my ability.
Can you do a git bisect between 6.6.1 and 6.6.2 ? See: https://wiki.gentoo.org/wiki/Kernel_git-bisect
I have never done this before, but I'll see what I can do.
Okay, I have the first kernel build running for the git-bisect. Kernel builds take me about 1h45m so this might take a few days.. By then 6.6.3 might have come out and fixed... I will keep trudging along as one can always learn.
I have finished the Kernel git-bisect and this appears to be the bad patch: ----------------------------------------------------------------- f1f2e068bbe7783eff75ab85ea8566084b138aed is the first bad commit commit f1f2e068bbe7783eff75ab85ea8566084b138aed Author: Johannes Berg <johannes.berg@intel.com> Date: Tue Oct 17 12:16:43 2023 +0300 wifi: iwlwifi: pcie: synchronize IRQs before NAPI [ Upstream commit 37fb29bd1f90f16d1abc95c0e9f0ff8eec9829ad ] When we want to synchronize the NAPI, which was added in commit 5af2bb3168db ("wifi: iwlwifi: call napi_synchronize() before freeing rx/tx queues"), we also need to make sure we can't actually reschedule the NAPI. Yes, this happens while interrupts are disabled, but interrupts may still be running or pending. Also call iwl_pcie_synchronize_irqs() to ensure we won't reschedule the NAPI. Fixes: 4cf2f5904d97 ("iwlwifi: queue: avoid memory leak in reset flow") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20231017115047.a0f4104b479a.Id5c50a944f709092aa6256e32d8c63b2b8d8d3ac@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c | 1 + drivers/net/wireless/intel/iwlwifi/pcie/trans.c | 1 + 2 files changed, 2 insertions(+) --------------------------------------------------------------- I am currently running the final git-bisect kernel and everything is working as expected. Attaching full bisect.log.
Created attachment 875711 [details] Full bisect.log
Great job. There is a lot of work in this space and 6.6.3 will, of course, be worth testing. Can you attach the following outputs? lspci -k iw event -t Can you also recreate with the following set ? CONFIG_MAC80211_HT_DEBUG=y CONFIG_MAC80211_VERBOSE_PS_DEBUG=y CONFIG_MAC80211_VERBOSE_DEBUG=y
Created attachment 875733 [details] Requested lspci -v
Created attachment 875734 [details] iw event -t Sanitized for SSID and MAC Addresses.
Hey thanks! Pretty cool actually... I provided the requested output with the exception of: ------ Can you also recreate with the following set ? CONFIG_MAC80211_HT_DEBUG=y CONFIG_MAC80211_VERBOSE_PS_DEBUG=y CONFIG_MAC80211_VERBOSE_DEBUG=y ------ I'm a bit confused. Are we wanting a complete new kernel-bisect with those options turned on? OR Are we wanting 6.6.2 rebuilt with the debugging turned on and then re-do the requested lspci/iw event commands? I'm hoping for option 2 as it makes sense. Option 1 might kill my laptop. I'm pretty sure it is option 2 but want to make sure....
(In reply to linux.fanatic from comment #9) > Hey thanks! Pretty cool actually... > > I provided the requested output with the exception of: > > ------ > > Can you also recreate with the following set ? > > CONFIG_MAC80211_HT_DEBUG=y > CONFIG_MAC80211_VERBOSE_PS_DEBUG=y > CONFIG_MAC80211_VERBOSE_DEBUG=y > > ------ > > I'm a bit confused. > > Are we wanting a complete new kernel-bisect with those options turned on? > Sorry I should have been more explicit. Can you recreate the error on a kernel with those turned on ? This is to potentially get more info. No new bisect needed.
No worries... I was pretty sure that's what you meant so I recompiled 6.6.2 with those options on already. Admittedly, I am not a great debugger so I may need guidance. I have already produced the problem and checked the logs and they are all the same as posted here, including the kernel log. However, I may not be looking at what you need. Again, not a lot of debugging experience :) Let me know how you would like to proceed.
Quick note: I just tried for the first time using nm-applet to disable wireless, and it still works for softblocking and unblocking. I normally use the radio key on the laptop so it didn't even dawn on me. It doesn't change the issue but I should have mentioned/tried that days ago.
Hello, Moved to 6.6.3 today and issue is still there but worth checking. Just adding quick note to show I tested.
See also: https://bugzilla.kernel.org/show_bug.cgi?id=218205
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=21915e473788b8008e1574ae55601098f676c848 commit 21915e473788b8008e1574ae55601098f676c848 Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-01 14:22:50 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-01 14:22:50 +0000 sys-kernel/gentoo-sources: Fix __randomize_layout crash in struct neighbour Closes: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources/gentoo-sources-6.6.3-r1.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+) Additionally, it has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=b0d3fdd42bd0642c28185cf9474e9dc670cef1a0 commit b0d3fdd42bd0642c28185cf9474e9dc670cef1a0 Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-01 14:22:28 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-01 14:22:28 +0000 sys-kernel/gentoo-sources: Fix __randomize_layout crash in struct neighbour Bug: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources/gentoo-sources-6.5.13-r1.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c765287b2d7590750700f1198c1f8e203f0ae0ab commit c765287b2d7590750700f1198c1f8e203f0ae0ab Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-01 14:22:07 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-01 14:22:07 +0000 sys-kernel/gentoo-sources: Fix __randomize_layout crash in struct neighbour Bug: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources/gentoo-sources-6.1.64-r1.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+) https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7114c836e4b0424592c994d30312c05c8ae450e8 commit 7114c836e4b0424592c994d30312c05c8ae450e8 Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-01 14:21:09 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-01 14:21:09 +0000 sys-kernel/gentoo-sources: Fix __randomize_layout crash in struct neighbour Bug: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources-5.15.140-r1.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+)
Sorry for the delay, was traveling this week. You should be ok with one of the -r1's above.
No Problem Mike, I just tried 6.6.3-r1 and unfortunately this did not fix my issue. :( I ran all tests we did before and the logs look identical. Same symptoms and same results. If you would like me to poke around further I can. I also have another laptop that I use more like a server running stable kernel (6.1.57). It has the exact same Wifi card, but it is a different model laptop. I have not had to update that kernel yet, but it has a "hardblocking" rfkill switch. Or I could just live with it. I bet there are others out there though as this Intel 7260 card is probably pretty popular. Typically works for Linux right out of the box.
thanks for reporting, reopening
Mr. Fanatic, Sorry to ask you to do this as I see your kernel build times are quite high. Upstream has asked if you could do test with that latest release sources from upstream after reverting the commit you found. You can do this a couple of ways, here's one. 1. Install the latest git-sources 2. revert the commit you found: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=37fb29bd1f90 You can do this easily by just deleting the two lines it creates in the two files: One in: drivers/net/wireless/intel/iwlwifi/pcie/trans-gen2.c and one in: /drivers/net/wireless/intel/iwlwifi/pcie/trans.c 3. build / test Does that make sense?
Makes perfect sense, no worries. Already building and will get back when finished. sys-kernel/git-sources-6.7_rc3. For the new kernel changes, I just used the defaults to be safe. For the fun of it, the other day I tried copying the driver module itself from a good kernel to a bad one and that failed miserably, which was expected. Quit right there. I'm not worried about one compilation time, used to it. I was afraid you might ask for another bisect.... Fun, but @2hrs per adds up (would probably do it). I'll let you know a bit later.
Well, unfortunately, THAT WORKED!!! I can enable/disable the rfkill switch like I normally would. The iw event output is back to normal. I'm betting if I recompiled without reverting that commit, it would break again. Wondering why this didn't work in 6.6.3_r1? I just looked and it appears that 6.6.3_r1 does not have that commit reversed either. Interesting all the way around. I guess it validates we're looking at the right commit at least.... Let me know if you would like more info.
(In reply to linux.fanatic from comment #21) > Well, unfortunately, THAT WORKED!!! > > I can enable/disable the rfkill switch like I normally would. The iw event > output is back to normal. > > I'm betting if I recompiled without reverting that commit, it would break > again. > > Wondering why this didn't work in 6.6.3_r1? I just looked and it appears > that 6.6.3_r1 does not have that commit reversed either. Interesting all > the way around. > > I guess it validates we're looking at the right commit at least.... > > Let me know if you would like more info. That should be good. Thanks ! I'll report to upstream
(In reply to Mike Pagano from comment #22) > (In reply to linux.fanatic from comment #21) > > Well, unfortunately, THAT WORKED!!! > > > > I can enable/disable the rfkill switch like I normally would. The iw event > > output is back to normal. > > > > I'm betting if I recompiled without reverting that commit, it would break > > again. > > > > Wondering why this didn't work in 6.6.3_r1? I just looked and it appears > > that 6.6.3_r1 does not have that commit reversed either. Interesting all > > the way around. > > > > I guess it validates we're looking at the right commit at least.... > > > > Let me know if you would like more info. > > That should be good. Thanks ! I'll report to upstream I'll also carry a revert path in genpatches going forward.
(In reply to Mike Pagano from comment #23) > (In reply to Mike Pagano from comment #22) > > (In reply to linux.fanatic from comment #21) > > > Well, unfortunately, THAT WORKED!!! > > > > > > I can enable/disable the rfkill switch like I normally would. The iw event > > > output is back to normal. > > > > > > I'm betting if I recompiled without reverting that commit, it would break > > > again. > > > > > > Wondering why this didn't work in 6.6.3_r1? I just looked and it appears > > > that 6.6.3_r1 does not have that commit reversed either. Interesting all > > > the way around. > > > > > > I guess it validates we're looking at the right commit at least.... > > > > > > Let me know if you would like more info. > > > > That should be good. Thanks ! I'll report to upstream > > I'll also carry a revert path in genpatches going forward. Thanks Mike for the assistance! Quick question and no hurry, next time you are here: How and when will the final fixes be applied upstream and how will we know? I mentioned way above about other affected distros I run, just curious how that works? I think I'm going to manually revert and recompile back to gentoo-sources as to get get off the git-sources kernel, simply for normalcy here. HAGD!
(In reply to linux.fanatic from comment #24) > (In reply to Mike Pagano from comment #23) > > (In reply to Mike Pagano from comment #22) > > > (In reply to linux.fanatic from comment #21) > > > > Well, unfortunately, THAT WORKED!!! > > > > > > > > I can enable/disable the rfkill switch like I normally would. The iw event > > > > output is back to normal. > > > > > > > > I'm betting if I recompiled without reverting that commit, it would break > > > > again. > > > > > > > > Wondering why this didn't work in 6.6.3_r1? I just looked and it appears > > > > that 6.6.3_r1 does not have that commit reversed either. Interesting all > > > > the way around. > > > > > > > > I guess it validates we're looking at the right commit at least.... > > > > > > > > Let me know if you would like more info. > > > > > > That should be good. Thanks ! I'll report to upstream > > > > I'll also carry a revert path in genpatches going forward. > > Thanks Mike for the assistance! Quick question and no hurry, next time you > are here: > > How and when will the final fixes be applied upstream and how will we know? > I mentioned way above about other affected distros I run, just curious how > that works? I think I'm going to manually revert and recompile back to > gentoo-sources as to get get off the git-sources kernel, simply for normalcy > here. HAGD! Couple things. Hang in there, I've got the revert coming out in 6.6.4 gentoo-sources and other gentoo kernels that carry genpatches (dist-kernel, etc). I'll leave this bug open until the real fix comes down from upstream, that way you'll know when it's officially been fixed upstream, and others who encounter the same issue can find this bug easier. As for other distros, this is the benefit of a rolling distro. We move fast and can do this quick fixes/updates since we don't have a few large releases. I'm not sure what other distros will do as I've been on this one for 20 years.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=aafebe69e7ab3283de9e394b3657b1f421254fd0 commit aafebe69e7ab3283de9e394b3657b1f421254fd0 Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-03 12:23:49 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-03 12:23:49 +0000 sys-kernel/gentoo-sources: add 6.6.4, iwlwifi revert Reverted: wifi: iwlwifi: pcie: synchronize IRQs before NAPI Bug: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources/gentoo-sources-6.6.4.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+)
(In reply to Mike Pagano from comment #25) > (In reply to linux.fanatic from comment #24) > > (In reply to Mike Pagano from comment #23) > > > (In reply to Mike Pagano from comment #22) > > > > (In reply to linux.fanatic from comment #21) > > > > > Well, unfortunately, THAT WORKED!!! > > > > > > > > > > I can enable/disable the rfkill switch like I normally would. The iw event > > > > > output is back to normal. > > > > > > > > > > I'm betting if I recompiled without reverting that commit, it would break > > > > > again. > > > > > > > > > > Wondering why this didn't work in 6.6.3_r1? I just looked and it appears > > > > > that 6.6.3_r1 does not have that commit reversed either. Interesting all > > > > > the way around. > > > > > > > > > > I guess it validates we're looking at the right commit at least.... > > > > > > > > > > Let me know if you would like more info. > > > > > > > > That should be good. Thanks ! I'll report to upstream > > > > > > I'll also carry a revert path in genpatches going forward. > > > > Thanks Mike for the assistance! Quick question and no hurry, next time you > > are here: > > > > How and when will the final fixes be applied upstream and how will we know? > > I mentioned way above about other affected distros I run, just curious how > > that works? I think I'm going to manually revert and recompile back to > > gentoo-sources as to get get off the git-sources kernel, simply for normalcy > > here. HAGD! > > Couple things. > > Hang in there, I've got the revert coming out in 6.6.4 gentoo-sources and > other gentoo kernels that carry genpatches (dist-kernel, etc). > > I'll leave this bug open until the real fix comes down from upstream, that > way you'll know when it's officially been fixed upstream, and others who > encounter the same issue can find this bug easier. > > As for other distros, this is the benefit of a rolling distro. We move fast > and can do this quick fixes/updates since we don't have a few large > releases. I'm not sure what other distros will do as I've been on this one > for 20 years. Yeah, understand. Arch and Void usually catch up pretty quick so I'll just see how this all works. When you run potato laptops, sometimes a binary distro is nice to have. I keep saying I need a new maxed out System76 (not an advert.), but keep failing to actually procure one. 6.6.4 works for me as expected as does 6.3.3_r1 with the manual revert, for the record. I was already compiling 6.6.3 when I saw your note, and then updated and saw 6.6.4, HA! I always keep the last kernel as my backup so it was not in vain...
Created attachment 877977 [details, diff] Upstream patch Would you consider applying this patch to a broken kernel and see if it works. It's from the upstream developer.
(In reply to Mike Pagano from comment #28) > Created attachment 877977 [details, diff] [details, diff] > Upstream patch > > Would you consider applying this patch to a broken kernel and see if it > works. It's from the upstream developer. Sure can. I think I'd go back to sys-kernel/gentoo-sources-6.6.2. Is there a proper method to do this or just vimdiff it? I've done a couple patches via portage, but few. I don't want to do it incorrectly and have to re-compile. I don't necessarily need explicit instructions, just a "hint" to get me going... Thx!
Ignore the vimdiff part as I don't have new file to diff. I think I'll trust the portage patching method as the patch is already created.
For ease of life, you can just install the kernel sources like normal, go into the directory and type: patch -p1 < iwlwifi-rfkill-fix.patch
(In reply to Mike Pagano from comment #31) > For ease of life, you can just install the kernel sources like normal, go > into the directory and type: > > patch -p1 < iwlwifi-rfkill-fix.patch Well, I messed that up. I think I needed to re-emerge now that I think of it. Got ahead of myself. At any rate, I am using your advise and re-compiling. Will follow up later.
Well, being a bit hard headed, I wanted to try again with Portage applying the patch to verify what I did wrong. Fortunately, it appears to have worked properly as I am on sys-kernel/gentoo-sources-6.6.3 with the patches applied (changed mind on version). Everything is working properly! I manually verified a couple of lines of code visually to make sure, and it looks good to me. If you would like more substantative verification, I would need to know a better way to do that, however (like is it listed somewhere I'm not looking?). Let me know!
(In reply to linux.fanatic from comment #33) > Well, being a bit hard headed, I wanted to try again with Portage applying > the patch to verify what I did wrong. Fortunately, it appears to have > worked properly as I am on sys-kernel/gentoo-sources-6.6.3 with the patches > applied (changed mind on version). > > Everything is working properly! I manually verified a couple of lines of > code visually to make sure, and it looks good to me. > > If you would like more substantative verification, I would need to know a > better way to do that, however (like is it listed somewhere I'm not > looking?). > > Let me know! Nope, that's perfect. Thanks for your help!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4f4fe7eee836bd7bc882053c8f8f3769693baf74 commit 4f4fe7eee836bd7bc882053c8f8f3769693baf74 Author: Mike Pagano <mpagano@gentoo.org> AuthorDate: 2023-12-08 12:37:07 +0000 Commit: Mike Pagano <mpagano@gentoo.org> CommitDate: 2023-12-08 12:37:07 +0000 sys-kernel/gentoo-sources: add 6.6.5, iwlwifi fix and revert Remove redundant patch 2010_Fix_randomize_layout_crash_in_struct_neigh.patch Remove revert, add upstream proposed fix 2400_rvrt-iwlwifi-pcie-sycn-IRQs-before-NAPI.patch 2410_iwlwifi-rfkill-fix.patch Closes: https://bugs.gentoo.org/918128 Signed-off-by: Mike Pagano <mpagano@gentoo.org> sys-kernel/gentoo-sources/Manifest | 3 +++ .../gentoo-sources/gentoo-sources-6.6.5.ebuild | 28 ++++++++++++++++++++++ 2 files changed, 31 insertions(+)
Emerged 6.6.5 on two laptops. Looks good on the laptop I was having issues with. I hate to bring this up, but a different laptop I have now has the same issue. Just showed up in this kernel. It is a soft blocking rfkill switch this time, exact symptoms as previously reported on the hard blocking rfkill laptop. This is not a continuation of this bug: FYI only as it is totally different hardware/driver, etc. I will live with it on this one as it's really a spare/testing machine that has no business running Gentoo. Starting to wonder why this is all happening though? Are older machines getting obsoleted (I did hear about some pruning going on but that was supposed to be on really ancient hardware). Thanks for your help on this bug, though. Here if needed.
Last entry for documentation (I think): Upgraded laptop running stable gentoo-sources kernel 6.1.57 to 6.1.66. Laptop is different model than initial laptop with the hardblock rfkill switch, but has a hardblocking rfkill switch and same Intel wifi card nonetheless. 6.1.57 had no issues prior to upgrade. 6.1.66 stopped working properly. Verified that patch (above) hadn't been applied. Manually applied patch shown above to 6.1.66 and recompiled. Laptop rfkill switch works like it should again.