I noticed my PC speaker beeping every 10-15 seconds and upon investigating (dmesg) noticed the message in the title within logging. Here is the output from /var/log/syslog: Apr 27 05:02:57 server kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 15098s! [WEB[28]:29220] Apr 27 05:02:57 server kernel: Modules linked in: macvtap macvlan rfcomm udp_diag tcp_diag inet_diag nvidia_uvm(POE) nf_conntrack_netlink xt_nat veth ip6table_nat ip6table_filter ip6_tables xt_set ip_set nfnetlink overlay nls_utf8 algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp hid_logitech_dj cfg80211 8021q garp mrp iptable_raw xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables sg vhost_net vhost vhost_iotlb tap snd_usb_audio snd_usbmidi_lib joydev snd_rawmidi snd_seq_device mc snd_ctl_led cdc_acm nvidia_drm(POE) nvidia_modeset(POE) dm_multipath dm_mod btusb btrtl btintel btbcm amd_atl intel_rapl_msr uas snd_hda_codec_realtek nct6775 intel_rapl_common nvidia(POE) usb_storage bluetooth nct6775_core snd_hda_codec_generic snd_hda_codec_hdmi hwmon_vid snd_hda_scodec_component edac_mce_amd snd_hda_intel snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec sha512_ssse3 sha256_ssse3 snd_hda_core eeepc_wmi Apr 27 05:02:57 server kernel: asus_wmi sha1_ssse3 snd_hwdep sparse_keymap rapl sp5100_tco platform_profile snd_pcm rfkill i2c_piix4 snd_timer ccp snd efi_pstore wmi_bmof pcspkr i2c_smbus soundcore mac_hid efivarfs sd_mod xhci_pci nvme ahci xhci_hcd nvme_core libahci aesni_intel crypto_simd cryptd Apr 27 05:02:57 server kernel: CPU: 7 UID: 201 PID: 29220 Comm: WEB[28] Tainted: P OEL 6.14.4-gentoo-x86_64 #1 Apr 27 05:02:57 server kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP Apr 27 05:02:57 server kernel: Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI), BIOS 5021 09/29/2024 Apr 27 05:02:57 server kernel: RIP: 0010:_raw_write_unlock_irq+0xe/0x40 Apr 27 05:02:57 server kernel: Code: c3 65 e3 fe 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 fb 0f 1f 44 00 00 <bf> 01 00 00 00 e8 58 3b 11 ff 65 8b 05 c9 74 d6 60 85 c0 74 05 e9 Apr 27 05:02:57 server kernel: RSP: 0018:ffff9d011138be00 EFLAGS: 00000202 Apr 27 05:02:57 server kernel: RAX: 0000000000000001 RBX: ffff897801c16710 RCX: ffff897801c166e8 Apr 27 05:02:57 server kernel: RDX: ffff897801c166e8 RSI: ffff897801c166e8 RDI: ffff897801c16720 Apr 27 05:02:57 server kernel: RBP: ffff897812478000 R08: 0000000000000008 R09: 0000000000000b90 Apr 27 05:02:57 server kernel: R10: 0000000000000000 R11: ffff8996ae5b488c R12: ffff897801c166c0 Apr 27 05:02:57 server kernel: R13: ffff9d011138be88 R14: ffff897801c166e8 R15: ffff897801c16720 Apr 27 05:02:57 server kernel: FS: 00007f832fdd46c0(0000) GS:ffff8996ae580000(0000) knlGS:0000000000000000 Apr 27 05:02:57 server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 27 05:02:57 server kernel: CR2: 0000748b28000020 CR3: 0000000979ea0000 CR4: 0000000000750ef0 Apr 27 05:02:57 server kernel: PKRU: 55555554 Apr 27 05:02:57 server kernel: Call Trace: Apr 27 05:02:57 server kernel: <TASK> Apr 27 05:02:57 server kernel: do_epoll_wait+0x6a3/0x850 Apr 27 05:02:57 server kernel: ? __pfx_ep_autoremove_wake_function+0x10/0x10 Apr 27 05:02:57 server kernel: __x64_sys_epoll_wait+0x5b/0xf0 Apr 27 05:02:57 server kernel: do_syscall_64+0x62/0x180 Apr 27 05:02:57 server kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Apr 27 05:02:57 server kernel: RIP: 0033:0x7f83b55f5ee6 Apr 27 05:02:57 server kernel: Code: 10 89 7c 24 0c 89 4c 24 1c e8 56 c9 f7 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 a6 c9 f7 ff 8b 44 Apr 27 05:02:57 server kernel: RSP: 002b:00007f832fdd18b0 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8 Apr 27 05:02:57 server kernel: RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f83b55f5ee6 Apr 27 05:02:57 server kernel: RDX: 0000000000000064 RSI: 00005607a7f4e244 RDI: 0000000000000204 Apr 27 05:02:57 server kernel: RBP: 00007f832fdd1b00 R08: 0000000000000000 R09: 00005607a7f4e240 Apr 27 05:02:57 server kernel: R10: 0000000000000064 R11: 0000000000000293 R12: 0000000000000002 Apr 27 05:02:57 server kernel: R13: 000000000667d4d9 R14: 0000000000000000 R15: 0000000000002000 Apr 27 05:02:57 server kernel: </TASK> Apr 27 05:03:09 server kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 15672s! [WEB[5]:29173] Apr 27 05:03:09 server kernel: Modules linked in: macvtap macvlan rfcomm udp_diag tcp_diag inet_diag nvidia_uvm(POE) nf_conntrack_netlink xt_nat veth ip6table_nat ip6table_filter ip6_tables xt_set ip_set nfnetlink overlay nls_utf8 algif_hash algif_skcipher af_alg bnep hid_logitech_hidpp hid_logitech_dj cfg80211 8021q garp mrp iptable_raw xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 xt_conntrack iptable_filter xt_MASQUERADE xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables sg vhost_net vhost vhost_iotlb tap snd_usb_audio snd_usbmidi_lib joydev snd_rawmidi snd_seq_device mc snd_ctl_led cdc_acm nvidia_drm(POE) nvidia_modeset(POE) dm_multipath dm_mod btusb btrtl btintel btbcm amd_atl intel_rapl_msr uas snd_hda_codec_realtek nct6775 intel_rapl_common nvidia(POE) usb_storage bluetooth nct6775_core snd_hda_codec_generic snd_hda_codec_hdmi hwmon_vid snd_hda_scodec_component edac_mce_amd snd_hda_intel snd_intel_dspcfg ghash_clmulni_intel snd_hda_codec sha512_ssse3 sha256_ssse3 snd_hda_core eeepc_wmi Apr 27 05:03:09 server kernel: asus_wmi sha1_ssse3 snd_hwdep sparse_keymap rapl sp5100_tco platform_profile snd_pcm rfkill i2c_piix4 snd_timer ccp snd efi_pstore wmi_bmof pcspkr i2c_smbus soundcore mac_hid efivarfs sd_mod xhci_pci nvme ahci xhci_hcd nvme_core libahci aesni_intel crypto_simd cryptd Apr 27 05:03:09 server kernel: CPU: 1 UID: 201 PID: 29173 Comm: WEB[5] Tainted: P OEL 6.14.4-gentoo-x86_64 #1 Apr 27 05:03:09 server kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP Apr 27 05:03:09 server kernel: Hardware name: System manufacturer System Product Name/TUF GAMING X570-PLUS (WI-FI), BIOS 5021 09/29/2024 Apr 27 05:03:09 server kernel: RIP: 0010:srso_alias_safe_ret+0x0/0x7 Apr 27 05:03:09 server kernel: Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <48> 8d 64 24 08 c3 cc e8 f4 ff ff ff 0f 0b cc cc cc cc cc cc cc cc Apr 27 05:03:09 server kernel: RSP: 0018:ffff9d0117fb7df8 EFLAGS: 00000246 Apr 27 05:03:09 server kernel: RAX: 0000000000000000 RBX: ffff897a319a1a90 RCX: 0000000000000000 Apr 27 05:03:09 server kernel: RDX: 0000000000000001 RSI: ffff9d0117fb7e50 RDI: 00000000ffffffff Apr 27 05:03:09 server kernel: RBP: ffff9d0117fb7e38 R08: 0000000000000008 R09: 0000000000000001 Apr 27 05:03:09 server kernel: R10: 0000000000000000 R11: ffff8996ae2b488c R12: ffff897a319a1a40 Apr 27 05:03:09 server kernel: R13: ffff9d0117fb7e88 R14: ffff9d0117fb7e50 R15: ffff9d0117fb7e38 Apr 27 05:03:09 server kernel: FS: 00007f8344dfe6c0(0000) GS:ffff8996ae280000(0000) knlGS:0000000000000000 Apr 27 05:03:09 server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Apr 27 05:03:09 server kernel: CR2: 0000722e71b0b030 CR3: 0000000979ea0000 CR4: 0000000000750ef0 Apr 27 05:03:09 server kernel: PKRU: 55555554 Apr 27 05:03:09 server kernel: Call Trace: Apr 27 05:03:09 server kernel: <TASK> Apr 27 05:03:09 server kernel: srso_alias_return_thunk+0x5/0xfbef5 Reproducible: Always Steps to Reproduce: 1.Install sys-kernel/gentoo-sources-6.14.4 2.Reboot into sys-kernel/gentoo-sources-6.14.4 3.Wait Actual Results: Logs begin showing message in Summary and Description repeatedly every 10-15 seconds and a single core is fully occupied. Initial application was Netdata (within Docker) then stopping this application in docker I saw Portainer's agent take up a full CPU core and message began later Expected Results: No soft lock messages I was able to resolve this by reverting back to sys-kernel/gentoo-sources-6.14.3 lshw will be attached as an attachment kernel-config will be attached as an attachment emerge --info will be attached as an attachment (I received an error when submitting that this was too long)
Created attachment 926355 [details] emerge --info
Created attachment 926356 [details] lshw
Created attachment 926357 [details] Same config as was used for 6.14.4 (I do a diff each kernel update)
Created attachment 926358 [details] docker system info (only change is kernel back to 6.14.3)
Can you do a git bisect between 6.14.3 and 6.14.4 ?
(In reply to Mike Pagano from comment #5) > Can you do a git bisect between 6.14.3 and 6.14.4 ? Hi Mike I should be able to work on bisecting tomorrow. I found the instructions for how to do this in the wiki
I expect to have bisect results in the next couple of hours. I get the soft lockup message within 15 minutes from bootup so I've been waiting 30+ minutes when no message is shown (kernel looks good) before considering that iteration good and continuing
Created attachment 926997 [details] git bisect results
git bisect attached The error started showing typically within 15 minutes of booting. During this time 2 VMs are started (kvm/qemu), one arch and the other Windows 11. The Windows 11 VM shuts back down prior to the error occurring but I'm unsure if related The first was "bad" all remaining were "good". In every case it was good I waited for 30+ minutes to ensure the error didn't occur before indicating a good bisect and recompiling then rebooting again
(In reply to Josh Solanes from comment #9) > git bisect attached Great job, and thank-you. I talked to the author and he asked if you could test with this patch: https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/
Any time Mike! I'll be able to test tomorrow morning
(In reply to Mike Pagano from comment #10) > (In reply to Josh Solanes from comment #9) > > git bisect attached > > Great job, and thank-you. I talked to the author and he asked if you could > test with this patch: > > https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/ I'm having trouble spotting a patch file in the thread I tried creating one with this: diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4bc264b854c4..1a5d1147f082 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, write_unlock_irq(&ep->lock); - if (!eavail && ep_schedule_timeout(to)) + if (!ep_schedule_timeout(to)) + timed_out = 1; + else if (!eavail) timed_out = !schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS); __set_current_state(TASK_RUNNING); ...but I get this message when applying: server /usr/src/linux-stable # patch -p1 < patch.c patching file fs/eventpoll.c Hunk #1 FAILED at 2111. 1 out of 1 hunk FAILED -- saving rejects to file fs/eventpoll.c.rej I haven't attempted to patch a linux kernel so it's very likely something I'm doing wrong
(In reply to Josh Solanes from comment #12) > (In reply to Mike Pagano from comment #10) > > (In reply to Josh Solanes from comment #9) > > > git bisect attached > > > > Great job, and thank-you. I talked to the author and he asked if you could > > test with this patch: > > > > https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/ > > I'm having trouble spotting a patch file in the thread > > I tried creating one with this: > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 4bc264b854c4..1a5d1147f082 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct > epoll_event __user *events, > > write_unlock_irq(&ep->lock); > > - if (!eavail && ep_schedule_timeout(to)) > + if (!ep_schedule_timeout(to)) > + timed_out = 1; > + else if (!eavail) > timed_out = !schedule_hrtimeout_range(to, slack, > > HRTIMER_MODE_ABS); > __set_current_state(TASK_RUNNING); > > > ...but I get this message when applying: > > server /usr/src/linux-stable # patch -p1 < patch.c > patching file fs/eventpoll.c > Hunk #1 FAILED at 2111. > 1 out of 1 hunk FAILED -- saving rejects to file fs/eventpoll.c.rej > > I haven't attempted to patch a linux kernel so it's very likely something > I'm doing wrong That's the one, can you just edit the file by hand and compile/install since it's such a small change?
(In reply to Mike Pagano from comment #13) > (In reply to Josh Solanes from comment #12) > > (In reply to Mike Pagano from comment #10) > > > (In reply to Josh Solanes from comment #9) > > > > git bisect attached > > > > > > Great job, and thank-you. I talked to the author and he asked if you could > > > test with this patch: > > > > > > https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/ > > > > I'm having trouble spotting a patch file in the thread > > > > I tried creating one with this: > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > index 4bc264b854c4..1a5d1147f082 100644 > > --- a/fs/eventpoll.c > > +++ b/fs/eventpoll.c > > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct > > epoll_event __user *events, > > > > write_unlock_irq(&ep->lock); > > > > - if (!eavail && ep_schedule_timeout(to)) > > + if (!ep_schedule_timeout(to)) > > + timed_out = 1; > > + else if (!eavail) > > timed_out = !schedule_hrtimeout_range(to, slack, > > > > HRTIMER_MODE_ABS); > > __set_current_state(TASK_RUNNING); > > > > > > ...but I get this message when applying: > > > > server /usr/src/linux-stable # patch -p1 < patch.c > > patching file fs/eventpoll.c > > Hunk #1 FAILED at 2111. > > 1 out of 1 hunk FAILED -- saving rejects to file fs/eventpoll.c.rej > > > > I haven't attempted to patch a linux kernel so it's very likely something > > I'm doing wrong > > That's the one, can you just edit the file by hand and compile/install since > it's such a small change? Sure working on it now, should start testing soon. Thanks Mike!
(In reply to Mike Pagano from comment #13) > (In reply to Josh Solanes from comment #12) > > (In reply to Mike Pagano from comment #10) > > > (In reply to Josh Solanes from comment #9) > > > > git bisect attached > > > > > > Great job, and thank-you. I talked to the author and he asked if you could > > > test with this patch: > > > > > > https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/ > > > > I'm having trouble spotting a patch file in the thread > > > > I tried creating one with this: > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > index 4bc264b854c4..1a5d1147f082 100644 > > --- a/fs/eventpoll.c > > +++ b/fs/eventpoll.c > > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct > > epoll_event __user *events, > > > > write_unlock_irq(&ep->lock); > > > > - if (!eavail && ep_schedule_timeout(to)) > > + if (!ep_schedule_timeout(to)) > > + timed_out = 1; > > + else if (!eavail) > > timed_out = !schedule_hrtimeout_range(to, slack, > > > > HRTIMER_MODE_ABS); > > __set_current_state(TASK_RUNNING); > > > > > > ...but I get this message when applying: > > > > server /usr/src/linux-stable # patch -p1 < patch.c > > patching file fs/eventpoll.c > > Hunk #1 FAILED at 2111. > > 1 out of 1 hunk FAILED -- saving rejects to file fs/eventpoll.c.rej > > > > I haven't attempted to patch a linux kernel so it's very likely something > > I'm doing wrong > > That's the one, can you just edit the file by hand and compile/install since > it's such a small change? Looks good now! 1h34m and no issues on the modified kernel. Thank you!
(In reply to Josh Solanes from comment #15) > (In reply to Mike Pagano from comment #13) > > (In reply to Josh Solanes from comment #12) > > > (In reply to Mike Pagano from comment #10) > > > > (In reply to Josh Solanes from comment #9) > > > > > git bisect attached > > > > > > > > Great job, and thank-you. I talked to the author and he asked if you could > > > > test with this patch: > > > > > > > > https://lore.kernel.org/linux-fsdevel/aA-xutxtw3jd00Bz@LQ3V64L9R2/ > > > > > > I'm having trouble spotting a patch file in the thread > > > > > > I tried creating one with this: > > > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > > index 4bc264b854c4..1a5d1147f082 100644 > > > --- a/fs/eventpoll.c > > > +++ b/fs/eventpoll.c > > > @@ -2111,7 +2111,9 @@ static int ep_poll(struct eventpoll *ep, struct > > > epoll_event __user *events, > > > > > > write_unlock_irq(&ep->lock); > > > > > > - if (!eavail && ep_schedule_timeout(to)) > > > + if (!ep_schedule_timeout(to)) > > > + timed_out = 1; > > > + else if (!eavail) > > > timed_out = !schedule_hrtimeout_range(to, slack, > > > > > > HRTIMER_MODE_ABS); > > > __set_current_state(TASK_RUNNING); > > > > > > > > > ...but I get this message when applying: > > > > > > server /usr/src/linux-stable # patch -p1 < patch.c > > > patching file fs/eventpoll.c > > > Hunk #1 FAILED at 2111. > > > 1 out of 1 hunk FAILED -- saving rejects to file fs/eventpoll.c.rej > > > > > > I haven't attempted to patch a linux kernel so it's very likely something > > > I'm doing wrong > > > > That's the one, can you just edit the file by hand and compile/install since > > it's such a small change? > > Looks good now! 1h34m and no issues on the modified kernel. Thank you! Fantastic, I'll let the upstream author know.
Queued up for the next release (included with 6.14.5)