Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 706572 - sys-kernel/gentoo-sources-4.19.86 - WARNING: CPU: 1 PID: 20926 at net/ipv4/tcp_output.c:911 tcp_wfree+0x29/0xe2
Summary: sys-kernel/gentoo-sources-4.19.86 - WARNING: CPU: 1 PID: 20926 at net/ipv4/tc...
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://lkml.org/lkml/2020/2/24/130
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-01-26 22:10 UTC by Vieri
Modified: 2021-02-16 10:12 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel syslog (kernel_log.txt,26.46 KB, text/plain)
2020-01-26 22:10 UTC, Vieri
Details
kernel .config (config,127.59 KB, text/plain)
2020-02-04 11:07 UTC, Vieri
Details
kernel syslog (syslog.txt,81.01 KB, text/plain)
2020-02-04 12:51 UTC, Vieri
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vieri 2020-01-26 22:10:34 UTC
Created attachment 605028 [details]
kernel syslog

Same hardware running on older gentoo-sources for years without issues. I've recently upgraded to 4.19.86-gentoo-x86_64, and after a week or so, I got a kernel panic and system freeze.
It seems to be network-related.

Kernel panic log:

[see attached file]

Even when not reaching system freeze, I see the following messages in syslog every now and then:

Jan 26 17:21:34 kernel: ------------[ cut here ]------------
Jan 26 17:21:34 kernel: WARNING: CPU: 1 PID: 20926 at net/ipv4/tcp_output.c:911 tcp_wfree+0x29/0xe2
Jan 26 17:21:34 kernel: Modules linked in: arc4 ecb md4 sha512_ssse3 sha512_generic cmac cifs ccm fscache nfnetlink_queue autofs4 xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox
Jan 26 17:21:34 kernel:  ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables sha256_ssse3 sha256_generic mcryptd sha1_ssse3 sha1_generic ipv6 arptable_filter arp_tables xt_iface(O) xt_conntrack iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_filter ip_tables x_tables sch_fq_codel bpfilter sch_fq snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_pcm k8temp snd_timer parport_pc floppy ohci_pci parport fan snd asus_atk0110 ohci_hcd soundcore thermal ehci_pci button ehci_hcd ata_generic i2c_nforce2 pata_amd pata_acpi msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash lzo_compress
Jan 26 17:21:34 kernel:  zlib_deflate sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Jan 26 17:21:34 kernel: CPU: 1 PID: 20926 Comm: W#01 Tainted: G           O      4.19.86-gentoo-x86_64 #1
Jan 26 17:21:34 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010
Jan 26 17:21:34 kernel: RIP: 0010:tcp_wfree+0x29/0xe2
Jan 26 17:21:34 kernel: Code: c3 55 53 8b 87 e0 00 00 00 48 8b 6f 18 ff c8 f0 29 85 44 01 00 00 0f 88 0f 4e 08 00 75 0e 48 c7 c7 83 3b d8 81 e8 b6 30 9c ff <0f> 0b 8b 85 44 01 00 00 3d 40 02 00 00 76 1a 65 48 8b 05 cc a8 95
Jan 26 17:21:34 kernel: RSP: 0000:ffff88811fc83ee8 EFLAGS: 00010246
Jan 26 17:21:34 kernel: RAX: 0000000000000024 RBX: ffff88807ffceee8 RCX: 0000000000000000
Jan 26 17:21:34 kernel: RDX: 0000000000000000 RSI: ffff88811fc952d8 RDI: ffff88811fc952d8
Jan 26 17:21:34 kernel: RBP: ffff8880376e2600 R08: 0000000000000001 R09: 0000000000009c00
Jan 26 17:21:34 kernel: R10: 0000000000000000 R11: 0000000000000044 R12: ffff88807ffceee8
Jan 26 17:21:34 kernel: R13: 0000000000000000 R14: 0000000000000002 R15: 0000000000000002
Jan 26 17:21:34 kernel: FS:  00007f35b6f15700(0000) GS:ffff88811fc80000(0000) knlGS:0000000000000000
Jan 26 17:21:34 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 26 17:21:34 kernel: CR2: 000055ba3dcdf848 CR3: 000000010f2dc000 CR4: 00000000000006e0
Jan 26 17:21:34 kernel: Call Trace:
Jan 26 17:21:34 kernel:  <IRQ>
Jan 26 17:21:34 kernel:  skb_release_head_state+0x74/0xa4
Jan 26 17:21:34 kernel:  skb_release_all+0xa/0x20
Jan 26 17:21:34 kernel:  __kfree_skb+0xa/0x14
Jan 26 17:21:34 kernel:  net_tx_action+0xff/0x1bc
Jan 26 17:21:34 kernel:  __do_softirq+0x114/0x267
Jan 26 17:21:34 kernel:  irq_exit+0x58/0x64
Jan 26 17:21:34 kernel:  do_IRQ+0xaa/0xc8
Jan 26 17:21:34 kernel:  common_interrupt+0xf/0xf
Jan 26 17:21:34 kernel:  </IRQ>
Jan 26 17:21:34 kernel: RIP: 0033:0x564f07ea47e7
Jan 26 17:21:34 kernel: Code: 49 89 fc 55 48 89 cd 53 48 89 d3 4c 8b b2 a0 00 00 00 eb 12 0f 1f 80 00 00 00 00 41 80 7e 01 00 75 41 49 83 c6 10 41 0f b6 16 <49> 8b 4e 08 48 89 ee 4c 89 e7 48 8d 04 d5 00 00 00 00 48 29 d0 48
Jan 26 17:21:34 kernel: RSP: 002b:00007f35b6f13fb0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffd7
Jan 26 17:21:34 kernel: RAX: 0000564f093d71f0 RBX: 0000564f093d6880 RCX: 00007f359c009bc0
Jan 26 17:21:34 kernel: RDX: 0000000000000004 RSI: 0000564f0ffc7f90 RDI: 00007f35940d3110
Jan 26 17:21:34 kernel: RBP: 00007f359c009bc0 R08: 00007f35b6f14100 R09: 00007f35b6f14100
Jan 26 17:21:34 kernel: R10: 0000000001080007 R11: 0000000000005885 R12: 00007f35940d3110
Jan 26 17:21:34 kernel: R13: 0000564f08319400 R14: 0000564f0ffc7f50 R15: 00007f3574328aa0
Jan 26 17:21:34 kernel: ---[ end trace f5d35299bace3ecb ]---

Seems to have to do with IRQs and networking.

Is this a bug?
I haven't tried vanilla-source, and this is why I'm filing a report here and not on the Kernel ML.

Thanks
Comment 1 Mike Pagano gentoo-dev 2020-01-26 22:34:25 UTC
This could be fixed in later kernels.
I would try upgrading to the latest gentoo-sources which is 4.19.98 as of this writing.

You could also try the vanilla version of this kernel.

Let us know if the oops till happens
Comment 2 Jeroen Roovers (RETIRED) gentoo-dev 2020-01-27 00:35:23 UTC
A kernel panic halts the kernel. Your kernel does not halt. There is no panic.
Comment 3 Vieri 2020-01-27 10:01:33 UTC
(In reply to Jeroen Roovers from comment #2)
> A kernel panic halts the kernel. Your kernel does not halt. There is no
> panic.

Why do you say that?
As clearly reported in my first post, my system halted because there was a kernel panic. The system was useless. It did not respond to anything. I had to hard-reboot it.

The attached file shows the log just before the kernel halted.

The other log snippet happens once in a while (very variable time periods), but that does not halt the system. However, it shows that there's something to worry about, and it also *seems* related to the kernel panic I experienced (related to network IRQs).

Anyway, I am currently in the process of updating to the latest stable gentoo-sources.

In any case, this bug report *is* about a kernel panic.
Comment 4 Vieri 2020-01-28 17:18:22 UTC
This morning I rebooted the system with the new kernel. So this is day 1, and I've already spotted 2 glitches:

Jan 28 14:05:50 kernel: ------------[ cut here ]------------
Jan 28 14:05:50 kernel: WARNING: CPU: 0 PID: 5410 at net/ipv4/tcp_output.c:915 tcp_wfree+0x29/0xe2
Jan 28 14:05:50 kernel: Modules linked in: arc4 ecb md4 sha512_ssse3 sha512_generic cmac cifs ccm fscache nfnetlink_queue autofs4 xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox
Jan 28 14:05:50 kernel:  ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables sha256_ssse3 sha256_generic mcryptd sha1_ssse3 sha1_generic ipv6 arptable_filter arp_tables xt_iface(O) xt_conntrack iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_filter ip_tables x_tables bpfilter sch_fq_codel sch_fq snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_pcm snd_timer snd k8temp ohci_pci parport_pc soundcore floppy ohci_hcd parport asus_atk0110 thermal ehci_pci fan ehci_hcd button i2c_nforce2 ata_generic pata_amd pata_acpi msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash lzo_compress
Jan 28 14:05:50 kernel:  zlib_deflate sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Jan 28 14:05:50 kernel: CPU: 0 PID: 5410 Comm: proftpd Tainted: G           O      4.19.97-gentoo-x86_64 #1
Jan 28 14:05:50 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010
Jan 28 14:05:50 kernel: RIP: 0010:tcp_wfree+0x29/0xe2
Jan 28 14:05:50 kernel: Code: c3 55 53 8b 87 e0 00 00 00 48 8b 6f 18 ff c8 f0 29 85 44 01 00 00 0f 88 b5 4e 08 00 75 0e 48 c7 c7 c3 3e d8 81 e8 2c 22 9c ff <0f> 0b 8b 85 44 01 00 00 3d 40 02 00 00 76 1a 65 48 8b 05 9a 99 95
Jan 28 14:05:50 kernel: RSP: 0000:ffff88811fc03df0 EFLAGS: 00010246
Jan 28 14:05:50 kernel: RAX: 0000000000000024 RBX: ffff88805c4ddc00 RCX: 0000000000000000
Jan 28 14:05:50 kernel: RDX: 0000000000000000 RSI: ffff88811fc152d8 RDI: ffff88811fc152d8
Jan 28 14:05:50 kernel: RBP: ffff888005e9f440 R08: 0000000000000001 R09: 000000000000fc00
Jan 28 14:05:50 kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 0000000000000000
Jan 28 14:05:50 kernel: R13: ffff88811af107c0 R14: 000000000000003e R15: ffff88811af10000
Jan 28 14:05:50 kernel: FS:  00007f99b34c7740(0000) GS:ffff88811fc00000(0000) knlGS:0000000000000000
Jan 28 14:05:50 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 28 14:05:50 kernel: CR2: 00007f99b3d20000 CR3: 0000000074504000 CR4: 00000000000006f0
Jan 28 14:05:50 kernel: Call Trace:
Jan 28 14:05:50 kernel:  <IRQ>
Jan 28 14:05:50 kernel:  skb_release_head_state+0x74/0xa4
Jan 28 14:05:50 kernel:  skb_release_all+0xa/0x20
Jan 28 14:05:50 kernel:  __kfree_skb+0xa/0x14
Jan 28 14:05:50 kernel:  e1000_put_txbuf+0x73/0x86
Jan 28 14:05:50 kernel:  e1000_clean_tx_irq+0xb4/0x23f
Jan 28 14:05:50 kernel:  e1000e_poll+0x5a/0x223
Jan 28 14:05:50 kernel:  net_rx_action+0x12e/0x305
Jan 28 14:05:50 kernel:  __do_softirq+0x114/0x267
Jan 28 14:05:50 kernel:  irq_exit+0x58/0x64
Jan 28 14:05:50 kernel:  do_IRQ+0xaa/0xc8
Jan 28 14:05:50 kernel:  common_interrupt+0xf/0xf
Jan 28 14:05:50 kernel:  </IRQ>
Jan 28 14:05:50 kernel: RIP: 0033:0x7f99b3d39917
Jan 28 14:05:50 kernel: Code: 00 66 90 8b b5 f4 02 00 00 85 f6 0f 84 c2 00 00 00 48 8b 45 70 c7 44 24 74 00 00 00 00 48 c7 44 24 78 00 00 00 00 48 8b 40 08 <48> 89 44 24 18 48 8b 45 68 48 8b 40 08 48 89 44 24 10 48 8b 85 00
Jan 28 14:05:50 kernel: RSP: 002b:00007ffc803179b0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffffde
Jan 28 14:05:50 kernel: RAX: 00007f99b3ce32d0 RBX: 0000000000000001 RCX: 0000000000000000
Jan 28 14:05:50 kernel: RDX: 0000000000000000 RSI: 000000000000000f RDI: 0000564024f736c6
Jan 28 14:05:50 kernel: RBP: 00007f99b3d1c000 R08: 0000000000000001 R09: 00007f99b3d593f0
Jan 28 14:05:50 kernel: R10: 00007f99b3d59130 R11: 00007ffc80317b88 R12: 0000564024f75ada
Jan 28 14:05:50 kernel: R13: 0000000000000018 R14: 00007ffc80317ae0 R15: 00007f99b351d8b8
Jan 28 14:05:50 kernel: ---[ end trace b7d8a2809485a990 ]---

Jan 28 14:54:00 kernel: ------------[ cut here ]------------
Jan 28 14:54:00 kernel: WARNING: CPU: 0 PID: 0 at net/ipv4/tcp_output.c:915 tcp_wfree+0x29/0xe2
Jan 28 14:54:00 kernel: Modules linked in: arc4 ecb md4 sha512_ssse3 sha512_generic cmac cifs ccm fscache nfnetlink_queue autofs4 xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox
Jan 28 14:54:00 kernel:  ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables sha256_ssse3 sha256_generic mcryptd sha1_ssse3 sha1_generic ipv6 arptable_filter arp_tables xt_iface(O) xt_conntrack iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw iptable_filter ip_tables x_tables bpfilter sch_fq_codel sch_fq snd_hda_codec_analog snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_pcm snd_timer snd k8temp ohci_pci parport_pc soundcore floppy ohci_hcd parport asus_atk0110 thermal ehci_pci fan ehci_hcd button i2c_nforce2 ata_generic pata_amd pata_acpi msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash lzo_compress
Jan 28 14:54:00 kernel:  zlib_deflate sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Jan 28 14:54:00 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  O      4.19.97-gentoo-x86_64 #1
Jan 28 14:54:00 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010
Jan 28 14:54:00 kernel: RIP: 0010:tcp_wfree+0x29/0xe2
Jan 28 14:54:00 kernel: Code: c3 55 53 8b 87 e0 00 00 00 48 8b 6f 18 ff c8 f0 29 85 44 01 00 00 0f 88 b5 4e 08 00 75 0e 48 c7 c7 c3 3e d8 81 e8 2c 22 9c ff <0f> 0b 8b 85 44 01 00 00 3d 40 02 00 00 76 1a 65 48 8b 05 9a 99 95
Jan 28 14:54:00 kernel: RSP: 0018:ffff88811fc03df0 EFLAGS: 00010246
Jan 28 14:54:00 kernel: RAX: 0000000000000024 RBX: ffff8881171be000 RCX: 0000000000000000
Jan 28 14:54:00 kernel: RDX: 0000000000000000 RSI: ffff88811fc152d8 RDI: ffff88811fc152d8
Jan 28 14:54:00 kernel: RBP: ffff88808a4604c0 R08: 0000000000000001 R09: 0000000000005300
Jan 28 14:54:00 kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 0000000000000000
Jan 28 14:54:00 kernel: R13: ffff88811af107c0 R14: 000000000000003e R15: ffff88811af10000
Jan 28 14:54:00 kernel: FS:  0000000000000000(0000) GS:ffff88811fc00000(0000) knlGS:0000000000000000
Jan 28 14:54:00 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 28 14:54:00 kernel: CR2: 00007f654a52f000 CR3: 0000000117384000 CR4: 00000000000006f0
Jan 28 14:54:00 kernel: Call Trace:
Jan 28 14:54:00 kernel:  <IRQ>
Jan 28 14:54:00 kernel:  skb_release_head_state+0x74/0xa4
Jan 28 14:54:00 kernel:  skb_release_all+0xa/0x20
Jan 28 14:54:00 kernel:  __kfree_skb+0xa/0x14
Jan 28 14:54:00 kernel:  e1000_put_txbuf+0x73/0x86
Jan 28 14:54:00 kernel:  e1000_clean_tx_irq+0xb4/0x23f
Jan 28 14:54:00 kernel:  e1000e_poll+0x5a/0x223
Jan 28 14:54:00 kernel:  net_rx_action+0x12e/0x305
Jan 28 14:54:00 kernel:  __do_softirq+0x114/0x267
Jan 28 14:54:00 kernel:  irq_exit+0x58/0x64
Jan 28 14:54:00 kernel:  do_IRQ+0xaa/0xc8
Jan 28 14:54:00 kernel:  common_interrupt+0xf/0xf
Jan 28 14:54:00 kernel:  </IRQ>
Jan 28 14:54:00 kernel: RIP: 0010:default_idle+0x9b/0x122
Jan 28 14:54:00 kernel: Code: 3b 00 eb e0 e8 07 5b 94 ff 89 ee 48 c7 c7 20 c2 03 82 e8 dc 1a 94 ff 8b 05 5a 27 d2 00 85 c0 7e 07 0f 00 2d b9 e4 4b 00 fb f4 <65> 44 8b 25 5d 9c 8c 7e 8b 05 8f a9 9b 00 85 c0 7e 70 65 8b 05 4c
Jan 28 14:54:00 kernel: RSP: 0018:ffffffff82003ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde
Jan 28 14:54:00 kernel: RAX: 0000000000000000 RBX: ffffffff82011780 RCX: ffffffff82035950
Jan 28 14:54:00 kernel: RDX: 00000000239745ba RSI: 0000000000000000 RDI: 0000000000000000
Jan 28 14:54:00 kernel: RBP: 0000000000000000 R08: 00000ffc5f8d457c R09: 0000000000000400
Jan 28 14:54:00 kernel: R10: ffff88808249f500 R11: 0000000000000002 R12: 0000000000000000
Jan 28 14:54:00 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jan 28 14:54:00 kernel:  do_idle+0xb2/0x172
Jan 28 14:54:00 kernel:  cpu_startup_entry+0x6a/0x6c
Jan 28 14:54:00 kernel:  start_kernel+0x480/0x49e
Jan 28 14:54:00 kernel:  secondary_startup_64+0xa4/0xb0
Jan 28 14:54:00 kernel: ---[ end trace b7d8a2809485a991 ]---

The kernel hasn't panicked *yet* because it will probably behave as with the previous one. After a week or so, I may get a system freeze with a similar message as the one attached to this report.
Even without a system freeze, having these messages show up in the log is a showstopper.

What do you suggest I try next?
Should I try vanilla-sources, or should I go for gentoo-sources 5.5.0?
Comment 5 Thomas Deutschmann (RETIRED) gentoo-dev 2020-01-28 18:16:13 UTC
No need to test vanilla sources: Gentoo-sources only add stuff, we usually don't patch existing code.

So I would recommend to test v5.5 to see if this is already fixed or not. In either way you will have to bisect kernel at the end: If it's fixed in 5.5 we probably want to identify the fix so this can get backported to LTS kernels. If it isn't fixed yet, we need to identify first bad commit causing that problem to find a fix.
Comment 6 Vieri 2020-01-29 10:03:49 UTC
Booted 5.5 today.

Getting the same behavior as in my previous message. The messages are not the same, but they still seem to be related to networking:

Jan 29 09:11:14 kernel: ------------[ cut here ]------------
Jan 29 09:11:14 kernel: refcount_t: addition on 0; use-after-free.
Jan 29 09:11:14 kernel: WARNING: CPU: 0 PID: 25403 at lib/refcount.c:25 refcount_warn_saturate+0x88/0xe8
Jan 29 09:11:14 kernel: Modules linked in: nfnetlink_queue autofs4 xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables sha1_ssse3 sha1_generic ipv6 arptable_filter arp_tables xt_iface(O) xt_conntrack iptable_mangle iptable_nat nf_nat
Jan 29 09:11:14 kernel:  nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_raw iptable_filter ip_tables x_tables sch_fq_codel sch_fq bpfilter snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore k8temp parport_pc ohci_pci ohci_hcd floppy parport ehci_pci thermal ehci_hcd asus_atk0110 fan ata_generic i2c_nforce2 button pata_amd pata_acpi msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash lzo_compress zlib_deflate sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Jan 29 09:11:14 kernel: CPU: 0 PID: 25403 Comm: TX#01 Tainted: G           O      5.5.0-gentoo-x86_64 #1
Jan 29 09:11:14 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010
Jan 29 09:11:14 kernel: RIP: 0010:refcount_warn_saturate+0x88/0xe8
Jan 29 09:11:14 kernel: Code: 05 4b c7 d7 00 01 e8 5e da ca ff 0f 0b c3 80 3d 3b c7 d7 00 00 75 72 48 c7 c7 14 6f df 81 c6 05 2b c7 d7 00 01 e8 3f da ca ff <0f> 0b c3 80 3d 1b c7 d7 00 00 75 53 48 c7 c7 40 6f df 81 c6 05 0b
Jan 29 09:11:14 kernel: RSP: 0018:ffffc900002bf888 EFLAGS: 00010282
Jan 29 09:11:14 kernel: RAX: 0000000000000000 RBX: ffff888118373500 RCX: 0000000000000007
Jan 29 09:11:14 kernel: RDX: 0000000000001b14 RSI: ffffc900002bf774 RDI: ffff88811fc18620
Jan 29 09:11:14 kernel: RBP: ffffc900002bf908 R08: 0000000000000001 R09: 000000000000fd00
Jan 29 09:11:14 kernel: R10: 0000000000000000 R11: 000000000000004c R12: ffff888118373500
Jan 29 09:11:14 kernel: R13: ffff888117d5b100 R14: ffffffffa06b4300 R15: 0000000000000068
Jan 29 09:11:14 kernel: FS:  00007fd65d0eb700(0000) GS:ffff88811fc00000(0000) knlGS:0000000000000000
Jan 29 09:11:14 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 09:11:14 kernel: CR2: 00007ff1e9b1f000 CR3: 00000001180e6000 CR4: 00000000000006f0
Jan 29 09:11:14 kernel: Call Trace:
Jan 29 09:11:14 kernel:  nf_queue_entry_get_refs+0x60/0xa0
Jan 29 09:11:14 kernel:  nf_queue+0xcf/0x202
Jan 29 09:11:14 kernel:  ? dst_mtu+0xd/0xd
Jan 29 09:11:14 kernel:  nf_reinject+0x187/0x194
Jan 29 09:11:14 kernel:  nfqnl_recv_verdict+0x37f/0x3a5 [nfnetlink_queue]
Jan 29 09:11:14 kernel:  nfnetlink_rcv_msg+0x164/0x20a [nfnetlink]
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? nfnetlink_net_init+0x8c/0x8c [nfnetlink]
Jan 29 09:11:14 kernel:  netlink_rcv_skb+0x7d/0xd1
Jan 29 09:11:14 kernel:  nfnetlink_rcv+0x10f/0x130 [nfnetlink]
Jan 29 09:11:14 kernel:  netlink_unicast+0x10c/0x1a5
Jan 29 09:11:14 kernel:  netlink_sendmsg+0x29d/0x2d3
Jan 29 09:11:14 kernel:  sock_sendmsg_nosec+0x20/0x2a
Jan 29 09:11:14 kernel:  ____sys_sendmsg+0xe6/0x14f
Jan 29 09:11:14 kernel:  ? copy_msghdr_from_user+0xfe/0x128
Jan 29 09:11:14 kernel:  ___sys_sendmsg+0x7a/0xb2
Jan 29 09:11:14 kernel:  ? do_futex+0x208/0x940
Jan 29 09:11:14 kernel:  ? common_interrupt+0xa/0xf
Jan 29 09:11:14 kernel:  __sys_sendmsg+0x4c/0x7f
Jan 29 09:11:14 kernel:  do_syscall_64+0x15d/0x189
Jan 29 09:11:14 kernel:  ? __up_read+0x12/0x3b
Jan 29 09:11:14 kernel:  ? __do_page_fault+0x2f6/0x38a
Jan 29 09:11:14 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 29 09:11:14 kernel: RIP: 0033:0x7fd6641162e1
Jan 29 09:11:14 kernel: Code: 00 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 26 e9 ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 48 89 44 24 08 e8 5a e9 ff ff 48
Jan 29 09:11:14 kernel: RSP: 002b:00007fd65d0e9820 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Jan 29 09:11:14 kernel: RAX: ffffffffffffffda RBX: 00007fd65d0e9900 RCX: 00007fd6641162e1
Jan 29 09:11:14 kernel: RDX: 0000000000000000 RSI: 00007fd65d0e9870 RDI: 0000000000000005
Jan 29 09:11:14 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000301
Jan 29 09:11:14 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jan 29 09:11:14 kernel: R13: 00007fd650268dd8 R14: 000000000a000000 R15: 0000000000000001
Jan 29 09:11:14 kernel: ---[ end trace b074762294df7e7d ]---
Jan 29 09:11:14 kernel: ------------[ cut here ]------------
Jan 29 09:11:14 kernel: refcount_t: underflow; use-after-free.
Jan 29 09:11:14 kernel: WARNING: CPU: 1 PID: 25394 at lib/refcount.c:28 refcount_warn_saturate+0xa7/0xe8
Jan 29 09:11:14 kernel: Modules linked in: nfnetlink_queue autofs4 xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel ip6table_filter ip6_tables sha1_ssse3 sha1_generic ipv6 arptable_filter arp_tables xt_iface(O) xt_conntrack iptable_mangle iptable_nat nf_nat
Jan 29 09:11:14 kernel:  nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_raw iptable_filter ip_tables x_tables sch_fq_codel sch_fq bpfilter snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore k8temp parport_pc ohci_pci ohci_hcd floppy parport ehci_pci thermal ehci_hcd asus_atk0110 fan ata_generic i2c_nforce2 button pata_amd pata_acpi msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash lzo_compress zlib_deflate sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Jan 29 09:11:14 kernel: CPU: 1 PID: 25394 Comm: RX-NFQ#0 Tainted: G        W  O      5.5.0-gentoo-x86_64 #1
Jan 29 09:11:14 kernel: Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010
Jan 29 09:11:14 kernel: RIP: 0010:refcount_warn_saturate+0xa7/0xe8
Jan 29 09:11:14 kernel: Code: 05 2b c7 d7 00 01 e8 3f da ca ff 0f 0b c3 80 3d 1b c7 d7 00 00 75 53 48 c7 c7 40 6f df 81 c6 05 0b c7 d7 00 01 e8 20 da ca ff <0f> 0b c3 80 3d fb c6 d7 00 00 75 34 48 c7 c7 68 6f df 81 c6 05 eb
Jan 29 09:11:14 kernel: RSP: 0018:ffffc90000817900 EFLAGS: 00010286
Jan 29 09:11:14 kernel: RAX: 0000000000000000 RBX: ffff888118373500 RCX: 0000000000000007
Jan 29 09:11:14 kernel: RDX: 0000000000001b52 RSI: ffffc900008177ec RDI: ffff88811fc98620
Jan 29 09:11:14 kernel: RBP: ffff888118373500 R08: 0000000000000001 R09: 0000000000001500
Jan 29 09:11:14 kernel: R10: 0000000000000000 R11: 0000000000000048 R12: 0000000000000001
Jan 29 09:11:14 kernel: R13: ffff888117d5b100 R14: ffff8881161bd9c0 R15: ffff888118373500
Jan 29 09:11:14 kernel: FS:  00007fd6618f4700(0000) GS:ffff88811fc80000(0000) knlGS:0000000000000000
Jan 29 09:11:14 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 09:11:14 kernel: CR2: 00007fd657172000 CR3: 00000001180e6000 CR4: 00000000000006e0
Jan 29 09:11:14 kernel: Call Trace:
Jan 29 09:11:14 kernel:  nf_queue_entry_release_refs+0x62/0xa2
Jan 29 09:11:14 kernel:  nf_reinject+0x5d/0x194
Jan 29 09:11:14 kernel:  nfqnl_recv_verdict+0x37f/0x3a5 [nfnetlink_queue]
Jan 29 09:11:14 kernel:  nfnetlink_rcv_msg+0x164/0x20a [nfnetlink]
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x40/0x70
Jan 29 09:11:14 kernel:  ? __switch_to_asm+0x34/0x70
Jan 29 09:11:14 kernel:  ? nfnetlink_net_init+0x8c/0x8c [nfnetlink]
Jan 29 09:11:14 kernel:  netlink_rcv_skb+0x7d/0xd1
Jan 29 09:11:14 kernel:  nfnetlink_rcv+0x10f/0x130 [nfnetlink]
Jan 29 09:11:14 kernel:  netlink_unicast+0x10c/0x1a5
Jan 29 09:11:14 kernel:  netlink_sendmsg+0x29d/0x2d3
Jan 29 09:11:14 kernel:  sock_sendmsg_nosec+0x20/0x2a
Jan 29 09:11:14 kernel:  ____sys_sendmsg+0xe6/0x14f
Jan 29 09:11:14 kernel:  ? copy_msghdr_from_user+0xfe/0x128
Jan 29 09:11:14 kernel:  ___sys_sendmsg+0x7a/0xb2
Jan 29 09:11:14 kernel:  ? netlink_recvmsg+0x2b2/0x2e0
Jan 29 09:11:14 kernel:  __sys_sendmsg+0x4c/0x7f
Jan 29 09:11:14 kernel:  do_syscall_64+0x15d/0x189
Jan 29 09:11:14 kernel:  ? copy_kernel_to_fpregs+0x21/0x2a
Jan 29 09:11:14 kernel:  ? switch_fpu_return+0x54/0x6b
Jan 29 09:11:14 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jan 29 09:11:14 kernel: RIP: 0033:0x7fd6641162e1
Jan 29 09:11:14 kernel: Code: 00 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 26 e9 ff ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2c 44 89 c7 48 89 44 24 08 e8 5a e9 ff ff 48
Jan 29 09:11:14 kernel: RSP: 002b:00007fd6618f1e80 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
Jan 29 09:11:14 kernel: RAX: ffffffffffffffda RBX: 00007fd6618f1f60 RCX: 00007fd6641162e1
Jan 29 09:11:14 kernel: RDX: 0000000000000000 RSI: 00007fd6618f1ed0 RDI: 0000000000000005
Jan 29 09:11:14 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000301
Jan 29 09:11:14 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jan 29 09:11:14 kernel: R13: 00007fd650268dd8 R14: 0000000000000000 R15: 0000000000000000
Jan 29 09:11:14 kernel: ---[ end trace b074762294df7e7e ]---

No system freeze yet.

Should I try downgrading to gentoo-sources 4.9.203 or even all the way down to 4.4.203?
Comment 7 Vieri 2020-02-04 08:06:58 UTC
I was wondering if these errors could have something to do with old failing hardware. So I installed Gentoo on a brand new high-end server, and got similar error messages in syslog. The system hasn't panicked yet, but I'm getting messages such as:

Feb  3 15:41:55 kernel: ------------[ cut here ]------------
Feb  3 15:41:55 kernel: WARNING: CPU: 14 PID: 0 at net/ipv4/tcp_output.c:915 tcp_wfree+0x29/0xe2
Feb  3 15:41:55 kernel: Modules linked in: arc4 ecb md4 sha512_ssse3 sha512_generic cmac cifs ccm fscache autofs4 nfnetlink_queue xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_iface(O) xt_statistic xt_connmark xt_TARPIT(O) xt_comment xt_iprange xt_geoip(O) xt_set xt_NFQUEUE arptable_filter arp_tables ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc iptable_nat nf_nat_ipv4 xt_mark iptable_mangle xt_TCPMSS xt_hashlimit xt_tcpudp xt_CT iptable_raw xt_multiport xt_conntrack nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink
Feb  3 15:41:55 kernel:  nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppoe pppox ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port ip_set_hash_net ip_set_hash_ip ip_set nfnetlink sch_fq_codel sch_fq iptable_filter ip_tables x_tables bpfilter mlx5_core mlxfw tls strparser sha1_mb mcryptd sha1_ssse3 sha1_generic ipv6 crct10dif_pclmul bnxt_en ghash_clmulni_intel ixgbe i2c_piix4 ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq button aesni_intel crypto_simd cryptd glue_helper aes_x86_64 algif_rng algif_aead algif_hash algif_skcipher af_alg xts crc32c_intel crc32_pclmul crc32_generic sha256_generic msdos configfs fuse f2fs jfs btrfs zstd_decompress zstd_compress xxhash
Feb  3 15:41:55 kernel:  lzo_compress zlib_deflate multipath dm_zero dm_verity dm_thin_pool dm_persistent_data dm_snapshot dm_raid dm_mirror dm_region_hash dm_log dm_flakey dm_delay dm_crypt dm_bufio dm_bio_prison dm_mod dax hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd ohci_hcd uhci_hcd usb_storage xhci_pci xhci_hcd ehci_pci ehci_hcd pata_sl82c105 pata_via pata_jmicron pata_marvell pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_platform pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar
Feb  3 15:41:55 kernel:  pata_rz1000 pata_sil680 pata_radisys pata_pdc2027x pata_mpiix aic94xx libsas lpfc crc_t10dif crct10dif_common qla2xxx megaraid_mbox megaraid_mm aacraid sx8 DAC960 hpsa 3w_9xxx 3w_xxxx 3w_sas mptsas mptfc scsi_transport_fc atp870u dc395x qla1280 dmx3191d sym53c8xx gdth initio BusLogic arcmsr aic7xxx aic79xx sg mpt3sas raid_class scsi_transport_sas megaraid megaraid_sas mptspi mptscsih mptbase scsi_transport_spi pdc_adma sata_inic162x sata_mv sata_qstor sata_vsc sata_uli sata_sis pata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise ata_piix ahci libahci libata nvme nvme_core virtio_crypto crypto_engine virtio_pci virtio_balloon virtio_rng virtio_console virtio_blk virtio_ring virtio
Feb  3 15:41:55 kernel: CPU: 14 PID: 0 Comm: swapper/14 Tainted: G           O      4.19.97-gentoo-x86_64 #1
Feb  3 15:41:55 kernel: Hardware name: Supermicro AS -1114S-WTRT/H12SSW-NT, BIOS 1.0b 11/15/2019
Feb  3 15:41:55 kernel: RIP: 0010:tcp_wfree+0x29/0xe2
Feb  3 15:41:55 kernel: Code: c3 55 53 8b 87 e0 00 00 00 48 8b 6f 18 ff c8 f0 29 85 44 01 00 00 0f 88 d9 4e 08 00 75 0e 48 c7 c7 c3 3e d8 81 e8 e4 1f 9c ff <0f> 0b 8b 85 44 01 00 00 3d 40 02 00 00 76 1a 65 48 8b 05 be 95 95
Feb  3 15:41:55 kernel: RSP: 0018:ffff88884ed83e28 EFLAGS: 00010246
Feb  3 15:41:55 kernel: RAX: 0000000000000024 RBX: ffff888804e218e8 RCX: 0000000000000000
Feb  3 15:41:55 kernel: RDX: 0000000000000000 RSI: ffff88884ed952d8 RDI: ffff88884ed952d8
Feb  3 15:41:55 kernel: RBP: ffff88877b8f6ac0 R08: 0000000000000001 R09: 0000000000025400
Feb  3 15:41:55 kernel: R10: 0000000000000000 R11: 0000000000000044 R12: 00000000ffffff07
Feb  3 15:41:55 kernel: R13: 000000000000004a R14: ffffc90002595150 R15: ffff888805d34070
Feb  3 15:41:55 kernel: FS:  0000000000000000(0000) GS:ffff88884ed80000(0000) knlGS:0000000000000000
Feb  3 15:41:55 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb  3 15:41:55 kernel: CR2: 00007fe6880c4cbc CR3: 000000080dc80000 CR4: 0000000000340ee0
Feb  3 15:41:55 kernel: Call Trace:
Feb  3 15:41:55 kernel:  <IRQ>
Feb  3 15:41:55 kernel:  skb_release_head_state+0x74/0xa4
Feb  3 15:41:55 kernel:  skb_release_all+0xa/0x20
Feb  3 15:41:55 kernel:  __kfree_skb+0xa/0x14
Feb  3 15:41:55 kernel:  igb_poll+0xbe/0xbf3
Feb  3 15:41:55 kernel:  net_rx_action+0x12e/0x305
Feb  3 15:41:55 kernel:  __do_softirq+0x114/0x267
Feb  3 15:41:55 kernel:  irq_exit+0x58/0x64
Feb  3 15:41:55 kernel:  do_IRQ+0xaa/0xc8
Feb  3 15:41:55 kernel:  common_interrupt+0xf/0xf
Feb  3 15:41:55 kernel:  </IRQ>
Feb  3 15:41:55 kernel: RIP: 0010:cpuidle_enter_state+0x245/0x297
Feb  3 15:41:55 kernel: Code: ff 31 ff e8 0e 4a a4 ff 45 84 ed 74 12 9c 58 0f ba e0 09 73 03 0f 0b fa 31 ff e8 7c 72 a7 ff fb 48 ba ff ff ff ff f3 01 00 00 <48> 2b 2c 24 b8 ff ff ff 7f 48 39 d5 7f 0d 48 89 e8 b9 e8 03 00 00
Feb  3 15:41:55 kernel: RSP: 0018:ffffc90000143e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
Feb  3 15:41:55 kernel: RAX: ffff88884ed9ddc0 RBX: ffff88882c2e8e00 RCX: 000000000000001f
Feb  3 15:41:55 kernel: RDX: 000001f3ffffffff RSI: 000000002c235072 RDI: 0000000000000000
Feb  3 15:41:55 kernel: RBP: 00000e678104be06 R08: 00000e678104be06 R09: 0000000000000000
Feb  3 15:41:55 kernel: R10: 0000000000000000 R11: ffff88884ed9de40 R12: 0000000000000002
Feb  3 15:41:55 kernel: R13: 0000000000000000 R14: ffffffff820b4d38 R15: 0000000000000000
Feb  3 15:41:55 kernel:  do_idle+0x104/0x172
Feb  3 15:41:55 kernel:  cpu_startup_entry+0x6a/0x6c
Feb  3 15:41:55 kernel:  start_secondary+0x187/0x1a2
Feb  3 15:41:55 kernel:  secondary_startup_64+0xa4/0xb0
Feb  3 15:41:55 kernel: ---[ end trace a6f6b996e449997f ]---

It's always about net/ipv4/tcp_output.c.
Using 5.5 doesn't fix this.

What can I try now?
Should it be reported upstream?
Comment 8 Michael 'veremitz' Everitt 2020-02-04 08:33:40 UTC
What old version of your kernel seems to work?

Can you paste your .config ?

Have you 'make oldconfig' to port the options across to new versions, and have you checked all dependencies are being pulled in?

Does a plain 'defconfig' work, and/or does a genkernel-build kernel work OK?

Something doesn't check out right here...
Comment 9 Vieri 2020-02-04 11:02:49 UTC
(In reply to Michael 'veremitz' Everitt from comment #8)
> What old version of your kernel seems to work?

For the older hardware (for which I opened this bug report -- I can't test the new hardware with that older kernel):

4.9.34-gentoo amd64 gentoo-sources
 
> Can you paste your .config ?

Will attach file.
 
> Have you 'make oldconfig' to port the options across to new versions, and
> have you checked all dependencies are being pulled in?

# make oldconfig
scripts/kconfig/conf  --oldconfig Kconfig
#
# configuration written to .config
#

How do I check that all dependencies are pulled in?

> Does a plain 'defconfig' work, and/or does a genkernel-build kernel work OK?

I can't use defconfig as my system requires features that are disabled by default.

I use genkernel to build the kernel and modules.
Comment 10 Vieri 2020-02-04 11:07:26 UTC
Created attachment 611552 [details]
kernel .config

kernel config file I used to build the kernel with genkernel
Comment 11 Vieri 2020-02-04 12:50:40 UTC
A lot more cases today (will attach file).

The thing each "trace" has in common is that there are calls to automount right before (not necessarily with errors). I don't know if it's just a coincidence, or if automount (which tries to access shares on the network) is responsible for this.
Comment 12 Vieri 2020-02-04 12:51:57 UTC
Created attachment 611556 [details]
kernel syslog
Comment 13 Vieri 2020-02-04 17:53:39 UTC
# cat /proc/sys/kernel/tainted 
4608

which I presume is the ORed value of:

512 (W): A kernel warning has occurred.
4096 (O): An out-of-tree module has been loaded.

What does "out-of-tree" mean exactly?

Does it refer to any kernel module provided by a package not being sys-kernel/*-sources (eg. xtables-addons)?

Here's an example of what shows up in syslog right before the dreaded kernel message.
It's not always the same, but it's always about automount (used by proftpd).

proftpd[5651]: pam_unix(ftp:session): session opened for user myftpuser by (uid=0)
automount[17294]: handle_packet: type = 3
automount[17294]: handle_packet_missing_indirect: token 37708, name .pam_environment, request pid 5651
automount[17294]: dev_ioctl_send_fail: token = 37708
automount[17294]: handle_packet: type = 3
automount[17294]: handle_packet_missing_indirect: token 37709, name etc, request pid 5651
automount[17294]: dev_ioctl_send_fail: token = 37709
automount[17294]: handle_packet: type = 3
automount[17294]: handle_packet_missing_indirect: token 37710, name etc, request pid 5651
automount[17294]: dev_ioctl_send_fail: token = 37710
automount[17294]: handle_packet: type = 3
automount[17294]: handle_packet_missing_indirect: token 37711, name etc, request pid 5651
automount[17294]: dev_ioctl_send_fail: token = 37711
Comment 14 Vieri 2020-02-05 09:16:19 UTC
Expanding on comments 8 and 9, I know for sure that I didn't have these messages in 4.12.12-gentoo amd64 gentoo-sources.
Comment 15 Vieri 2020-02-07 22:01:25 UTC
Hey, I'm really desperate now...

I re-installed everything from scratch on perfectly new enterprise-grade hardware.
I used genkernel to install gentoo-sources (stable).

It's even worse now.

Just so you get an idea of what's happening:

# grep -i taint messages
Feb  7 22:20:07 gw2 kernel: CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:09 gw2 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:11 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:12 gw2 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:13 gw2 kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:14 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:15 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:15 gw2 kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:16 gw2 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:17 gw2 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:18 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:18 gw2 kernel: CPU: 2 PID: 19519 Comm: RX-NFQ#2 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:20 gw2 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:20 gw2 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:24 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:28 gw2 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:34 gw2 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:34 gw2 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:37 gw2 kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:37 gw2 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:38 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:39 gw2 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:39 gw2 kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:40 gw2 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:42 gw2 kernel: CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:42 gw2 kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:44 gw2 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:45 gw2 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:46 gw2 kernel: CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:49 gw2 kernel: CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:50 gw2 kernel: CPU: 7 PID: 0 Comm: swapper/7 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:50 gw2 kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1
Feb  7 22:20:50 gw2 kernel: CPU: 4 PID: 19519 Comm: RX-NFQ#2 Tainted: G        W  OE     4.19.97-gentoo-x86_64 #1

... and there's lots, lots more...

What in the world is wrong?
What can I try?

Sure, the system hasn't frozen yet, but you must agree that this is totally unusual.

Thanks!
Comment 16 Vieri 2020-02-07 22:05:40 UTC
The trace always starts with the following line every 2 seconds approximately!

kernel: WARNING: CPU: 6 PID: 0 at net/ipv4/tcp_output.c:915 tcp_wfree.cold+0xc/0x13
Comment 17 Vieri 2020-02-11 17:20:54 UTC
This time the kernel is not tainted:

Feb 11 16:53:40 kernel: ------------[ cut here ]------------
Feb 11 16:53:40 kernel: WARNING: CPU: 6 PID: 0 at net/ipv4/tcp_output.c:915 tcp_wfree.cold+0xc/0x13
Feb 11 16:53:40 kernel: Modules linked in: autofs4 nfnetlink_queue xt_mac xt_REDIRECT xt_limit xt_nat xt_recent xt_statistic xt_connmark xt_comment xt_iprange l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel xt_set xt_NFQUEUE xt_AUDIT ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt_hashlimit xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port
Feb 11 16:53:40 kernel:  ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ip6table_filter ip6_tables arptable_filter arp_tables xt_conntrack iptable_mangle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw sch_fq tcp_cdg tcp_bbr iptable_filter ip_tables bpfilter mlx5_ib ipmi_ssif ib_uverbs edac_mce_amd kvm_amd kvm ast ttm irqbypass crct10dif_pclmul efi_pstore ghash_clmulni_intel drm_kms_helper pcspkr efivars ixgbe igb sp5100_tco mlx5_core drm joydev bnxt_en i2c_algo_bit mdio i2c_piix4 mlxfw ccp dca i2c_core ipmi_si ipmi_devintf ipmi_msghandler pinctrl_amd pcc_cpufreq acpi_cpufreq mac_hid efivarfs aesni_intel crypto_simd cryptd glue_helper aes_x86_64 algif_rng algif_aead algif_hash algif_skcipher af_alg crc32c_intel crc32_pclmul crc32_generic msdos fat cramfs overlay squashfs
Feb 11 16:53:40 kernel:  loop fuse f2fs xfs nfs lockd grace sunrpc fscache jfs reiserfs btrfs ext4 mbcache jbd2 multipath linear raid10 raid1 raid0 dm_zero dm_verity reed_solomon dm_thin_pool dm_switch dm_snapshot dm_raid raid456 md_mod async_raid6_recov async_memcpy async_pq raid6_pq dm_mirror dm_region_hash dm_log_writes dm_log_userspace dm_log dm_integrity async_xor async_tx xor dm_flakey dm_delay dm_crypt dm_cache_smq dm_cache dm_persistent_data libcrc32c dm_bufio dm_bio_prison dm_mod firewire_core crc_itu_t hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech_dj hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd ohci_hcd uhci_hcd uas usb_storage xhci_plat_hcd pata_sl82c105 pata_via pata_jmicron
Feb 11 16:53:40 kernel:  pata_marvell pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmcia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_oldpiix pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pata_cmd64x pata_efar pata_sil680 pata_pdc2027x pata_mpiix lpfc nvmet_fc qla2xxx megaraid_mbox megaraid_mm aacraid sx8 hpsa 3w_9xxx 3w_xxxx 3w_sas mptsas mptfc scsi_transport_fc atp870u dc395x qla1280 dmx3191d sym53c8xx gdth initio BusLogic arcmsr aic7xxx aic79xx sr_mod cdrom sg sd_mod mpt3sas raid_class scsi_transport_sas megaraid megaraid_sas mptspi mptscsih mptbase scsi_transport_spi pdc_adma sata_inic162x sata_mv sata_qstor sata_vsc sata_uli sata_sis pata_sis sata_sx4 sata_nv sata_via sata_svw sata_sil24
Feb 11 16:53:40 kernel:  sata_sil sata_promise ata_piix ahci libahci nvme_fc nvme_loop nvmet nvme_rdma rdma_cm iw_cm ib_cm ib_core configfs ipv6 crc_ccitt nvme_fabrics nvme nvme_core
Feb 11 16:53:40 kernel: CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.19.97-gentoo-x86_64 #1
Feb 11 16:53:40 kernel: Hardware name: Supermicro AS -1114S-WTRT/H12SSW-NT, BIOS 1.0b 11/15/2019
Feb 11 16:53:40 kernel: RIP: 0010:tcp_wfree.cold+0xc/0x13
Feb 11 16:53:40 kernel: Code: 9d 04 00 00 00 5b c6 85 9b 04 00 00 00 5d c3 48 c7 c7 70 93 06 b0 e8 f7 f7 94 ff 0f 0b c3 48 c7 c7 70 93 06 b0 e8 e8 f7 94 ff <0f> 0b e9 46 a5 ff ff 48 c7 c7 70 93 06 b0 e8 d5 f7 94 ff 0f 0b b8
Feb 11 16:53:40 kernel: RSP: 0018:ffff9e9c2b183d90 EFLAGS: 00010246
Feb 11 16:53:40 kernel: RAX: 0000000000000024 RBX: ffff9e9bc099cee8 RCX: 0000000000000000
Feb 11 16:53:40 kernel: RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 0000000000000300
Feb 11 16:53:40 kernel: RBP: ffff9e9bbef09980 R08: ffff9e9c2b1968b8 R09: 0000000000000001
Feb 11 16:53:40 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9e9bc099cee8
Feb 11 16:53:40 kernel: R13: ffff9e9a900100a8 R14: ffff9e9c0155a8c0 R15: 0000000000000026
Feb 11 16:53:40 kernel: FS:  0000000000000000(0000) GS:ffff9e9c2b180000(0000) knlGS:0000000000000000
Feb 11 16:53:40 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 11 16:53:40 kernel: CR2: 00007f90cb702820 CR3: 00000007d41b2000 CR4: 0000000000340ee0
Feb 11 16:53:40 kernel: Call Trace:
Feb 11 16:53:40 kernel:  <IRQ>
Feb 11 16:53:40 kernel:  skb_release_head_state+0x64/0xb0
Feb 11 16:53:40 kernel:  skb_release_all+0xe/0x30
Feb 11 16:53:40 kernel:  consume_skb+0x27/0x80
Feb 11 16:53:40 kernel:  bnxt_tx_int+0xd0/0x360 [bnxt_en]
Feb 11 16:53:40 kernel:  bnxt_poll+0x20f/0x870 [bnxt_en]
Feb 11 16:53:40 kernel:  net_rx_action+0x148/0x3b0
Feb 11 16:53:40 kernel:  __do_softirq+0xe8/0x2f1
Feb 11 16:53:40 kernel:  irq_exit+0x100/0x110
Feb 11 16:53:40 kernel:  do_IRQ+0x81/0xe0
Feb 11 16:53:40 kernel:  common_interrupt+0xf/0xf
Feb 11 16:53:40 kernel:  </IRQ>
Feb 11 16:53:40 kernel: RIP: 0010:cpuidle_enter_state+0xc3/0x320
Feb 11 16:53:40 kernel: Code: e8 82 68 a0 ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 30 02 00 00 31 ff e8 84 55 a6 ff fb 66 0f 1f 44 00 00 <48> ba cf f7 53 e3 a5 9b c4 20 4c 29 f5 48 89 e8 48 c1 fd 3f 48 f7
Feb 11 16:53:40 kernel: RSP: 0018:ffffb4440021fe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
Feb 11 16:53:40 kernel: RAX: ffff9e9c2b1a2200 RBX: ffff9e9c02dfe800 RCX: 000000000000001f
Feb 11 16:53:40 kernel: RDX: 0000000000000000 RSI: 000000002c234d74 RDI: 0000000000000000
Feb 11 16:53:40 kernel: RBP: 000002de50bf72ea R08: 000002de50bf72ea R09: 0000000000000035
Feb 11 16:53:40 kernel: R10: 00000000ffffffff R11: ffff9e9c2b1a12e8 R12: 0000000000000002
Feb 11 16:53:40 kernel: R13: ffffffffb03954a0 R14: 000002de4de20c75 R15: ffff9e95044bcc80
Feb 11 16:53:40 kernel:  do_idle+0x1dc/0x270
Feb 11 16:53:40 kernel:  cpu_startup_entry+0x6f/0x80
Feb 11 16:53:40 kernel:  start_secondary+0x1a7/0x200
Feb 11 16:53:40 kernel:  secondary_startup_64+0xb6/0xc0
Feb 11 16:53:40 kernel: ---[ end trace 828aa59c66af655f ]---
Comment 18 Vieri 2020-02-13 17:16:00 UTC
Hi,

I think I've found the root cause for this issue, or at least how to reproduce it.

The warning messages I reported (which *could* lead to a system hang after a long period running) disappear if I stop using NFQUEUE.

In my specific case I use NFQUEUE balance 0:5 with iptables-1.6.1.

As an IPS I'm using suricata 5.0.1 with the following arguments (among others):
 -q 0 -q 1 -q 2 -q 3 -q 4 -q 5

I've reproduced this behavior in several recent Linux kernel versions.

A reminder of the kernel warning message:

Feb 13 17:10:01 kernel: ------------[ cut here ]------------
Feb 13 17:10:01 kernel: WARNING: CPU: 5 PID: 0 at net/ipv4/tcp_output.c:915 tcp_wfree.cold+0xc/0x13
Feb 13 17:10:01 kernel: Modules linked in: autofs4 nfnetlink_queue l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel xt_mac xt_REDIRECT xt_limit xt_nat
 xt_recent xt_statistic xt_connmark xt_comment xt_iprange xt_set xt_NFQUEUE xt_AUDIT ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark xt_TCPMSS xt
_hashlimit xt_CT xt_multiport nfnetlink_log xt_NFLOG nf_log_ipv4 nf_log_common xt_LOG nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip nf_nat_pptp n
f_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip nf_conntrack_p
ptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp pppoe pppox
 ppp_generic slhc ip_set_hash_mac ip_set_bitmap_port
Feb 13 17:10:01 kernel:  ip_set_hash_net ip_set_hash_ip ip_set nfnetlink ip6table_filter ip6_tables arptable_filter arp_tables xt_conntrack iptable_ma
ngle iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw sch_fq tcp_cdg tcp_bbr iptable_filter ip_tables bpfilter mlx5_ib ip
mi_ssif ib_uverbs edac_mce_amd ast kvm_amd ttm kvm drm_kms_helper igb irqbypass efi_pstore crct10dif_pclmul ghash_clmulni_intel sp5100_tco efivars pcspkr mlx5
_core drm ixgbe bnxt_en joydev i2c_algo_bit i2c_piix4 mdio ccp mlxfw dca i2c_core ipmi_si ipmi_devintf ipmi_msghandler pinctrl_amd pcc_cpufreq mac_hid acpi_cp
ufreq efivarfs aesni_intel crypto_simd cryptd glue_helper aes_x86_64 algif_rng algif_aead algif_hash algif_skcipher af_alg crc32c_intel crc32_pclmul crc32_gen
eric msdos fat cramfs overlay squashfs
Feb 13 17:10:01 kernel:  loop fuse f2fs xfs nfs lockd grace sunrpc fscache jfs reiserfs btrfs ext4 mbcache jbd2 multipath linear raid10 raid1 raid0 dm
_zero dm_verity reed_solomon dm_thin_pool dm_switch dm_snapshot dm_raid raid456 md_mod async_raid6_recov async_memcpy async_pq raid6_pq dm_mirror dm_region_ha
sh dm_log_writes dm_log_userspace dm_log dm_integrity async_xor async_tx xor dm_flakey dm_delay dm_crypt dm_cache_smq dm_cache dm_persistent_data libcrc32c dm
_bufio dm_bio_prison dm_mod firewire_core crc_itu_t hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey hid_microsoft hid_logitech_dj hid_logite
ch ff_memless hid_gyration hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech sl811_hcd ohci_hcd uhci_hcd uas usb_storage xhci_plat_
hcd pata_sl82c105 pata_via pata_jmicron
Feb 13 17:10:01 kernel:  pata_marvell pata_netcell pata_pdc202xx_old pata_triflex pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia pcmc
ia pcmcia_core pata_ns87415 pata_ns87410 pata_serverworks pata_oldpiix pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x pata_hpt366 pa
ta_cmd64x pata_efar pata_sil680 pata_pdc2027x pata_mpiix lpfc nvmet_fc qla2xxx megaraid_mbox megaraid_mm aacraid sx8 hpsa 3w_9xxx 3w_xxxx 3w_sas mptsas mptfc 
scsi_transport_fc atp870u dc395x qla1280 dmx3191d sym53c8xx gdth initio BusLogic arcmsr aic7xxx aic79xx sr_mod cdrom sg sd_mod mpt3sas raid_class scsi_transpo
rt_sas megaraid megaraid_sas mptspi mptscsih mptbase scsi_transport_spi pdc_adma sata_inic162x sata_mv sata_qstor sata_vsc sata_uli sata_sis pata_sis sata_sx4
 sata_nv sata_via sata_svw sata_sil24
Feb 13 17:10:01 kernel:  sata_sil sata_promise ata_piix ahci libahci nvme_fc nvme_loop nvmet nvme_rdma rdma_cm iw_cm ib_cm ib_core configfs ipv6 crc_c
citt nvme_fabrics nvme nvme_core
Feb 13 17:10:01 kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.97-gentoo-x86_64 #1
Feb 13 17:10:01 kernel: Hardware name: Supermicro AS -1114S-WTRT/H12SSW-NT, BIOS 1.0b 11/15/2019
Feb 13 17:10:01 kernel: RIP: 0010:tcp_wfree.cold+0xc/0x13
Feb 13 17:10:01 kernel: Code: 9d 04 00 00 00 5b c6 85 9b 04 00 00 00 5d c3 48 c7 c7 70 93 06 a2 e8 f7 f7 94 ff 0f 0b c3 48 c7 c7 70 93 06 a2 e8 e8 f7 
94 ff <0f> 0b e9 46 a5 ff ff 48 c7 c7 70 93 06 a2 e8 d5 f7 94 ff 0f 0b b8
Feb 13 17:10:01 kernel: RSP: 0018:ffff9e15eb143d90 EFLAGS: 00010246
Feb 13 17:10:01 kernel: RAX: 0000000000000024 RBX: ffff9e15787094e8 RCX: 0000000000000000
Feb 13 17:10:01 kernel: RDX: 0000000000000000 RSI: ffff9e15eb1568b8 RDI: ffff9e15eb1568b8
Feb 13 17:10:01 kernel: RBP: ffff9e15011f1100 R08: ffff9e15eb1568b8 R09: 0000000000000001
Feb 13 17:10:01 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9e15787094e8
Feb 13 17:10:01 kernel: R13: ffff9e0ec3ab10a8 R14: ffff9e15e39de8c0 R15: 000000000000008e
Feb 13 17:10:01 kernel: FS:  0000000000000000(0000) GS:ffff9e15eb140000(0000) knlGS:0000000000000000
Feb 13 17:10:01 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 13 17:10:01 kernel: CR2: 00007f711864a690 CR3: 0000000804968000 CR4: 0000000000340ee0
Feb 13 17:10:01 kernel: Call Trace:
Feb 13 17:10:01 kernel:  <IRQ>
Feb 13 17:10:01 kernel:  skb_release_head_state+0x64/0xb0
Feb 13 17:10:01 kernel:  skb_release_all+0xe/0x30
Feb 13 17:10:01 kernel:  consume_skb+0x27/0x80
Feb 13 17:10:01 kernel:  bnxt_tx_int+0xd0/0x360 [bnxt_en]
Feb 13 17:10:01 kernel:  bnxt_poll+0x20f/0x870 [bnxt_en]
Feb 13 17:10:01 kernel:  net_rx_action+0x148/0x3b0
Feb 13 17:10:01 kernel:  __do_softirq+0xe8/0x2f1
Feb 13 17:10:01 kernel:  irq_exit+0x100/0x110
Feb 13 17:10:01 kernel:  do_IRQ+0x81/0xe0
Feb 13 17:10:01 kernel:  common_interrupt+0xf/0xf
Feb 13 17:10:01 kernel:  </IRQ>
Feb 13 17:10:01 kernel: RIP: 0010:cpuidle_enter_state+0xc3/0x320
Feb 13 17:10:01 kernel: Code: e8 82 68 a0 ff 80 7c 24 0b 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 30 02 00 00 31 ff e8 84 55 a6 ff fb 66 0f 1f 44 
00 00 <48> ba cf f7 53 e3 a5 9b c4 20 4c 29 f5 48 89 e8 48 c1 fd 3f 48 f7
Feb 13 17:10:01 kernel: RSP: 0018:ffffbfbac0217e80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
Feb 13 17:10:01 kernel: RAX: ffff9e15eb162200 RBX: ffff9e15c4491c00 RCX: 000000000000001f
Feb 13 17:10:01 kernel: RDX: 0000000000000000 RSI: 000000002c234c74 RDI: 0000000000000000
Feb 13 17:10:01 kernel: RBP: 000000653d4d3728 R08: 000000653d4d3728 R09: 0000000000002707
Feb 13 17:10:01 kernel: R10: 0000000000003268 R11: ffff9e15eb1612e8 R12: 0000000000000002
Feb 13 17:10:01 kernel: R13: ffffffffa23954a0 R14: 000000653d200b61 R15: ffff9e0ec44a2640
Feb 13 17:10:01 kernel:  do_idle+0x1dc/0x270
Feb 13 17:10:01 kernel:  cpu_startup_entry+0x6f/0x80
Feb 13 17:10:01 kernel:  start_secondary+0x1a7/0x200
Feb 13 17:10:01 kernel:  secondary_startup_64+0xb6/0xc0
Feb 13 17:10:01 kernel: ---[ end trace 70699422f7793e3b ]---

# ethtool -a isp1
Pause parameters for isp1:
Autonegotiate:on
RX:on
TX:on
RX negotiated:on
TX negotiated:on

# ethtool -c isp1
Coalesce parameters for isp1:
Adaptive RX: off  TX: off
stats-block-usecs: 1000000
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 14
rx-frames: 15
rx-usecs-irq: 1
rx-frames-irq: 1

tx-usecs: 28
tx-frames: 30
tx-usecs-irq: 2
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

# ethtool -g isp1
Ring parameters for isp1:
Pre-set maximums:
RX:2047
RX Mini:0
RX Jumbo:8191
TX:2047
Current hardware settings:
RX:511
RX Mini:0
RX Jumbo:2044
TX:511

# ethtool -i isp1
driver: bnxt_en
version: 1.9.2
firmware-version: 214.0.191.0
expansion-rom-version: 
bus-info: 0000:c6:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

# ethtool -k isp1
Features for isp1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: on
tls-hw-record: off [fixed]

Regards,

Vieri
Comment 19 Vieri 2020-03-11 22:44:41 UTC
The problem is reproducible when using Suricata (or similar program) in NFQ repeat mode.
It goes away if I stop using repeat mode.
It seems to be a netfilter issue.
Comment 20 Mike Pagano gentoo-dev 2020-03-20 22:54:28 UTC
Did you contact the netfilter team as advised by upsteam?
Comment 21 Vieri 2020-03-21 11:52:30 UTC
(In reply to Mike Pagano from comment #20)
> Did you contact the netfilter team as advised by upsteam?

Yes -- https://marc.info/?l=netfilter&m=158214108315073&w=2

I previously contacted the Suricata ML, and they told me to contact the netfilter ML too:
https://lists.openinfosecfoundation.org/pipermail/oisf-users/2020-February/017411.html
Comment 22 Mike Pagano gentoo-dev 2020-05-21 17:18:41 UTC
From the email list:

"I'll ask the Suricata ML what they think about that."

Any response from upstream in that ML ?
Comment 23 Vieri 2020-05-21 23:41:28 UTC
(In reply to Mike Pagano from comment #22)
> From the email list:
> 
> "I'll ask the Suricata ML what they think about that."
> 
> Any response from upstream in that ML ?

Yes, that netfilter should take care of it:

https://lists.openinfosecfoundation.org/pipermail/oisf-users/2020-February/017411.html

Same advice here:

https://lkml.org/lkml/2020/2/24/130

The netfilter team has been notified.
Comment 24 Mike Pagano gentoo-dev 2021-02-13 20:18:35 UTC
Is this still an issue with later kernels?
Comment 25 Vieri 2021-02-16 10:12:06 UTC
I haven't had this problem anymore.
I guess it has been fixed, or now it just "works for me" after updating my systems.