Created attachment 897402 [details] kernel panic log Hi, I'm trying to troubleshoot an issue that cropped up on some arm boards that I have. I can run 6.6.21 just fine, but after upgrading to 6.6.30, they seem to get a bug/panic after running for a few days. skb_panic+0x6c/0x78 skb_put+0xa4/0xb0 mvpp2_rx+0x604/0xbe8 mvpp2_poll+0x100/0x220 From what I can tell, the code hasn't changed in 3+ years, and 6.6.x is only a year old, and both kernels were compiled with the same gcc (13.2.1_p20240210). I'm at a loss why it would panic on 6.6.30 and not 6.6.21 if I was somehow receiving malformed packets that weren't being caught by iptables/etc. I was wondering if I could get a second set of eyes before I try reporting it upstream... I'll attach the full crash, since I was able to pull it off the serial console. Thanks in advance
Created attachment 897403 [details] kernel config
Hello, can you try a few things, all without any proprietary modules loaded as indicated in your panic. 1. Try the latest 6.6.X to see if something is fixed, (6.6.38 as of this writing) 2. Do a git bisect from 6.6.21 to 6.6.30 to see if there is an offending commit 3. Increase your logging : You can add ignore_loglevel to your kernel parameters or If you get a more verbose panic, can you attach the full dmesg ?
Hi, unfortunately I don't have the means to move these systems back to ext4, but I'll try walking up the 6.6.x tree and see where things start going wrong. I'll also try 6.6.38 (or 6.6.39 since I just saw that go up). In the meantime, I'll give you the full dmesg from a running 6.6.21 system, although it looks pretty normal to me. (A little backstory, I have three of these boards, one is my router, one is my mail server, and the last one is my web server... I did have some memory issues on the web server (bug 907766), but it was resolved by reseating the memory. I only saw this panic on the router and the mail server before I rolled all three back to 6.6.21.)
Created attachment 897445 [details] 6.6.21 kernel dmesg
22/23/24 seems to have lasted a week, moving up to 25/26/27...
Moving up to 28/29... I'll try 30 again later, then 38 and whatever the latest 6.6.x kernel winds up being after next week... Starting to wish I started at 29 and walked backwards instead.
I can't replicate this anymore on 6.6.30.
A small addendum... While I haven't seen the crashes again from a month ago, I just saw this message on one of these arm boards... [1113648.270846] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. If I had to guess, the crashes might have been caused by random garbage packets that got merged and offloaded by GRO. I'm going to disable it across all interfaces and see where that gets me. Cheers