I just had to reset my machine, as its monitor wouldn't come to life after pressing a key after I had been away for a while. The kernel log indicates this as the first problem: sky2 eth0: rx error, status 0x69f069f0 length 0 eth0: hw csum failure. Pid: 4146, comm: hadcm3transum_5 Tainted: P 2.6.29-gentoo-r5 #1 Call Trace: [<c02f9f3d>] __skb_checksum_complete_head+0x58/0x5e [<f8554568>] tcp_error+0xac/0x24c [nf_conntrack] [<fa417591>] ipt_do_table+0x1e8/0x50b [ip_tables] [<c037215b>] _read_lock_bh+0x8/0x22 [<f85514d3>] __nf_conntrack_find+0xd9/0xdf [nf_conntrack] [<f85544bc>] tcp_error+0x0/0x24c [nf_conntrack] [<f855162d>] nf_conntrack_in+0xd3/0x4f2 [nf_conntrack] [<c03315a9>] tcp_rcv_established+0x2f0/0x5b5 [<c0337a50>] tcp_v4_do_rcv+0x94/0x191 [<c0317202>] genl_register_mc_group+0xc7/0x115 [<c03172bb>] nf_iterate+0x6b/0x7e [<c031d1e0>] ip_rcv_finish+0x0/0x311 [<c0317467>] nf_hook_slow+0xaa/0xed [<c031d1e0>] ip_rcv_finish+0x0/0x311 [<c031d6f0>] ip_rcv+0x1ff/0x248 [<c031d1e0>] ip_rcv_finish+0x0/0x311 [<c030058c>] netif_receive_skb+0x29a/0x538 [<f8106218>] sky2_poll+0x407/0xd15 [sky2] [<c02fe5fb>] net_rx_action+0xef/0x1a8 [<c01291a0>] __do_softirq+0x89/0x120 [<f82aba8d>] sym53c8xx_intr+0x3f/0x64 [sym53c8xx] [<c0113ccd>] ack_apic_level+0x73/0x26b [<c012926e>] do_softirq+0x37/0x3b [<c012945c>] irq_exit+0x42/0x44 [<c01052a8>] do_IRQ+0x48/0x90 [<c012945c>] irq_exit+0x42/0x44 [<c011214e>] smp_apic_timer_interrupt+0x5c/0x87 [<c0103827>] common_interrupt+0x27/0x2c After that, the hw csum failures get repeated a lot, see the full log which I'll attach. It seems to me that this might be a problem reported by other people in other places against older kernel releases like 2.6.22, where only the network became unusable, or 2.6.23, where lockups like mine were observed. http://thread.gmane.org/gmane.linux.network/69593 https://bugs.launchpad.net/linux/+bug/138611 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/320808 All these cases seem to have succeeded in reproducing the issue on a regular basis, while I've been usin that kernel version for a while now and experienced it the first time today. So I can't simply test different versions, and won't expect much help here if this remains the only incident. I'd like to report it in any case, though, so that if others experience the same, the issue will become more visible. Some more information: this is a ASUS P5GDC-V Deluxe mainboard with the following eth0 nic, according to lspci -vv Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15) Subsystem: ASUSTeK Computer Inc. Marvell 88E8053 Gigabit Ethernet controller PCIe (Asus) Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at cbefc000 (64-bit, non-prefetchable) [size=16K] I/O ports at c800 [size=256] Expansion ROM at cbec0000 [disabled] [size=128K] Capabilities: [48] Power Management version 2 Capabilities: [50] Vital Product Data Capabilities: [5c] MSI: Mask- 64bit+ Count=1/2 Enable- Capabilities: [e0] Express Legacy Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: sky2 Kernel modules: sky2 I'll attach a file with the lspci -vvv log just in case. The running commands hadcm3transum_5 and astropulse_5.03 are boinc applications, while firefox and X were running in my KDE session. The kernel is tainted from the nvidia driver modules. # uname -a Linux server 2.6.29-gentoo-r5 #1 SMP PREEMPT Thu Jun 4 09:37:06 CEST 2009 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
Created attachment 194264 [details] kernel log
Created attachment 194265 [details] lspci -vvv of eth0
Please post your "emerge --info".
Created attachment 195017 [details] emerge --info (In reply to comment #3) > Please post your "emerge --info". Sorry I forgot that.
Has this error recurred? Have you tested with gentoo-sources-2.6.30-r2?
Created attachment 196919 [details] kernel log - take 2 (In reply to comment #5) > Has this error recurred? Yes, I was about to write a follow up. It happened again once, on 2.6.30-r1. I'm attaching a kernel log again. It was slightly different this time around. For once, there is only a single hw csum failure, caused by udp this time. It seems to lead more or less directly to the line mentioning a paging request and declaring it a BUG. I've been in front of my machine this time, and I'll try to recapitulate what happened. I had been away for a while, and had been using gqview in fullscreen mode on one desktop while postfix/procmail/spamassassin was busy working through the backlog in the background. I had a "watch 'postqueue -p | tail -n1'" command in some shell on another desktop, displaying me the number of messages left in the queue. At some point, I noticed that there was no more output from that command. I canceled watch and typed the postqueue manually. Hung. Didn't respond to Ctrl+C either. Did a pstree in a different shell, found a branch from some postfix command through two instances of procmail to spamassassin, each process the only child of the one mentioned before. Killed the child procmail instance, using SIGTERM iirc. postqueue still didn't respond. At some point in all that, don't recall exactly where, I noticed the system becoming unresponsive in other areas. The gqview fullscreen window I had on one desktop didn't respond any more, nor did it repaint, giving me one completely black desktop. Desktop switching still worked, however. As two seemingly unrelated applications (gqview and postqueue) now showed problems, I decided to reboot my system. Might be that some unclosable Firefox windows were involved as well, but I'm not sure, I might mix things up. In any case, I closed as much windows as I could. I think the rxvt-unicode shell windows didn't close properly either. Told my kde session to shut down the computer. It terminated the task bar, then hung. Couldn't switch to text console. Seemingly no response to magic sysrequest key. Reset was my only remaining option. > Have you tested with gentoo-sources-2.6.30-r2? No, I just installed that version. Will be testing from tomorrow onwards...
this patch seems related, although it refers to PowerPC. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b9389796fa4c87fbdff33816e317cdae5f36dd0b either patch a 2.6.30 kernel manually, or use a 2.6.31-rc2+ kernel, and see if the patch fixes this bug. it's going to be backported to 2.6.27-stable probably http://lkml.org/lkml/2009/7/28/475
feel free to reopen the bug, after you've tested the patch or a newer kernel