I am getting very frequent (430k in 10 days, so about one every 2 seconds on average) 'hw csum failure' messages on a VLAN created on a ethernet port on a Sun HME quad ethernet card. The 'hw csum failure' message is followed by a random 'Pid: 0, comm: swapper Not tainted 2.6.31-gentoo-r6' message (where the pid/comm changes with no apparent logic). That line is followed by a stack trace, included below. There are no reported errors on the underlying interface. Before creating the VLAN there were no problems on the same physical link on this port. The link is not very heavily used, peaking at about 10Mbit/s or roughly 10% capacity. Traffic is flowing, with no apparent packet loss at the IP level. Rebooting the machine didn't resolve the issue, nor does removing/reinserting the cable. Call Trace: [<c11fea91>] netdev_rx_csum_fault+0x31/0x40 [<c11f9e49>] __skb_checksum_complete_head+0x59/0x60 [<c11f9e5b>] __skb_checksum_complete+0xb/0x10 [<c12745a4>] nf_ip_checksum+0xa4/0x110 [<c1274500>] ? nf_ip_checksum+0x0/0x110 [<f8265ddb>] tcp_error+0xcb/0x240 [nf_conntrack] [<c10076d9>] ? nommu_map_page+0x39/0x70 [<f8265d10>] ? tcp_error+0x0/0x240 [nf_conntrack] [<f826266f>] nf_conntrack_in+0xdf/0x4b0 [nf_conntrack] [<c10076a0>] ? nommu_map_page+0x0/0x70 [<c11fef31>] ? dev_hard_start_xmit+0x241/0x380 [<c12150ed>] ? __qdisc_run+0x12d/0x1b0 [<f828d2e0>] ? ipv4_conntrack_in+0x0/0x20 [nf_conntrack_ipv4] [<f828d2fa>] ipv4_conntrack_in+0x1a/0x20 [nf_conntrack_ipv4] [<c1232b87>] nf_iterate+0x57/0x80 [<c123a690>] ? ip_rcv_finish+0x0/0x2c0 [<c1232dcd>] nf_hook_slow+0x4d/0xc0 [<c123a690>] ? ip_rcv_finish+0x0/0x2c0 [<c123ade6>] ip_rcv+0x1f6/0x280 [<c123a690>] ? ip_rcv_finish+0x0/0x2c0 [<c123abf0>] ? ip_rcv+0x0/0x280 [<c11fe2a2>] netif_receive_skb+0x2a2/0x520 [<c1200cf9>] process_backlog+0x69/0x90 [<c12011e7>] net_rx_action+0x97/0x110 [<c1022b83>] __do_softirq+0x73/0x100 [<c103f695>] ? handle_IRQ_event+0x35/0xc0 [<c1022c3a>] do_softirq+0x2a/0x30 [<c1022eca>] irq_exit+0x2a/0x40 [<c1004922>] do_IRQ+0x42/0x90 [<c11f6edc>] ? __kfree_skb+0x3c/0x90 [<c1003189>] ? common_interrupt+0x29/0x30 [<c1003189>] common_interrupt+0x29/0x30 [<c1040000>] ? synchronize_irq+0x80/0xc0 [<c107fbc1>] ? __mnt_is_readonly+0x1/0x20 [<c107fc3b>] ? mnt_clone_write+0xb/0x20 [<c107fc8e>] mnt_want_write_file+0x3e/0x50 [<c107e328>] file_update_time+0x38/0xd0 [<c10499fa>] __generic_file_aio_write_nolock+0x20a/0x4e0 [<c1128507>] ? do_con_write+0x367/0x1aa0 [<c103277a>] ? atomic_notifier_call_chain+0x1a/0x20 [<c1126042>] ? notify_update+0x22/0x30 [<c1049f74>] generic_file_aio_write+0x54/0xc0 [<c10b5abd>] ext3_file_write+0x2d/0xc0 [<c106cc5c>] do_sync_write+0xcc/0x110 [<c102ef30>] ? autoremove_wake_function+0x0/0x50 [<c111d7f8>] ? tty_ldisc_deref+0x8/0x10 [<c11182d9>] ? tty_write+0x1a9/0x1d0 [<c106d3b9>] vfs_write+0x99/0x150 [<c1101255>] ? copy_to_user+0x35/0x50 [<c106cb90>] ? do_sync_write+0x0/0x110 [<c106d92d>] sys_write+0x3d/0x70 [<c1002b48>] sysenter_do_call+0x12/0x26 Reproducible: Always Steps to Reproduce: 1. configure vlan on Sun HME card 2. generate traffic 3. watch dmesg Actual Results: errors in dmesg/log Expected Results: Nothing.
Greetings, could you try reporting this issue upstream (http://bugzilla.kernel.org/)? If you do so, could you paste us the link to the upstream bug report here?
Seems to be indeed an upstream bug with Sun HME cards and VLANs in general. http://bugzilla.kernel.org/show_bug.cgi?id=9270 I hope I am marking change of status correctly.
Can you attach the full dmesg with the error? Last working kernel? Please test with 2.6.32 and git 2.6.33_rcX if 2.6.32 fails.