Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 480722 - =sys-kernel/gentoo-sources-3.8.13 in a domU crashes - invalid opcode: 0000 [#1] SMP - kernel BUG at net/core/skbuff.c:1040 in pskb_expand_head
Summary: =sys-kernel/gentoo-sources-3.8.13 in a domU crashes - invalid opcode: 0000 [#...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://thread.gmane.org/gmane.comp.em...
Whiteboard: linux-3.10.17
Keywords:
: 466200 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-08-12 12:47 UTC by Konstantin Agouros
Modified: 2013-12-24 22:32 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Agouros 2013-08-12 12:47:34 UTC
I recently upgraded to xen 4.2.2 and 3.8.13 in Dom0 and DomU.
There is a DomU machine running as a firewall and now for the second time it paniced on me:

  ------------[ cut here ]------------
[409191.113858] kernel BUG at net/core/skbuff.c:1040!
[409191.113864] invalid opcode: 0000 [#1] SMP 
[409191.113872] Modules linked in: nfsv4 nfnetlink_log nfnetlink nfsd exportfs auth_rpcgss nfs_acl tun af_packet xt_TCPMSS xt_mark xt_connmark iptable_mangle xt_limit xt_policy ipt_REJECT iptable_filter ipt_MASQUERADE xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_raw ip_tables xt_LOG ip6t_REJECT xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack ip6table_filter ip6_tables x_tables nf_conntrack_tftp nf_conntrack_ftp nf_conntrack_sip nf_conntrack ipv6 dm_zero dm_round_robin dm_multipath dm_flakey dm_bufio xts gf128mul aes_x86_64 cbc sha256_generic scsi_transport_iscsi fuse nfs fscache lockd sunrpc reiserfs btrfs libcrc32c zlib_deflate ext2 multipath linear raid0 dm_raid raid10 raid1 raid456 md_mod async_pq async_xor xor async_memcpy async_raid6_recov async_tx raid6_pq dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod
[409191.114112] CPU 0 
[409191.114118] Pid: 0, comm: swapper/0 Not tainted 3.8.13-gentoodomU #1  
[409191.114124] RIP: e030:[<ffffffff812d1221>]  [<ffffffff812d1221>] pskb_expand_head+0x30/0x239
[409191.114143] RSP: e02b:ffff88003f8036d0  EFLAGS: 00010202
[409191.114149] RAX: 0000000000000001 RBX: ffff880038db61c0 RCX: 0000000000000020
[409191.114155] RDX: 00000000000003d4 RSI: 0000000000000000 RDI: 00000000000002c0
[409191.114162] RBP: ffff88003f803720 R08: 0000000000000000 R09: ffff88003d3ff000
[409191.114168] R10: 000000000000ffff R11: ffff880038db61c0 R12: 0000000000000000
[409191.114174] R13: 0000000000000020 R14: ffff880038da8400 R15: ffffffff813b0250
[409191.114184] FS:  00007ff438dd4700(0000) GS:ffff88003f800000(0000) knlGS:0000000000000000
[409191.114190] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[409191.114196] CR2: 00007ffca7dc6000 CR3: 0000000038c94000 CR4: 0000000000000660
[409191.114203] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[409191.114210] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[409191.117783] Process swapper/0 (pid: 0, threadinfo ffffffff81526000, task ffffffff8153b400)
[409191.117783] Stack:
[409191.117783]  ffff88003f8036f0 ffff88003f8036f0 ffffffff8136ef07 ffff880015601cf8
[409191.117783]  ffff88003f803740 ffff880038db61c0 00000000000004d4 0000000060054800
[409191.117783]  ffff880038da8400 ffffffff813b0250 ffff88003f803770 ffffffff812d14bc
[409191.117783] Call Trace:
[409191.117783]  <IRQ> 
[409191.117783]  [<ffffffff8136ef07>] ? _raw_spin_unlock_irqrestore+0x19/0x1c
[409191.117783]  [<ffffffff812d14bc>] __pskb_pull_tail+0x59/0x28d
[409191.117783]  [<ffffffff81037d9d>] ? local_bh_enable_ip+0x22/0x8b
[409191.117783]  [<ffffffff812dc42a>] dev_hard_start_xmit+0x242/0x3ba
[409191.117783]  [<ffffffff812f15d9>] sch_direct_xmit+0x72/0x1a1
[409191.117783]  [<ffffffff812dc924>] dev_queue_xmit+0x19e/0x39f
[409191.117783]  [<ffffffffa02b62ee>] ? ip6_fragment+0x93a/0x93a [ipv6]
[409191.117783]  [<ffffffffa02b48f7>] ip6_finish_output2+0x246/0x2d2 [ipv6]
[409191.117783]  [<ffffffffa02b6383>] ip6_finish_output+0x95/0x9a [ipv6]
[409191.117783]  [<ffffffffa02b63f3>] ip6_output+0x6b/0x9c [ipv6]
[409191.117783]  [<ffffffffa02b5925>] ip6_forward+0x625/0x6b4 [ipv6]
[409191.117783]  [<ffffffffa02bfdb6>] ? ip6_route_input+0x92/0xb1 [ipv6]
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffffa02b67aa>] ip6_rcv_finish+0x65/0x69 [ipv6]
[409191.117783]  [<ffffffffa031f783>] __ipv6_conntrack_in+0xf5/0x148 [nf_conntrack_ipv6]
[409191.117783]  [<ffffffff8105813c>] ? __enqueue_entity+0x64/0x66
[409191.117783]  [<ffffffffa031f7f6>] ipv6_conntrack_in+0x20/0x22 [nf_conntrack_ipv6]
[409191.117783]  [<ffffffff812fc4b8>] nf_iterate+0x44/0x9e
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffff812fc581>] nf_hook_slow+0x6f/0x106
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffffa031abe7>] nf_ct_frag6_output+0x9e/0xdf [nf_defrag_ipv6]
[409191.117783]  [<ffffffffa031a0d0>] ipv6_defrag+0xca/0xda [nf_defrag_ipv6]
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffff812fc4b8>] nf_iterate+0x44/0x9e
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffff812fc581>] nf_hook_slow+0x6f/0x106
[409191.117783]  [<ffffffffa02b6745>] ? ip6_input_finish+0x321/0x321 [ipv6]
[409191.117783]  [<ffffffff8105b304>] ? enqueue_task_fair+0x397/0x416
[409191.117783]  [<ffffffff81052462>] ? resched_task+0x25/0x69
[409191.117783]  [<ffffffffa02b6ac4>] ipv6_rcv+0x316/0x31d [ipv6]
[409191.117783]  [<ffffffff812da5f9>] __netif_receive_skb+0x5dd/0x683
[409191.117783]  [<ffffffff812da856>] netif_receive_skb+0x46/0x76
[409191.117783]  [<ffffffff8129a728>] xennet_poll+0x992/0xaef
[409191.117783]  [<ffffffff8107055b>] ? tick_program_event+0x1f/0x21
[409191.117783]  [<ffffffff8104d56d>] ? hrtimer_interrupt+0x113/0x1bc
[409191.117783]  [<ffffffff8100641c>] ? xen_clocksource_read+0x20/0x22
[409191.117783]  [<ffffffff812daa77>] net_rx_action+0x9f/0x1b1
[409191.117783]  [<ffffffff8108dcdf>] ? handle_irq_event+0x49/0x5e
[409191.117783]  [<ffffffff810381ef>] __do_softirq+0xa1/0x14b
[409191.117783]  [<ffffffff81090218>] ? handle_edge_irq+0xd8/0xea
[409191.117783]  [<ffffffff812708fd>] ? __xen_evtchn_do_upcall+0x1a4/0x1e1
[409191.117783]  [<ffffffff81370fdc>] call_softirq+0x1c/0x30
[409191.117783]  [<ffffffff8100b493>] do_softirq+0x41/0x7e
[409191.117783]  [<ffffffff81038383>] irq_exit+0x44/0x9c
[409191.117783]  [<ffffffff812720c9>] xen_evtchn_do_upcall+0x2c/0x39
[409191.117783]  [<ffffffff8137103e>] xen_do_hypervisor_callback+0x1e/0x30
[409191.117783]  <EOI> 
[409191.117783]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[409191.117783]  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
[409191.117783]  [<ffffffff81006144>] ? xen_safe_halt+0x10/0x18
[409191.117783]  [<ffffffff810108bd>] ? default_idle+0x27/0x44
[409191.117783]  [<ffffffff810110d8>] ? cpu_idle+0xae/0xf1
[409191.117783]  [<ffffffff81356be9>] ? rest_init+0x6d/0x6f
[409191.117783]  [<ffffffff815a3b8e>] ? start_kernel+0x3b6/0x3c3
[409191.117783]  [<ffffffff815a35e1>] ? repair_env_string+0x56/0x56
[409191.117783]  [<ffffffff815a32d3>] ? x86_64_start_reservations+0xae/0xb2
[409191.117783]  [<ffffffff815a592b>] ? xen_start_kernel+0x4b3/0x4b5
[409191.117783] Code: 57 41 56 41 55 41 89 cd 41 54 41 89 f4 53 48 89 fb 48 83 ec 28 85 f6 8b bf d4 00 00 00 79 02 0f 0b 8b 83 ec 00 00 00 ff c8 74 02 <0f> 0b 01 f7 89 c8 48 8b 4d 08 8d 54 3a 3f 80 cc 20 83 e2 c0 f6 
[409191.117783] RIP  [<ffffffff812d1221>] pskb_expand_head+0x30/0x239
[409191.117783]  RSP <ffff88003f8036d0>
[409191.117783] ---[ end trace 371b0bf63c68d12b ]---
[409191.117783] Kernel panic - not syncing: Fatal exception in interrupt
Comment 1 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-10-14 16:06:53 UTC
Found something very closely related (see URL field), but as the person was instructed to contact the maintainers I'm not sure what happened after that; it might be the case that this has been fixed so please consider to try to run the latest kernel(s) to see if it fixed. If so, please let us know.

If it is not fixed I suggest you file this upstream at https://bugzilla.kernel.org such that the Linux networking maintainers can look into it; if you do that, please let us know the URL of the upstream bug.

Good luck and thank you in advance.
Comment 2 Konstantin Agouros 2013-10-14 16:10:22 UTC
Well I tried git-sources 3.11-something - same result

at the moment I am running standard gentoo-sources-3.10-7-r1 with an uptime of 5 days which is a record for anything > 3.6.11.

However there is an additional change, that Dom0 is running 3.10.7-r1 as well. I don't know if this influences the other thing or not. But since dom0 hosts the physical interfaces (I had no luck with pci passthrough) that might influence the whole thing. But I would give a thumbs up only after a month of uptime.

Konstantin
Comment 3 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-10-14 17:04:50 UTC
*** Bug 466200 has been marked as a duplicate of this bug. ***
Comment 4 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-10-14 17:06:01 UTC
Okay, have closed the other bug as it appears to have been a double bug.

Let us know how the new kernel goes.

If it still breaks, please file this upstream at http://bugzilla.kernel.org/ and provide us an URL to the upstream bug report; thank you in advance.
Comment 5 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-11-16 16:33:44 UTC
How is it going? :)
Comment 6 Konstantin Agouros 2013-11-16 22:33:56 UTC
running 3.10.7-r1 on the Dom0:

# uptime
 23:32:24 up 31 days,  9:27, 49 users,  load average: 0.36, 0.25, 0.18

Same on the troubled DomU:

# uptime
 23:33:02 up 31 days,  9:20,  3 users,  load average: 0.08, 0.04, 0.05


However next week I have to reboot due to HW problems, sothe the upgrade to 3.10.17 will happen then as well. Otherwise it is looking good.
Comment 7 Konstantin Agouros 2013-11-21 17:40:26 UTC
I have no complete crashes anymore.
However I found the following (3.10.17) in dmesg of Dom0:

[267021.868226] ------------[ cut here ]------------
[267021.868236] WARNING: at net/ipv4/tcp_input.c:2776 tcp_fastretrans_alert+0x17c/0x9e1()
[267021.868237] Modules linked in: crc32c xen_blkback tcm_loop iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod w83627ehf hwmon_vid af_packet tun autofs4 nfsd auth_rpcgss oid_registry nfs_acl bridge stp llc xt_TCPMSS iptable_mangle iptable_raw xt_physdev iptable_filter ipt_MASQUERADE xt_nat xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables configfs xen_gntdev loop ipv6 radeon usblp fbcon bitblit softcursor font mperf igb drm_kms_helper ptp k10temp st ttm button pps_core drm agpgart snd_hda_codec_realtek 8250 processor floppy i2c_piix4 serial_core i2c_algo_bit rtc_cmos snd_hda_intel snd_hda_codec r8169 mii snd_pcm snd_page_alloc snd_timer snd xts gf128mul aes_x86_64 cbc sha256_generic iscsi_tcp libiscsi_tcp
[267021.868274]  libiscsi scsi_transport_iscsi e1000 fuse exportfs nfs fscache lockd sunrpc reiserfs zlib_deflate lzo_compress ext4 jbd2 crc16 ext2 multipath linear raid10 raid456 async_pq async_xor xor async_raid6_recov raid6_pq async_memcpy async_tx raid1 raid0 md_mod dm_snapshot dm_crypt dm_mirror dm_region_hash dm_log dm_mod hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx usbhid xhci_hcd ohci_hcd uhci_hcd usb_storage ehci_pci ehci_hcd usbcore usb_common sx8 mptsas scsi_transport_sas mptfc scsi_transport_fc mptspi mptscsih mptbase sym53c8xx aic7xxx scsi_transport_spi sr_mod cdrom sg ahci libahci sata_via pata_atiixp pata_amd libata
[267021.868308] CPU: 4 PID: 1035 Comm: netback/4 Tainted: G        W    3.10.17-gentoo-64bit #1
[267021.868310] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./880GXH/USB3, BIOS P1.20 06/24/2010
[267021.868312]  ffffffff815590c4 ffff880211103758 ffffffff8139ce6a ffff880211103798
[267021.868315]  ffffffff810416d6 ffff880160334380 0000000000000000 0000000000004320
[267021.868317]  0000000000000001 0000000000000001 ffff880160334040 ffff8802111037a8
[267021.868320] Call Trace:
[267021.868321]  <IRQ>  [<ffffffff8139ce6a>] dump_stack+0x19/0x1b
[267021.868329]  [<ffffffff810416d6>] warn_slowpath_common+0x62/0x7b
[267021.868331]  [<ffffffff81041704>] warn_slowpath_null+0x15/0x17
[267021.868333]  [<ffffffff81347560>] tcp_fastretrans_alert+0x17c/0x9e1
[267021.868336]  [<ffffffff81348825>] tcp_ack+0x9c1/0xb9c
[267021.868338]  [<ffffffff8134916c>] tcp_rcv_established+0x466/0x55b
[267021.868342]  [<ffffffff8119a448>] ? security_sock_rcv_skb+0x11/0x13
[267021.868356]  [<ffffffffa0bd44d7>] tcp_v6_do_rcv+0x24e/0x5b1 [ipv6]
[267021.868365]  [<ffffffffa0bd4b25>] tcp_v6_rcv+0x2eb/0x6bb [ipv6]
[267021.868373]  [<ffffffffa0bde0dc>] ? fib6_rule_lookup+0x37/0x5c [ipv6]
[267021.868376]  [<ffffffff810f944c>] ? kfree+0x1f5/0x235
[267021.868383]  [<ffffffffa0bb5942>] ip6_input_finish+0x292/0x3c3 [ipv6]
[267021.868389]  [<ffffffffa0bb5e4a>] ip6_input+0x25/0x56 [ipv6]
[267021.868396]  [<ffffffffa0bb5ad8>] ip6_rcv_finish+0x65/0x69 [ipv6]
[267021.868402]  [<ffffffffa0bb5dbf>] ipv6_rcv+0x2e3/0x349 [ipv6]
[267021.868406]  [<ffffffff81309ca3>] __netif_receive_skb_core+0x621/0x69b
[267021.868409]  [<ffffffff81309d6b>] __netif_receive_skb+0x4e/0x60
[267021.868411]  [<ffffffff81309f3c>] netif_receive_skb+0x50/0x82
[267021.868418]  [<ffffffffa0cabcf5>] br_handle_frame_finish+0x275/0x2e0 [bridge]
[267021.868423]  [<ffffffffa0cb10dd>] br_nf_pre_routing_finish_ipv6+0xde/0x10e [bridge]
[267021.868427]  [<ffffffffa0cb18e2>] br_nf_pre_routing+0x3cf/0x5a5 [bridge]
[267021.868431]  [<ffffffff81068c13>] ? ttwu_do_activate.constprop.82+0x57/0x5c
[267021.868433]  [<ffffffff8132fa9e>] nf_iterate+0x44/0x81
[267021.868437]  [<ffffffffa0caba80>] ? br_handle_local_finish+0x48/0x48 [bridge]
[267021.868440]  [<ffffffff8132fb43>] nf_hook_slow+0x68/0xfd
[267021.868444]  [<ffffffffa0caba80>] ? br_handle_local_finish+0x48/0x48 [bridge]
[267021.868448]  [<ffffffffa0cabf6c>] br_handle_frame+0x20c/0x230 [bridge]
[267021.868452]  [<ffffffffa0cabd60>] ? br_handle_frame_finish+0x2e0/0x2e0 [bridge]
[267021.868454]  [<ffffffff81309b4e>] __netif_receive_skb_core+0x4cc/0x69b
[267021.868457]  [<ffffffff8106cc75>] ? account_steal_ticks+0x9/0xb
[267021.868460]  [<ffffffff81309d6b>] __netif_receive_skb+0x4e/0x60
[267021.868462]  [<ffffffff81309e24>] process_backlog+0xa7/0x16f
[267021.868465]  [<ffffffff8130a16f>] net_rx_action+0xac/0x1ec
[267021.868468]  [<ffffffff8104806f>] __do_softirq+0xf2/0x20d
[267021.868471]  [<ffffffff813a351c>] call_softirq+0x1c/0x30
[267021.868472]  <EOI>  [<ffffffff8101253d>] do_softirq+0x40/0x7f
[267021.868477]  [<ffffffff81307fdd>] netif_rx_ni+0x21/0x26
[267021.868480]  [<ffffffff812b5b69>] xenvif_receive_skb+0xc/0xe
[267021.868483]  [<ffffffff812b4bff>] xen_netbk_kthread+0x753/0x85d
[267021.868486]  [<ffffffff8105f1bb>] ? abort_exclusive_wait+0x89/0x89
[267021.868488]  [<ffffffff812b44ac>] ? xen_netbk_tx_build_gops+0xcbf/0xcbf
[267021.868491]  [<ffffffff8105e822>] kthread+0xb5/0xbd
[267021.868493]  [<ffffffff8105e76d>] ? kthread_freezable_should_stop+0x43/0x43
[267021.868496]  [<ffffffff813a1fbc>] ret_from_fork+0x7c/0xb0
[267021.868498]  [<ffffffff8105e76d>] ? kthread_freezable_should_stop+0x43/0x43
[267021.868500] ---[ end trace 84ad879484aae2bd ]---
Comment 8 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-12-24 15:52:44 UTC
That warning seems unrelated, does it have any other symptoms or does the rest look like it is working fine? I suggest you to file the warning upstream.

https://bugzilla.kernel.org
Comment 9 Konstantin Agouros 2013-12-24 22:32:37 UTC
Dom0 and DomU are running stable besides some i2c nagging.

So I guess we can close this case.