Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 545192 - =sys-kernel/hardened-sources-3.19.3: crash: PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.113_120 min, count: 10
Summary: =sys-kernel/hardened-sources-3.19.3: crash: PAX: size overflow detected in fu...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Hardened (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Anthony Basile
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-01 00:50 UTC by satmd
Modified: 2015-10-04 14:23 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
0001-xfrm6-Fix-ICMPv6-and-MH-header-checks-in-_decode_ses.patch (0001-xfrm6-Fix-ICMPv6-and-MH-header-checks-in-_decode_ses.patch,1.83 KB, patch)
2015-09-10 08:16 UTC, Mathias Krause
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description satmd 2015-04-01 00:50:39 UTC
This crash report has been recovered from acpi efi pstore on a machine running gentoo hardened ~amd64 machine, running hardened-sources 3.19.3.

I've been getting these crashes since 3.19, but wasn't able to capture the previous crashes.

About the setup:
This is a laptop with e1000e and iwlwifi, on a networking using ipv6 and ipsec transport mode for its entire ipv6 communication (minus the unencryptable bits).

The report only mentions iwlwifi, but I suspect the bug being independent from wireless, since it happened with wifi deactivated in the past as well.

I've seen similar crashes in earlier kernels after various uptimes, but rarely.

But since kernel 3.19.x, I'm able to get reproducible results. The kernel always crashes within minutes - when connected to my network.

Additional information can be supplied later, I'm just not sure what to mention right now.

<3>[  120.709605] PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.113_120 min, count: 10
Oops#1 Part8
<4>[  120.709631] CPU: 2 PID: 1109 Comm: irq/31-iwlwifi Not tainted 3.19.3-hardened #4
<4>[  120.709635] Hardware name: LENOVO 2394CTO/2394CTO, BIOS G4ETA3WW (2.63 ) 01/21/2015
<4>[  120.709639]  0000000000000000 ffffc900043338a0 ffffffff8160b923 ffff8803e0ad0200
<4>[  120.709647]  ffffffff81172022 ffffffffffffffff ffffffffc03d5bc7 ffff8800d15482c0
<4>[  120.709653]  00000000c03be8b4 ffff8803d9d95500 ffff8803fe119110 ffff8803e0ad0200
<4>[  120.709661] Call Trace:
<4>[  120.709675]  [<ffffffff8160b923>] ? dump_stack+0x4a/0x78
<4>[  120.709684]  [<ffffffff81172022>] ? report_size_overflow+0x21/0x2b
<4>[  120.709742]  [<ffffffffc03d5bc7>] ? _decode_session6+0x1fd/0x367 [ipv6]
<4>[  120.709751]  [<ffffffff815ef4e0>] ? __xfrm_decode_session+0x35/0x49
<4>[  120.709756]  [<ffffffff815f3705>] ? __xfrm_policy_check+0x56/0x51c
<4>[  120.709776]  [<ffffffffc03bad59>] ? ip6_pol_route+0x339/0x366 [ipv6]
<4>[  120.709794]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
<4>[  120.709813]  [<ffffffffc03c8b9e>] ? icmpv6_rcv+0x68/0x796 [ipv6]
<4>[  120.709819]  [<ffffffff81599dd5>] ? fib_rules_lookup+0x134/0x148
<4>[  120.709827]  [<ffffffff8160f8ae>] ? _raw_read_unlock+0x10/0x26
<4>[  120.709847]  [<ffffffffc03c7cc0>] ? raw6_local_deliver+0x1b6/0x204 [ipv6]
<4>[  120.709853]  [<ffffffff81081f2e>] ? __local_bh_enable_ip+0x69/0x7f
<4>[  120.709868]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
<4>[  120.709882]  [<ffffffffc03b0fdb>] ? ip6_input_finish+0x39c/0x4f0 [ipv6]
<4>[  120.709897]  [<ffffffffc03b173a>] ? ip6_mc_input+0xba/0xcb [ipv6]
<4>[  120.709905]  [<ffffffff815811fd>] ? __netif_receive_skb_core+0x47e/0x506
<4>[  120.709910]  [<ffffffff81581dd8>] ? netif_receive_skb_internal+0x46/0x8e
Oops#1 Part7
<4>[  120.709915]  [<ffffffff8158276f>] ? napi_gro_receive+0x47/0xc1
<4>[  120.709941]  [<ffffffffc02a5985>] ? ieee80211_deliver_skb+0xe7/0x151 [mac80211]
<4>[  120.709966]  [<ffffffffc02a74c9>] ? ieee80211_rx_handlers+0x149f/0x1db2 [mac80211]
<4>[  120.709973]  [<ffffffff813df179>] ? dma_pte_clear_level+0x102/0x160
<4>[  120.709980]  [<ffffffff81038e7b>] ? clflush_cache_range+0x30/0x3a
<4>[  120.710004]  [<ffffffffc02a86ba>] ? ieee80211_prepare_and_rx_handle+0x8de/0x9b2 [mac80211]
<4>[  120.710029]  [<ffffffffc02a8e02>] ? ieee80211_rx+0x674/0x6ae [mac80211]
<4>[  120.710036]  [<ffffffff81573ebb>] ? __kmalloc_reserve.isra.20+0x23/0x64
<4>[  120.710042]  [<ffffffff81160b9d>] ? virt_to_head_page+0x9/0x5d
<4>[  120.710058]  [<ffffffffc035ec2a>] ? iwlagn_rx_reply_rx+0x3bb/0x43a [iwldvm]
<4>[  120.710076]  [<ffffffffc0139e4f>] ? iwl_pcie_irq_handler+0x6d4/0x7f7 [iwlwifi]
<4>[  120.710083]  [<ffffffff810ac217>] ? pick_next_task_rt+0xea/0xfc
<4>[  120.710089]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  120.710094]  [<ffffffff810bffd6>] ? irq_thread_fn+0x1c/0x3a
<4>[  120.710099]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  120.710104]  [<ffffffff810c0981>] ? irq_thread+0x10d/0x18f
<4>[  120.710109]  [<ffffffff810c007c>] ? wake_threads_waitq+0x33/0x33
<4>[  120.710114]  [<ffffffff810c0874>] ? free_irq+0x86/0x86
<4>[  120.710121]  [<ffffffff81097b17>] ? kthread+0xb4/0xbc
<4>[  120.710150]  [<ffffffff81090000>] ? SyS_prctl+0x41/0x409
<4>[  120.710157]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<4>[  120.710166]  [<ffffffff8160fde4>] ? ret_from_fork+0x44/0x70
<4>[  120.710187]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<0>[  120.710193] Kernel panic - not syncing: Aiee, killing interrupt handler!
Oops#1 Part6
<4>[  120.710381] CPU: 2 PID: 1109 Comm: irq/31-iwlwifi Not tainted 3.19.3-hardened #4
<4>[  120.710530] Hardware name: LENOVO 2394CTO/2394CTO, BIOS G4ETA3WW (2.63 ) 01/21/2015
<4>[  120.710716]  0000000000000000 0000000000000009 ffffffff8160b923 ffffffff817f07ea
<4>[  120.710967]  ffffffff816088ca ffff88041e283fc0 ffffffff00000008 ffffc900043336c8
<4>[  120.711171]  ffffc90004333668 ffffc90004333740 69b2d35a45e07b19 ffff88040afc6d48
<4>[  120.711332] Call Trace:
<4>[  120.711390]  [<ffffffff8160b923>] ? dump_stack+0x4a/0x78
<3>[  120.711480] PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.113_120 min, count: 10
<4>[  120.711748]  [<ffffffff816088ca>] ? panic+0xc2/0x1f2
<4>[  120.711844]  [<ffffffff8107fbb1>] ? do_exit+0x92/0x9c2
<4>[  120.711942]  [<ffffffff810812b1>] ? do_group_exit+0x3f/0xba
<4>[  120.712045]  [<ffffffff8117202c>] ? report_size_overflow+0x2b/0x2b
<4>[  120.712178]  [<ffffffffc03d5bc7>] ? _decode_session6+0x1fd/0x367 [ipv6]
<4>[  120.712303]  [<ffffffff815ef4e0>] ? __xfrm_decode_session+0x35/0x49
<4>[  120.712419]  [<ffffffff815f3705>] ? __xfrm_policy_check+0x56/0x51c
<4>[  120.712545]  [<ffffffffc03bad59>] ? ip6_pol_route+0x339/0x366 [ipv6]
<4>[  120.712673]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
<4>[  120.712826]  [<ffffffffc03c8b9e>] ? icmpv6_rcv+0x68/0x796 [ipv6]
<4>[  120.712939]  [<ffffffff81599dd5>] ? fib_rules_lookup+0x134/0x148
<4>[  120.713049]  [<ffffffff8160f8ae>] ? _raw_read_unlock+0x10/0x26
<4>[  120.713168]  [<ffffffffc03c7cc0>] ? raw6_local_deliver+0x1b6/0x204 [ipv6]
<4>[  120.713295]  [<ffffffff81081f2e>] ? __local_bh_enable_ip+0x69/0x7f
<4>[  120.713417]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
Oops#1 Part5
<4>[  120.713563]  [<ffffffffc03b0fdb>] ? ip6_input_finish+0x39c/0x4f0 [ipv6]
<4>[  120.713693]  [<ffffffffc03b173a>] ? ip6_mc_input+0xba/0xcb [ipv6]
<4>[  120.713808]  [<ffffffff815811fd>] ? __netif_receive_skb_core+0x47e/0x506
<4>[  120.713934]  [<ffffffff81581dd8>] ? netif_receive_skb_internal+0x46/0x8e
<4>[  120.714055]  [<ffffffff8158276f>] ? napi_gro_receive+0x47/0xc1
<4>[  120.714180]  [<ffffffffc02a5985>] ? ieee80211_deliver_skb+0xe7/0x151 [mac80211]
<4>[  120.714332]  [<ffffffffc02a74c9>] ? ieee80211_rx_handlers+0x149f/0x1db2 [mac80211]
<4>[  120.714475]  [<ffffffff813df179>] ? dma_pte_clear_level+0x102/0x160
<4>[  120.714592]  [<ffffffff81038e7b>] ? clflush_cache_range+0x30/0x3a
<4>[  120.714722]  [<ffffffffc02a86ba>] ? ieee80211_prepare_and_rx_handle+0x8de/0x9b2 [mac80211]
<4>[  120.714888]  [<ffffffffc02a8e02>] ? ieee80211_rx+0x674/0x6ae [mac80211]
<4>[  120.715012]  [<ffffffff81573ebb>] ? __kmalloc_reserve.isra.20+0x23/0x64
<4>[  120.715135]  [<ffffffff81160b9d>] ? virt_to_head_page+0x9/0x5d
<4>[  120.715252]  [<ffffffffc035ec2a>] ? iwlagn_rx_reply_rx+0x3bb/0x43a [iwldvm]
<4>[  120.715391]  [<ffffffffc0139e4f>] ? iwl_pcie_irq_handler+0x6d4/0x7f7 [iwlwifi]
<4>[  120.715525]  [<ffffffff810ac217>] ? pick_next_task_rt+0xea/0xfc
<4>[  120.715636]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  120.715750]  [<ffffffff810bffd6>] ? irq_thread_fn+0x1c/0x3a
<4>[  120.715853]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  120.715967]  [<ffffffff810c0981>] ? irq_thread+0x10d/0x18f
<4>[  120.716070]  [<ffffffff810c007c>] ? wake_threads_waitq+0x33/0x33
<4>[  120.716179]  [<ffffffff810c0874>] ? free_irq+0x86/0x86
<4>[  120.716277]  [<ffffffff81097b17>] ? kthread+0xb4/0xbc
<4>[  120.716374]  [<ffffffff81090000>] ? SyS_prctl+0x41/0x409
Oops#1 Part4
<4>[  120.716473]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<4>[  120.720579]  [<ffffffff8160fde4>] ? ret_from_fork+0x44/0x70
<4>[  120.724606]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<4>[  120.728575] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.3-hardened #4
<0>[  120.728585] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
<3>[  120.737209] drm_kms_helper: panic occurred, switching back to text console
<4>[  120.741637] ------------[ cut here ]------------
<2>[  120.745966] kernel BUG at drivers/gpu/drm/drm_crtc.c:536!
<4>[  120.750317] invalid opcode: 0000 [#1] PREEMPT SMP 
<4>[  120.754574] Modules linked in: tun esp6 xfrm6_mode_transport ccm autofs4 nfsd auth_rpcgss nfs_acl deflate ctr twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common cmac xcbc rmd160 crypto_null af_key xfrm_algo cachefiles nfnetlink_log nf_tables nfnetlink vfat fat uas usb_storage nfsv4 dns_resolver nfs lockd grace sunrpc fscache ipv6 ecb snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb bluetooth iwldvm mac80211 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev cdc_ncm usbnet mii cdc_wdm cdc_acm x86_pkg_temp_thermal coretemp kvm_intel kvm snd_hda_intel iwlwifi snd_hda_controller snd_hda_codec i2c_i801 snd_hwdep snd_pcm thinkpad_acpi cfg80211 e1000e snd_timer nvram snd ptp soundcore wmi rfkill pps_core algif_skcipher algif_hash af_alg crc32_pclmul crc32c_intel sr_mod sdhci_pci cdrom sdhci led_class mmc_core
<4>[  120.788021] CPU: 2 PID: 1109 Comm: irq/31-iwlwifi Not tainted 3.19.3-hardened #4
Oops#1 Part3
<4>[  120.793537] Hardware name: LENOVO 2394CTO/2394CTO, BIOS G4ETA3WW (2.63 ) 01/21/2015
<4>[  120.799086] task: ffff88040afc6680 ti: ffff88040afc6c48 task.ti: ffff88040afc6c48
<4>[  120.804654] RIP: 0010:[<ffffffff813fb706>]  [<ffffffff813fb706>] drm_framebuffer_free_bug+0x0/0x2
<4>[  120.810318] RSP: 0000:ffffc90004333540  EFLAGS: 00010086
<4>[  120.815981] RAX: 0000000000000000 RBX: ffff88040acab000 RCX: 0000000000000008
<4>[  120.821710] RDX: 0000000000000000 RSI: ffffffff813fb706 RDI: ffff8800c9bde908
<4>[  120.827420] RBP: ffff8800c9bde900 R08: 0000000000000000 R09: 0000000000000000
<4>[  120.833152] R10: 0000000000000003 R11: 0000000000000001 R12: 8000000000000000
<4>[  120.838863] R13: ffff88040bd56000 R14: ffff88040bd56378 R15: ffffffff81a8743a
<4>[  120.844576] FS:  0000000000000000(0000) GS:ffff88041e280000(0000) knlGS:0000000000000000
<4>[  120.850380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  120.856164] CR2: 00000362ba782000 CR3: 000000000261b000 CR4: 00000000001607f0
<4>[  120.862006] Stack:
<4>[  120.867771]  ffffffff813fc7c2 0000000000000000 ffffffff813fce84 0000000000000000
<4>[  120.873681]  ffff88040acab000 ffff88040a8f2a00 ffffffff813ed2cc 0000000000000000
<4>[  120.879585]  ffff88040a8f2a00 0000000000000000 ffff88040bd56000 00000000fffffffc
<4>[  120.885480] Call Trace:
<4>[  120.891314]  [<ffffffff813fc7c2>] ? kref_put+0x17/0x22
<4>[  120.897161]  [<ffffffff813fce84>] ? drm_plane_force_disable+0x8b/0xb4
<4>[  120.903036]  [<ffffffff813ed2cc>] ? restore_fbdev_mode+0x42/0xcb
<4>[  120.908878]  [<ffffffff813ed472>] ? drm_fb_helper_force_kernel_mode+0x55/0x85
<4>[  120.914768]  [<ffffffff813edd52>] ? drm_fb_helper_panic+0x1d/0x26
<4>[  120.920597]  [<ffffffff81098622>] ? notifier_call_chain+0x39/0x64
Oops#1 Part2
<4>[  120.926426]  [<ffffffff81098691>] ? __atomic_notifier_call_chain+0x3a/0x4f
<4>[  120.932262]  [<ffffffff816088fd>] ? panic+0xf5/0x1f2
<4>[  120.938065]  [<ffffffff8107fbb1>] ? do_exit+0x92/0x9c2
<4>[  120.943875]  [<ffffffff810812b1>] ? do_group_exit+0x3f/0xba
<4>[  120.949690]  [<ffffffff8117202c>] ? report_size_overflow+0x2b/0x2b
<4>[  120.955549]  [<ffffffffc03d5bc7>] ? _decode_session6+0x1fd/0x367 [ipv6]
<4>[  120.961405]  [<ffffffff815ef4e0>] ? __xfrm_decode_session+0x35/0x49
<4>[  120.967254]  [<ffffffff815f3705>] ? __xfrm_policy_check+0x56/0x51c
<4>[  120.973134]  [<ffffffffc03bad59>] ? ip6_pol_route+0x339/0x366 [ipv6]
<4>[  120.979029]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
<4>[  120.984990]  [<ffffffffc03c8b9e>] ? icmpv6_rcv+0x68/0x796 [ipv6]
<4>[  120.990927]  [<ffffffff81599dd5>] ? fib_rules_lookup+0x134/0x148
<4>[  120.996887]  [<ffffffff8160f8ae>] ? _raw_read_unlock+0x10/0x26
<4>[  121.002802]  [<ffffffffc03c7cc0>] ? raw6_local_deliver+0x1b6/0x204 [ipv6]
<4>[  121.008648]  [<ffffffff81081f2e>] ? __local_bh_enable_ip+0x69/0x7f
<4>[  121.014409]  [<ffffffffc03da790>] ? inet6_set_link_af.part.36+0x2163/0x11823 [ipv6]
<4>[  121.020171]  [<ffffffffc03b0fdb>] ? ip6_input_finish+0x39c/0x4f0 [ipv6]
<4>[  121.025937]  [<ffffffffc03b173a>] ? ip6_mc_input+0xba/0xcb [ipv6]
<4>[  121.031657]  [<ffffffff815811fd>] ? __netif_receive_skb_core+0x47e/0x506
<4>[  121.037235]  [<ffffffff81581dd8>] ? netif_receive_skb_internal+0x46/0x8e
<4>[  121.042613]  [<ffffffff8158276f>] ? napi_gro_receive+0x47/0xc1
<4>[  121.047808]  [<ffffffffc02a5985>] ? ieee80211_deliver_skb+0xe7/0x151 [mac80211]
<4>[  121.052860]  [<ffffffffc02a74c9>] ? ieee80211_rx_handlers+0x149f/0x1db2 [mac80211]
<4>[  121.057710]  [<ffffffff813df179>] ? dma_pte_clear_level+0x102/0x160
Oops#1 Part1
<4>[  121.062365]  [<ffffffff81038e7b>] ? clflush_cache_range+0x30/0x3a
<4>[  121.066889]  [<ffffffffc02a86ba>] ? ieee80211_prepare_and_rx_handle+0x8de/0x9b2 [mac80211]
<4>[  121.071440]  [<ffffffffc02a8e02>] ? ieee80211_rx+0x674/0x6ae [mac80211]
<4>[  121.075914]  [<ffffffff81573ebb>] ? __kmalloc_reserve.isra.20+0x23/0x64
<4>[  121.080332]  [<ffffffff81160b9d>] ? virt_to_head_page+0x9/0x5d
<4>[  121.084680]  [<ffffffffc035ec2a>] ? iwlagn_rx_reply_rx+0x3bb/0x43a [iwldvm]
<4>[  121.089035]  [<ffffffffc0139e4f>] ? iwl_pcie_irq_handler+0x6d4/0x7f7 [iwlwifi]
<4>[  121.093388]  [<ffffffff810ac217>] ? pick_next_task_rt+0xea/0xfc
<4>[  121.097719]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  121.102060]  [<ffffffff810bffd6>] ? irq_thread_fn+0x1c/0x3a
<4>[  121.106343]  [<ffffffff810bffba>] ? irq_finalize_oneshot+0x93/0x93
<4>[  121.110559]  [<ffffffff810c0981>] ? irq_thread+0x10d/0x18f
<4>[  121.114710]  [<ffffffff810c007c>] ? wake_threads_waitq+0x33/0x33
<4>[  121.118827]  [<ffffffff810c0874>] ? free_irq+0x86/0x86
<4>[  121.122892]  [<ffffffff81097b17>] ? kthread+0xb4/0xbc
<4>[  121.126893]  [<ffffffff81090000>] ? SyS_prctl+0x41/0x409
<4>[  121.130889]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<4>[  121.134881]  [<ffffffff8160fde4>] ? ret_from_fork+0x44/0x70
<4>[  121.138815]  [<ffffffff81097a63>] ? __kthread_parkme+0x71/0x71
<4>[  121.142739] Code: 0a 39 28 48 8d 50 e0 48 0f 44 da 4c 89 ef e8 14 25 21 00 48 89 d8 5b 5d 41 5d 4c 09 24 24 c3 48 89 f2 48 8b 76 08 e9 7e 94 f2 ff <0f> 0b 41 55 55 53 48 8b 2f 48 89 fb 4c 8d ad e8 02 00 00 4c 89 
<1>[  121.151986] RIP  [<ffffffff813fb706>] drm_framebuffer_free_bug+0x0/0x2
<4>[  121.156467]  RSP <ffffc90004333540>
<4>[  121.189129] ---[ end trace a80b1cbf9e2034e8 ]---
Comment 1 satmd 2015-04-01 00:53:10 UTC
I'm on freenode if you want to catch me.
Comment 2 satmd 2015-04-01 02:08:19 UTC
After inserting debugging with pipacs, there's a new oops documented at https://lain.at/dump/crash/20150331/oops~2.txt, using the kernel stored at https://lain.at/dump/crash/20150331/ (will later name them ~2, too)

There's also more debug information from xfrm_policy.c at https://lain.at/dump/crash/20150331/xfrm6_policy.c~2/

Instead of spamming the bug report, I'll silently continue to numer the revisions of kernel compiles.
Comment 3 Anthony Basile gentoo-dev 2015-04-01 09:03:04 UTC
Okay bouncing this one by upstream.
Comment 4 PaX Team 2015-04-01 10:29:52 UTC
> PAX: nh:ffff8803d7cc5e28 off:28 data:ffff8803d7cc5e58 len:18 data_len:0

the above is the state of affairs when the size overflow (well, actually underflow here) detection triggers. the expression computed is nh+off+2-data which underflows with the above values (...e28+28+2 = ...e52 < ...e58). this code was last fixed for bug #529352 which made 'offset' a constant (0x28 above), so either that fix is still not correct or something else goes wrong with nh or data. at this point an upstream report to netdev is in order to let them figure it out again ;).

PS: add Emese too for size overflow related bugs please ;)
Comment 5 jack_mort 2015-04-04 09:50:30 UTC
Before posting a new bug report, I post here another size overflow problem.
Can it be related or do I create a new bug ?

More info her : as satmd, I'm getting crashes since 3.19 series and I was able to get the crash today on 3.19.3. Kernel boots fine, and after few minutes, throws a size overflow error and cannot access my raid array anymore.
A hard reboot has then to be done.

[avril 4 11:38] PAX: size overflow detected in function async_copy_data.isra.38 drivers/md/raid5.c:946 cicus.1056_137 min, count: 60
[  +0,000012] CPU: 0 PID: 2210 Comm: md127_raid5 Tainted: G           O   3.19.3-hardened #1
[  +0,000004] Hardware name: MSI MS-7592/G41M-P33 Combo(MS-7592), BIOS V32.12 09/13/2013
[  +0,000003]  2e62af56a4220b55 ffffffffa011f51e 0000000000000000 ffffffffa011f51e
[  +0,000007]  ffffffff81609dfc ffffffffa011f61e ffffffff8114a055 0000000000080000
[  +0,000006]  00000000dedfac08 ffff8800c6b17180 ffff8800c6b175f0 0000000000000002
[  +0,000006] Call Trace:
[  +0,000030]  [<ffffffffa011f51e>] ? raid5_exit+0x51e/0x2bd8 [raid456]
[  +0,000011]  [<ffffffffa011f51e>] ? raid5_exit+0x51e/0x2bd8 [raid456]
[  +0,000008]  [<ffffffff81609dfc>] ? dump_stack+0x40/0x56
[  +0,000010]  [<ffffffffa011f61e>] ? raid5_exit+0x61e/0x2bd8 [raid456]
[  +0,000007]  [<ffffffff8114a055>] ? report_size_overflow+0x35/0x40
[  +0,000011]  [<ffffffffa0117ca5>] ? async_copy_data.isra.38+0x405/0x470 [raid456]
[  +0,000011]  [<ffffffffa00f8141>] ? async_xor+0x141/0x180 [async_xor]
[  +0,000010]  [<ffffffffa01183e3>] ? raid_run_ops+0x6d3/0xfa0 [raid456]
[  +0,000010]  [<ffffffffa0115670>] ? release_stripe+0x100/0x100 [raid456]
[  +0,000010]  [<ffffffffa011bdd8>] ? handle_stripe+0xbf8/0x2170 [raid456]
[  +0,000007]  [<ffffffff8109be15>] ? sched_clock_local+0x15/0x80
[  +0,000006]  [<ffffffff8109c068>] ? sched_clock_cpu+0x88/0xb0
[  +0,000006]  [<ffffffff810a30ab>] ? pick_next_task_fair+0x33b/0x480
[  +0,000010]  [<ffffffffa011d4ae>] ? handle_active_stripes.isra.39+0x15e/0x3d0 [raid456]
[  +0,000010]  [<ffffffffa011dace>] ? raid5d+0x30e/0x4d0 [raid456]
[  +0,000015]  [<ffffffffa00a5c29>] ? md_thread+0x139/0x140 [md_mod]
[  +0,000006]  [<ffffffff810a7de0>] ? wait_woken+0xa0/0xa0
[  +0,000012]  [<ffffffffa00a5af0>] ? md_start_sync+0xf0/0xf0 [md_mod]
[  +0,000007]  [<ffffffff810905ff>] ? kthread+0xdf/0x100
[  +0,000005]  [<ffffffff81090520>] ? kthread_create_on_node+0x170/0x170
[  +0,000007]  [<ffffffff8160f219>] ? ret_from_fork+0x49/0x80
[  +0,000006]  [<ffffffff81090520>] ? kthread_create_on_node+0x170/0x170
Comment 6 satmd 2015-04-04 10:07:30 UTC
Hi,

your bug seems to be different enough from mine to be a separate bug.

I've been talking to pipacs and my bug is related to https://forums.grsecurity.net/viewtopic.php?f=1&t=4083 . 

The problem seems to lie within the networking code for me and I was asked to forward my problem to netdev.

I will continue to work on my bug after the holidays.

Your bug doesn't reference any networking functions and probably is related to something different.

(In reply to jack_mort from comment #5)
> Before posting a new bug report, I post here another size overflow problem.
> Can it be related or do I create a new bug ?
> 
> More info her : as satmd, I'm getting crashes since 3.19 series and I was
> able to get the crash today on 3.19.3. Kernel boots fine, and after few
> minutes, throws a size overflow error and cannot access my raid array
> anymore.
> A hard reboot has then to be done.
>
Comment 7 satmd 2015-04-04 10:18:54 UTC
I've searched through the history of related bugs and come up with some links

http://marc.info/?l=linux-netdev&m=141768340108789&w=2

The suggested patch is already included with the kernel, but obviously isn't sufficient.

Bug 529352: That bug tracks above mentioned mailing list thread and does not contain the fix for this bug.

Shall I reply to the original mailing list thread with my (new) issue? Or shall I send a new mail to the list (without In-Reply-To)?
Comment 8 PaX Team 2015-04-04 12:37:30 UTC
(In reply to jack_mort from comment #5)
> Before posting a new bug report, I post here another size overflow problem.
> Can it be related or do I create a new bug ?

this is a different bug so please file it as such. while you're at it, please enable frame pointers to have a better backtrace. we'll also need the resulting files (drivers/md/raid5.c.*) of the following command:

 make drivers/md/raid5.o EXTRA_CFLAGS="-fdump-tree-all -fdump-ipa-all"

to help us gather runtime data, you should also apply the following patch and post the results along with the backtrace next time it triggers:

--- a/drivers/md/raid5.c  2015-03-18 15:21:50.408349253 +0100
+++ b/drivers/md/raid5.c  2015-04-04 14:26:03.230450669 +0200
@@ -954,6 +954,7 @@
        struct async_submit_ctl submit;
        enum async_tx_flags flags = 0;

+printk("PAX: bi_iter.bi_sector:%lx sector:%lx\n", bio->bi_iter.bi_sector, sector);
        if (bio->bi_iter.bi_sector >= sector)
                page_offset = (signed)(bio->bi_iter.bi_sector - sector) * 512;
        else
Comment 9 PaX Team 2015-04-04 12:41:45 UTC
(In reply to satmd from comment #7)
> Shall I reply to the original mailing list thread with my (new) issue? Or
> shall I send a new mail to the list (without In-Reply-To)?

good question, i'd say these two bugs are related so you might as well continue that thread (that way it'll also be easier to find them in the future). in any case make sure you CC the same people that were on the original report (and Emese/me too).
Comment 10 PaX Team 2015-07-30 08:16:43 UTC
satmd: i'm wondering, did you manage to follow up on this with Steffen Klassert on netdev? i.e., is this bug fixed now or are you guys still investigating?
Comment 11 Marcin Jurkowski 2015-08-31 21:59:34 UTC
Subscribe myself for updates.

This bugs occurs in recent hardened kernel (4.1.6). Very annoying :-/
Comment 12 Mathias Krause 2015-09-07 08:02:44 UTC
Can you please try reverting commit cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e. <https://git.kernel.org/linus/cd3bafc7>?
Comment 13 Marcin Jurkowski 2015-09-09 20:07:29 UTC
(In reply to Mathias Krause from comment #12)
> Can you please try reverting commit
> cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e.
> <https://git.kernel.org/linus/cd3bafc7>?
Unfortunately, it didn't help and I'm not surprised. The overflow occurs in IPPROTO_ICMPV6 branch whereas the commit alters offset calculation in IPPROTO_MH case.

Just for the record, oops message with above-mentioned commit reverted:

PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:188 cicus.107_211 min, count: 14
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.6-hardened #4
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q1900-ITX, BIOS P1.40 10/31/2014
 ffffffffa06081b3 0000000000000000 ffffffffa06081c4 ffff88013fc03948
 ffffffff814df088 0000000000000001 ffffffffa06081b3 ffff88013fc03978
 ffffffff8111d1a6 ffff8800a74d2ece ffff88013fc03a60 ffff8800a761d800
Call Trace:
 <IRQ>  [<ffffffffa06081b3>] ? ipv6_proc_exit_net+0x8133/0x12e29 [ipv6]
 [<ffffffffa06081c4>] ? ipv6_proc_exit_net+0x8144/0x12e29 [ipv6]
 [<ffffffff814df088>] dump_stack+0x45/0x5b
 [<ffffffffa06081b3>] ? ipv6_proc_exit_net+0x8133/0x12e29 [ipv6]
 [<ffffffff8111d1a6>] report_size_overflow+0x36/0x40
 [<ffffffffa05fc1ce>] _decode_session6+0x59e/0x6f0 [ipv6]
 [<ffffffff814b7a70>] __xfrm_decode_session+0x40/0x60
 [<ffffffff814bc576>] __xfrm_policy_check+0x56/0x5f0
 [<ffffffffa06025a0>] ? ipv6_proc_exit_net+0x2520/0x12e29 [ipv6]
 [<ffffffffa05e7181>] icmpv6_rcv+0x1d1/0xa20 [ipv6]
 [<ffffffffa05d44f0>] ? ip6_pol_route.isra.47+0x530/0x530 [ipv6]
 [<ffffffffa05fe6a8>] ? fib6_rule_action+0xc8/0x210 [ipv6]
 [<ffffffffa000e6d1>] ? xhci_queue_bulk_tx+0x2a1/0x700 [xhci_hcd]
 [<ffffffff814e458c>] ? _raw_read_unlock_bh+0x2c/0x40
 [<ffffffffa05ec208>] ? ipv6_chk_mcast_addr+0x128/0x150 [ipv6]
 [<ffffffffa06025a0>] ? ipv6_proc_exit_net+0x2520/0x12e29 [ipv6]
 [<ffffffffa05c7006>] ip6_input_finish+0x1e6/0x550 [ipv6]
 [<ffffffffa05c78c6>] ip6_input+0x26/0x80 [ipv6]
 [<ffffffffa05ec16e>] ? ipv6_chk_mcast_addr+0x8e/0x150 [ipv6]
 [<ffffffffa05c79c7>] ip6_mc_input+0xa7/0x210 [ipv6]
 [<ffffffffa05c6dac>] ip6_rcv_finish+0x2c/0xa0 [ipv6]
 [<ffffffffa05c7624>] ipv6_rcv+0x2b4/0x530 [ipv6]
 [<ffffffff813b0bf2>] ? usb_submit_urb+0x302/0x560
 [<ffffffff81421678>] __netif_receive_skb_core+0x608/0xa10
 [<ffffffff8142415f>] __netif_receive_skb+0x1f/0x80
 [<ffffffff814241de>] netif_receive_skb_internal+0x1e/0x90
 [<ffffffff81424a48>] napi_gro_receive+0x78/0xa0
 [<ffffffffa04945fc>] rtl8169_poll+0x2ec/0x680 [r8169]
 [<ffffffff81425435>] net_rx_action+0x125/0x2f0
 [<ffffffff8104fa9f>] __do_softirq+0xdf/0x240
 [<ffffffff8104fe8e>] irq_exit+0xee/0x110
 [<ffffffff81004ad6>] do_IRQ+0x56/0xf0
 [<ffffffff814e592b>] common_interrupt+0xab/0xab
 <EOI>  [<ffffffff813e4201>] ? cpuidle_enter_state+0x81/0x140
 [<ffffffff813e4314>] cpuidle_enter+0x24/0x40
 [<ffffffff810824bb>] cpu_startup_entry+0x24b/0x2c0
 [<ffffffff814d7b72>] rest_init+0x72/0x80
 [<ffffffff81a140d8>] 0xffffffff81a140d8
 [<ffffffff81a139a9>] ? 0xffffffff81a139a9
 [<ffffffff81a13120>] ? 0xffffffff81a13120
 [<ffffffff81a13120>] ? 0xffffffff81a13120
 [<ffffffff81a134f4>] 0xffffffff81a134f4
 [<ffffffff81a135f2>] 0xffffffff81a135f2
Comment 14 Mathias Krause 2015-09-10 08:16:25 UTC
Created attachment 411484 [details, diff]
0001-xfrm6-Fix-ICMPv6-and-MH-header-checks-in-_decode_ses.patch

(In reply to Marcin Jurkowski from comment #13)
> (In reply to Mathias Krause from comment #12)
> > Can you please try reverting commit
> > cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e.
> > <https://git.kernel.org/linus/cd3bafc7>?
> Unfortunately, it didn't help and I'm not surprised. The overflow occurs in
> IPPROTO_ICMPV6 branch whereas the commit alters offset calculation in
> IPPROTO_MH case.

*D'oh!* You're correct! ;)

It looks like there are ICMPv6 packets received by your system that lack the actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data' underflows. That negative value will be passed to psk_may_pull() which formally takes an 'unsigned int', implicitly converting the negative value to an unsigned one -- making the size_overflow catch that bug and generate the report.

Can you please test the following patch instead? It should prevent the underflow from happening by testing it beforehand.
Comment 15 Marcin Jurkowski 2015-09-10 19:45:47 UTC
(In reply to Mathias Krause from comment #14)
> It looks like there are ICMPv6 packets received by your system that lack the
> actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data'
> underflows. That negative value will be passed to psk_may_pull() which
> formally takes an 'unsigned int', implicitly converting the negative value
> to an unsigned one -- making the size_overflow catch that bug and generate
> the report.
> 
> Can you please test the following patch instead? It should prevent the
> underflow from happening by testing it beforehand.
It did the trick. PAX no longer reports overflow.

By the way, is this part of IPv6 code really maintained? Similar issue was fixed in https://git.kernel.org/linus/59cae00 and this one should have been addressed back then.
There were more bugs like this in IPv6 XFRM code in the past, unnoticed until PAX detected overflow, illegal assignment etc. Every single case I encountered could be spotted by carefully reading code, yet no one did it!
Comment 16 Anthony Basile gentoo-dev 2015-09-10 20:59:07 UTC
@pageexec and spender, i'm confused about the new workflow upstream with stable no longer being available.  will this fix be out in the next testing patchset?
Comment 17 Anthony Basile gentoo-dev 2015-09-10 21:00:06 UTC
(In reply to PaX Team from comment #8)
> (In reply to jack_mort from comment #5)
> > Before posting a new bug report, I post here another size overflow problem.
> > Can it be related or do I create a new bug ?
> 

@jack_mort, did you open another bug report because i dont' see it
Comment 18 PaX Team 2015-09-10 21:18:43 UTC
(In reply to Anthony Basile from comment #16)
> @pageexec and spender, i'm confused about the new workflow upstream with
> stable no longer being available.  will this fix be out in the next testing
> patchset?

of course we'll fix it in 4.1.x as well, nothing changed for that series.
Comment 19 jack_mort 2015-09-10 21:22:51 UTC
(In reply to Anthony Basile from comment #17)
> (In reply to PaX Team from comment #8)
> > (In reply to jack_mort from comment #5)
> > > Before posting a new bug report, I post here another size overflow problem.
> > > Can it be related or do I create a new bug ?
> > 
> 
> @jack_mort, did you open another bug report because i dont' see it

Sorry it was a long time ago xD
And yes I opened a dedicated report at that time.
Comment 20 Mathias Krause 2015-09-11 08:08:30 UTC
(In reply to Marcin Jurkowski from comment #15)
> (In reply to Mathias Krause from comment #14)
> > It looks like there are ICMPv6 packets received by your system that lack the
> > actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data'
> > underflows. That negative value will be passed to psk_may_pull() which
> > formally takes an 'unsigned int', implicitly converting the negative value
> > to an unsigned one -- making the size_overflow catch that bug and generate
> > the report.
> > 
> > Can you please test the following patch instead? It should prevent the
> > underflow from happening by testing it beforehand.
> It did the trick. PAX no longer reports overflow.

Thanks for testing!

> By the way, is this part of IPv6 code really maintained? Similar issue was
> fixed in https://git.kernel.org/linus/59cae00 and this one should have been
> addressed back then.

It better is. There's an entry for this code in the MAINTAINERS file, at least ;)

> There were more bugs like this in IPv6 XFRM code in the past, unnoticed
> until PAX detected overflow, illegal assignment etc. Every single case I
> encountered could be spotted by carefully reading code, yet no one did it!

Unfortunately, those kind of bugs "silently fail" on vanilla as the underflows goes unnoticed -- beside dropped packets, maybe.

But you're welcome to review the code and send patches to netdev... ;)
Comment 21 Alexander Miroshnichenko 2015-09-20 07:28:45 UTC
I have the issue too.

Steps to reproduce:
1. Hardened kernel [workstation profile]
2. ipsec vpn connection (I use Shrew VPN)
3. Kernel panic

[ 1771.738191] PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.110_217 min, count: 14
[ 1771.738797] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 1771.738906] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.8-hardened #1
[ 1771.738995] Hardware name: Hewlett-Packard HP Compaq 6720s/30D8, BIOS 68MDU Ver. F.0D 11/04/2008
[ 1771.739106]  0000000000000009 ffff88007f5037b8 ffffffff8188ffbb 0000000000004992
[ 1771.739235]  ffffffff87a82410 ffff88007f503848 ffffffff81889fe7 ffff88007f503818
[ 1771.739362]  ffff880000000008 ffff88007f503858 ffff88007f5037e8 0000000000000000
[ 1771.739487] Call Trace:
[ 1771.739525]  <IRQ>  [<ffffffff8188ffbb>] dump_stack+0x45/0x5d
[ 1771.739625]  [<ffffffff81889fe7>] panic+0xc8/0x20d
[ 1771.739696]  [<ffffffff810be175>] do_exit+0xa15/0xc00
[ 1771.739769]  [<ffffffff810be3f1>] do_group_exit+0x41/0xc0
[ 1771.739846]  [<ffffffff81215783>] report_size_overflow+0x33/0x40
[ 1771.739945]  [<ffffffffa05805d3>] _decode_session6+0x5b3/0x700 [ipv6]
[ 1771.740036]  [<ffffffff81862171>] __xfrm_decode_session+0x31/0x50
[ 1771.740122]  [<ffffffff81866bb5>] __xfrm_policy_check+0x65/0x600
[ 1771.740220]  [<ffffffffa05571d0>] ? ip6_pol_route.isra.42+0x520/0x520 [ipv6]
[ 1771.740330]  [<ffffffffa05680fa>] rawv6_rcv+0x4a/0x330 [ipv6]
[ 1771.740411]  [<ffffffff817afb13>] ? skb_clone+0x63/0xb0
[ 1771.740498]  [<ffffffffa05684fe>] raw6_local_deliver+0x11e/0x2c0 [ipv6]
[ 1771.740598]  [<ffffffffa054a03a>] ip6_input_finish+0x11a/0x570 [ipv6]
[ 1771.740695]  [<ffffffffa054aa0f>] ip6_input+0x2f/0x70 [ipv6]


The kernel crashes from hardened-sources-4.*
I tried 4.0.8-hardened, 4.1.4-hardened. All of them crashes.
Comment 22 Alexander Miroshnichenko 2015-09-20 07:39:38 UTC
4.1.6-hardened crashes too.
Comment 23 Mathias Krause 2015-09-20 07:54:18 UTC
(In reply to Alexander Miroshnichenko from comment #22)
> 4.1.6-hardened crashes too.

It should be fixed as of grsecurity-3.1-4.1.6-201509112213.patch. Which version is in 4.1.6-hardened? If it's this version or newer, can you please provide a backtrace for that kernel version?
Comment 24 Alexander Miroshnichenko 2015-09-20 08:09:35 UTC
(In reply to Mathias Krause from comment #23)
> (In reply to Alexander Miroshnichenko from comment #22)
> > 4.1.6-hardened crashes too.
> 
> It should be fixed as of grsecurity-3.1-4.1.6-201509112213.patch. Which
> version is in 4.1.6-hardened? If it's this version or newer, can you please
> provide a backtrace for that kernel version?

I tried stable versions:
# qlist -ICv hardened-sources
sys-kernel/hardened-sources-4.1.6


I found 4.1.6/4420_grsecurity-3.1-4.1.6-201509112213.patch in the hardened-sources-4.1.6-r2 version which '~amd64'.

I will try hardened-sources-4.1.6-r2 version.
Comment 25 Alexander Miroshnichenko 2015-10-04 07:29:46 UTC
(In reply to Alexander Miroshnichenko from comment #24)
> I will try hardened-sources-4.1.6-r2 version.

With this version bug realy fixed. There are no crashes for two weeks.
Comment 26 Anthony Basile gentoo-dev 2015-10-04 14:23:57 UTC
(In reply to Alexander Miroshnichenko from comment #25)
> (In reply to Alexander Miroshnichenko from comment #24)
> > I will try hardened-sources-4.1.6-r2 version.
> 
> With this version bug realy fixed. There are no crashes for two weeks.

i'm going to stabilize hardened-sources-4.1.7-r1.  please open this bug if its still an issue on that kernel.  seeing as this is fixed with 4.1.6-r2 it should be okay in 4.1.7-r1.