Summary: | =sys-kernel/hardened-sources-3.19.3: crash: PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.113_120 min, count: 10 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | satmd <satmd> |
Component: | Hardened | Assignee: | Anthony Basile <blueness> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alex, hardened, jackmort37, kernel, marcin1j, minipli, pageexec, re.emese, spender |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | 0001-xfrm6-Fix-ICMPv6-and-MH-header-checks-in-_decode_ses.patch |
Description
satmd
2015-04-01 00:50:39 UTC
I'm on freenode if you want to catch me. After inserting debugging with pipacs, there's a new oops documented at https://lain.at/dump/crash/20150331/oops~2.txt, using the kernel stored at https://lain.at/dump/crash/20150331/ (will later name them ~2, too) There's also more debug information from xfrm_policy.c at https://lain.at/dump/crash/20150331/xfrm6_policy.c~2/ Instead of spamming the bug report, I'll silently continue to numer the revisions of kernel compiles. Okay bouncing this one by upstream. > PAX: nh:ffff8803d7cc5e28 off:28 data:ffff8803d7cc5e58 len:18 data_len:0 the above is the state of affairs when the size overflow (well, actually underflow here) detection triggers. the expression computed is nh+off+2-data which underflows with the above values (...e28+28+2 = ...e52 < ...e58). this code was last fixed for bug #529352 which made 'offset' a constant (0x28 above), so either that fix is still not correct or something else goes wrong with nh or data. at this point an upstream report to netdev is in order to let them figure it out again ;). PS: add Emese too for size overflow related bugs please ;) Before posting a new bug report, I post here another size overflow problem. Can it be related or do I create a new bug ? More info her : as satmd, I'm getting crashes since 3.19 series and I was able to get the crash today on 3.19.3. Kernel boots fine, and after few minutes, throws a size overflow error and cannot access my raid array anymore. A hard reboot has then to be done. [avril 4 11:38] PAX: size overflow detected in function async_copy_data.isra.38 drivers/md/raid5.c:946 cicus.1056_137 min, count: 60 [ +0,000012] CPU: 0 PID: 2210 Comm: md127_raid5 Tainted: G O 3.19.3-hardened #1 [ +0,000004] Hardware name: MSI MS-7592/G41M-P33 Combo(MS-7592), BIOS V32.12 09/13/2013 [ +0,000003] 2e62af56a4220b55 ffffffffa011f51e 0000000000000000 ffffffffa011f51e [ +0,000007] ffffffff81609dfc ffffffffa011f61e ffffffff8114a055 0000000000080000 [ +0,000006] 00000000dedfac08 ffff8800c6b17180 ffff8800c6b175f0 0000000000000002 [ +0,000006] Call Trace: [ +0,000030] [<ffffffffa011f51e>] ? raid5_exit+0x51e/0x2bd8 [raid456] [ +0,000011] [<ffffffffa011f51e>] ? raid5_exit+0x51e/0x2bd8 [raid456] [ +0,000008] [<ffffffff81609dfc>] ? dump_stack+0x40/0x56 [ +0,000010] [<ffffffffa011f61e>] ? raid5_exit+0x61e/0x2bd8 [raid456] [ +0,000007] [<ffffffff8114a055>] ? report_size_overflow+0x35/0x40 [ +0,000011] [<ffffffffa0117ca5>] ? async_copy_data.isra.38+0x405/0x470 [raid456] [ +0,000011] [<ffffffffa00f8141>] ? async_xor+0x141/0x180 [async_xor] [ +0,000010] [<ffffffffa01183e3>] ? raid_run_ops+0x6d3/0xfa0 [raid456] [ +0,000010] [<ffffffffa0115670>] ? release_stripe+0x100/0x100 [raid456] [ +0,000010] [<ffffffffa011bdd8>] ? handle_stripe+0xbf8/0x2170 [raid456] [ +0,000007] [<ffffffff8109be15>] ? sched_clock_local+0x15/0x80 [ +0,000006] [<ffffffff8109c068>] ? sched_clock_cpu+0x88/0xb0 [ +0,000006] [<ffffffff810a30ab>] ? pick_next_task_fair+0x33b/0x480 [ +0,000010] [<ffffffffa011d4ae>] ? handle_active_stripes.isra.39+0x15e/0x3d0 [raid456] [ +0,000010] [<ffffffffa011dace>] ? raid5d+0x30e/0x4d0 [raid456] [ +0,000015] [<ffffffffa00a5c29>] ? md_thread+0x139/0x140 [md_mod] [ +0,000006] [<ffffffff810a7de0>] ? wait_woken+0xa0/0xa0 [ +0,000012] [<ffffffffa00a5af0>] ? md_start_sync+0xf0/0xf0 [md_mod] [ +0,000007] [<ffffffff810905ff>] ? kthread+0xdf/0x100 [ +0,000005] [<ffffffff81090520>] ? kthread_create_on_node+0x170/0x170 [ +0,000007] [<ffffffff8160f219>] ? ret_from_fork+0x49/0x80 [ +0,000006] [<ffffffff81090520>] ? kthread_create_on_node+0x170/0x170 Hi, your bug seems to be different enough from mine to be a separate bug. I've been talking to pipacs and my bug is related to https://forums.grsecurity.net/viewtopic.php?f=1&t=4083 . The problem seems to lie within the networking code for me and I was asked to forward my problem to netdev. I will continue to work on my bug after the holidays. Your bug doesn't reference any networking functions and probably is related to something different. (In reply to jack_mort from comment #5) > Before posting a new bug report, I post here another size overflow problem. > Can it be related or do I create a new bug ? > > More info her : as satmd, I'm getting crashes since 3.19 series and I was > able to get the crash today on 3.19.3. Kernel boots fine, and after few > minutes, throws a size overflow error and cannot access my raid array > anymore. > A hard reboot has then to be done. > I've searched through the history of related bugs and come up with some links http://marc.info/?l=linux-netdev&m=141768340108789&w=2 The suggested patch is already included with the kernel, but obviously isn't sufficient. Bug 529352: That bug tracks above mentioned mailing list thread and does not contain the fix for this bug. Shall I reply to the original mailing list thread with my (new) issue? Or shall I send a new mail to the list (without In-Reply-To)? (In reply to jack_mort from comment #5) > Before posting a new bug report, I post here another size overflow problem. > Can it be related or do I create a new bug ? this is a different bug so please file it as such. while you're at it, please enable frame pointers to have a better backtrace. we'll also need the resulting files (drivers/md/raid5.c.*) of the following command: make drivers/md/raid5.o EXTRA_CFLAGS="-fdump-tree-all -fdump-ipa-all" to help us gather runtime data, you should also apply the following patch and post the results along with the backtrace next time it triggers: --- a/drivers/md/raid5.c 2015-03-18 15:21:50.408349253 +0100 +++ b/drivers/md/raid5.c 2015-04-04 14:26:03.230450669 +0200 @@ -954,6 +954,7 @@ struct async_submit_ctl submit; enum async_tx_flags flags = 0; +printk("PAX: bi_iter.bi_sector:%lx sector:%lx\n", bio->bi_iter.bi_sector, sector); if (bio->bi_iter.bi_sector >= sector) page_offset = (signed)(bio->bi_iter.bi_sector - sector) * 512; else (In reply to satmd from comment #7) > Shall I reply to the original mailing list thread with my (new) issue? Or > shall I send a new mail to the list (without In-Reply-To)? good question, i'd say these two bugs are related so you might as well continue that thread (that way it'll also be easier to find them in the future). in any case make sure you CC the same people that were on the original report (and Emese/me too). satmd: i'm wondering, did you manage to follow up on this with Steffen Klassert on netdev? i.e., is this bug fixed now or are you guys still investigating? Subscribe myself for updates. This bugs occurs in recent hardened kernel (4.1.6). Very annoying :-/ Can you please try reverting commit cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e. <https://git.kernel.org/linus/cd3bafc7>? (In reply to Mathias Krause from comment #12) > Can you please try reverting commit > cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e. > <https://git.kernel.org/linus/cd3bafc7>? Unfortunately, it didn't help and I'm not surprised. The overflow occurs in IPPROTO_ICMPV6 branch whereas the commit alters offset calculation in IPPROTO_MH case. Just for the record, oops message with above-mentioned commit reverted: PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:188 cicus.107_211 min, count: 14 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.6-hardened #4 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q1900-ITX, BIOS P1.40 10/31/2014 ffffffffa06081b3 0000000000000000 ffffffffa06081c4 ffff88013fc03948 ffffffff814df088 0000000000000001 ffffffffa06081b3 ffff88013fc03978 ffffffff8111d1a6 ffff8800a74d2ece ffff88013fc03a60 ffff8800a761d800 Call Trace: <IRQ> [<ffffffffa06081b3>] ? ipv6_proc_exit_net+0x8133/0x12e29 [ipv6] [<ffffffffa06081c4>] ? ipv6_proc_exit_net+0x8144/0x12e29 [ipv6] [<ffffffff814df088>] dump_stack+0x45/0x5b [<ffffffffa06081b3>] ? ipv6_proc_exit_net+0x8133/0x12e29 [ipv6] [<ffffffff8111d1a6>] report_size_overflow+0x36/0x40 [<ffffffffa05fc1ce>] _decode_session6+0x59e/0x6f0 [ipv6] [<ffffffff814b7a70>] __xfrm_decode_session+0x40/0x60 [<ffffffff814bc576>] __xfrm_policy_check+0x56/0x5f0 [<ffffffffa06025a0>] ? ipv6_proc_exit_net+0x2520/0x12e29 [ipv6] [<ffffffffa05e7181>] icmpv6_rcv+0x1d1/0xa20 [ipv6] [<ffffffffa05d44f0>] ? ip6_pol_route.isra.47+0x530/0x530 [ipv6] [<ffffffffa05fe6a8>] ? fib6_rule_action+0xc8/0x210 [ipv6] [<ffffffffa000e6d1>] ? xhci_queue_bulk_tx+0x2a1/0x700 [xhci_hcd] [<ffffffff814e458c>] ? _raw_read_unlock_bh+0x2c/0x40 [<ffffffffa05ec208>] ? ipv6_chk_mcast_addr+0x128/0x150 [ipv6] [<ffffffffa06025a0>] ? ipv6_proc_exit_net+0x2520/0x12e29 [ipv6] [<ffffffffa05c7006>] ip6_input_finish+0x1e6/0x550 [ipv6] [<ffffffffa05c78c6>] ip6_input+0x26/0x80 [ipv6] [<ffffffffa05ec16e>] ? ipv6_chk_mcast_addr+0x8e/0x150 [ipv6] [<ffffffffa05c79c7>] ip6_mc_input+0xa7/0x210 [ipv6] [<ffffffffa05c6dac>] ip6_rcv_finish+0x2c/0xa0 [ipv6] [<ffffffffa05c7624>] ipv6_rcv+0x2b4/0x530 [ipv6] [<ffffffff813b0bf2>] ? usb_submit_urb+0x302/0x560 [<ffffffff81421678>] __netif_receive_skb_core+0x608/0xa10 [<ffffffff8142415f>] __netif_receive_skb+0x1f/0x80 [<ffffffff814241de>] netif_receive_skb_internal+0x1e/0x90 [<ffffffff81424a48>] napi_gro_receive+0x78/0xa0 [<ffffffffa04945fc>] rtl8169_poll+0x2ec/0x680 [r8169] [<ffffffff81425435>] net_rx_action+0x125/0x2f0 [<ffffffff8104fa9f>] __do_softirq+0xdf/0x240 [<ffffffff8104fe8e>] irq_exit+0xee/0x110 [<ffffffff81004ad6>] do_IRQ+0x56/0xf0 [<ffffffff814e592b>] common_interrupt+0xab/0xab <EOI> [<ffffffff813e4201>] ? cpuidle_enter_state+0x81/0x140 [<ffffffff813e4314>] cpuidle_enter+0x24/0x40 [<ffffffff810824bb>] cpu_startup_entry+0x24b/0x2c0 [<ffffffff814d7b72>] rest_init+0x72/0x80 [<ffffffff81a140d8>] 0xffffffff81a140d8 [<ffffffff81a139a9>] ? 0xffffffff81a139a9 [<ffffffff81a13120>] ? 0xffffffff81a13120 [<ffffffff81a13120>] ? 0xffffffff81a13120 [<ffffffff81a134f4>] 0xffffffff81a134f4 [<ffffffff81a135f2>] 0xffffffff81a135f2 Created attachment 411484 [details, diff] 0001-xfrm6-Fix-ICMPv6-and-MH-header-checks-in-_decode_ses.patch (In reply to Marcin Jurkowski from comment #13) > (In reply to Mathias Krause from comment #12) > > Can you please try reverting commit > > cd3bafc73d11eb51cb2d3691629718431e1768ce, i.e. > > <https://git.kernel.org/linus/cd3bafc7>? > Unfortunately, it didn't help and I'm not surprised. The overflow occurs in > IPPROTO_ICMPV6 branch whereas the commit alters offset calculation in > IPPROTO_MH case. *D'oh!* You're correct! ;) It looks like there are ICMPv6 packets received by your system that lack the actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data' underflows. That negative value will be passed to psk_may_pull() which formally takes an 'unsigned int', implicitly converting the negative value to an unsigned one -- making the size_overflow catch that bug and generate the report. Can you please test the following patch instead? It should prevent the underflow from happening by testing it beforehand. (In reply to Mathias Krause from comment #14) > It looks like there are ICMPv6 packets received by your system that lack the > actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data' > underflows. That negative value will be passed to psk_may_pull() which > formally takes an 'unsigned int', implicitly converting the negative value > to an unsigned one -- making the size_overflow catch that bug and generate > the report. > > Can you please test the following patch instead? It should prevent the > underflow from happening by testing it beforehand. It did the trick. PAX no longer reports overflow. By the way, is this part of IPv6 code really maintained? Similar issue was fixed in https://git.kernel.org/linus/59cae00 and this one should have been addressed back then. There were more bugs like this in IPv6 XFRM code in the past, unnoticed until PAX detected overflow, illegal assignment etc. Every single case I encountered could be spotted by carefully reading code, yet no one did it! @pageexec and spender, i'm confused about the new workflow upstream with stable no longer being available. will this fix be out in the next testing patchset? (In reply to PaX Team from comment #8) > (In reply to jack_mort from comment #5) > > Before posting a new bug report, I post here another size overflow problem. > > Can it be related or do I create a new bug ? > @jack_mort, did you open another bug report because i dont' see it (In reply to Anthony Basile from comment #16) > @pageexec and spender, i'm confused about the new workflow upstream with > stable no longer being available. will this fix be out in the next testing > patchset? of course we'll fix it in 4.1.x as well, nothing changed for that series. (In reply to Anthony Basile from comment #17) > (In reply to PaX Team from comment #8) > > (In reply to jack_mort from comment #5) > > > Before posting a new bug report, I post here another size overflow problem. > > > Can it be related or do I create a new bug ? > > > > @jack_mort, did you open another bug report because i dont' see it Sorry it was a long time ago xD And yes I opened a dedicated report at that time. (In reply to Marcin Jurkowski from comment #15) > (In reply to Mathias Krause from comment #14) > > It looks like there are ICMPv6 packets received by your system that lack the > > actual ICMP data. Therefore the calculation 'nh + offset + 2 - skb->data' > > underflows. That negative value will be passed to psk_may_pull() which > > formally takes an 'unsigned int', implicitly converting the negative value > > to an unsigned one -- making the size_overflow catch that bug and generate > > the report. > > > > Can you please test the following patch instead? It should prevent the > > underflow from happening by testing it beforehand. > It did the trick. PAX no longer reports overflow. Thanks for testing! > By the way, is this part of IPv6 code really maintained? Similar issue was > fixed in https://git.kernel.org/linus/59cae00 and this one should have been > addressed back then. It better is. There's an entry for this code in the MAINTAINERS file, at least ;) > There were more bugs like this in IPv6 XFRM code in the past, unnoticed > until PAX detected overflow, illegal assignment etc. Every single case I > encountered could be spotted by carefully reading code, yet no one did it! Unfortunately, those kind of bugs "silently fail" on vanilla as the underflows goes unnoticed -- beside dropped packets, maybe. But you're welcome to review the code and send patches to netdev... ;) I have the issue too. Steps to reproduce: 1. Hardened kernel [workstation profile] 2. ipsec vpn connection (I use Shrew VPN) 3. Kernel panic [ 1771.738191] PAX: size overflow detected in function _decode_session6 net/ipv6/xfrm6_policy.c:190 cicus.110_217 min, count: 14 [ 1771.738797] Kernel panic - not syncing: Aiee, killing interrupt handler! [ 1771.738906] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.0.8-hardened #1 [ 1771.738995] Hardware name: Hewlett-Packard HP Compaq 6720s/30D8, BIOS 68MDU Ver. F.0D 11/04/2008 [ 1771.739106] 0000000000000009 ffff88007f5037b8 ffffffff8188ffbb 0000000000004992 [ 1771.739235] ffffffff87a82410 ffff88007f503848 ffffffff81889fe7 ffff88007f503818 [ 1771.739362] ffff880000000008 ffff88007f503858 ffff88007f5037e8 0000000000000000 [ 1771.739487] Call Trace: [ 1771.739525] <IRQ> [<ffffffff8188ffbb>] dump_stack+0x45/0x5d [ 1771.739625] [<ffffffff81889fe7>] panic+0xc8/0x20d [ 1771.739696] [<ffffffff810be175>] do_exit+0xa15/0xc00 [ 1771.739769] [<ffffffff810be3f1>] do_group_exit+0x41/0xc0 [ 1771.739846] [<ffffffff81215783>] report_size_overflow+0x33/0x40 [ 1771.739945] [<ffffffffa05805d3>] _decode_session6+0x5b3/0x700 [ipv6] [ 1771.740036] [<ffffffff81862171>] __xfrm_decode_session+0x31/0x50 [ 1771.740122] [<ffffffff81866bb5>] __xfrm_policy_check+0x65/0x600 [ 1771.740220] [<ffffffffa05571d0>] ? ip6_pol_route.isra.42+0x520/0x520 [ipv6] [ 1771.740330] [<ffffffffa05680fa>] rawv6_rcv+0x4a/0x330 [ipv6] [ 1771.740411] [<ffffffff817afb13>] ? skb_clone+0x63/0xb0 [ 1771.740498] [<ffffffffa05684fe>] raw6_local_deliver+0x11e/0x2c0 [ipv6] [ 1771.740598] [<ffffffffa054a03a>] ip6_input_finish+0x11a/0x570 [ipv6] [ 1771.740695] [<ffffffffa054aa0f>] ip6_input+0x2f/0x70 [ipv6] The kernel crashes from hardened-sources-4.* I tried 4.0.8-hardened, 4.1.4-hardened. All of them crashes. 4.1.6-hardened crashes too. (In reply to Alexander Miroshnichenko from comment #22) > 4.1.6-hardened crashes too. It should be fixed as of grsecurity-3.1-4.1.6-201509112213.patch. Which version is in 4.1.6-hardened? If it's this version or newer, can you please provide a backtrace for that kernel version? (In reply to Mathias Krause from comment #23) > (In reply to Alexander Miroshnichenko from comment #22) > > 4.1.6-hardened crashes too. > > It should be fixed as of grsecurity-3.1-4.1.6-201509112213.patch. Which > version is in 4.1.6-hardened? If it's this version or newer, can you please > provide a backtrace for that kernel version? I tried stable versions: # qlist -ICv hardened-sources sys-kernel/hardened-sources-4.1.6 I found 4.1.6/4420_grsecurity-3.1-4.1.6-201509112213.patch in the hardened-sources-4.1.6-r2 version which '~amd64'. I will try hardened-sources-4.1.6-r2 version. (In reply to Alexander Miroshnichenko from comment #24) > I will try hardened-sources-4.1.6-r2 version. With this version bug realy fixed. There are no crashes for two weeks. (In reply to Alexander Miroshnichenko from comment #25) > (In reply to Alexander Miroshnichenko from comment #24) > > I will try hardened-sources-4.1.6-r2 version. > > With this version bug realy fixed. There are no crashes for two weeks. i'm going to stabilize hardened-sources-4.1.7-r1. please open this bug if its still an issue on that kernel. seeing as this is fixed with 4.1.6-r2 it should be okay in 4.1.7-r1. |