Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 470738 - =sys-kernel/vanilla-sources-{3.7.5-r1, 3.8.3, 3.8.6} - Oops when ping host through ipsec
Summary: =sys-kernel/vanilla-sources-{3.7.5-r1, 3.8.3, 3.8.6} - Oops when ping host th...
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-20 11:18 UTC by Veovis
Modified: 2013-10-14 17:44 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
my current kernel config with debugging enabled (config,60.80 KB, text/plain)
2013-05-20 11:22 UTC, Veovis
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Veovis 2013-05-20 11:18:37 UTC
I start ipsec daemon (strongswan).
Nearly almost time I ping a remote host just after ipsec is up, I get an Oops.

I got this on 3 different vm, whith thoses kernel at least: 3.7.5-r1, 3.8.3, 3.8.6.
They are full vm, under vmware esxi.

Reproducible: Sometimes

Steps to Reproduce:
1. Have an hardened-gentoo kernel.
2. Configure ipsec with strongswan with network-network tunnel
3. Test ipsec connectivity with a simple ping
Actual Results:  
The ping process can't be killed. It has D state in htop.
I get an oops.

Expected Results:  
Not a oops.

The Oops:
May 20 12:51:06 sargeras kernel: PAX: please report this to pageexec@freemail.hu
May 20 12:51:06 sargeras kernel: BUG: unable to handle kernel NULL pointer dereference at 00000000000002a0
May 20 12:51:06 sargeras kernel: IP: [<ffffffff813bd7b6>] xfrm_output_one+0xa7/0x230
May 20 12:51:06 sargeras kernel: PGD 7ca5f000
May 20 12:51:06 sargeras kernel: Thread overran stack, or stack corrupted
May 20 12:51:06 sargeras kernel: Oops: 0000 [#1] SMP
May 20 12:51:06 sargeras kernel: Modules linked in: xfrm_user vsock(O) vmsync(O) coretemp processor thermal_sys microcode vmci(O)
May 20 12:51:06 sargeras kernel: CPU 0
May 20 12:51:06 sargeras kernel: Pid: 2274, comm: ping Tainted: G           O 3.8.6-hardened #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
May 20 12:51:06 sargeras kernel: RIP: 0010:[<ffffffff813bd7b6>]  [<ffffffff813bd7b6>] xfrm_output_one+0xa7/0x230
May 20 12:51:06 sargeras kernel: RSP: 0018:ffff88007b4b98e8  EFLAGS: 00010286
May 20 12:51:06 sargeras kernel: RAX: 000000000000021c RBX: ffff88007b400d80 RCX: 0000000000000000
May 20 12:51:06 sargeras kernel: RDX: 00000000fffffde4 RSI: 0000000000000000 RDI: ffff88007b400d80
May 20 12:51:06 sargeras kernel: RBP: ffff88007b4b9918 R08: 00000000d97586c6 R09: 0000000000000600
May 20 12:51:06 sargeras kernel: R10: ffff88007b4b9718 R11: ffff88007ada90f0 R12: 0000000000000000
May 20 12:51:06 sargeras kernel: R13: 8000000000000000 R14: 000000000203a8c0 R15: 0000000000000000
May 20 12:51:06 sargeras kernel: FS:  0000032bc14a9700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
May 20 12:51:06 sargeras kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 20 12:51:06 sargeras kernel: CR2: 00000000000002a0 CR3: 0000000001434000 CR4: 00000000000007f0
May 20 12:51:06 sargeras kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 20 12:51:06 sargeras kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 20 12:51:06 sargeras kernel: Process ping (pid: 2274, threadinfo ffff88007cb1dc40, task ffff88007cb1d850)
May 20 12:51:06 sargeras kernel: Stack:
May 20 12:51:06 sargeras kernel: ffff88007b4b9948 ffffffff00000001 0000000000000001 ffff88007b400d80
May 20 12:51:06 sargeras kernel: 8000000000000000 000000000203a8c0 ffff88007b4b9958 ffffffff813bda44
May 20 12:51:06 sargeras kernel: ffffffff813ed84c ffff88007b400d80 0000000000000004 ffff88007b400d80
May 20 12:51:06 sargeras kernel: Call Trace:
May 20 12:51:06 sargeras kernel: [<ffffffff813bda44>] xfrm_output_resume+0x105/0x131
May 20 12:51:06 sargeras kernel: [<ffffffff813ed84c>] ? xfrm6_extract_output+0x3d/0x3d
May 20 12:51:06 sargeras kernel: [<ffffffff813ed84c>] ? xfrm6_extract_output+0x3d/0x3d
May 20 12:51:06 sargeras kernel: [<ffffffff813bda8a>] xfrm_output2+0x1a/0x22
May 20 12:51:06 sargeras kernel: [<ffffffff8136e65d>] ? ip_setup_cork+0xfb/0xfb
May 20 12:51:06 sargeras kernel: [<ffffffff813bdb4b>] xfrm_output+0xb9/0xca
May 20 12:51:06 sargeras kernel: [<ffffffff813ed84c>] ? xfrm6_extract_output+0x3d/0x3d
May 20 12:51:06 sargeras kernel: [<ffffffff813ed86c>] xfrm6_output_finish+0x20/0x28
May 20 12:51:06 sargeras kernel: [<ffffffff813b4ea1>] xfrm4_output+0x78/0x87
May 20 12:51:06 sargeras kernel: [<ffffffff8136fcd7>] ip_local_out+0x31/0x3b
May 20 12:51:06 sargeras kernel: [<ffffffff81370cff>] ip_send_skb+0x15/0x41
May 20 12:51:06 sargeras kernel: [<ffffffff81370d68>] ip_push_pending_frames+0x3d/0x4a
May 20 12:51:06 sargeras kernel: [<ffffffff81390ab1>] raw_sendmsg+0x365/0x401
May 20 12:51:06 sargeras kernel: [<ffffffff8139b811>] inet_sendmsg+0x97/0xa6
May 20 12:51:06 sargeras kernel: [<ffffffff8130aaa7>] sock_sendmsg+0x9e/0xc5
May 20 12:51:06 sargeras kernel: [<ffffffff81319973>] ? verify_iovec+0x168/0x1e9
May 20 12:51:06 sargeras kernel: [<ffffffff8130afc4>] __sys_sendmsg+0x3d5/0x4cf
May 20 12:51:06 sargeras kernel: [<ffffffff8130ab61>] ? sockfd_lookup_light+0x2a/0x73
May 20 12:51:06 sargeras kernel: [<ffffffff8130e912>] sys_sendmsg+0x43/0x6a
May 20 12:51:06 sargeras kernel: [<ffffffff81427cfa>] system_call_fastpath+0x18/0x1d
May 20 12:51:06 sargeras kernel: Code: 85 f6 7f 08 31 f6 85 d2 7f 0c eb 1f 85 d2 b8 00 00 00 00 0f 48 d0 b9 20 00 00 00 48 89 df e8 6a 94 f5 ff 85 c0 0f 85 79 01 00 00 <49> 8b 84 24 a0 02 00 00 49 be 00 00 00 00 00 00 00 80 48 89 de
May 20 12:51:06 sargeras kernel: RIP  [<ffffffff813bd7b6>] xfrm_output_one+0xa7/0x230
May 20 12:51:06 sargeras kernel: RSP <ffff88007b4b98e8>
May 20 12:51:06 sargeras kernel: CR2: 00000000000002a0
May 20 12:51:06 sargeras kernel: ---[ end trace 15eb41c127dbce11 ]---

I recompiled kernel to gives me more info, so I can gets the line where the bug occurred:
sargeras ~ # addr2line -e /usr/src/linux/vmlinux ffffffff813bd7b6
/usr/src/linux/net/xfrm/xfrm_output.c:57.
This refers to this line: err = x->outer_mode->output(x, skb);

My emerge-info:
sargeras ~ # emerge --info
Portage 2.1.11.62 (hardened/linux/amd64, gcc-4.6.3, glibc-2.15-r3, 3.8.6-hardened x86_64)
=================================================================
System uname: Linux-3.8.6-hardened-x86_64-Intel-R-_Core-TM-_i7_CPU_920_@_2.67GHz-with-gentoo-2.2
KiB Mem:     2058088 total,   1550160 free
KiB Swap:          0 total,         0 free
Timestamp of tree: Wed, 15 May 2013 15:45:01 +0000
ld GNU ld (GNU Binutils) 2.22
app-shells/bash:          4.2_p45
dev-lang/python:          2.7.3-r3, 3.2.3-r2
dev-util/cmake:           2.8.9
dev-util/pkgconfig:       0.28
sys-apps/baselayout:      2.2
sys-apps/openrc:          0.11.8
sys-apps/sandbox:         2.5
sys-devel/autoconf:       2.69
sys-devel/automake:       1.11.6, 1.12.6
sys-devel/binutils:       2.22-r1
sys-devel/gcc:            4.6.3
sys-devel/gcc-config:     1.7.3
sys-devel/libtool:        2.4-r1
sys-devel/make:           3.82-r4
sys-kernel/linux-headers: 3.7 (virtual/os-headers)
sys-libs/glibc:           2.15-r3
Repositories: gentoo kveer
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=native -mtune=native -mmmx -msse -msse2 -msse3 -msse4.1 -msse4.2 -maes -mpclmul -mpopcnt -mcx16 -mfpmath=sse -fomit-frame-pointer -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/apache2-php5.4/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=native -mtune=native -mmmx -msse -msse2 -msse3 -msse4.1 -msse4.2 -maes -mpclmul -mpopcnt -mcx16 -mfpmath=sse -fomit-frame-pointer -pipe"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="ftp://mirror.ovh.net/gentoo-distfiles/ ftp://ftp.free.fr/mirrors/ftp.gentoo.org/"
LANG="fr_FR.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/var/lib/layman/Kveer"
SYNC="rsync://rsync.fr.gentoo.org/gentoo-portage"
USE="acl amd64 bash-completion berkdb bzip2 caps cli cracklib crypt cxx dri fam gdbm gpm hardened iconv ipv6 justify mmx modules mudflap multilib ncurses nls nptl openmp pam pax_kernel pcre readline session sse sse2 ssl tcpd threads unicode urandom vim-syntax xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation setenvif speling status unique_id userdir usertrack vhost_alias" APACHE2_MPMS="prefork" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="fr" NGINX_MODULES_HTTP="access autoindex browser charset dav empty_gif fancyindex fastcgi flv geo geoip gzip headers_more mp4 passenger proxy referer rewrite scgi secure_link spdy stub_status sub upload_progress" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-3 php5-4" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="geoip gradm ipp2p ipset pknock"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON

Tested with strongswan 5.0.0 and 5.0.4.
IPSec uses ECC certificates for mutual auth.

I may do some tests if asked, although I'm absolutely not a kernel dev.
Comment 1 Veovis 2013-05-20 11:22:56 UTC
Created attachment 348726 [details]
my current kernel config with debugging enabled
Comment 2 Anthony Basile gentoo-dev 2013-05-20 11:30:07 UTC
pipacs, hardened-sources-3.8.6 = grsecurity-2.9.1-3.8.6-201304052305.  I have not pushed any of the latest because of the low mem issue on x86.
Comment 3 PaX Team 2013-05-20 18:02:36 UTC
this is a null ptr deref in the kernel. does vanilla work better?
Comment 4 Veovis 2013-05-20 18:57:55 UTC
No, it doesn't. (vanilla 3.8.6, same .config)
Comment 5 Veovis 2013-05-26 11:03:01 UTC
I manage to finally make the bug reproductible.

This is the situation:
- 2 servers A and B with each 2 interfaces, one public with ipv6 and ipv4, the other private with ipv4 only.
- an ipsec ipv6 tunnel to link the two ipv4 private network
- this iptable rule on A: iptables -t nat -A POSTROUTING -o enp2s0 -s 192.168.14.32/27 -j MASQUERADE --random

enp2s0 is the public interface on A.
This iptable rule is to provide internet to the computers in the private network.

A ping from A to B trigger the oops: ping 192.168.3.2.

If the tunnel was in ipv4, the ping just print nothing, as if the packet is dropped internally, I think this is what I supposed to get instead of the oops.

This oops seems in fact a misconfiguration on iptables: I have to not masquerading packets that are subject to be processed by ipsec.

If I change the rule to: iptables -A POSTROUTING -s 192.168.14.32/27 -o enp2s0 -m policy --dir out --pol none -j MASQUERADE --random, all is working properly.
Comment 6 PaX Team 2013-05-26 22:07:23 UTC
you should also check 3.9 and perhaps some 3.10-rc and if the problem is still there, report this to the kernel devs.
Comment 7 Anthony Basile gentoo-dev 2013-06-24 21:05:42 UTC
(In reply to Veovis from comment #4)
> No, it doesn't. (vanilla 3.8.6, same .config)

I'm pushing this to the people handling the vanilla kernel.
Comment 8 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-06-24 22:40:48 UTC
(In reply to PaX Team from comment #6)
> you should also check 3.9 and perhaps some 3.10-rc and if the problem is
> still there, report this to the kernel devs.

Please do this first (3.9.7 / 3.10-rc7) so we know if upstream has already fixed this; if not fixed, I will look into the code in two days from now.
Comment 9 Veovis 2013-06-26 09:30:19 UTC
Thanks for correcting my report.
I am in vacation right now, so I cannot test the newest kernel until july 8.

I reported the bug on the netdev ML, but it had not received any comment so far.
I have seen a bug very close to mine a couple of weeks ago, that received some patchs.
Comment 10 Veovis 2013-07-16 14:42:41 UTC
I'm testing sys-kernel/hardened-sources-3.9.9 on my buggy systems.
It seems this precise bug is solved, since from this version at least.

I have still kernel panics caused by ipsec kernel code on recent kernels (3.8 and 3.9 branches until 3.9.5 at least), but this is another bug. I'll refer to you if I get the kernel trace.
Comment 11 Agostino Sarubbo gentoo-dev 2013-09-15 11:56:06 UTC
not anymore in the tree.

@Veovis: work for you the latest version of 3.10/3.11 ?
Comment 12 Tom Wijsman (TomWij) (RETIRED) gentoo-dev 2013-10-14 17:30:02 UTC
(In reply to Agostino Sarubbo from comment #11)
> @Veovis: work for you the latest version of 3.10/3.11 ?

@Veovis: Please test the latest versions.
Comment 13 Veovis 2013-10-14 17:44:49 UTC
Hi,
Right now, I have 3 systems on 3.10.1-hardened-r1, but with a small uptime (48 hours only).
No kernel panic to report yet.

Probably unrelated (or specific to my hoster maybe), but if I use IPv6-IPSec for tunneling, I have a lot of connection problems (ping OK, but connection between exchange server <=> active directory is not, lots of 0-length packets)
With a IPv4-IPSEC, all is fine.