Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 428576 - sys-kernel/hardened-sources-3.4*: early x86 kernel crash when SMP && RELOCATABLE
Summary: sys-kernel/hardened-sources-3.4*: early x86 kernel crash when SMP && RELOCATABLE
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Hardened (show other bugs)
Hardware: x86 Linux
: Normal major (vote)
Assignee: The Gentoo Linux Hardened Kernel Team (OBSOLETE)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-30 00:47 UTC by Maxim Kammerer
Modified: 2012-08-02 19:08 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
minimal hardened-sources-3.4.2 configuration (linux-3.4.2-hardened-clean.config,18.28 KB, text/plain)
2012-07-30 00:50 UTC, Maxim Kammerer
Details
minimal gentoo-sources-3.4.2 configuration (linux-3.4.2-gentoo-clean.config,18.04 KB, text/plain)
2012-07-30 00:53 UTC, Maxim Kammerer
Details
output from gentoo-sources-3.4.2 (serial-ok.log,3.49 KB, text/plain)
2012-07-30 00:57 UTC, Maxim Kammerer
Details
Failing bzImage for config in comment 1 (bzImage,704.83 KB, application/octet-stream)
2012-07-30 03:12 UTC, Anthony Basile
Details
Failing System.map for config in comment 1 (System.map-genkernel-x86-3.4.6-hardened-r2,227.84 KB, text/plain)
2012-07-30 03:15 UTC, Anthony Basile
Details
Failing vmlinux image for config in comment 1, lzma compressed (vmlinux.lzma,709.15 KB, application/octet-stream)
2012-07-30 03:18 UTC, Anthony Basile
Details
minimal hardened-sources-3.4.6-r1 configuration, with KERNEXEC (linux-3.4.6-hardened-clean-kernexec.config,22.72 KB, text/plain)
2012-07-30 15:23 UTC, Maxim Kammerer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Maxim Kammerer 2012-07-30 00:47:29 UTC
There is apparently a problem with hardened-sources patchset for kernel 3.4. The issue is present in both the stable (3.4.2) and the latest unstable (3.4.6-r1) hardened-sources with all hardened features turned off, and is *not* present in corresponding gentoo-sources.

A symptom of the problem is that the kernel hangs or reboots immediately after:
  Decompressing Linux... Parsing ELF... done.
  Booting the kernel.
(even with "earlyprintk", "loglevel=7", "debug", etc.)

The problem is quite robust and resistant to kernel settings changes, except for those below. I have managed to narrow down the kernel configuration to following key features, where changing each makes the problem disappear):

1. CONFIG_SMP=y (to change: disable)
2. CONFIG_RELOCATABLE=y (to change: disable)
3. CONFIG_PHYSICAL_ALIGN=0x400000 (to change: set to 0x1000000)
4. CONFIG_PHYSICAL_START=0x1000000 (to change: set to 0x400000)

Note that simply adding "nosmp", "nolapic", etc. has no effect on the problem. Also, e.g., changing PHYSICAL_START to same value as PHYSICAL_ALIGN is not a good workaround, because I see much more serious problems with UEFI boot, where this doesn't help (and where I didn't yet narrow down the kernel config), and I suspect that the problems are related (or even caused by the same bug).

The kernel is started as:
  qemu-kvm -nodefaults -sdl -monitor vc -m 512M -vga cirrus -cdrom cdrom.iso -serial file:serial.log
and cdrom.iso boots the kernel using ISOLINUX, with e.g., "earlyprintk=serial,keep".

Minimal kernel configurations and "good" output from a gentoo-sources kernel are attached below.
Comment 1 Maxim Kammerer 2012-07-30 00:50:01 UTC
Created attachment 319658 [details]
minimal hardened-sources-3.4.2 configuration

This is a minimal configuration for hardened-sources-3.4.2 that results in a problem described (immediate reboot or hang).
Comment 2 Maxim Kammerer 2012-07-30 00:53:03 UTC
Created attachment 319660 [details]
minimal gentoo-sources-3.4.2 configuration

This is essentially the same configuration for gentoo-sources-3.4.2. It is easy to get it by running "make oldconfig" in gentoo-sources tree for the previous attachment.
Comment 3 Maxim Kammerer 2012-07-30 00:57:25 UTC
Created attachment 319662 [details]
output from gentoo-sources-3.4.2

This is the "good" output from running the gentoo-sources-3.4.2 kernel. As expected, the kernel reaches the point where it tries to run init, and panics.

Same output can be achieved by changing one of the settings above for a hardened-sources-3.4* kernel.
Comment 4 Maxim Kammerer 2012-07-30 01:01:15 UTC
The toolchain is latest hardened profile one:

sys-devel/gcc-4.5.3-r2 was built with the following:
USE="cxx hardened nls nptl openmp (-altivec) -bootstrap -build -doc (-fixed-point) -fortran -gcj -graphite -gtk (-libssp) -lto -mudflap (-multilib) -multislot -nocxx -nopie -nossp -objc -objc++ -objc-gc -test -vanilla"

sys-devel/binutils-2.21.1-r1 was built with the following:
USE="cxx nls zlib -multislot -multitarget -static-libs -test -vanilla"

CFLAGS="-O2 -march=pentium3 -mtune=core2 -pipe"
CXXFLAGS="-O2 -march=pentium3 -mtune=core2 -pipe"

The problem appears in VMware as well (didn't try other environments).
Comment 5 Maxim Kammerer 2012-07-30 01:13:48 UTC
An easier way to run the kernel (without an ISO):

qemu-system-x86_64 -cpu kvm64 -nodefaults -sdl -monitor vc -m 512M -vga cirrus -kernel .../bzImage -serial file:serial.log -append "earlyprintk=serial,keep debug"

I am using (on amd64):
app-emulation/qemu-kvm-1.0.1 was built with the following:
USE="aio alsa bluetooth curl (multilib) ncurses opengl sdl spice threads vhost-net xattr -brltty -debug -fdt -pulseaudio -qemu-ifup (-rbd) -sasl -smartcard -static -test -tls -usbredir -vde -xen" QEMU_SOFTMMU_TARGETS="x86_64 (-arm) -cris -i386 (-m68k) -microblaze (-mips) -mips64 -mips64el -mipsel (-ppc) (-ppc64) -ppcemb -sh4 -sh4eb (-sparc) -sparc64" QEMU_USER_TARGETS="(-alpha) (-arm) -armeb -cris -i386 (-m68k) -microblaze (-mips) -mipsel (-ppc) (-ppc64) -ppc64abi32 -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -x86_64"
Comment 6 Anthony Basile gentoo-dev 2012-07-30 03:02:45 UTC
(In reply to comment #2)
> Created attachment 319660 [details]
> minimal gentoo-sources-3.4.2 configuration
> 
> This is essentially the same configuration for gentoo-sources-3.4.2. It is
> easy to get it by running "make oldconfig" in gentoo-sources tree for the
> previous attachment.

I'm testing with the very latest patches from usptream not 3.4.2, because upstream will ask that right away.

When I compile with gcc-4.5.4 it compiles fine, but when I compile with gcc-4.6.3, it fails with:


(cat /dev/null; ) > arch/x86/vdso/modules.order
   ld -m elf_i386   -r -o arch/x86/built-in.o arch/x86/kernel/built-in.o arch/x86/mm/built-in.o arch/x86/crypto/built-in.o arch/x86/vdso/built-in.o arch/x86/platform/built-in.o arch/x86/net/built-in.o 
(cat /dev/null;   cat arch/x86/kernel/modules.order;   cat arch/x86/mm/modules.order;   cat arch/x86/crypto/modules.order;   cat arch/x86/vdso/modules.order;   cat arch/x86/platform/modules.order;   cat arch/x86/net/modules.order;) > arch/x86/modules.order
make -f scripts/Makefile.build obj=kernel
  gcc -Wp,-MD,kernel/.time.o.d  -nostdinc -isystem /usr/lib/gcc/i686-pc-linux-gnu/4.6.3/include -I/usr/src/linux-3.4.6-hardened-r2/arch/x86/include -Iarch/x86/include/generated -Iinclude  -include /usr/src/linux-3.4.6-hardened-r2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/constify_plugin.so -DCONSTIFY_PLUGIN -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/colorize_plugin.so    -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(time)"  -D"KBUILD_MODNAME=KBUILD_STR(time)" -c -o kernel/time.o kernel/time.c
  gcc -Wp,-MD,kernel/.capability.o.d  -nostdinc -isystem /usr/lib/gcc/i686-pc-linux-gnu/4.6.3/include -I/usr/src/linux-3.4.6-hardened-r2/arch/x86/include -Iarch/x86/include/generated -Iinclude  -include /usr/src/linux-3.4.6-hardened-r2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/constify_plugin.so -DCONSTIFY_PLUGIN -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/colorize_plugin.so    -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(capability)"  -D"KBUILD_MODNAME=KBUILD_STR(capability)" -c -o kernel/capability.o kernel/capability.c
In file included from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/uaccess.h:636:0,
                 from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/sections.h:5,
                 from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/hw_irq.h:26,
                 from include/linux/irq.h:369,
                 from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/hardirq.h:5,
                 from include/linux/hardirq.h:7,
                 from include/linux/ftrace_event.h:7,
                 from include/trace/syscall.h:6,
                 from include/linux/syscalls.h:78,
                 from kernel/capability.c:15:
In function 'copy_to_user',
    inlined from 'sys_capget' at kernel/capability.c:208:19:
/usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/uaccess_32.h:244:24: error: call to 'copy_to_user_overflow' declared with attribute error: copy_to_user() buffer size is not provably correct
make[1]: *** [kernel/capability.o] Error 1
make: *** [kernel] Error 2
Comment 7 Anthony Basile gentoo-dev 2012-07-30 03:10:56 UTC
Okay I confirmed this using gcc-4.5.4 and binutils-2.21.1-r1.  I'll attach the bzImage, System.map and vmlinux files in my next posts.
Comment 8 Anthony Basile gentoo-dev 2012-07-30 03:12:48 UTC
Created attachment 319664 [details]
Failing bzImage for config in comment 1
Comment 9 Anthony Basile gentoo-dev 2012-07-30 03:15:57 UTC
Created attachment 319666 [details]
Failing System.map for config in comment 1
Comment 10 Anthony Basile gentoo-dev 2012-07-30 03:18:35 UTC
Created attachment 319668 [details]
Failing vmlinux image for config in comment 1, lzma compressed
Comment 11 Maxim Kammerer 2012-07-30 03:25:11 UTC
(In reply to comment #7)
> Okay I confirmed this using gcc-4.5.4 and binutils-2.21.1-r1.

Great! I will gladly test any patches, and see if they fix issues in a more complex UEFI+OVMF setup, where the output is (in case it is related and may hint for the culprit):

Checking if this processor honours the WP bit even in supervisor mode...Ok.
SLUB: Genslabs=15, HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
BUG: unable to handle kernel NULL pointer dereference at 0000001c
IP: [<c144477d>] set_task_rq+0xd/0x4e
*pdpt = 0000000000000000 *pde = 00000b0548c68948 
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c1421125>] no_context+0x15f/0x1c5
*pdpt = 0000000000000000 *pde = 00000b0548c68948 
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c1421125>] no_context+0x15f/0x1c5
...
(many repetitions)
Comment 12 PaX Team 2012-07-30 11:01:36 UTC
hmm, very interesting, i see what's happening (some statically initialized pcpu data don't get relocated) but i don't see how that would happen yet (i've been testing relocatable kernels myself for many years now, with the same align/start relationship to force an actual relocation on boot). one question: does enabling various PaX features (esp. KERNEXEC) change the situation?
Comment 13 Maxim Kammerer 2012-07-30 14:09:24 UTC
(In reply to comment #12)
> one question: does enabling various PaX features (esp. KERNEXEC) change the
> situation?

Yes! This is indeed interesting. Enabling KERNEXEC in an otherwise fully blown (GRKERNSEC, PAX) x86 3.4.6-hardened-r1 kernel with all the features mentioned in comment #1 (the configuration I started with before narrowing it down) makes the problem (hang after "Booting the kernel.") disappear for normal (BIOS) boot.

For UEFI (QEMU OVMF), however, instead of behavior described in comment #11, I now observe something weirder: GRUB2's x86_64-efi image now fails to load the kernel with:
  "couldn't find suitable memory target"
-- a message in grub-core/lib/relocator.c.
Comment 14 Maxim Kammerer 2012-07-30 15:13:43 UTC
(In reply to comment #13)

And exactly the same results with a minimal kernel with KERNEXEC enabled, configuration to be attached.
Comment 15 Maxim Kammerer 2012-07-30 15:23:06 UTC
Created attachment 319728 [details]
minimal hardened-sources-3.4.6-r1 configuration, with KERNEXEC

This x86 hardened-sources-3.4.6-r1 configuration has all the features from comment #1, but PAX_KERNEXEC is enabled. Note that PAX_PER_CPU_PGD is not enabled, since PAE is disabled in this x86 kernel.

BIOS boot proceeds fine, but GRUB-EFI is unable to load the kernel, as described in the previous comment.

By the way, one of the enabled options should select PROC_FS, because otherwise:
  grsecurity/gracl.c: In function ‘gr_handle_proc_create’:
  grsecurity/gracl.c:2838:73: error: ‘struct pid_namespace’ has no member named ‘proc_mnt’
Comment 16 Maxim Kammerer 2012-07-30 15:52:17 UTC
(In reply to comment #15)
> Note that PAX_PER_CPU_PGD is not enabled, since PAE is disabled in this x86 kernel.

Enabling X86_PAE (and consequently PAX_PER_CPU_PGD) results in immediate reboot after:

initial memory mapped : 0 - 00e00000
Base memory trampoline at [c009e000] 9e000 size 4096
init_memory_mapping: 0000000000000000-000000000fffd000
 0000000000 - 0000200000 page 4k
 0000200000 - 000fe00000 page 2M
 000fe00000 - 000fffd000 page 4k
kernel direct mapping tables up to fffd000 @ dfb000-e00000

when booting via

qemu-system-x86_64 -cpu kvm64 -nodefaults -sdl -monitor vc -m 256M -vga cirrus -kernel .../bzImage -serial file:serial.log -append "earlyprintk=serial,keep debug"

Results for OVMF boot via GRUB-UEFI are the same as previously ("couldn't find suitable memory target").
Comment 17 PaX Team 2012-07-30 19:02:58 UTC
ok, i figured it out, it seems that my percpu data approach never worked with relocatable kernels as the needed relocation handing code was under KERNEXEC. will be fixed in the next patch, thanks for your help guys!
Comment 18 PaX Team 2012-07-30 19:04:13 UTC
as for the UEFI only problem, can you move that to a new bug please?
Comment 19 Maxim Kammerer 2012-07-30 20:30:46 UTC
(In reply to comment #18)
> as for the UEFI only problem, can you move that to a new bug please?

Sure, but are you sure it's an unrelated problem? Should I wait for a new patchset first?
Comment 20 PaX Team 2012-07-30 22:02:38 UTC
(In reply to comment #19)
> Sure, but are you sure it's an unrelated problem? Should I wait for a new
> patchset first?

you can try the next patch i'll release soon, but from my reading of the grub code, it's not finding a large enough GRUB_MEMORY_AVAILABLE memory chunk, so i don't see how that's related to this relocatable kernel bug per se.
Comment 21 Maxim Kammerer 2012-08-01 22:27:56 UTC
@blueness: will you release new hardened-patches? Not sure how to properly test the newly released grsecurity / PaX patches on grsecurity.net.
Comment 22 Anthony Basile gentoo-dev 2012-08-02 12:15:52 UTC
(In reply to comment #21)
> @blueness: will you release new hardened-patches? Not sure how to properly
> test the newly released grsecurity / PaX patches on grsecurity.net.

Please test hardened-sources-3.4.7 which uses grsecurity-2.9.1-3.4.7-201208011850.
Comment 23 Maxim Kammerer 2012-08-02 15:44:44 UTC
I have tested hardened-sources-3.4.7 both with a full-blown configuration, and with the configuration in comment #1, and the problem seems to be gone. Moreover, the NULL deref issues on OVMF (comment #11) are also gone.

There is, however, an issue that appears in QEMU only sometimes, maybe 1 in 5 runs (tested with configuration from comment #1):

Freeing SMP alternatives: 8k freed
Enabling APIC mode:  Flat.  Using 1 I/O APICs
------------[ cut here ]------------
WARNING: at arch/x86/kernel/apic/apic.c:1334 setup_local_APIC+0x2d2/0x3c7()
Pid: 1, comm: swapper/0 Not tainted 3.4.7-hardened #3
Call Trace:
 [<c041db95>] ? warn_slowpath_common+0x65/0x90
 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7
 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7
 [<c041dc79>] ? warn_slowpath_null+0x19/0x20
 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7
 [<c052b762>] ? native_smp_prepare_cpus+0x2c2/0x373
 [<c04d4b26>] ? ret_from_fork+0x6/0x20
 [<c0525714>] ? do_one_initcall+0x125/0x125
 [<c052575c>] ? kernel_init+0x48/0x180
 [<c0525714>] ? do_one_initcall+0x125/0x125
 [<c04d5156>] ? kernel_thread_helper+0x6/0xd
---[ end trace 4eaa2a86a8e2da22 ]---
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel QEMU Virtual CPU version 1.0,1 stepping 03

This could be unrelated to hardened -- didn't check on gentoo-sources yet.
Comment 24 Maxim Kammerer 2012-08-02 16:34:06 UTC
(In reply to comment #23)
> This could be unrelated to hardened -- didn't check on gentoo-sources yet.

Just observed the same warning in gentoo-sources-3.4.7, so probably not related. Is it enough testing to close the bug?

By the way, backporting the patches to the stable hardened-sources-3.4.2 is not possible?
Comment 25 Anthony Basile gentoo-dev 2012-08-02 18:03:29 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > This could be unrelated to hardened -- didn't check on gentoo-sources yet.
> 
> Just observed the same warning in gentoo-sources-3.4.7, so probably not
> related. Is it enough testing to close the bug?
> 
> By the way, backporting the patches to the stable hardened-sources-3.4.2 is
> not possible?

Not unless upstream is willing and I doubt that.
Comment 26 Anthony Basile gentoo-dev 2012-08-02 18:41:50 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > This could be unrelated to hardened -- didn't check on gentoo-sources yet.
> 
> Just observed the same warning in gentoo-sources-3.4.7, so probably not
> related. Is it enough testing to close the bug?
> 
> By the way, backporting the patches to the stable hardened-sources-3.4.2 is
> not possible?

I'll open a bug for this against gentoo-sources-3.4.7 so we don't loose track of this issue.  But the original bug in the title is resolved.
Comment 27 Anthony Basile gentoo-dev 2012-08-02 19:08:12 UTC
(In reply to comment #26)
> (In reply to comment #24)
> > (In reply to comment #23)
> > > This could be unrelated to hardened -- didn't check on gentoo-sources yet.
> > 
> > Just observed the same warning in gentoo-sources-3.4.7, so probably not
> > related. Is it enough testing to close the bug?
> > 
> > By the way, backporting the patches to the stable hardened-sources-3.4.2 is
> > not possible?
> 
> I'll open a bug for this against gentoo-sources-3.4.7 so we don't loose
> track of this issue.  But the original bug in the title is resolved.

I've opened bug #429562 to track the oops in comment 23.