There is apparently a problem with hardened-sources patchset for kernel 3.4. The issue is present in both the stable (3.4.2) and the latest unstable (3.4.6-r1) hardened-sources with all hardened features turned off, and is *not* present in corresponding gentoo-sources. A symptom of the problem is that the kernel hangs or reboots immediately after: Decompressing Linux... Parsing ELF... done. Booting the kernel. (even with "earlyprintk", "loglevel=7", "debug", etc.) The problem is quite robust and resistant to kernel settings changes, except for those below. I have managed to narrow down the kernel configuration to following key features, where changing each makes the problem disappear): 1. CONFIG_SMP=y (to change: disable) 2. CONFIG_RELOCATABLE=y (to change: disable) 3. CONFIG_PHYSICAL_ALIGN=0x400000 (to change: set to 0x1000000) 4. CONFIG_PHYSICAL_START=0x1000000 (to change: set to 0x400000) Note that simply adding "nosmp", "nolapic", etc. has no effect on the problem. Also, e.g., changing PHYSICAL_START to same value as PHYSICAL_ALIGN is not a good workaround, because I see much more serious problems with UEFI boot, where this doesn't help (and where I didn't yet narrow down the kernel config), and I suspect that the problems are related (or even caused by the same bug). The kernel is started as: qemu-kvm -nodefaults -sdl -monitor vc -m 512M -vga cirrus -cdrom cdrom.iso -serial file:serial.log and cdrom.iso boots the kernel using ISOLINUX, with e.g., "earlyprintk=serial,keep". Minimal kernel configurations and "good" output from a gentoo-sources kernel are attached below.
Created attachment 319658 [details] minimal hardened-sources-3.4.2 configuration This is a minimal configuration for hardened-sources-3.4.2 that results in a problem described (immediate reboot or hang).
Created attachment 319660 [details] minimal gentoo-sources-3.4.2 configuration This is essentially the same configuration for gentoo-sources-3.4.2. It is easy to get it by running "make oldconfig" in gentoo-sources tree for the previous attachment.
Created attachment 319662 [details] output from gentoo-sources-3.4.2 This is the "good" output from running the gentoo-sources-3.4.2 kernel. As expected, the kernel reaches the point where it tries to run init, and panics. Same output can be achieved by changing one of the settings above for a hardened-sources-3.4* kernel.
The toolchain is latest hardened profile one: sys-devel/gcc-4.5.3-r2 was built with the following: USE="cxx hardened nls nptl openmp (-altivec) -bootstrap -build -doc (-fixed-point) -fortran -gcj -graphite -gtk (-libssp) -lto -mudflap (-multilib) -multislot -nocxx -nopie -nossp -objc -objc++ -objc-gc -test -vanilla" sys-devel/binutils-2.21.1-r1 was built with the following: USE="cxx nls zlib -multislot -multitarget -static-libs -test -vanilla" CFLAGS="-O2 -march=pentium3 -mtune=core2 -pipe" CXXFLAGS="-O2 -march=pentium3 -mtune=core2 -pipe" The problem appears in VMware as well (didn't try other environments).
An easier way to run the kernel (without an ISO): qemu-system-x86_64 -cpu kvm64 -nodefaults -sdl -monitor vc -m 512M -vga cirrus -kernel .../bzImage -serial file:serial.log -append "earlyprintk=serial,keep debug" I am using (on amd64): app-emulation/qemu-kvm-1.0.1 was built with the following: USE="aio alsa bluetooth curl (multilib) ncurses opengl sdl spice threads vhost-net xattr -brltty -debug -fdt -pulseaudio -qemu-ifup (-rbd) -sasl -smartcard -static -test -tls -usbredir -vde -xen" QEMU_SOFTMMU_TARGETS="x86_64 (-arm) -cris -i386 (-m68k) -microblaze (-mips) -mips64 -mips64el -mipsel (-ppc) (-ppc64) -ppcemb -sh4 -sh4eb (-sparc) -sparc64" QEMU_USER_TARGETS="(-alpha) (-arm) -armeb -cris -i386 (-m68k) -microblaze (-mips) -mipsel (-ppc) (-ppc64) -ppc64abi32 -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -x86_64"
(In reply to comment #2) > Created attachment 319660 [details] > minimal gentoo-sources-3.4.2 configuration > > This is essentially the same configuration for gentoo-sources-3.4.2. It is > easy to get it by running "make oldconfig" in gentoo-sources tree for the > previous attachment. I'm testing with the very latest patches from usptream not 3.4.2, because upstream will ask that right away. When I compile with gcc-4.5.4 it compiles fine, but when I compile with gcc-4.6.3, it fails with: (cat /dev/null; ) > arch/x86/vdso/modules.order ld -m elf_i386 -r -o arch/x86/built-in.o arch/x86/kernel/built-in.o arch/x86/mm/built-in.o arch/x86/crypto/built-in.o arch/x86/vdso/built-in.o arch/x86/platform/built-in.o arch/x86/net/built-in.o (cat /dev/null; cat arch/x86/kernel/modules.order; cat arch/x86/mm/modules.order; cat arch/x86/crypto/modules.order; cat arch/x86/vdso/modules.order; cat arch/x86/platform/modules.order; cat arch/x86/net/modules.order;) > arch/x86/modules.order make -f scripts/Makefile.build obj=kernel gcc -Wp,-MD,kernel/.time.o.d -nostdinc -isystem /usr/lib/gcc/i686-pc-linux-gnu/4.6.3/include -I/usr/src/linux-3.4.6-hardened-r2/arch/x86/include -Iarch/x86/include/generated -Iinclude -include /usr/src/linux-3.4.6-hardened-r2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/constify_plugin.so -DCONSTIFY_PLUGIN -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/colorize_plugin.so -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(time)" -D"KBUILD_MODNAME=KBUILD_STR(time)" -c -o kernel/time.o kernel/time.c gcc -Wp,-MD,kernel/.capability.o.d -nostdinc -isystem /usr/lib/gcc/i686-pc-linux-gnu/4.6.3/include -I/usr/src/linux-3.4.6-hardened-r2/arch/x86/include -Iarch/x86/include/generated -Iinclude -include /usr/src/linux-3.4.6-hardened-r2/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m32 -msoft-float -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/constify_plugin.so -DCONSTIFY_PLUGIN -fplugin=/usr/src/linux-3.4.6-hardened-r2/tools/gcc/colorize_plugin.so -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(capability)" -D"KBUILD_MODNAME=KBUILD_STR(capability)" -c -o kernel/capability.o kernel/capability.c In file included from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/uaccess.h:636:0, from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/sections.h:5, from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/hw_irq.h:26, from include/linux/irq.h:369, from /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/hardirq.h:5, from include/linux/hardirq.h:7, from include/linux/ftrace_event.h:7, from include/trace/syscall.h:6, from include/linux/syscalls.h:78, from kernel/capability.c:15: In function 'copy_to_user', inlined from 'sys_capget' at kernel/capability.c:208:19: /usr/src/linux-3.4.6-hardened-r2/arch/x86/include/asm/uaccess_32.h:244:24: error: call to 'copy_to_user_overflow' declared with attribute error: copy_to_user() buffer size is not provably correct make[1]: *** [kernel/capability.o] Error 1 make: *** [kernel] Error 2
Okay I confirmed this using gcc-4.5.4 and binutils-2.21.1-r1. I'll attach the bzImage, System.map and vmlinux files in my next posts.
Created attachment 319664 [details] Failing bzImage for config in comment 1
Created attachment 319666 [details] Failing System.map for config in comment 1
Created attachment 319668 [details] Failing vmlinux image for config in comment 1, lzma compressed
(In reply to comment #7) > Okay I confirmed this using gcc-4.5.4 and binutils-2.21.1-r1. Great! I will gladly test any patches, and see if they fix issues in a more complex UEFI+OVMF setup, where the output is (in case it is related and may hint for the culprit): Checking if this processor honours the WP bit even in supervisor mode...Ok. SLUB: Genslabs=15, HWalign=128, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 BUG: unable to handle kernel NULL pointer dereference at 0000001c IP: [<c144477d>] set_task_rq+0xd/0x4e *pdpt = 0000000000000000 *pde = 00000b0548c68948 BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c1421125>] no_context+0x15f/0x1c5 *pdpt = 0000000000000000 *pde = 00000b0548c68948 BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [<c1421125>] no_context+0x15f/0x1c5 ... (many repetitions)
hmm, very interesting, i see what's happening (some statically initialized pcpu data don't get relocated) but i don't see how that would happen yet (i've been testing relocatable kernels myself for many years now, with the same align/start relationship to force an actual relocation on boot). one question: does enabling various PaX features (esp. KERNEXEC) change the situation?
(In reply to comment #12) > one question: does enabling various PaX features (esp. KERNEXEC) change the > situation? Yes! This is indeed interesting. Enabling KERNEXEC in an otherwise fully blown (GRKERNSEC, PAX) x86 3.4.6-hardened-r1 kernel with all the features mentioned in comment #1 (the configuration I started with before narrowing it down) makes the problem (hang after "Booting the kernel.") disappear for normal (BIOS) boot. For UEFI (QEMU OVMF), however, instead of behavior described in comment #11, I now observe something weirder: GRUB2's x86_64-efi image now fails to load the kernel with: "couldn't find suitable memory target" -- a message in grub-core/lib/relocator.c.
(In reply to comment #13) And exactly the same results with a minimal kernel with KERNEXEC enabled, configuration to be attached.
Created attachment 319728 [details] minimal hardened-sources-3.4.6-r1 configuration, with KERNEXEC This x86 hardened-sources-3.4.6-r1 configuration has all the features from comment #1, but PAX_KERNEXEC is enabled. Note that PAX_PER_CPU_PGD is not enabled, since PAE is disabled in this x86 kernel. BIOS boot proceeds fine, but GRUB-EFI is unable to load the kernel, as described in the previous comment. By the way, one of the enabled options should select PROC_FS, because otherwise: grsecurity/gracl.c: In function ‘gr_handle_proc_create’: grsecurity/gracl.c:2838:73: error: ‘struct pid_namespace’ has no member named ‘proc_mnt’
(In reply to comment #15) > Note that PAX_PER_CPU_PGD is not enabled, since PAE is disabled in this x86 kernel. Enabling X86_PAE (and consequently PAX_PER_CPU_PGD) results in immediate reboot after: initial memory mapped : 0 - 00e00000 Base memory trampoline at [c009e000] 9e000 size 4096 init_memory_mapping: 0000000000000000-000000000fffd000 0000000000 - 0000200000 page 4k 0000200000 - 000fe00000 page 2M 000fe00000 - 000fffd000 page 4k kernel direct mapping tables up to fffd000 @ dfb000-e00000 when booting via qemu-system-x86_64 -cpu kvm64 -nodefaults -sdl -monitor vc -m 256M -vga cirrus -kernel .../bzImage -serial file:serial.log -append "earlyprintk=serial,keep debug" Results for OVMF boot via GRUB-UEFI are the same as previously ("couldn't find suitable memory target").
ok, i figured it out, it seems that my percpu data approach never worked with relocatable kernels as the needed relocation handing code was under KERNEXEC. will be fixed in the next patch, thanks for your help guys!
as for the UEFI only problem, can you move that to a new bug please?
(In reply to comment #18) > as for the UEFI only problem, can you move that to a new bug please? Sure, but are you sure it's an unrelated problem? Should I wait for a new patchset first?
(In reply to comment #19) > Sure, but are you sure it's an unrelated problem? Should I wait for a new > patchset first? you can try the next patch i'll release soon, but from my reading of the grub code, it's not finding a large enough GRUB_MEMORY_AVAILABLE memory chunk, so i don't see how that's related to this relocatable kernel bug per se.
@blueness: will you release new hardened-patches? Not sure how to properly test the newly released grsecurity / PaX patches on grsecurity.net.
(In reply to comment #21) > @blueness: will you release new hardened-patches? Not sure how to properly > test the newly released grsecurity / PaX patches on grsecurity.net. Please test hardened-sources-3.4.7 which uses grsecurity-2.9.1-3.4.7-201208011850.
I have tested hardened-sources-3.4.7 both with a full-blown configuration, and with the configuration in comment #1, and the problem seems to be gone. Moreover, the NULL deref issues on OVMF (comment #11) are also gone. There is, however, an issue that appears in QEMU only sometimes, maybe 1 in 5 runs (tested with configuration from comment #1): Freeing SMP alternatives: 8k freed Enabling APIC mode: Flat. Using 1 I/O APICs ------------[ cut here ]------------ WARNING: at arch/x86/kernel/apic/apic.c:1334 setup_local_APIC+0x2d2/0x3c7() Pid: 1, comm: swapper/0 Not tainted 3.4.7-hardened #3 Call Trace: [<c041db95>] ? warn_slowpath_common+0x65/0x90 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7 [<c041dc79>] ? warn_slowpath_null+0x19/0x20 [<c053ae40>] ? setup_local_APIC+0x2d2/0x3c7 [<c052b762>] ? native_smp_prepare_cpus+0x2c2/0x373 [<c04d4b26>] ? ret_from_fork+0x6/0x20 [<c0525714>] ? do_one_initcall+0x125/0x125 [<c052575c>] ? kernel_init+0x48/0x180 [<c0525714>] ? do_one_initcall+0x125/0x125 [<c04d5156>] ? kernel_thread_helper+0x6/0xd ---[ end trace 4eaa2a86a8e2da22 ]--- ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel QEMU Virtual CPU version 1.0,1 stepping 03 This could be unrelated to hardened -- didn't check on gentoo-sources yet.
(In reply to comment #23) > This could be unrelated to hardened -- didn't check on gentoo-sources yet. Just observed the same warning in gentoo-sources-3.4.7, so probably not related. Is it enough testing to close the bug? By the way, backporting the patches to the stable hardened-sources-3.4.2 is not possible?
(In reply to comment #24) > (In reply to comment #23) > > This could be unrelated to hardened -- didn't check on gentoo-sources yet. > > Just observed the same warning in gentoo-sources-3.4.7, so probably not > related. Is it enough testing to close the bug? > > By the way, backporting the patches to the stable hardened-sources-3.4.2 is > not possible? Not unless upstream is willing and I doubt that.
(In reply to comment #24) > (In reply to comment #23) > > This could be unrelated to hardened -- didn't check on gentoo-sources yet. > > Just observed the same warning in gentoo-sources-3.4.7, so probably not > related. Is it enough testing to close the bug? > > By the way, backporting the patches to the stable hardened-sources-3.4.2 is > not possible? I'll open a bug for this against gentoo-sources-3.4.7 so we don't loose track of this issue. But the original bug in the title is resolved.
(In reply to comment #26) > (In reply to comment #24) > > (In reply to comment #23) > > > This could be unrelated to hardened -- didn't check on gentoo-sources yet. > > > > Just observed the same warning in gentoo-sources-3.4.7, so probably not > > related. Is it enough testing to close the bug? > > > > By the way, backporting the patches to the stable hardened-sources-3.4.2 is > > not possible? > > I'll open a bug for this against gentoo-sources-3.4.7 so we don't loose > track of this issue. But the original bug in the title is resolved. I've opened bug #429562 to track the oops in comment 23.