When debugging any program (tried ldapvi and /bin/ls) using gdb, I get this: [ 260.840914] kernel BUG at mm/mmap.c:2207! [ 260.840943] invalid opcode: 0000 [#3] [ 260.840972] SMP [ 260.841045] Modules linked in: [ 260.841098] CPU: 1 [ 260.841099] EIP: 0060:[<00052b97>] Not tainted VLI [ 260.841100] EFLAGS: 00010202 (2.6.19-hardened-r6-possum #6) [ 260.841187] EIP is at exit_mmap+0x102/0x112 [ 260.841216] eax: 00000000 ebx: c180c740 ecx: dfeedd40 edx: c16ff700 [ 260.841246] esi: 00000000 edi: c1aba040 ebp: 00000001 esp: f7277f0c [ 260.841277] ds: 0068 es: 0068 ss: 0068 [ 260.841306] Process ls (pid: 6480, ti=f7276000 task=dffc0a90 task.ti=f7276000) [ 260.841336] Stack: f7fb8ac4 0000008e 00000000 00000000 f7277f28 00000000 c180c740 00000094 [ 260.841580] c1aba040 c1aba088 00000000 00020245 c1aba040 00000000 dffc0a90 00025097 [ 260.841822] c1aba040 00000000 c06bc31f 00000004 00000000 00000000 00000000 f7276000 [ 260.842065] Call Trace: [ 260.842119] ======================= [ 260.842148] Code: 44 24 04 8d 43 10 89 04 24 e8 af 36 00 00 c7 43 04 00 00 00 00 85 f6 74 0c 89 34 24 e8 23 e3 ff ff 89 c6 eb f0 83 7f 7c 00 74 09 <0f> 0b ea 7f f0 6b c0 9f 08 83 c4 20 5b 5e 5f c3 56 53 83 ec 20 [ 260.844348] EIP: [<00052b97>] exit_mmap+0x102/0x112 SS:ESP 0068:f7277f0c [ 260.844426] <1>Fixing recursive fault but reboot is needed! Line 2207 of mmap.c is: BUG_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT); This is reproducible on two different machines, one with an AMD Opteron and the other an Intel Core 2 Duo. The oops is not reproducible on vanilla-sources-2.6.19.5. Alexander Gabert <pappy@g.o> suspects it being a bug in PaX. If you need further tests, I can conduct them. Relevant info from emerge --info: Portage 2.1.2-r9 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r5, 2.6.19-hardened-r6-possum i686) sys-devel/binutils: 2.16.1-r3
Forgot to mention this, but I'm using sys-devel/gdb-6.6.
pipacs, I guess I/we need your help on this, as it's waaay out of my scope.
(In reply to comment #2) > pipacs, I guess I/we need your help on this, as it's waaay out of my scope. i'll need the relevant bits from .config and also a confirmation with the latest 2.6.20-pax (standalone) patch.
Created attachment 112329 [details] Kernel configuration provoking link error (In reply to comment #3) > i'll need the relevant bits from .config and also a confirmation with the > latest 2.6.20-pax (standalone) patch. I wanted to try with pax-linux-2.6.20.1-test5.patch applied to linux-2.6.20.tar.bz2. Unfortunately, it fails already during linking stage. Configuration is attached. arch/i386/mm/built-in.o: In function `free_initmem': : undefined reference to `__init_end' arch/i386/mm/built-in.o: In function `mem_init': : undefined reference to `__init_end' arch/i386/mm/built-in.o: In function `mem_init': : undefined reference to `__init_end' fs/built-in.o: In function `load_elf_binary': binfmt_elf.c:(.text+0x25f64): undefined reference to `pax_set_initial_flags' Before I try looking into the code, do you already have a newer patchset available?
(In reply to comment #4) > arch/i386/mm/built-in.o: In function `free_initmem': > : undefined reference to `__init_end' > arch/i386/mm/built-in.o: In function `mem_init': > : undefined reference to `__init_end' > arch/i386/mm/built-in.o: In function `mem_init': > : undefined reference to `__init_end' fixed in test6 hopefully, give it a try. > fs/built-in.o: In function `load_elf_binary': > binfmt_elf.c:(.text+0x25f64): undefined reference to `pax_set_initial_flags' that's because you set CONFIG_PAX_HAVE_ACL_FLAGS=y whereas PaX itself doesn't actually provide such a hook. i guess it's a leftover from a previous grsec config, set it to 'none' under plain PaX.
(In reply to comment #5) > fixed in test6 hopefully, give it a try. Yes, linked fine now. Unfortunately, it now panics reallly early in the boot process, after less than 0.002 secs according to printk's output. netconsole isn't enabled at that time and, although unchecked, I suppose serial console wouldn't help, too. So here's a partly backtrace, typed by hand: [cut off because only 80x24 display] deactivate_task __sched_text_start do_fork wait_for_completion default_wake_function default_wake_function keventd_create_kthread kthread_create migration_thread keventd_create_kthread migration_call migration_thread ret_from_fork migration_init do_pre_smp_initcalls init kernel_thread_helper EIP: dequeue_task Kernel panic: Attempted to kill the idle task! Is this of any help for you? Anything more I could provide? Configuration is the same except for CONFIG_PAX_HAVE_ACL_FLAGS. > whereas PaX itself doesn't actually provide such a hook. i guess it's a > leftover from a previous grsec config, set it to 'none' under plain PaX. Yes, it was a PaX+Grsecurity config before. Normally I'm using Gentoo's hardened-sources which include both.
More info (thanks to vga=ask :-)): general protection fault: 0000 [#1] Process swapper (pid: 0, …) > [cut off because only 80x24 display] > deactivate_task > __sched_text_start > …
(In reply to comment #6) > EIP: dequeue_task > Kernel panic: Attempted to kill the idle task! > > Is this of any help for you? Anything more I could provide? interesting place to crash, i'd need all the info you can get, best is to boot with vga=ext (or some other mode with 50+ lines) and take a picture.
Created attachment 112857 [details] Photo of panic message Here's the image with the full backtrace. I might not have access to bugs until Wednesday, but will respond afterwards.
(In reply to comment #9) > Created an attachment (id=112857) [edit] > Photo of panic message > > Here's the image with the full backtrace. thanks, it's more helpful but unfortunately i still have no clue what could go so wrong as the oops indicates. could you get a serial console and also capture the very first oops (hoping that it'd actually make more sense than the last one reported)?
Created attachment 113396 [details] Full boot log with backtrace (In reply to comment #10) > could you get a serial console and also capture the very first oops? You're lucky, serial console worked. Hopefully this is more useful.
(In reply to comment #11) > (In reply to comment #10) > > could you get a serial console and also capture the very first oops? > > You're lucky, serial console worked. Hopefully this is more useful. thanks, it helped in that it showed that it wasn't an oops per se, but an earlier problem (scheduling from the idle thread). now why that happens is a good question, my guess is that it's something else failing elsewhere and this is just a side-effect... one more thing you could try that would clean up the backtrace a bit is to enable KERNEXEC (and try the latest 2.6.20.3 while you're at it).
Created attachment 113484 [details] 2.6.20.3 without CONFIG_KERNEXEC (In reply to comment #12) > one more thing you could try that would clean up the backtrace a bit > is to enable KERNEXEC (and try the latest 2.6.20.3 while you're > at it). Here's a log of 2.6.20.3-pax-test8 without KERNEXEC.
Created attachment 113486 [details] 2.6.20.3 with CONFIG_KERNEXEC And here's one with KERNEXEC.
thanks for your help and after lots of debugging under qemu, i finally fixed it, can you give test9 a try?
Sorry for the late answer, I was busy with work. (In reply to comment #15) > thanks for your help and after lots of debugging under qemu, i finally fixed > it, can you give test9 a try? Now it fails here: In file included from arch/i386/kernel/ldt.c:21: include/asm/mmu_context.h: In function `switch_mm': include/asm/mmu_context.h:58: error: structure has no member named `user_cs_base' include/asm/mmu_context.h:58: error: structure has no member named `user_cs_limit' include/asm/mmu_context.h:76: error: structure has no member named `user_cs_base' include/asm/mmu_context.h:76: error: structure has no member named `user_cs_limit' The reason is because I don't have CONFIG_PAX_PAGEEXEC or CONFIG_PAX_SEGMEXEC defined. It compiled after enabling both. However, this should be either fixed in code or Kconfig. Linux 2.6.20.3 with test9 boots and gdb doesn't cause it to crash.
(In reply to comment #16) > Now it fails here: > In file included from arch/i386/kernel/ldt.c:21: > include/asm/mmu_context.h: In function `switch_mm': > include/asm/mmu_context.h:58: error: structure has no member named > `user_cs_base' should be fixed in test10 (also updated for .20.4). > Linux 2.6.20.3 with test9 boots and gdb doesn't cause it to crash. ok, that's good news, but you should verify it with your previous config (w/o PAGEEXEC/SEGMEXEC) as well.
(In reply to comment #17) > should be fixed in test10 (also updated for .20.4). > ok, that's good news, but you should verify it with your previous config (w/o > PAGEEXEC/SEGMEXEC) as well. Everthing works again. Thanks a lot!