Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 169121 - gdb causes "kernel BUG at mm/mmap.c:2207"
Summary: gdb causes "kernel BUG at mm/mmap.c:2207"
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: The Gentoo Linux Hardened Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-03 13:48 UTC by Michael Hanselmann (hansmi) (RETIRED)
Modified: 2007-03-25 11:59 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Kernel configuration provoking link error (dot-config-2.6.20-paxtest,36.28 KB, text/plain)
2007-03-06 21:13 UTC, Michael Hanselmann (hansmi) (RETIRED)
Details
Photo of panic message (panic.jpg,120.09 KB, image/jpeg)
2007-03-10 22:22 UTC, Michael Hanselmann (hansmi) (RETIRED)
Details
Full boot log with backtrace (panic,11.82 KB, text/plain)
2007-03-15 20:29 UTC, Michael Hanselmann (hansmi) (RETIRED)
Details
2.6.20.3 without CONFIG_KERNEXEC (panic,9.67 KB, text/plain)
2007-03-16 17:01 UTC, Michael Hanselmann (hansmi) (RETIRED)
Details
2.6.20.3 with CONFIG_KERNEXEC (panic2,7.81 KB, text/plain)
2007-03-16 17:02 UTC, Michael Hanselmann (hansmi) (RETIRED)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-03 13:48:41 UTC
When debugging any program (tried ldapvi and /bin/ls) using gdb, I get this:

[  260.840914] kernel BUG at mm/mmap.c:2207!
[  260.840943] invalid opcode: 0000 [#3]
[  260.840972] SMP 
[  260.841045] Modules linked in:
[  260.841098] CPU:    1
[  260.841099] EIP:    0060:[<00052b97>]    Not tainted VLI
[  260.841100] EFLAGS: 00010202   (2.6.19-hardened-r6-possum #6)
[  260.841187] EIP is at exit_mmap+0x102/0x112
[  260.841216] eax: 00000000   ebx: c180c740   ecx: dfeedd40   edx: c16ff700
[  260.841246] esi: 00000000   edi: c1aba040   ebp: 00000001   esp: f7277f0c
[  260.841277] ds: 0068   es: 0068   ss: 0068
[  260.841306] Process ls (pid: 6480, ti=f7276000 task=dffc0a90 task.ti=f7276000)
[  260.841336] Stack: f7fb8ac4 0000008e 00000000 00000000 f7277f28 00000000 c180c740 00000094 
[  260.841580]        c1aba040 c1aba088 00000000 00020245 c1aba040 00000000 dffc0a90 00025097 
[  260.841822]        c1aba040 00000000 c06bc31f 00000004 00000000 00000000 00000000 f7276000 
[  260.842065] Call Trace:
[  260.842119]  =======================
[  260.842148] Code: 44 24 04 8d 43 10 89 04 24 e8 af 36 00 00 c7 43 04 00 00 00 00 85 f6 74 0c 89 34 24 e8 23 e3 ff ff 89 c6 eb f0 83 7f 7c 00 74 09 <0f> 0b ea 7f f0 6b c0 9f 08 83 c4 20 5b 5e 5f c3 56 53 83 ec 20 
[  260.844348] EIP: [<00052b97>] exit_mmap+0x102/0x112 SS:ESP 0068:f7277f0c
[  260.844426]  <1>Fixing recursive fault but reboot is needed!

Line 2207 of mmap.c is:
    BUG_ON(mm->nr_ptes > (FIRST_USER_ADDRESS+PMD_SIZE-1)>>PMD_SHIFT);


This is reproducible on two different machines, one with an AMD Opteron and the other an Intel Core 2 Duo. The oops is not reproducible on vanilla-sources-2.6.19.5. Alexander Gabert <pappy@g.o> suspects it being a bug in PaX.

If you need further tests, I can conduct them.

Relevant info from emerge --info:
Portage 2.1.2-r9 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r5, 2.6.19-hardened-r6-possum i686)
sys-devel/binutils:  2.16.1-r3
Comment 1 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-03 14:03:52 UTC
Forgot to mention this, but I'm using sys-devel/gdb-6.6.
Comment 2 Christian Heim (RETIRED) gentoo-dev 2007-03-03 14:52:46 UTC
pipacs, I guess I/we need your help on this, as it's waaay out of my scope.
Comment 3 PaX Team 2007-03-06 14:40:53 UTC
(In reply to comment #2)
> pipacs, I guess I/we need your help on this, as it's waaay out of my scope.

i'll need the relevant bits from .config and also a confirmation with the latest 2.6.20-pax (standalone) patch.
Comment 4 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-06 21:13:52 UTC
Created attachment 112329 [details]
Kernel configuration provoking link error

(In reply to comment #3)
> i'll need the relevant bits from .config and also a confirmation with the
> latest 2.6.20-pax (standalone) patch.

I wanted to try with pax-linux-2.6.20.1-test5.patch applied to linux-2.6.20.tar.bz2. Unfortunately, it fails already during linking stage. Configuration is attached.

arch/i386/mm/built-in.o: In function `free_initmem':
: undefined reference to `__init_end'
arch/i386/mm/built-in.o: In function `mem_init':
: undefined reference to `__init_end'
arch/i386/mm/built-in.o: In function `mem_init':
: undefined reference to `__init_end'
fs/built-in.o: In function `load_elf_binary':
binfmt_elf.c:(.text+0x25f64): undefined reference to `pax_set_initial_flags'

Before I try looking into the code, do you already have a newer patchset available?
Comment 5 PaX Team 2007-03-06 23:28:01 UTC
(In reply to comment #4)
> arch/i386/mm/built-in.o: In function `free_initmem':
> : undefined reference to `__init_end'
> arch/i386/mm/built-in.o: In function `mem_init':
> : undefined reference to `__init_end'
> arch/i386/mm/built-in.o: In function `mem_init':
> : undefined reference to `__init_end'

fixed in test6 hopefully, give it a try.

> fs/built-in.o: In function `load_elf_binary':
> binfmt_elf.c:(.text+0x25f64): undefined reference to `pax_set_initial_flags'

that's because you set

    CONFIG_PAX_HAVE_ACL_FLAGS=y

whereas PaX itself doesn't actually provide such a hook. i guess it's a leftover from a previous grsec config, set it to 'none' under plain PaX.
Comment 6 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-07 20:40:00 UTC
(In reply to comment #5)
> fixed in test6 hopefully, give it a try.

Yes, linked fine now.

Unfortunately, it now panics reallly early in the boot process, after less than 0.002 secs according to printk's output. netconsole isn't enabled at that time and, although unchecked, I suppose serial console wouldn't help, too. So here's a partly backtrace, typed by hand:

[cut off because only 80x24 display]
deactivate_task
__sched_text_start
do_fork
wait_for_completion
default_wake_function
default_wake_function
keventd_create_kthread
kthread_create
migration_thread
keventd_create_kthread
migration_call
migration_thread
ret_from_fork
migration_init
do_pre_smp_initcalls
init
kernel_thread_helper
EIP: dequeue_task
Kernel panic: Attempted to kill the idle task!

Is this of any help for you? Anything more I could provide?

Configuration is the same except for CONFIG_PAX_HAVE_ACL_FLAGS.

> whereas PaX itself doesn't actually provide such a hook. i guess it's a
> leftover from a previous grsec config, set it to 'none' under plain PaX.

Yes, it was a PaX+Grsecurity config before. Normally I'm using Gentoo's hardened-sources which include both.
Comment 7 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-07 20:51:13 UTC
More info (thanks to vga=ask :-)):

general protection fault: 0000 [#1]
Process swapper (pid: 0, …)

> [cut off because only 80x24 display]
> deactivate_task
> __sched_text_start
> …
Comment 8 PaX Team 2007-03-09 07:20:38 UTC
(In reply to comment #6)
> EIP: dequeue_task
> Kernel panic: Attempted to kill the idle task!
> 
> Is this of any help for you? Anything more I could provide?

interesting place to crash, i'd need all the info you can get, best is to boot with vga=ext (or some other mode with 50+ lines) and take a picture.
Comment 9 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-10 22:22:22 UTC
Created attachment 112857 [details]
Photo of panic message

Here's the image with the full backtrace.

I might not have access to bugs until Wednesday, but will respond afterwards.
Comment 10 PaX Team 2007-03-11 18:25:31 UTC
(In reply to comment #9)
> Created an attachment (id=112857) [edit]
> Photo of panic message
> 
> Here's the image with the full backtrace.

thanks, it's more helpful but unfortunately i still have no clue what could go so wrong as the oops indicates. could you get a serial console and also capture the very first oops (hoping that it'd actually make more sense than the last one reported)?
Comment 11 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-15 20:29:23 UTC
Created attachment 113396 [details]
Full boot log with backtrace

(In reply to comment #10)
> could you get a serial console and also capture the very first oops?

You're lucky, serial console worked. Hopefully this is more useful.
Comment 12 PaX Team 2007-03-16 00:31:21 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > could you get a serial console and also capture the very first oops?
> 
> You're lucky, serial console worked. Hopefully this is more useful.

thanks, it helped in that it showed that it wasn't an oops per se, but an earlier problem (scheduling from the idle thread). now why that happens is a good question, my guess is that it's something else failing elsewhere and this is just a side-effect... one more thing you could try that would clean up the backtrace a bit is to enable KERNEXEC (and try the latest 2.6.20.3 while you're at it).
Comment 13 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-16 17:01:29 UTC
Created attachment 113484 [details]
2.6.20.3 without CONFIG_KERNEXEC

(In reply to comment #12)
> one more thing you could try that would clean up the backtrace a bit
> is to enable KERNEXEC (and try the latest 2.6.20.3 while you're
> at it).

Here's a log of 2.6.20.3-pax-test8 without KERNEXEC.
Comment 14 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-16 17:02:01 UTC
Created attachment 113486 [details]
2.6.20.3 with CONFIG_KERNEXEC

And here's one with KERNEXEC.
Comment 15 PaX Team 2007-03-19 01:36:49 UTC
thanks for your help and after lots of debugging under qemu, i finally fixed it, can you give test9 a try?
Comment 16 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-23 21:59:29 UTC
Sorry for the late answer, I was busy with work.

(In reply to comment #15)
> thanks for your help and after lots of debugging under qemu, i finally fixed
> it, can you give test9 a try?

Now it fails here:
In file included from arch/i386/kernel/ldt.c:21:
include/asm/mmu_context.h: In function `switch_mm':
include/asm/mmu_context.h:58: error: structure has no member named `user_cs_base'
include/asm/mmu_context.h:58: error: structure has no member named `user_cs_limit'
include/asm/mmu_context.h:76: error: structure has no member named `user_cs_base'
include/asm/mmu_context.h:76: error: structure has no member named `user_cs_limit'

The reason is because I don't have CONFIG_PAX_PAGEEXEC or CONFIG_PAX_SEGMEXEC defined. It compiled after enabling both. However, this should be either fixed in code or Kconfig. Linux 2.6.20.3 with test9 boots and gdb doesn't cause it to crash.
Comment 17 PaX Team 2007-03-23 23:34:00 UTC
(In reply to comment #16)
> Now it fails here:
> In file included from arch/i386/kernel/ldt.c:21:
> include/asm/mmu_context.h: In function `switch_mm':
> include/asm/mmu_context.h:58: error: structure has no member named
> `user_cs_base'

should be fixed in test10 (also updated for .20.4).

> Linux 2.6.20.3 with test9 boots and gdb doesn't cause it to crash.

ok, that's good news, but you should verify it with your previous config (w/o PAGEEXEC/SEGMEXEC) as well.
Comment 18 Michael Hanselmann (hansmi) (RETIRED) gentoo-dev 2007-03-25 11:59:27 UTC
(In reply to comment #17)
> should be fixed in test10 (also updated for .20.4).

> ok, that's good news, but you should verify it with your previous config (w/o
> PAGEEXEC/SEGMEXEC) as well.

Everthing works again. Thanks a lot!