Bug 91262

Summary:	"Unable to handle kernel paging request at..." error at boot. 2.6.11-gentoo-r6 & r7
Product:	Gentoo Linux	Reporter:	Alex <alex323>
Component:	[OLD] Core system	Assignee:	Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel>
Status:	RESOLVED NEEDINFO
Severity:	critical
Priority:	Lowest
Version:	unspecified
Hardware:	x86
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---
Attachments:	Kernel config file [2.6.11-gentoo-r6] (/usr/src/linux/.config) Kernel config file [2.6.12-r3] (/usr/src/linux/.config)

Description Alex 2005-05-02 20:43:18 UTC

I am having a serious problem with 2.6.11-gentoo-r6/r7. For some reason, when I try to boot, an error along the lines of "Unable to handle kernel paging request at..." Then it gives a hex number and just dies. I have not changed the kernel. This happened suddenly. In addition, it's reproducable 100% of the time I've tried a distclean followed by a reconfigure (menuconfig) and recompile. I have also tried removing `uname -r` from /lib/modules. Nothing seems to work.
So I downgraded to r6, but the problem is STILL occuring. All I did was reboot. The error is something like: "Unable to handle kernel paging request at...". Just above it are some lines that go something like this:

[0xsomething] elf_binary_something

Reproducible: Always
Steps to Reproduce:
Kind of hard to explain really.
Actual Results:  
Crash.. bam.. boom

Expected Results:  
Booted

Comment 1 Alex 2005-05-03 04:46:52 UTC

I just did 2 passes of memtest86 from http://memtest86.com/. 0 errors. I do wonder what the problem could be now.

Comment 2 Alex 2005-05-03 05:18:37 UTC

Ohh. I also recall something else.. if it helps at all:

Oops 0000 [#2]

Comment 3 Daniel Drake (RETIRED) gentoo-dev

2005-05-10 14:44:25 UTC

Which is the last good kernel which worked? How far into the boot sequence does the error occur?

Are you able to write down or photograph the error? If writing, the interesting bits are the lines beginning with "EIP", "EIP is at", "Process", and the call trace. You can miss out the hex values which look like [<c025fe0f>] at the beginning of every line in the call trace.

Comment 4 Alex 2005-05-10 18:00:03 UTC

The last working one was r6. I upgraded to r7 with the usual "problems." Then it was working fine for a while. I will try and get you that trace.

Comment 5 Daniel Drake (RETIRED) gentoo-dev

2005-05-12 10:39:44 UTC

Contradiction? The original comment says you are having a "serious problem with 2.6.11-gentoo-r6" but in comment #4 you say -r6 was the last one that worked fine?

Comment 6 Alex 2005-05-13 04:50:00 UTC

Kernel r6 was working. Then I upgraded to r7. r7 was working and then I get this. I downgraded to r6 and the problem still happens.

Comment 7 Alex 2005-05-15 12:10:08 UTC

Here is what I was able to write down:

[<c018e68f>] load_elf_binary+0x434/0xc40
[<c016bcee>] kernel_read+0x47e/0x60
[<c018e250>] load_elf_binary+0x0/0xc40
[<c016cace>] search_binary_handler+0x18e/0c2f0
[<c018d755>] load_script+0x215/0x250
[<c0146423>] __alloc_pages+0x2e3/0x420
[<c023a806>] copy_from_user+0x46/0x80
[<c016b7d8>] copy_strings+0x188/0x200
[<c018d540>] load_script+0x0/0x250
[<c016cace>] search_binary_handler+0x18e/0x2f0
[<c016cdbb>] do_execve+0x18b/0x720
[<c0101d36>] sys_execve+0x46/0xc0
[<c01032af>] syscall_call+0x7/0xb
[<c01303cf>] ____call_user_mode_helper+0x9f/0xc0
[<c0130330>] ____call_user_mode_helper+0x0/0xc0
[<c0101375>] kernel_thread_helper+0x5/0x10
==========================================
Unable to handle kernel paging request at virtual address 00008b0f.
printing eip:
c0103fde
*pde=00000000
Oops: 0000 [#2]
PREEMPT SMP
Modules linked in:
CPU: 0

Comment 8 Alex 2005-05-15 14:16:06 UTC

Created attachment 58971 [details]
Kernel config file [2.6.11-gentoo-r6] (/usr/src/linux/.config)

This is the .config file I am using. Generic x86 support IS enabled in it. The
only thing that changed when I added it was the error message. It said, "... a
NULL pointer at..." If I take it out, I get the error I originally reported.

Comment 9 Daniel Drake (RETIRED) gentoo-dev

2005-05-15 14:40:29 UTC

Odd. Are you able to enable a framebuffer at a high resolution so that you can see more of the error?

You could also try taking a backup of /sbin/init and re-merging sys-apps/sysvinit
Don't lose the backup, as it would be good to investigate why that is causing an oops, if it is.

Comment 10 Daniel Drake (RETIRED) gentoo-dev

2005-05-15 14:42:57 UTC

Also try booting with kernel parameter "init=/bin/sash" . It won't boot up normally, it will drop you to a console, but it would be interesting to see if the error still occurs.

Comment 11 Alex 2005-05-15 20:26:15 UTC

I tried init=/bin/sash and re-emering sysvinit. The same error occurs. I didn't do the framebuffer thing yet, but i'll get around to it.
Any thoughts? :S

Comment 12 Alex 2005-05-18 20:08:28 UTC

Since I havn't gotten an answer in a while, i'll try a vanilla kernel this
weekend and see if it works.

Comment 13 Alex 2005-05-22 18:53:04 UTC

The vanilla kernel worked. I still can't fathom why it works and the gentoo one
doesn't. I will be posting the vanilla kernel .config now.

Comment 14 Alex 2005-05-22 19:16:04 UTC

Created attachment 59599 [details]
Kernel config file [2.6.12-r3] (/usr/src/linux/.config)

This is the vanilla config file. It works. I don't see how the gentoo one
doesn't work :/.

Comment 15 Alex 2005-05-27 22:18:11 UTC

Anyone out there?

Comment 16 Daniel Drake (RETIRED) gentoo-dev

2005-05-31 15:59:37 UTC

Comparing gentoo-2.6.11 to vanilla-2.6.12-rc is not a valid comparison. Please
try vanilla-2.6.11. I suspect you will find that the issue does exist there,
which indicates that it has been fixed upstream in time for 2.6.12.

Comment 17 Daniel Drake (RETIRED) gentoo-dev

2005-06-13 15:39:15 UTC

see comment #16