Summary: | GENERATE_KEYMAP produces oops in kernel 2.6.3[12] when attempting to start hald | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | ta2002 <throw_away_2002> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | RESOLVED UPSTREAM | ||
Severity: | normal | ||
Priority: | High | ||
Version: | 10.1 | ||
Hardware: | All | ||
OS: | Linux | ||
URL: | http://forums.gentoo.org/viewtopic-p-6113574.html | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
.config file used
compilation output without the patch compilation output with the patch kernel messages without the patch kernel message with the patch (oops occurs when hald attempts to start) Shipped keymap Generated keymap with NO changes to defkeymap.map |
Description
ta2002
2009-12-30 15:20:02 UTC
Two guesses, both related to your CFLAGS, either you've got your processor wrong or that particular combination triggers a compiler bug. (In reply to comment #1) > Two guesses, both related to your CFLAGS, > either you've got your processor wrong > or that particular combination triggers a compiler bug. I doubt that (although I certainly don't want to say "that's impossible"). From from forum thread (where I have posted "emerge --info" cpuinfo, and various .configs): CFLAGS="-march=pentium3 -Os -pipe -fomit-frame-pointer -mfpmath=sse" $ dog /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III Mobile CPU 1000MHz stepping : 1 cpu MHz : 1000.000 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1998.25 clflush size : 32 power management: That hardly seems like such a rare combination that it wouldn't have been found before. Any suggestions on how to proceed? I assume I am going to have to isolate which patch is doing this. I have tracked down the cause of this, though I don't really have any idea how to fix it. This comes from a two character patch I have been applying since 2.6.4 on over a dozen machines. I have never seen anything like this happen before on any of the machines. The patch: --- linux-2.6.31/drivers/char/Makefile 2009-09-09 22:13:59.000000000 +0000 +++ linux-2.6.31/drivers/char/Makefile 2009-12-01 00:00:00.000000000 +0000 @@ -125,7 +125,7 @@ # Uncomment if you're changing the keymap and have an appropriate # loadkeys version for the map. By default, we'll use the shipped # versions. -# GENERATE_KEYMAP := 1 +GENERATE_KEYMAP := 1 ifdef GENERATE_KEYMAP Without the patch, 2.6.31.6 runs without errors, and a kernel compiled with the patch produces the output shown in the files I will attach. I will also attach the .config I used, and the compilation output. Please advise on how to continue. It is important for me to get this fixed. Created attachment 214685 [details]
.config file used
Created attachment 214687 [details]
compilation output without the patch
Created attachment 214688 [details]
compilation output with the patch
Created attachment 214690 [details]
kernel messages without the patch
Created attachment 214691 [details]
kernel message with the patch (oops occurs when hald attempts to start)
Since you have tracked it down to a consequence of the custom keymap, how about attaching your generated defkeymap.c and also a diff against the shipped version? (In reply to comment #9) > Since you have tracked it down to a consequence of the custom keymap, how about > attaching your generated defkeymap.c and also a diff against the shipped > version? I am guessing that the problem comes from the fact that loadkeys is generating a defkeymap.c in a different format (and a much larger file) than the shipped one. To test this, I have generated a keymap from the default with no changes to the keyboard layout. (I will attach these files). This still produced the Oops. Editing the "shipped" version directly and using that (without generating a keymap) produces a kernel that does not cause the problem. Created attachment 215384 [details]
Shipped keymap
Created attachment 215386 [details]
Generated keymap with NO changes to defkeymap.map
The only real difference I see is that the arrays from loadkeys are filled up with 0xf200 where the shipped version allows the unused space at the end of the arrays to be padded out with 0's. I can think of one more key piece of information needed to report this issue upstream: the oops output from running vanilla kernel with 'CONFIG_DEBUG_INFO=y' and using your generated keymap. (In reply to comment #13) > The only real difference I see is that the arrays from loadkeys are > filled up with 0xf200 where the shipped version allows the unused > space at the end of the arrays to be padded out with 0's. OK. Well, I admit I am pretty much in over my head on this. > I can think of one more key piece of information needed to report this > issue upstream: the oops output from running vanilla kernel with > 'CONFIG_DEBUG_INFO=y' and using your generated keymap. I set that (what a space hog to compile - over 1.1 GB), and generated the keymap from the default defkeymap.map (again, I usually make changes before generating the keymap, but I want to demonstrate that this bug has nothing to do with those changes). These are the lines from /var/log/messages: Jan 7 11:25:33 system kernel: BUG: unable to handle kernel NULL pointer dereference at 000000a2 Jan 7 11:25:33 system kernel: IP: [<c10e2a2a>] strlen+0x8/0x11 Jan 7 11:25:33 system kernel: *pde = 00000000 Jan 7 11:25:33 system kernel: Oops: 0000 [#1] Jan 7 11:25:33 system kernel: last sysfs file: /sys/devices/virtual/misc/nvram/uevent Jan 7 11:25:33 system kernel: Jan 7 11:25:33 system kernel: Pid: 2360, comm: udevadm Not tainted (2.6.31.6 #2) 26472TA Jan 7 11:25:33 system kernel: EIP: 0060:[<c10e2a2a>] EFLAGS: 00010246 CPU: 0 Jan 7 11:25:33 system kernel: EIP is at strlen+0x8/0x11 Jan 7 11:25:33 system kernel: EAX: 00000000 EBX: 00000000 ECX: ffffffff EDX: 000000d0 Jan 7 11:25:33 system kernel: ESI: 000000a2 EDI: 000000a2 EBP: efaad908 ESP: eec7dee4 Jan 7 11:25:33 system kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Jan 7 11:25:33 system kernel: Process udevadm (pid: 2360, ti=eec7c000 task=efb97130 task.ti=eec7c000) Jan 7 11:25:33 system kernel: Stack: Jan 7 11:25:33 system kernel: 000000d0 c104992b 00000000 eec7df24 efaad900 c1149cc7 efaad900 eee90000 Jan 7 11:25:33 system kernel: <0> eee90000 c1149d57 eee90000 c1362a93 00000090 eee90000 c1362a8a 0000000a Jan 7 11:25:33 system kernel: <0> 00000000 c13b9ba8 00000000 c1149ebe eec78000 fffffffb c13b9bd4 c1149e5a Jan 7 11:25:33 system kernel: Call Trace: Jan 7 11:25:33 system kernel: [<c104992b>] ? kstrdup+0x14/0x41 Jan 7 11:25:33 system kernel: [<c1149cc7>] ? device_get_nodename+0x3c/0x89 Jan 7 11:25:33 system kernel: [<c1149d57>] ? dev_uevent+0x43/0xdc Jan 7 11:25:33 system kernel: [<c1149ebe>] ? show_uevent+0x64/0xa5 Jan 7 11:25:33 system kernel: [<c1149e5a>] ? show_uevent+0x0/0xa5 Jan 7 11:25:33 system kernel: [<c1149b79>] ? dev_attr_show+0x16/0x32 Jan 7 11:25:33 system kernel: [<c108d61b>] ? sysfs_read_file+0x8b/0xea Jan 7 11:25:33 system kernel: [<c108d590>] ? sysfs_read_file+0x0/0xea Jan 7 11:25:33 system kernel: [<c105bf9a>] ? vfs_read+0x81/0x102 Jan 7 11:25:33 system kernel: [<c105c0b3>] ? sys_read+0x3c/0x63 Jan 7 11:25:33 system kernel: [<c1002728>] ? sysenter_do_call+0x12/0x26 Jan 7 11:25:33 system kernel: Code: eb 04 19 c0 0c 01 5e 5f c3 56 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e c3 57 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 49 89 c8 5f c3 57 31 ff 85 c9 74 0e 89 c7 89 d0 f2 Jan 7 11:25:33 system kernel: EIP: [<c10e2a2a>] strlen+0x8/0x11 SS:ESP 0068:eec7dee4 Jan 7 11:25:33 system kernel: CR2: 00000000000000a2 Jan 7 11:25:33 system kernel: ---[ end trace 423d5ba8ea6bd181 ]--- Let me know if you need anything else (I don't really see any additional information beyond what was there before). Ok time to get this reported upstream since you have an oops from vanilla kernel; assigning to kernel team who can advise on opening up a kernel bug. Have you tested with gentoo-sources-2.6.32-r1? (In reply to comment #16) > Have you tested with gentoo-sources-2.6.32-r1? No. I generally tend to run only "stable" packages. I am even a bit more leery of 2.6.32 than usual. I believe there are some significant changes in serial port handling that have the potential to break a lot of things. I'd like to know if this is a bug fixed in later kernels. Please reopen if you can do this test and post the results. Tested with latest stable kernels (gentoo-sources-2.6.31-r10 and vanilla-sources-2.6.31.12) with absolutely identical results. Please test with gentoo-sources 2.6.32 and 2.6.33 (In reply to comment #20) > Please test with gentoo-sources 2.6.32 and 2.6.33 Just tested with gentoo-sources 2.6.32-r7 (just marked stable a couple of days ago). No changes. From /var/log/messages: Apr 10 11:25:02 system kernel: BUG: unable to handle kernel NULL pointer dereference at 000000a2 Apr 10 11:25:02 system kernel: IP: [<c110500a>] strlen+0x8/0x11 Apr 10 11:25:02 system kernel: *pde = 00000000 Apr 10 11:25:02 system kernel: Oops: 0000 [#1] Apr 10 11:25:02 system kernel: last sysfs file: /sys/devices/virtual/misc/hpet/uevent Apr 10 11:25:02 system kernel: Apr 10 11:25:02 system kernel: Pid: 2298, comm: udevadm Not tainted (2.6.32-gentoo-r7 #1) 26472TA Apr 10 11:25:02 system kernel: EIP: 0060:[<c110500a>] EFLAGS: 00010246 CPU: 0 Apr 10 11:25:02 system kernel: EIP is at strlen+0x8/0x11 Apr 10 11:25:02 system kernel: EAX: 00000000 EBX: 00000000 ECX: ffffffff EDX: 000000d0 Apr 10 11:25:02 system kernel: ESI: 000000a2 EDI: 000000a2 EBP: ef363f26 ESP: ef363edc Apr 10 11:25:02 system kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Apr 10 11:25:02 system kernel: Process udevadm (pid: 2298, ti=ef362000 task=ef074700 task.ti=ef362000) Apr 10 11:25:02 system kernel: Stack: Apr 10 11:25:02 system kernel: 000000d0 c104b843 00000000 ef363f20 ef9b4c00 c11860a3 ef9b4c00 ef1fb000 Apr 10 11:25:02 system kernel: <0> ef1fb000 ef9b4c08 c118613f ef1fb000 c13bc2a1 000000e4 ef1fb000 c13bc298 Apr 10 11:25:02 system kernel: <0> 0000000a 00000000 007c8280 c141a508 00000000 c11862c4 ef2f8000 fffffffb Apr 10 11:25:02 system kernel: Call Trace: Apr 10 11:25:02 system kernel: [<c104b843>] ? kstrdup+0x14/0x41 Apr 10 11:25:02 system kernel: [<c11860a3>] ? device_get_devnode+0x41/0x8f Apr 10 11:25:02 system kernel: [<c118613f>] ? dev_uevent+0x4e/0x105 Apr 10 11:25:02 system kernel: [<c11862c4>] ? show_uevent+0x64/0xa5 Apr 10 11:25:02 system kernel: [<c1186260>] ? show_uevent+0x0/0xa5 Apr 10 11:25:02 system kernel: [<c1185f4d>] ? dev_attr_show+0x16/0x32 Apr 10 11:25:02 system kernel: [<c1091417>] ? sysfs_read_file+0x8b/0xea Apr 10 11:25:02 system kernel: [<c109138c>] ? sysfs_read_file+0x0/0xea Apr 10 11:25:02 system kernel: [<c105f069>] ? vfs_read+0x81/0x102 Apr 10 11:25:02 system kernel: [<c105f182>] ? sys_read+0x3c/0x63 Apr 10 11:25:02 system kernel: [<c10027a8>] ? sysenter_do_call+0x12/0x26 Apr 10 11:25:02 system kernel: Code: eb 04 19 c0 0c 01 5e 5f c3 56 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e c3 57 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 49 89 c8 5f c3 57 31 ff 85 c9 74 0e 89 c7 89 d0 f2 Apr 10 11:25:02 system kernel: EIP: [<c110500a>] strlen+0x8/0x11 SS:ESP 0068:ef363edc Apr 10 11:25:02 system kernel: CR2: 00000000000000a2 Apr 10 11:25:02 system kernel: ---[ end trace d6fa32b3eb5107ce ]--- Apr 10 11:25:11 system kernel: BUG: unable to handle kernel NULL pointer dereference at 00000076 Apr 10 11:25:11 system kernel: IP: [<c111e425>] misc_open+0x35/0xb7 Apr 10 11:25:11 system kernel: *pde = 00000000 Apr 10 11:25:11 system kernel: Oops: 0000 [#2] Apr 10 11:25:11 system kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/boot_vga Apr 10 11:25:11 system kernel: Apr 10 11:25:11 system kernel: Pid: 3627, comm: X Tainted: G D (2.6.32-gentoo-r7 #1) 26472TA Apr 10 11:25:11 system kernel: EIP: 0060:[<c111e425>] EFLAGS: 00213212 CPU: 0 Apr 10 11:25:11 system kernel: EIP is at misc_open+0x35/0xb7 Apr 10 11:25:11 system kernel: EAX: 0000006a EBX: 0000003f ECX: c111e3f0 EDX: 00000076 Apr 10 11:25:11 system kernel: ESI: ef39eb80 EDI: 00000000 EBP: ef1c5adc ESP: ef10fe7c Apr 10 11:25:11 system kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Apr 10 11:25:11 system kernel: Process X (pid: 3627, ti=ef10e000 task=ef35ee00 task.ti=ef10e000) Apr 10 11:25:11 system kernel: Stack: Apr 10 11:25:11 system kernel: 00000000 ef837bc0 00000000 ef1c5adc c106064e ef39eb80 0000003f ef39eb80 Apr 10 11:25:11 system kernel: <0> ef1c5adc 00000000 c1060599 c105d424 efbf6b00 ef4c5600 ef39eb80 ef10ff00 Apr 10 11:25:11 system kernel: <0> ef10ff00 00000003 c105d591 ef39eb80 00000000 ef32da80 00000000 ef10ff00 Apr 10 11:25:11 system kernel: Call Trace: Apr 10 11:25:11 system kernel: [<c106064e>] ? chrdev_open+0xb5/0xcb Apr 10 11:25:11 system kernel: [<c1060599>] ? chrdev_open+0x0/0xcb Apr 10 11:25:11 system kdm[3598]: X server died during startup Apr 10 11:25:11 system kdm[3598]: X server for display :0 cannot be started, session disabled Apr 10 11:25:11 system kernel: [<c105d424>] ? __dentry_open+0xd5/0x1b2 Apr 10 11:25:11 system kernel: [<c105d591>] ? nameidata_to_filp+0x28/0x3b Apr 10 11:25:11 system kernel: [<c1066fbd>] ? do_filp_open+0x417/0x7b0 Apr 10 11:25:11 system kernel: [<c104dbee>] ? __do_fault+0x2e2/0x319 Apr 10 11:25:11 system kernel: [<c106d86c>] ? alloc_fd+0x49/0xab Apr 10 11:25:11 system kernel: [<c105d224>] ? do_sys_open+0x48/0x114 Apr 10 11:25:11 system kernel: [<c105d334>] ? sys_open+0x1e/0x23 Apr 10 11:25:11 system kernel: [<c10027a8>] ? sysenter_do_call+0x12/0x26 Apr 10 11:25:11 system kernel: Code: 34 b8 b4 e1 40 c1 e8 13 00 1e 00 a1 c0 e1 40 c1 81 e3 ff ff 0f 00 83 e8 0c eb 10 39 18 75 09 8b 40 08 85 c0 75 51 eb 11 8d 42 f4 <8b> 50 0c 0f 18 02 90 3d b4 e1 40 c1 75 e2 b8 b4 e1 40 c1 e8 f1 Apr 10 11:25:11 system kernel: EIP: [<c111e425>] misc_open+0x35/0xb7 SS:ESP 0068:ef10fe7c Apr 10 11:25:11 system kernel: CR2: 0000000000000076 Apr 10 11:25:11 system kernel: ---[ end trace d6fa32b3eb5107cf ]--- I must admit I am completely baffled as to why you think that this bug might somehow disappear in later kernels without anyone actually bothering to fix it. Please submit upstream at http://bugzilla.kernel.org and post url here. |