Using kernel 2.4.20-gentoo-r6, pcmcia-cs v3.2.4. I get a kernel oops when removing any PCMCIA card from a Dell Latitude CPi D300XT unless I explicitly eject the card in software first using cardctl. I compiled a stock 2.4.20 kernel and pcmcia-cs modules for it and card removal is handled properly. Modules are loaded and/or unloaded in response to card insertion and removal. This appears to be a problem with the interrupt handler. The oops notes a problem in sched.c, line 1141, along with the observation "interrupt handler - not syncing". Reproducible: Always Steps to Reproduce: 1.Insert card into PCMCIA card slot, or boot machine with card inserted 2.After modules are loaded and the card is active, remove the card. 3. Actual Results: Kernel oops. Expected Results: Modules relevant to removed card should be unloaded in response to its removal. I will amend this bug and attach a rather crappy photo of the kernel oops screen (sorry about the quality!).
Created attachment 16351 [details] Digital photo of kernel oops screen Digital photo of kernel oops. Not great, but relevant information is either there, or can be inferred.
Thank you, the photo was fine. Can you please: > [Do this on the buggy gentoo kernel] > Stick the stuff below into a text file > emerge ksymoops > ksymoops < file_with_trace > file_with_output [ as root if you can't find it ] > And attach file_with_output to Bugzilla... Also, can you try gentoo-sources-r5 and see if you get the same problem? Thanks... --BEGIN-OOPS-- CPU: 0 EIP: 0010:[<c018cd20>] Not tainted EFLAGS: 00010002 eax: 00000001 ebx: 00000003 ecx: 00000000 edx: 00000001 esi: c6e8c840 edi: c6e8c858 ebp: c013ddec esp: c013ddcc ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c013d000) Stack: 66656463 6a696867 6e6d6c6b 7271706f 00000006 c013c000 c6e8c840 c6e8c858 c6e8c860 c01bd540 c7c94000 c6e8c840 00000000 c150c000 c6f2a4e0 c01d2f93 c6e8c840 c6e8c840 c01d3d4e c6e8c840 c0108a20 00000001 c6f2a7e0 c7024840 Call Trace: [<c01bd5b0>] [<c01d2f93>] [<c01d3d4e>] [<c01ed986>] [<c01ebcd4>] [<c01ed9e0>] [<c01d26cf>] [<c01d0ffa>] [<c01ec243>] [<c01ec2c9>] [<c895c600>] [<c0220201>] [<c8957930>] [<c89583dc>] [<c8956f75>] [<c895c600>] [<c895e1c0>] [<c895e130>] [<c019aaed>] [<c01960b3>] [<c0195f86>] [<c0195dcb>] [<c018207e>] [<c017e660>] [<c01849a3>] [<c017e660>] [<c017e660>] [<c017e683>] [<c017e6e4>] Code: 0f 0b 75 04 3f c0 2b c0 e9 13 fc ff ff 8d 76 00 55 89 e5 53 <0> Kernel panic: Aiee, killing interrupt handler In interrupt handler - not syncing --END-OOPS--
Created attachment 16368 [details] ksymoops output from kernel crash # ksymoops --system-map=/usr/src/linux/System.map < gentoo_oops.txt > oops_analysis.txt. I trust this is what you want. To the best of my knowledge, this corresponds to what was running when I shot the oops output which you transcribed, and everything should match up. The only difference being that I recompiled pcmcia-cs after going back to the gentoo kernel. If this matters, I can run the process again, although I don't look forward to hand-copying the screen output into a text file :-( This also happened with the r5 kernel. I didn't report it, and hoped that it might have been fixed in the r6 kernel, which it wasn't, so I decided to report it. I do a bit of programming, but I ain't no kernel hacker ;-) I appreciate your patience.
Sorry I didn't reply earlier, bugzilla seems to have some bug with not bugging you upon new attachments. Can you recompile your kernel removing: Preemptible Kernel, any APIC-related things, and ACPI [temporarily].
Never mind that. Looking through the code an evil nasty quick way would be to enable Preemptible Kernel. Enable that and essentially you can't have any interrupts as the kernel is preemptible, which should fix that bug.
*** Bug 26333 has been marked as a duplicate of this bug. ***
Enabling Preemptible Kernel does not fix the problem. Ejecting the card first with cardctl works.
Okay, can you try getting rid of lines 1140 and 1141 from kernel/sched.c and see what happens [ just comment them out with a "//" ]
Gives: Oops: 0007 CPU: 0 EIP: 0023:[<400e6243>] Not tainted EFLAGS: 00010286 eax: 00000001 ebx: 4014ae00 ecx: bfffafa0 edx: bfffafa0 esi: 00000001 edi: 0805edd8 ebp: bffffdc48 esp: bffffaf90 ds: 002b es:002b ss: 002b Process devfsd (pid: 160, stackpage=ddddd000) <0> Kernel panic: Aiee, killing interrupt handler In interrupt handler - not syncing It all works just fine if i kill devfsd.
Try this: [for lines 1140+1141]: if (in_interrupt()) return; The PCMCIA modules like to call devfs functions [if available] on a timer for some reason. When you get an interrupt, it has no clue what do to with it sends it over to schedule() which BUGs() out as it's also clueless...
This works!
Resolving. I'll try and get this into the next gentoo kernel. Thanks for your help :-)
*** Bug 27448 has been marked as a duplicate of this bug. ***
I note that this bug is still in gentoo-sources (r7) which I put on a system this past weekend (c.a. 9/20/03). This problem occurs in other contexts, apparently, as I found out when a newly installed kernel on a desktop system crashed with the same error. I had to apply the same fix. Did this slip through the cracks? Shouldn't it be in gentoo-sources by now? It's marked as FIXED.
Fixed in CVS, should sync over to Portage soon.