I'm using kernel-2.6.12-gentoo-r6 on a quad CPU SCSI based system. In general it works well, but trying to unload the aic7xxx SCSI driver, even if it isn't used (use count=0)rmmod aic7xxx causes a kernel panic and the system crashes. Reproducible: Always Steps to Reproduce: 1. modprobe aic7xxx 2. Use a tape device or other SCSI peripheral. 3. rmmod aic7xxx PANIC! Actual Results: A one-line error, which I'll quote exactly later. Expected Results: Should have returned silently without further ado.
Is this reproducable on vanilla-sources-2.6.13_rc6?
I neglected to report exactly what the message was earlier: "Kernel panic - not syncing: Loop 1" and then the entire system becomes unresponsive and a power-off restart becomes necessary. I haven't tried this with vanilla sources, but I should note that I'm using the modutils recommended in the 2.6 kernel migration guide.
The migration guide suggests module-init-tools (modutils will not work) If you could test vanilla-sources-2.6.13_rc6 it would be appreciated. Maybe the problem has already been solved..
I wasn't clear earlier: In the 2.6 migration I did replace modutils with module-init-tools as the migration guide recommended. I wanted to make it clear that the cause of this problem was not that I had neglected to carry out this step.
I don't know if this has any bearing but I'm running 2.6.11-r6 and have been running gentoo sources 2.6 for many months. When 2.6.12-r6 was available I tried it but it kernel paniced on boot. I then tried r9 when it came out - same thing. I have an Adaptec 3210S RAID card and a 29160 card in the sytem. Based on this bug I removed the aic7xxx driver and rebuilt 2.6.12-r9. It worked - the system booted and Gentoo started but then hung at the colplug of pnp. What's broken with the aic7xxx driver?
You tell us - we need to see the message and we need to know whether it is reproducable on the latest development kernel (currently vanilla-sources- 2.6.13_rc7)
See below. However, I've done some testing on two systems - on both if I leave aic7xxx as a module I can boot, otherwise it dies. The 2005.1 LiveCD is a disaster, too. I tell it noload=aic7xxx but it still lists it as loaded. I was hoping someone with more knowledge could tell me if I should continue fighting it or wait till whatever broke is fixed. > > Unable to handle paging request at virtual address 8000002c > > printing eip > > c0303411 > > *pde = 00000000 > > Ooops 0000 [#1] > > Prempt SMP > > Modules linked in: > > CPU: 0 > > EIP: 0060:[<c0303411.] Not tainted VLI > > EFLAGS: 0010086 (2.6.12-gentoo-r6) > > DIP is at adptr_isr+)x141/Ox220 > > eax: 80000000 ebx: f7ea0000 ecx: 00000000 edx: f7d74000 > > esi: 00000000 edi: c05adf98 ebp: 37ea0000 esp: c05adf10 > > ds: 007b es: 007b ss: 0068 > > Process swapper (pid:0, threadinfo=c05ac000 task=c04f3c00) > > Stack: c219dde4 c0453d22 c05adf7cc0453d55 f7d74022 00000000 00000202 > > 00000000 > > f7d48cc0 00000000 c05adf98 00000015 c013e520 00000015 f7d74000 > > c005adf98 > > 00000000 00000015 c05a67c0 00000540 c05ab7dc c013e624 00000015 > > c01206c5 > > Call Trace: > > [<c0453d22>] schedule+0x2a2/0xc20 > > [<c0453d55>] schedule+0x3d5/0xc20 > > [<c013e520>] handle_IRQ_event+0x30/0x70 > > [<c013e624.] __do_IRQ+0xc4/0x120 > > [<c01206c5>] __do_softirq+0xc5/0xc0 > > [<c0105699>] do_IRQ+0x19/0x30 > > [<c0103932>] common_interrupt+0x1a/0x20 > > [<c0100bf0>] default_idle+0x0/0x30 > > [<c0100c13>] default_idle+0x23/0x30 > > [<c0100cc7>] cpu_idle+0x67/0x70 > > [<c05ae97d>] start_kernel+0x18d/0x1d0 > > [<c05aa380>] unknown_bootoption+0x010/0x1b0 > > Code: 08 a9 00 00 00 40 89 44 24 1c 74 10 8b 7b 0c 85 ff 74 09 b9 11 00 > > 00 > > 00 89 de f3 a5 8b 4c 24 1c 85 c9 78 36 8b 43 0c 85 c0 > > 74 07 <8b> 50 2c 85 d2 75 17 8b 54 24 38 8b 82 8c 00 00 00 89 28 f0 > > 83 > > <0>Kernel panic - not syncing: Fatal execption in interrupt
This is *probably* fixed in 2.6.13, if you could test gentoo-sources-2.6.13 instead it would be great.
Will do then. I was sticking with kernels that were not ~arch. I'll give it a try.
Well 2.16.13 boots and Gentoo starts but it hangs colplugging pnp devices for some reason - it's always worked before. I'll kill coldplug and see what happens. But it looks like 2.16.13 has fixed the kernel oops problem. I'll try it on my other system.
Sounds like 2.6.13 fixed it then. Please file a new bug for your coldplug issues when you have investigated further. It's probably a good idea to take coldplug out of runlevels, then manually start it once the system is booted. You'll then be able to go to another console and hopefully look at the "dmesg" output for any clues which module it is breaking on.
Yes, I'll pursue the coldplug separately as it is now apparent it has nothing to do with this. For what it's worth I've found a coldplug bug that mentions some changes to the usb.rc file so I'll pursue that first. Thanks. I guess I need to say with odd numbered releases <G>.
I've just tried this with gentoo-sources-2.6.13 - exactly the same problem. I haven't had a chance to try with vanilla sources yet; but is this still thought to be worthwhile?
When you say "the same problem", you mean the error you reported earlier? (as opposed to Brett's problem, which is different). Is that definately the _only_ message that appears? I'd generally expect a panic to be more verbose.
(In reply to comment #14) > When you say "the same problem", you mean the error you reported earlier? (as > opposed to Brett's problem, which is different). > > Is that definately the _only_ message that appears? I'd generally expect a panic > to be more verbose. It's exactly the same as in the *original* problem report that I started. The (admittedly terse) error message is exactly the same: "Kernel panic - not syncing: Loop 1", and the system hangs _completely_, even the cursor stops blinking! The problem is mildly worse than I originally reported too, just loading the module (aic7xxx) and unloading it again straight away is enough to cause this panic.
Please write about your experiences at: http://bugzilla.kernel.org/show_bug.cgi?id=5224
I'm not sure what has fixed this, but this problem has now mysteriously 'gone away'. I think the only thing that changed was the compiler (gcc) and after rebuilding the kernel the problem vanished. I haven't got a good explanation, but things are now as they should be!
Please readd "watch-linux-bugzilla" to the status whiteboard if this occurs again.