Summary: | gentoo-sources 2.6.19-r5 - 2.6.21 8250_pnp module oops' on load with console on serial port | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Joshua Hoblitt <j_gentoo> |
Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
Status: | VERIFIED UPSTREAM | ||
Severity: | major | CC: | j_gentoo |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
URL: | http://bugzilla.kernel.org/show_bug.cgi?id=8552 | ||
Whiteboard: | linux-2.6.22 | ||
Package list: | Runtime testing required: | --- |
Description
Joshua Hoblitt
2007-05-17 01:27:16 UTC
http://lkml.org/lkml/2006/10/3/393 During boot: uart_open() -> uart_get() // state->info allocated During 8250_pnp serial_pnp_probe(): serial8250_register_port() -> uart_remove_one_port() (state->info set to NULL, memory leaked) -> uart_add_one_port() (state->info not modified) During serial console operation: tty_write() -> write_chan() -> uart_write_room() (dereferences state->info) 8250_pnp probably shouldn't be allowed to deregister ports while they are open (refcounting needed?) So I'm thumbed through the code myself, not that I really understand it, and it seems to me that either serial8250_register_port() shouldn't remove a port that it _knows_ is already registered (causing 8250_pnp to fail) or somewhere in the tty_write() call chain serial_mutex needs to be aquired. Where is uart_circ_chars_free() defined? (In reply to comment #2) > So I'm thumbed through the code myself, not that I really understand it, and it > seems to me that either serial8250_register_port() shouldn't remove a port that > it _knows_ is already registered The function is specifically designed to remove already-registered ports. The removal codepath makes the assumptions that it has taken all the necessary precautions to ensure that there are no users. As you have seen, it hasn't :) > Where is uart_circ_chars_free() defined? include/linux/serial_core.h I don't see how acquiring serial_mutex would help. This isn't a race condition as such. Even if there were 5 minutes between 8250_pnp probe completing, and the next message being sent out of the serial console, the crash would still happen. (In reply to comment #3) > I don't see how acquiring serial_mutex would help. This isn't a race condition > as such. Even if there were 5 minutes between 8250_pnp probe completing, and > the next message being sent out of the serial console, the crash would still > happen. OK - then I think you original suggestion is correct that there needs to be some sort of ref count or in use semaphore on the port struct so that serial8250_register_port() can't remove the port while it's still in use. I guess that means, at least on my system, that 8250_pnp will blowup during system startup. That's kinda ugly too as it'll prevent init from attaching a tty to the serial line. Are you sure? Try running a system without that module -- I suspect it will work fine. I can confirm that the serial console does work (without panic) when I remove the 8250_pnp driver. So what is the long term fix here? Changing serial8250_register_port() so that it will only operate on an unused port? Not really sure. Please file this bug upstream at http://bugzilla.kernel.org and post the new URL here. This is fixed in 2.6.22 rc kernels. I'm not going to backport the fix into 2.6.21 as it is fairly large, breaks speakup, is not an entirely common scenario and there is an easy workaround. (rest assured that this will be fixed as of 2.6.22, we'll be dropping speakup unless it gets fixed, just cannot drop it in the middle of 2.6.21 development) It seems not to be fixed for hardened-sources (2.6.22-hardened-r8): * udev loading module pcspkr * udev loading module 8250_pnp�Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: [<ffffffff804001b8>] uart_write_room+0xb/0x19 PGD 118582067 PUD 11857f067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 2306, comm: modprobe.sh Not tainted 2.6.22-hardened-r8 #1 RIP: 0010:[<ffffffff804001b8>] [<ffffffff804001b8>] uart_write_room+0xb/0x19 RSP: 0018:ffff810118581e50 EFLAGS: 00010202 RAX: ffff81011b32de00 RBX: 0000000000000025 RCX: ffff81011a773210 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff81011a773000 RBP: ffff81011a773000 R08: 0000000000000001 R09: ffff810118581e68 R10: 0000000000000286 R11: 0000000000000246 R12: ffff81011a035800 R13: 0000000000000025 R14: ffff810118bdb380 R15: ffff81011a035800 FS: 00002ba48bfafb00(0000) GS:ffffffff805d9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000014 CR3: 000000011813e000 CR4: 00000000000006e0 Process modprobe.sh (pid: 2306, threadinfo ffff810118580000, task ffff8101184d8040) Stack: ffffffff803ee713 0000000000000000 ffff8101184d8040 ffffffff80224ccc ffff81011a773210 ffff81011a773210 2222222222222222 2222222222222222 0000000000000025 0000000000000025 ffff81011a773000 0000000000000025 Call Trace: [<ffffffff803ee713>] write_chan+0x102/0x2ff [<ffffffff80224ccc>] default_wake_function+0x0/0xe [<ffffffff803eba95>] tty_write+0x181/0x21c [<ffffffff803ee611>] write_chan+0x0/0x2ff [<ffffffff80279106>] vfs_write+0xad/0x136 [<ffffffff80279643>] sys_write+0x45/0x6e [<ffffffff80209cde>] system_call+0x7e/0x83 Code: 8b 42 14 2b 42 10 ff c8 25 ff 0f 00 00 c3 48 8b 87 60 02 00 RIP [<ffffffff804001b8>] uart_write_room+0xb/0x19 RSP <ffff810118581e50> CR2: 0000000000000014 Unable to handle kernel NULL pointer dereference at 0000000000000011 RIP: [<0000000000000011>] PGD 119485067 PUD 119484067 PMD 0 Oops: 0010 [2] SMP CPU 1 Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 0, comm: swapper Not tainted 2.6.22-hardened-r8 #1 RIP: 0010:[<0000000000000011>] [<0000000000000011>] RSP: 0018:ffff81011b9a7e98 EFLAGS: 00010087 RAX: ffff81011a773210 RBX: ffff810100040001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff810118581e58 RBP: ffff81011b9a7ed0 R08: ffff810118581e70 R09: ffff81011b9cbe90 R10: ffff81011a79b380 R11: ffffffff80404575 R12: 0000000000000001 R13: 0000000000000056 R14: ffff81011a773208 R15: 0000000000000000 FS: 00002aca2a96eb00(0000) GS:ffff81011b977ac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000011 CR3: 0000000119424000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81011b9a2000, task ffff81011b97d840) Stack: ffffffff8022227f 0000000100000000 ffff81011a773208 0000000000000000 0000000000000001 0000000000000282 0000000000000001 ffff81011b9a7f10 ffffffff802227b4 ffffffff804052ad ffff81011907bae0 000000000000000a Call Trace: <IRQ> [<ffffffff8022227f>] __wake_up_common+0x3e/0x68 [<ffffffff802227b4>] __wake_up+0x38/0x4f [<ffffffff804052ad>] serial8250_interrupt+0x3f/0x121 [<ffffffff8022e97a>] tasklet_action+0x53/0x9d [<ffffffff8022e8a4>] __do_softirq+0x55/0xc4 [<ffffffff8020ae6c>] call_softirq+0x1c/0x28 [<ffffffff8020c781>] do_softirq+0x2c/0x7d [<ffffffff8020c888>] do_IRQ+0xb6/0xd6 [<ffffffff80208fde>] default_idle+0x0/0x3d [<ffffffff8020a1f1>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80404575>] serial8250_start_tx+0x0/0xb9 [<ffffffff80209007>] default_idle+0x29/0x3d [<ffffffff8020906c>] cpu_idle+0x51/0x70 Code: Bad RIP value. RIP [<0000000000000011>] RSP <ffff81011b9a7e98> CR2: 0000000000000011 Kernel panic - not syncing: Aiee, killing interrupt handler! |