When directing the kernel "console" to a serial line the kernel almost always oops'. 2.6.19 seemed to have a slightly better chance of not blowing up while 2.6.21 seems to always fall over. With all of these kernels the system will boot fine without the console redirected to serial and then happily attach a tty to the serial port (started by init). I have tried this on two different motherboards (Tyan S2982 & S2927), at least 3 different kernels since 2.6.19, and with 3 different settings for the BIOS level console -> serial port redirection (off, always, and until OS loads). I've also tried this with 8250_pnp compiled in statically and it still oops although I believe the oops' message was slightly different (I can recreate this if needed). Since this oops' is recreatable on two different systems I'm assuming that it is either a code or configuration problem. Here is the grub line used: kernel (hd0,0)/boot/kernel-genkernel-x86_64-2.6.21-gentoo root=/dev/ram0 real_r oot=/dev/sda3 init=/linuxrc console=tty0 console=ttyS0,115200n8 udev initrd (hd0,0)/boot/initramfs-genkernel-x86_64-2.6.21-gentoo Note that just one console line like "console=ttyS0,115200n8" will also oops. Ironically, the system actually sends the oops message to the serial line: * udev loading module 8250_pnp * udev loading module 8250_pnp Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: [<ffffffff80293e82>] uart_write_room+0xb/0x19 PGD 223501067 PUD 22287a067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: 8250_pnp pcspkr forcedeth arcmsr sg tg3 e1000 nfs nfs_acl lockd sunrpc raid10 raid1 raid0 dm_mirror dm_mod sata_nv libata sbp2 ohci1394 ieee1394 sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 3346, comm: modprobe.sh Not tainted 2.6.21-gentoo #6 RIP: 0010:[<ffffffff80293e82>] [<ffffffff80293e82>] uart_write_room+0xb/0x19 RSP: 0018:ffff810222f3de00 EFLAGS: 00010202 RAX: ffff810123eb7400 RBX: ffff810123cb7000 RCX: ffff810123cb7228 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff810123cb7000 RBP: 0000000000000020 R08: 0000000000000001 R09: ffff810222f3de58 R10: 0000000000000022 R11: 0000000000000246 R12: ffff8101281b8c00 R13: 0000000000000020 R14: 0000000000000020 R15: 0000000000000000 FS: 00002b599b677e10(0000) GS:ffff810123e2cd40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000014 CR3: 000000022327d000 CR4: 00000000000006e0 Process modprobe.sh (pid: 3346, threadinfo ffff810222f3c000, task ffff810223bb40c0) Stack: ffffffff801185cf ffff8101281b8c00 ffff81022389de80 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec 0000000000000000 0000000000000000 fffffffffffffffb 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec Call Trace: [<ffffffff801185cf>] write_chan+0x14f/0x32d [<ffffffff8017a7ec>] default_wake_function+0x0/0xe [<ffffffff8017a7ec>] default_wake_function+0x0/0xe [<ffffffff801268d7>] tty_write+0x188/0x22c [<ffffffff80118480>] write_chan+0x0/0x32d [<ffffffff80115531>] vfs_write+0xaf/0x131 [<ffffffff80115e9a>] sys_write+0x45/0x6e [<ffffffff8015911e>] system_call+0x7e/0x83 Code: 8b 42 14 2b 42 10 ff c8 25 ff 0f 00 00 c3 48 8b 87 78 02 00 RIP [<ffffffff80293e82>] uart_write_room+0xb/0x19 RSP <ffff810222f3de00> CR2: 0000000000000014 Reproducible: Sometimes Expected Results: Portage 2.1.2.2 (default-linux/amd64/2006.1, gcc-3.4.6, glibc-2.5-r0, 2.6.21-gentoo x86_64) ================================================================= System uname: 2.6.21-gentoo x86_64 Dual-Core AMD Opteron(tm) Processor 2220 Gentoo Base System release 1.12.9 Timestamp of tree: Mon, 14 May 2007 21:30:01 +0000 distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled] ccache version 2.4 [disabled] dev-java/java-config: 1.3.7, 2.0.31-r5 dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.4-r6 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.15-r1 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r2 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -march=k8" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/init.d /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c" CXXFLAGS="-O2 -pipe -march=k8" DISTDIR="/usr/portage/distfiles" FEATURES="digest distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/src/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X acl acpi amd64 apache2 bash-completion berkdb bitmap-fonts cli cracklib crypt cups dri firefox fortran gdbm gnome gnome2 gnutls gpm gtk gtk2 iconv imap ipv6 isdnlog libg++ mbox midi ncurses nls nptl ntpl pam pcre perl pic ppds pppd python readline reflection samba session spl ssl sysfs tcpd truetype-fonts type1-fonts unicode vim-with-x xattr xinerama xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="rage128" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
http://lkml.org/lkml/2006/10/3/393 During boot: uart_open() -> uart_get() // state->info allocated During 8250_pnp serial_pnp_probe(): serial8250_register_port() -> uart_remove_one_port() (state->info set to NULL, memory leaked) -> uart_add_one_port() (state->info not modified) During serial console operation: tty_write() -> write_chan() -> uart_write_room() (dereferences state->info) 8250_pnp probably shouldn't be allowed to deregister ports while they are open (refcounting needed?)
So I'm thumbed through the code myself, not that I really understand it, and it seems to me that either serial8250_register_port() shouldn't remove a port that it _knows_ is already registered (causing 8250_pnp to fail) or somewhere in the tty_write() call chain serial_mutex needs to be aquired. Where is uart_circ_chars_free() defined?
(In reply to comment #2) > So I'm thumbed through the code myself, not that I really understand it, and it > seems to me that either serial8250_register_port() shouldn't remove a port that > it _knows_ is already registered The function is specifically designed to remove already-registered ports. The removal codepath makes the assumptions that it has taken all the necessary precautions to ensure that there are no users. As you have seen, it hasn't :) > Where is uart_circ_chars_free() defined? include/linux/serial_core.h I don't see how acquiring serial_mutex would help. This isn't a race condition as such. Even if there were 5 minutes between 8250_pnp probe completing, and the next message being sent out of the serial console, the crash would still happen.
(In reply to comment #3) > I don't see how acquiring serial_mutex would help. This isn't a race condition > as such. Even if there were 5 minutes between 8250_pnp probe completing, and > the next message being sent out of the serial console, the crash would still > happen. OK - then I think you original suggestion is correct that there needs to be some sort of ref count or in use semaphore on the port struct so that serial8250_register_port() can't remove the port while it's still in use. I guess that means, at least on my system, that 8250_pnp will blowup during system startup. That's kinda ugly too as it'll prevent init from attaching a tty to the serial line.
Are you sure? Try running a system without that module -- I suspect it will work fine.
I can confirm that the serial console does work (without panic) when I remove the 8250_pnp driver. So what is the long term fix here? Changing serial8250_register_port() so that it will only operate on an unused port?
Not really sure. Please file this bug upstream at http://bugzilla.kernel.org and post the new URL here.
http://bugzilla.kernel.org/show_bug.cgi?id=8552
This is fixed in 2.6.22 rc kernels. I'm not going to backport the fix into 2.6.21 as it is fairly large, breaks speakup, is not an entirely common scenario and there is an easy workaround. (rest assured that this will be fixed as of 2.6.22, we'll be dropping speakup unless it gets fixed, just cannot drop it in the middle of 2.6.21 development)
It seems not to be fixed for hardened-sources (2.6.22-hardened-r8): * udev loading module pcspkr * udev loading module 8250_pnp�Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: [<ffffffff804001b8>] uart_write_room+0xb/0x19 PGD 118582067 PUD 11857f067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 2306, comm: modprobe.sh Not tainted 2.6.22-hardened-r8 #1 RIP: 0010:[<ffffffff804001b8>] [<ffffffff804001b8>] uart_write_room+0xb/0x19 RSP: 0018:ffff810118581e50 EFLAGS: 00010202 RAX: ffff81011b32de00 RBX: 0000000000000025 RCX: ffff81011a773210 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff81011a773000 RBP: ffff81011a773000 R08: 0000000000000001 R09: ffff810118581e68 R10: 0000000000000286 R11: 0000000000000246 R12: ffff81011a035800 R13: 0000000000000025 R14: ffff810118bdb380 R15: ffff81011a035800 FS: 00002ba48bfafb00(0000) GS:ffffffff805d9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000014 CR3: 000000011813e000 CR4: 00000000000006e0 Process modprobe.sh (pid: 2306, threadinfo ffff810118580000, task ffff8101184d8040) Stack: ffffffff803ee713 0000000000000000 ffff8101184d8040 ffffffff80224ccc ffff81011a773210 ffff81011a773210 2222222222222222 2222222222222222 0000000000000025 0000000000000025 ffff81011a773000 0000000000000025 Call Trace: [<ffffffff803ee713>] write_chan+0x102/0x2ff [<ffffffff80224ccc>] default_wake_function+0x0/0xe [<ffffffff803eba95>] tty_write+0x181/0x21c [<ffffffff803ee611>] write_chan+0x0/0x2ff [<ffffffff80279106>] vfs_write+0xad/0x136 [<ffffffff80279643>] sys_write+0x45/0x6e [<ffffffff80209cde>] system_call+0x7e/0x83 Code: 8b 42 14 2b 42 10 ff c8 25 ff 0f 00 00 c3 48 8b 87 60 02 00 RIP [<ffffffff804001b8>] uart_write_room+0xb/0x19 RSP <ffff810118581e50> CR2: 0000000000000014 Unable to handle kernel NULL pointer dereference at 0000000000000011 RIP: [<0000000000000011>] PGD 119485067 PUD 119484067 PMD 0 Oops: 0010 [2] SMP CPU 1 Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore Pid: 0, comm: swapper Not tainted 2.6.22-hardened-r8 #1 RIP: 0010:[<0000000000000011>] [<0000000000000011>] RSP: 0018:ffff81011b9a7e98 EFLAGS: 00010087 RAX: ffff81011a773210 RBX: ffff810100040001 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff810118581e58 RBP: ffff81011b9a7ed0 R08: ffff810118581e70 R09: ffff81011b9cbe90 R10: ffff81011a79b380 R11: ffffffff80404575 R12: 0000000000000001 R13: 0000000000000056 R14: ffff81011a773208 R15: 0000000000000000 FS: 00002aca2a96eb00(0000) GS:ffff81011b977ac0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000011 CR3: 0000000119424000 CR4: 00000000000006e0 Process swapper (pid: 0, threadinfo ffff81011b9a2000, task ffff81011b97d840) Stack: ffffffff8022227f 0000000100000000 ffff81011a773208 0000000000000000 0000000000000001 0000000000000282 0000000000000001 ffff81011b9a7f10 ffffffff802227b4 ffffffff804052ad ffff81011907bae0 000000000000000a Call Trace: <IRQ> [<ffffffff8022227f>] __wake_up_common+0x3e/0x68 [<ffffffff802227b4>] __wake_up+0x38/0x4f [<ffffffff804052ad>] serial8250_interrupt+0x3f/0x121 [<ffffffff8022e97a>] tasklet_action+0x53/0x9d [<ffffffff8022e8a4>] __do_softirq+0x55/0xc4 [<ffffffff8020ae6c>] call_softirq+0x1c/0x28 [<ffffffff8020c781>] do_softirq+0x2c/0x7d [<ffffffff8020c888>] do_IRQ+0xb6/0xd6 [<ffffffff80208fde>] default_idle+0x0/0x3d [<ffffffff8020a1f1>] ret_from_intr+0x0/0xa <EOI> [<ffffffff80404575>] serial8250_start_tx+0x0/0xb9 [<ffffffff80209007>] default_idle+0x29/0x3d [<ffffffff8020906c>] cpu_idle+0x51/0x70 Code: Bad RIP value. RIP [<0000000000000011>] RSP <ffff81011b9a7e98> CR2: 0000000000000011 Kernel panic - not syncing: Aiee, killing interrupt handler!