Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 178821 - gentoo-sources 2.6.19-r5 - 2.6.21 8250_pnp module oops' on load with console on serial port
Summary: gentoo-sources 2.6.19-r5 - 2.6.21 8250_pnp module oops' on load with console ...
Status: VERIFIED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High major
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: http://bugzilla.kernel.org/show_bug.c...
Whiteboard: linux-2.6.22
Keywords:
Depends on:
Blocks:
 
Reported: 2007-05-17 01:27 UTC by Joshua Hoblitt
Modified: 2007-11-02 02:55 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Hoblitt 2007-05-17 01:27:16 UTC
When directing the kernel "console" to a serial line the kernel almost always oops'.  2.6.19 seemed to have a slightly better chance of not blowing up while 2.6.21 seems to always fall over.  With all of these kernels the system will boot fine without the console redirected to serial and then happily attach a tty to the serial port (started by init).  I have tried this on two different motherboards (Tyan S2982 & S2927), at least 3 different kernels since 2.6.19, and with 3 different settings for the BIOS level console -> serial port redirection (off, always, and until OS loads).  I've also tried this with 8250_pnp compiled in statically and it still oops although I believe the oops' message was slightly different (I can recreate this if needed).  Since this oops' is recreatable on two different systems I'm assuming that it is either a code or configuration problem.

Here is the grub line used:

kernel (hd0,0)/boot/kernel-genkernel-x86_64-2.6.21-gentoo root=/dev/ram0 real_r
oot=/dev/sda3 init=/linuxrc console=tty0 console=ttyS0,115200n8 udev
initrd (hd0,0)/boot/initramfs-genkernel-x86_64-2.6.21-gentoo

Note that just one console line like "console=ttyS0,115200n8" will also oops.

Ironically, the system actually sends the oops message to the serial line:


 *   udev loading module 8250_pnp
 *   udev loading module 8250_pnp
Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP: 
 [<ffffffff80293e82>] uart_write_room+0xb/0x19
PGD 223501067 PUD 22287a067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: 8250_pnp pcspkr forcedeth arcmsr sg tg3 e1000 nfs nfs_acl lockd sunrpc raid10 raid1 raid0 dm_mirror dm_mod sata_nv libata sbp2 ohci1394 ieee1394 sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore
Pid: 3346, comm: modprobe.sh Not tainted 2.6.21-gentoo #6
RIP: 0010:[<ffffffff80293e82>]  [<ffffffff80293e82>] uart_write_room+0xb/0x19
RSP: 0018:ffff810222f3de00  EFLAGS: 00010202
RAX: ffff810123eb7400 RBX: ffff810123cb7000 RCX: ffff810123cb7228
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff810123cb7000
RBP: 0000000000000020 R08: 0000000000000001 R09: ffff810222f3de58
R10: 0000000000000022 R11: 0000000000000246 R12: ffff8101281b8c00
R13: 0000000000000020 R14: 0000000000000020 R15: 0000000000000000
FS:  00002b599b677e10(0000) GS:ffff810123e2cd40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000014 CR3: 000000022327d000 CR4: 00000000000006e0
Process modprobe.sh (pid: 3346, threadinfo ffff810222f3c000, task ffff810223bb40c0)
Stack:  ffffffff801185cf ffff8101281b8c00 ffff81022389de80 0000000000000000
 ffff810223bb40c0 ffffffff8017a7ec 0000000000000000 0000000000000000
 fffffffffffffffb 0000000000000000 ffff810223bb40c0 ffffffff8017a7ec
Call Trace:
 [<ffffffff801185cf>] write_chan+0x14f/0x32d
 [<ffffffff8017a7ec>] default_wake_function+0x0/0xe
 [<ffffffff8017a7ec>] default_wake_function+0x0/0xe
 [<ffffffff801268d7>] tty_write+0x188/0x22c
 [<ffffffff80118480>] write_chan+0x0/0x32d
 [<ffffffff80115531>] vfs_write+0xaf/0x131
 [<ffffffff80115e9a>] sys_write+0x45/0x6e
 [<ffffffff8015911e>] system_call+0x7e/0x83


Code: 8b 42 14 2b 42 10 ff c8 25 ff 0f 00 00 c3 48 8b 87 78 02 00 
RIP  [<ffffffff80293e82>] uart_write_room+0xb/0x19
 RSP <ffff810222f3de00>
CR2: 0000000000000014


Reproducible: Sometimes



Expected Results:  
Portage 2.1.2.2 (default-linux/amd64/2006.1, gcc-3.4.6, glibc-2.5-r0, 2.6.21-gentoo x86_64)
=================================================================
System uname: 2.6.21-gentoo x86_64 Dual-Core AMD Opteron(tm) Processor 2220
Gentoo Base System release 1.12.9
Timestamp of tree: Mon, 14 May 2007 21:30:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [disabled]
dev-java/java-config: 1.3.7, 2.0.31-r5
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     2.4-r6
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.15-r1
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=k8"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/init.d /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -pipe -march=k8"
DISTDIR="/usr/portage/distfiles"
FEATURES="digest distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/src/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X acl acpi amd64 apache2 bash-completion berkdb bitmap-fonts cli cracklib crypt cups dri firefox fortran gdbm gnome gnome2 gnutls gpm gtk gtk2 iconv imap ipv6 isdnlog libg++ mbox midi ncurses nls nptl ntpl pam pcre perl pic ppds pppd python readline reflection samba session spl ssl sysfs tcpd truetype-fonts type1-fonts unicode vim-with-x xattr xinerama xorg zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="rage128"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Daniel Drake (RETIRED) gentoo-dev 2007-05-17 12:52:36 UTC
http://lkml.org/lkml/2006/10/3/393

During boot:
 uart_open()
 -> uart_get() // state->info allocated

During 8250_pnp serial_pnp_probe():
 serial8250_register_port()
 -> uart_remove_one_port() (state->info set to NULL, memory leaked)
 -> uart_add_one_port() (state->info not modified)

During serial console operation:
 tty_write()
 -> write_chan()
  -> uart_write_room() (dereferences state->info)

8250_pnp probably shouldn't be allowed to deregister ports while they are open (refcounting needed?)
Comment 2 Joshua Hoblitt 2007-05-17 21:11:03 UTC
So I'm thumbed through the code myself, not that I really understand it, and it seems to me that either serial8250_register_port() shouldn't remove a port that it _knows_ is already registered (causing 8250_pnp to fail) or somewhere in the tty_write() call chain serial_mutex needs to be aquired.

Where is uart_circ_chars_free() defined?
Comment 3 Daniel Drake (RETIRED) gentoo-dev 2007-05-17 23:21:09 UTC
(In reply to comment #2)
> So I'm thumbed through the code myself, not that I really understand it, and it
> seems to me that either serial8250_register_port() shouldn't remove a port that
> it _knows_ is already registered

The function is specifically designed to remove already-registered ports.
The removal codepath makes the assumptions that it has taken all the necessary precautions to ensure that there are no users. As you have seen, it hasn't :)

> Where is uart_circ_chars_free() defined?

include/linux/serial_core.h


I don't see how acquiring serial_mutex would help. This isn't a race condition as such. Even if there were 5 minutes between 8250_pnp probe completing, and the next message being sent out of the serial console, the crash would still happen.
Comment 4 Joshua Hoblitt 2007-05-18 01:22:11 UTC
(In reply to comment #3)
> I don't see how acquiring serial_mutex would help. This isn't a race condition
> as such. Even if there were 5 minutes between 8250_pnp probe completing, and
> the next message being sent out of the serial console, the crash would still
> happen.

OK - then I think you original suggestion is correct that there needs to be some sort of ref count or in use semaphore on the port struct so that serial8250_register_port() can't remove the port while it's still in use.  I guess that means, at least on my system, that 8250_pnp will blowup during system startup.  That's kinda ugly too as it'll prevent init from attaching a tty to the serial line.
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2007-05-18 02:21:45 UTC
Are you sure? Try running a system without that module -- I suspect it will work fine.
Comment 6 Joshua Hoblitt 2007-05-19 00:44:02 UTC
I can confirm that the serial console does work (without panic) when I remove the 8250_pnp driver.

So what is the long term fix here?  Changing serial8250_register_port() so that it will only operate on an unused port?
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2007-05-24 00:53:04 UTC
Not really sure. Please file this bug upstream at http://bugzilla.kernel.org and post the new URL here.
Comment 8 Joshua Hoblitt 2007-05-30 00:54:47 UTC
http://bugzilla.kernel.org/show_bug.cgi?id=8552
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2007-06-02 00:43:10 UTC
This is fixed in 2.6.22 rc kernels. I'm not going to backport the fix into 2.6.21 as it is fairly large, breaks speakup, is not an entirely common scenario and there is an easy workaround. (rest assured that this will be fixed as of 2.6.22, we'll be dropping speakup unless it gets fixed, just cannot drop it in the middle of 2.6.21 development)
Comment 10 Matthias Nagl 2007-11-02 02:55:22 UTC
It seems not to be fixed for hardened-sources (2.6.22-hardened-r8):

 *   udev loading module pcspkr
 *   udev loading module 8250_pnp�Unable to handle kernel NULL pointer dereference at 0000000000000014 RIP:
 [<ffffffff804001b8>] uart_write_room+0xb/0x19
PGD 118582067 PUD 11857f067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore
Pid: 2306, comm: modprobe.sh Not tainted 2.6.22-hardened-r8 #1
RIP: 0010:[<ffffffff804001b8>]  [<ffffffff804001b8>] uart_write_room+0xb/0x19
RSP: 0018:ffff810118581e50  EFLAGS: 00010202
RAX: ffff81011b32de00 RBX: 0000000000000025 RCX: ffff81011a773210
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff81011a773000
RBP: ffff81011a773000 R08: 0000000000000001 R09: ffff810118581e68
R10: 0000000000000286 R11: 0000000000000246 R12: ffff81011a035800
R13: 0000000000000025 R14: ffff810118bdb380 R15: ffff81011a035800
FS:  00002ba48bfafb00(0000) GS:ffffffff805d9000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000014 CR3: 000000011813e000 CR4: 00000000000006e0
Process modprobe.sh (pid: 2306, threadinfo ffff810118580000, task ffff8101184d8040)
Stack:  ffffffff803ee713 0000000000000000 ffff8101184d8040 ffffffff80224ccc
 ffff81011a773210 ffff81011a773210 2222222222222222 2222222222222222
 0000000000000025 0000000000000025 ffff81011a773000 0000000000000025
Call Trace:
 [<ffffffff803ee713>] write_chan+0x102/0x2ff
 [<ffffffff80224ccc>] default_wake_function+0x0/0xe
 [<ffffffff803eba95>] tty_write+0x181/0x21c
 [<ffffffff803ee611>] write_chan+0x0/0x2ff
 [<ffffffff80279106>] vfs_write+0xad/0x136
 [<ffffffff80279643>] sys_write+0x45/0x6e
 [<ffffffff80209cde>] system_call+0x7e/0x83


Code: 8b 42 14 2b 42 10 ff c8 25 ff 0f 00 00 c3 48 8b 87 60 02 00
RIP  [<ffffffff804001b8>] uart_write_room+0xb/0x19
 RSP <ffff810118581e50>
CR2: 0000000000000014

Unable to handle kernel NULL pointer dereference at 0000000000000011 RIP:
 [<0000000000000011>]
PGD 119485067 PUD 119484067 PMD 0
Oops: 0010 [2] SMP
CPU 1
Modules linked in: 8250_pnp pcspkr k8temp sg nfs lockd sunrpc jfs dm_mirror dm_mod scsi_wait_scan sl811_hcd usbhid ff_memless ohci_hcd uhci_hcd usb_storage ehci_hcd usbcore
Pid: 0, comm: swapper Not tainted 2.6.22-hardened-r8 #1
RIP: 0010:[<0000000000000011>]  [<0000000000000011>]
RSP: 0018:ffff81011b9a7e98  EFLAGS: 00010087
RAX: ffff81011a773210 RBX: ffff810100040001 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff810118581e58
RBP: ffff81011b9a7ed0 R08: ffff810118581e70 R09: ffff81011b9cbe90
R10: ffff81011a79b380 R11: ffffffff80404575 R12: 0000000000000001
R13: 0000000000000056 R14: ffff81011a773208 R15: 0000000000000000
FS:  00002aca2a96eb00(0000) GS:ffff81011b977ac0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000011 CR3: 0000000119424000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff81011b9a2000, task ffff81011b97d840)
Stack:  ffffffff8022227f 0000000100000000 ffff81011a773208 0000000000000000
 0000000000000001 0000000000000282 0000000000000001 ffff81011b9a7f10
 ffffffff802227b4 ffffffff804052ad ffff81011907bae0 000000000000000a
Call Trace:
 <IRQ>  [<ffffffff8022227f>] __wake_up_common+0x3e/0x68
 [<ffffffff802227b4>] __wake_up+0x38/0x4f
 [<ffffffff804052ad>] serial8250_interrupt+0x3f/0x121
 [<ffffffff8022e97a>] tasklet_action+0x53/0x9d
 [<ffffffff8022e8a4>] __do_softirq+0x55/0xc4
 [<ffffffff8020ae6c>] call_softirq+0x1c/0x28
 [<ffffffff8020c781>] do_softirq+0x2c/0x7d
 [<ffffffff8020c888>] do_IRQ+0xb6/0xd6
 [<ffffffff80208fde>] default_idle+0x0/0x3d
 [<ffffffff8020a1f1>] ret_from_intr+0x0/0xa
 <EOI>  [<ffffffff80404575>] serial8250_start_tx+0x0/0xb9
 [<ffffffff80209007>] default_idle+0x29/0x3d
 [<ffffffff8020906c>] cpu_idle+0x51/0x70


Code:  Bad RIP value.
RIP  [<0000000000000011>]
 RSP <ffff81011b9a7e98>
CR2: 0000000000000011
Kernel panic - not syncing: Aiee, killing interrupt handler!