| Summary: | soft lockup detected on CPU#0! (ahci|ata|usb_hcd) on Asus P5W DH Delux | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Bjoern Olausson <contactme> |
| Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
| Status: | RESOLVED OBSOLETE | ||
| Severity: | critical | ||
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| URL: | http://bugzilla.kernel.org/show_bug.cgi?id=8259 | ||
| Whiteboard: | watch-linux-bugzilla | ||
| Package list: | Runtime testing required: | --- | |
|
Description
Bjoern Olausson
2007-03-16 19:46:03 UTC
Softlock Nr.4 2007.03.13 19:00:14 BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254eb0>] softlockup_tick+0xda/0xf5 [<ffffffff8023a7da>] update_process_times+0x42/0x68 [<ffffffff80216bb1>] smp_local_timer_interrupt+0x34/0x52 [<ffffffff80217285>] smp_apic_timer_interrupt+0x44/0x5f [<ffffffff8020a016>] apic_timer_interrupt+0x66/0x70 [<ffffffff80255171>] handle_IRQ_event+0x1a/0x53 [<ffffffff802562e3>] handle_edge_irq+0xe4/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff8025a992>] pfn_to_page+0x2e/0x36 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff802777fc>] kmem_cache_free+0x40/0x1b0 [<ffffffff880dcdbd>] :sky2:sky2_tx_complete+0xc9/0x134 [<ffffffff880deb8d>] :sky2:sky2_poll+0x76f/0x920 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff804d8b12>] net_rx_action+0xae/0x1b7 [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> irq 1275: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff80255b53>] __report_bad_irq+0x30/0x72 [<ffffffff80255d53>] note_interrupt+0x1be/0x203 [<ffffffff802562f8>] handle_edge_irq+0xf9/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff8025a992>] pfn_to_page+0x2e/0x36 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff802777fc>] kmem_cache_free+0x40/0x1b0 [<ffffffff880dcdbd>] :sky2:sky2_tx_complete+0xc9/0x134 [<ffffffff880deb8d>] :sky2:sky2_poll+0x76f/0x920 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff804d8b12>] net_rx_action+0xae/0x1b7 [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> handlers: [<ffffffff8045b039>] (ahci_interrupt+0x0/0x45a) Disabling IRQ #1275 Softlock Nr.5 2007.03.16 17:00:14 BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254eb0>] softlockup_tick+0xda/0xf5 [<ffffffff8023a7da>] update_process_times+0x42/0x68 [<ffffffff80216bb1>] smp_local_timer_interrupt+0x34/0x52 [<ffffffff80217285>] smp_apic_timer_interrupt+0x44/0x5f [<ffffffff8020a016>] apic_timer_interrupt+0x66/0x70 [<ffffffff80255171>] handle_IRQ_event+0x1a/0x53 [<ffffffff802562e3>] handle_edge_irq+0xe4/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff804cc8a0>] pci_conf1_read+0x0/0xc6 [<ffffffff88128967>] :nvidia:_nv003350rm+0xf/0x10 [<ffffffff8830f5af>] :nvidia:_nv007025rm+0x7d/0xb0 [<ffffffff883fb35d>] :nvidia:_nv001501rm+0x29/0xfe [<ffffffff883f9bf6>] :nvidia:_nv001505rm+0x2e/0x4e [<ffffffff883f9dd1>] :nvidia:_nv001511rm+0x39/0x52 [<ffffffff8824b21f>] :nvidia:_nv005982rm+0x3b/0x108 [<ffffffff8841bb89>] :nvidia:_nv009292rm+0xe9/0x658 [<ffffffff88115032>] :nvidia:_nv003618rm+0xe/0xdc [<ffffffff882623c8>] :nvidia:_nv009294rm+0x50/0x64 [<ffffffff8838a555>] :nvidia:_nv004932rm+0x165/0x4fa [<ffffffff8838097f>] :nvidia:_nv004943rm+0x8b/0xd2 [<ffffffff8812afaf>] :nvidia:_nv002554rm+0x99/0xbe [<ffffffff8812fde1>] :nvidia:rm_isr_bh+0x53/0x56 [<ffffffff88433377>] :nvidia:nv_kern_isr_bh+0x16/0x18 [<ffffffff80236af8>] tasklet_action+0x53/0x9d [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> irq 14: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff80255b53>] __report_bad_irq+0x30/0x72 [<ffffffff80255d53>] note_interrupt+0x1be/0x203 [<ffffffff802562f8>] handle_edge_irq+0xf9/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff804cc8a0>] pci_conf1_read+0x0/0xc6 [<ffffffff88128967>] :nvidia:_nv003350rm+0xf/0x10 [<ffffffff8830f5af>] :nvidia:_nv007025rm+0x7d/0xb0 [<ffffffff883fb35d>] :nvidia:_nv001501rm+0x29/0xfe [<ffffffff883f9bf6>] :nvidia:_nv001505rm+0x2e/0x4e [<ffffffff883f9dd1>] :nvidia:_nv001511rm+0x39/0x52 [<ffffffff8824b21f>] :nvidia:_nv005982rm+0x3b/0x108 [<ffffffff8841bb89>] :nvidia:_nv009292rm+0xe9/0x658 [<ffffffff88115032>] :nvidia:_nv003618rm+0xe/0xdc [<ffffffff882623c8>] :nvidia:_nv009294rm+0x50/0x64 [<ffffffff8838a555>] :nvidia:_nv004932rm+0x165/0x4fa [<ffffffff8838097f>] :nvidia:_nv004943rm+0x8b/0xd2 [<ffffffff8812afaf>] :nvidia:_nv002554rm+0x99/0xbe [<ffffffff8812fde1>] :nvidia:rm_isr_bh+0x53/0x56 [<ffffffff88433377>] :nvidia:nv_kern_isr_bh+0x16/0x18 [<ffffffff80236af8>] tasklet_action+0x53/0x9d [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> handlers: [<ffffffff80452fcc>] (ata_interrupt+0x0/0x206) Disabling IRQ #14 Softlock Nr.6 2007.03.16 20:16:05 BUG: soft lockup detected on CPU#0! Call Trace: <IRQ> [<ffffffff80254eb0>] softlockup_tick+0xda/0xf5 [<ffffffff8023a7da>] update_process_times+0x42/0x68 [<ffffffff80216bb1>] smp_local_timer_interrupt+0x34/0x52 [<ffffffff80217285>] smp_apic_timer_interrupt+0x44/0x5f [<ffffffff8020a016>] apic_timer_interrupt+0x66/0x70 [<ffffffff80255171>] handle_IRQ_event+0x1a/0x53 [<ffffffff802562e3>] handle_edge_irq+0xe4/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff80259b9e>] mempool_free_slab+0x0/0xe [<ffffffff80435d47>] scsi_put_command+0x48/0x61 [<ffffffff80439370>] scsi_next_command+0x25/0x39 [<ffffffff8043959a>] scsi_end_request+0xbb/0xc9 [<ffffffff804396e5>] scsi_io_completion+0xec/0x2c9 [<ffffffff8044798e>] sd_rw_intr+0x188/0x1b3 [<ffffffff80580f1e>] _spin_unlock_irqrestore+0x16/0x31 [<ffffffff80399dcc>] blk_done_softirq+0x5c/0x6a [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> irq 1275: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> [<ffffffff80255b53>] __report_bad_irq+0x30/0x72 [<ffffffff80255d53>] note_interrupt+0x1be/0x203 [<ffffffff802562f8>] handle_edge_irq+0xf9/0x128 [<ffffffff8020ba39>] do_IRQ+0xf1/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa [<ffffffff80259b9e>] mempool_free_slab+0x0/0xe [<ffffffff80435d47>] scsi_put_command+0x48/0x61 [<ffffffff80439370>] scsi_next_command+0x25/0x39 [<ffffffff8043959a>] scsi_end_request+0xbb/0xc9 [<ffffffff804396e5>] scsi_io_completion+0xec/0x2c9 [<ffffffff8044798e>] sd_rw_intr+0x188/0x1b3 [<ffffffff80580f1e>] _spin_unlock_irqrestore+0x16/0x31 [<ffffffff80399dcc>] blk_done_softirq+0x5c/0x6a [<ffffffff80236a2c>] __do_softirq+0x49/0xb8 [<ffffffff8020a56c>] call_softirq+0x1c/0x28 [<ffffffff8020b8f7>] do_softirq+0x2c/0x7d [<ffffffff802369d7>] irq_exit+0x36/0x42 [<ffffffff8020ba85>] do_IRQ+0x13d/0x160 [<ffffffff80209931>] ret_from_intr+0x0/0xa <EOI> handlers: [<ffffffff8045b039>] (ahci_interrupt+0x0/0x45a) Disabling IRQ #1275 Erm, don't you think that telling us which kernel version(s) are affected would be important here? Sry, I had to remove the "emerge --info" because the comment was to long so i just forgot it. Currently I am using vanilla 2.6.21-r1 which solves a bug with my NIC so. All Kernel version from 2.6.19 to 2.6.21.r1 Here is emerge --info Portage 2.1.2.2 (default-linux/amd64/2006.1, gcc-4.1.1, glibc-2.5-r0, 2.6.21-rc1 x86_64) ================================================================= System uname: 2.6.21-rc1 x86_64 Intel(R) Core(TM)2 CPU 6600 @ 2.40GHz Gentoo Base System release 1.12.9 Timestamp of tree: Fri, 16 Mar 2007 17:50:01 +0000 dev-java/java-config: 1.3.7, 2.0.31 dev-lang/python: 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.61 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.14 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r2 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="ja" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config" CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c" CXXFLAGS="-march=nocona -O2 -pipe -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict" GENTOO_MIRRORS="ftp://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ ftp://ftp.easynet.nl/mirror/gentoo/ http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo" LANG="de_DE.utf8" LC_ALL="de_DE.utf8" LINGUAS="de sv" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/xeffects" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X a52 aac aalib aiglx alsa amd64 ares asf bash-completion berkdb bitmap-fonts bittorrent bluetooth bzip2 cairo cdparanoia cli connectionstatus cpudetection cracklib crypt css cups curl dbus dga divx4linux dri dts dvd dvdr dvdread edl emovix encode exif fam fbcon ffmpeg flac fortran gdbm gif gimp glitz gnutls gpm gtk gtk2 hal highlight history iconv imagemagick isdnlog java jpeg jpeg2k kde libg++ lirc live lm_sensors logitech-mouse lzo mad madwifi matroska metalink midi modplug mp3 musepack musicbrainz mythtv ncurses network nfs nls nptl nptlonly nsplugin nvidia ogg openal opengl pam pam_console pcre pda pdf perl png ppds pppd python qt3 quicktime readline reflection rtc samba scanner sdl session sndfile spell spl ssl svg tcltk tcpd theora tiff tk transcode transparency truetype truetype-fonts type1-fonts unicode usb utempter v4l v4l2 vcd vorbis xine xinerama xml xorg xvid yahoo zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de sv" LIRC_DEVICES="dvico" USERLAND="GNU" VIDEO_CARDS="nvidia vesa fbdev" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS Reopen... I tried without ALPE and ASP but I still get softlocks. After I disabled ALPE and ASP I got a softlock targeting [<ffffffff80487893>] (usb_hcd_irq+0x0/0x52) Disabling IRQ #18 I'll see if the softlocks with target on ahci and ata will occure after turning ALPE and ASP off regards Bjoern Nothing changed when turning ALPE and ASP off. So now I am in a catch 22. Any ideas? regards Bjoern Please reproduce this without the binary nvidia module loaded (and make sure it is not loaded at any point that boot). Also post /proc/interrupts before and after the soft lockup (if possible). I switched to latest kernel 2.6.21-rc5 and now its hard to reproduce. I switched to that kernel because I noticed some changes to the jmicron driver in the changelog. The System locks now completely so I can't access any logs nore can I read any error msg. This happens both with nvidia and without nvidia driver. But I can't guarantee that its the same errro. Shold I try to hunt the bug with the latest kernel or should I switch back to the one where I got a running system back after a lock? Do you have any Ideas how I can get any error messages except sitting in front of the monitor and staring at a TTY wher the error will be dumped? I'll invastigate further and post my results here. Suggestions how to get debug outbut on such a crash would be nice. regards blubbi Stick with 2.6.21_rc5 for now. The hard hang appears randomly after some time, or what? Have you ever seen it happen while on the console without X running? Okay, Here's the error without nvidia CS driver. All I can give you is a picture of the error: http://olausson.name/temp/IMG_4192.JPG Any ideas what this error is about? Irealized that I had ex4dv as filesystem. So I switched back zo ext3 and bootet again with the nvidia CS driver and the ps locked again. Now I am gonna try to geht a "Monitor Shot" with ext3 and NV driver. Regards blubbi Here are my interrupts:
CPU0 CPU1
0: 75418 0 IO-APIC-edge timer
1: 2 0 IO-APIC-edge i8042
6: 3 0 IO-APIC-edge floppy
8: 1 0 IO-APIC-edge rtc
9: 0 0 IO-APIC-fasteoi acpi
12: 4 0 IO-APIC-edge i8042
14: 631 0 IO-APIC-edge libata
15: 0 0 IO-APIC-edge libata
16: 0 0 IO-APIC-fasteoi libata
17: 4178 0 IO-APIC-fasteoi libata, uhci_hcd:usb3
18: 9517 0 IO-APIC-fasteoi uhci_hcd:usb4
19: 262 0 IO-APIC-fasteoi uhci_hcd:usb5, HDA Intel
20: 1841 0 IO-APIC-fasteoi ehci_hcd:usb1, uhci_hcd:usb2
21: 3 0 IO-APIC-fasteoi ohci1394
22: 3 0 IO-APIC-fasteoi bttv0
23: 17053 0 IO-APIC-fasteoi wifi0
1274: 641 0 PCI-MSI-edge eth0
1275: 8285 0 PCI-MSI-edge libata
NMI: 0 0
LOC: 74001 73927
ERR: 0
Finally I got a screenshot output for you from the whole bug. Luckiely I had a konsole open with some text in it. The system got locked, keyboard is no longer funktional. All I could use was my mouse. So I tried to copy and paste the letters for "dmesg > BUG" together with the mous and used a whole marked line to send it of. For securety reasons I have taken a screenshot to. I got the screenhot in that way: Again with the mouse searched all letters for "cat BUG" and then took a screenshot of the output. And it was good that way... after a reboot I couldn't find the BUG file.... so here's the "MonitorShot" http://olausson.name/temp/IMG_4193.JPG This one is now without nvidia CS driver and without ext4dev. regards Bjoern Thats useful, thanks. It seems like an odd problem though, possibly a hardware issue. Can you run memtest for a few passes and check that it doesn't bring up any errors? (In reply to comment #13) > Thats useful, thanks. It seems like an odd problem though, possibly a hardware > issue. Can you run memtest for a few passes and check that it doesn't bring up > any errors? > No problem, gonna run it tomorrow. Thanks for your help. I could provide a remote access (ssh) if it would help. regards Bjoern Menmtest86+ now ran for 3,5 houres. 10 Tests without any error. Want to access this nasty machine via ssh? regrads and thanks Bjoern Whats your most favoured guess: 1) Hardware error 2) Driver bug if 1) I'll go and grab a new board if 2) I'll pray that it will be fixed ;-) When I run WinXP my CD-R (Plextor, pata) drive, connected to the jmicron controller, is not found. I have to use the DeviceManager to "search for new hardware" after this procedure my CD-R drive is shown in the Explorer and in the DeviceManager. After this everything works fine. anything more I can do to help you? regrads Bjoern By the way, I posted the same bug on bugzilla.kernel.org ( http://bugzilla.kernel.org/show_bug.cgi?id=8259 ) but I didn't get an answere till now. Therefor I started the bug in the gentoo bug tracker. Maybe we should focus on one bugtracker. Maybe the kernel.org bugtracker? http://bugs.gentoo.org/show_bug.cgi?id=171185 regards Bjoern We'll watch the upstream bug, thanks. |