Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 184411 - adding USB disk crashes whole USB system
Summary: adding USB disk crashes whole USB system
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-06 12:16 UTC by Michael Elbaum
Modified: 2010-04-20 14:16 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
.config for 2.6.20-gentoo-r8 (config-2.6.20-gentoo-r8,72.82 KB, text/plain)
2007-07-06 12:23 UTC, Michael Elbaum
Details
dmesg output, just after plugging in the disk (dmesg.output,30.53 KB, text/plain)
2007-07-06 12:25 UTC, Michael Elbaum
Details
lspci -v (lspci,5.49 KB, text/plain)
2007-07-06 12:26 UTC, Michael Elbaum
Details
lsusb -v (lsusb.output,21.46 KB, text/plain)
2007-07-06 12:27 UTC, Michael Elbaum
Details
.config for vanilla-sources-2.6.22-rc7 (config-2.6.22-rc7,38.69 KB, text/plain)
2007-07-06 12:28 UTC, Michael Elbaum
Details
udevmonitor --env (udevmonitor.log,32.00 KB, text/plain)
2007-07-06 12:30 UTC, Michael Elbaum
Details
dmesg output immediately after plugging in the disk (dmesg.pluggedin,30.48 KB, text/plain)
2007-07-06 18:30 UTC, Michael Elbaum
Details
dmesg output with usb crashed by updatedb (dmesg.usb_crashed,30.48 KB, text/plain)
2007-07-06 18:32 UTC, Michael Elbaum
Details
tail of /var/log/messages during crash (msgs,2.50 KB, text/plain)
2007-07-20 21:05 UTC, Michael Elbaum
Details
/var/log/messages from boot, disk mount, and crash (usb-crash_messages,394.96 KB, text/plain)
2007-07-23 21:51 UTC, Michael Elbaum
Details
/var/log/messages in VT without hald or xdm (messages.VTonly.bz2,72.91 KB, text/plain)
2007-07-24 21:44 UTC, Michael Elbaum
Details
dmesg after crash, then unplug and replug (dmesg-2.6.23-outandinagain,59.84 KB, text/plain)
2007-07-31 22:01 UTC, Michael Elbaum
Details
.config for 2.6.23-rc1 (.config,38.18 KB, text/plain)
2007-07-31 22:16 UTC, Michael Elbaum
Details
/var/log/messages after the USB crash (bzip2'ed) (messages-2.6.23.bz2,251.26 KB, text/plain)
2007-07-31 22:17 UTC, Michael Elbaum
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Elbaum 2007-07-06 12:16:23 UTC
This is referred from Bug #182808.

When I add an external USB hard disk it first mounts and appears to work normally, but with heavy access it freezes, and then I've lost ALL usb devices. A PS/2 keyboard and mouse still work so I can still look around. Updatedb is a sure way to kill with a large disk (100 GB). Post-mortem, there's no record in /var/log/messages. Doing lsusb freezes the shell. (I'll post udevmonitor in a sec.) Doing lsof I can see the updatedb process with whatever file it had open at the moment. Fortunately it doesn't scorrupt the disk!

Shutdown doesn't work either, except with the hard reset or power button. Gnome goes down but then the screen locks. Switching to another VT I can still log in.

My original problem was with gentoo-sources-2.6.20-r8/genkernel. I repeated the problem with vanilla-sources-2.6.22-rc7 and a manual config.

Reproducible: Always

Steps to Reproduce:
1.plug in the disk
2.start using it, or do updatedb to make sure it crashes.
3.




emerge --info

Portage 2.1.2.7 (default-linux/x86/2007.0/desktop, gcc-4.1.2, glibc-2.5-r3, 2.6.20-gentoo-r8 i686)
=================================================================
System uname: 2.6.20-gentoo-r8 i686 Intel(R) Core(TM)2 CPU 4300 @ 1.80GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Thu, 05 Jul 2007 01:47:01 +0000
dev-java/java-config: 1.3.7, 2.0.32
dev-lang/python:     2.4.4-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.17
sys-devel/gcc-config: 1.3.16
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=nocona -O2 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/X11/xkb"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-march=nocona -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://mirror.hamakor.org.il/pub/mirrors/gentoo http://distfiles.gentoo.org http://www.ibiblio.org/pub/Linux/distributions/gentoo"
LINGUAS="en he fr de"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X Xaw3d acl acpi alsa arts avi berkdb bidi bindist bitmap-fonts bonobo bzlib cairo cdr cli cracklib crypt cscope cups dbus dga dhcp divx4linux dri dv dvd dvdr dvdread eds emboss encode esd evo exif f77 fam fftw firefox flac flash foomaticdb fortran ftp gdbm gif gimpprint ginac gnome gphoto2 gpm gstreamer gtk gtk2 hal iconv ieee1394 imagemagick ipv6 isdnlog java jpeg jpeg2k kerberos lcms ldap libg++ mad midi mikmod ming mmx motif mp3 mpeg mplayer mudflap ncurses nls nptl nptlonly nvidia ogg oggvorbis opengl openmp oss pam pcmcia pcre pdf pdflib perl png pnp ppds pppd python qt3 qt3support qt4 quicktime readline reflection samba sdk sdl session spell spl sse sse2 sse3 ssl svg tcpd tetex theora tiff truetype truetype-fonts trusted type1-fonts unicode usb vorbis win32codecs wmf x86 xinerama xml xorg xv xvid zeo zlib" ALSA_CARDS="hda-intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" FOO2ZJS_DEVICES="hp1020" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en he fr de" USERLAND="GNU" VIDEO_CARDS="nvidia"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Michael Elbaum 2007-07-06 12:23:58 UTC
Created attachment 124038 [details]
.config for 2.6.20-gentoo-r8
Comment 2 Michael Elbaum 2007-07-06 12:25:24 UTC
Created attachment 124040 [details]
dmesg output, just after plugging in the disk
Comment 3 Michael Elbaum 2007-07-06 12:26:42 UTC
Created attachment 124042 [details]
lspci -v
Comment 4 Michael Elbaum 2007-07-06 12:27:04 UTC
Created attachment 124043 [details]
lsusb -v
Comment 5 Michael Elbaum 2007-07-06 12:28:11 UTC
Created attachment 124044 [details]
.config for vanilla-sources-2.6.22-rc7
Comment 6 Michael Elbaum 2007-07-06 12:30:31 UTC
Created attachment 124045 [details]
udevmonitor --env

I left udevmonitor --env running while doing updatedb so that the events would be recorded during the crash.
Comment 7 Maarten Bressers (RETIRED) gentoo-dev 2007-07-06 16:24:58 UTC
Could you please turn on these options in your kernel .config:
CONFIG_USB_DEBUG=y
CONFIG_USB_STORAGE_DEBUG=y
then boot the new kernel, mount the disk and run updatedb. Then please post the new dmesg output.
Comment 8 Michael Elbaum 2007-07-06 18:30:48 UTC
Created attachment 124078 [details]
dmesg output immediately after plugging in the disk
Comment 9 Michael Elbaum 2007-07-06 18:32:39 UTC
Created attachment 124079 [details]
dmesg output with usb crashed by updatedb
Comment 10 Michael Elbaum 2007-07-06 18:34:31 UTC
I used the 2.6.20 kernel with debugging enabled. Let me know if you have another preference.
Comment 11 Michael Elbaum 2007-07-07 21:24:09 UTC
I found what look like two related issues:

http://ozlabs.org/pipermail/linuxppc-embedded/2006-October/024638.html
This post reports exactly the same error messages and is marked FIXED, though I don't know what to do with the information.

https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.15/+bug/77971/comments/6
This one reports a similar behavior in Ubuntu. Sure enough, I have a similar crash on 7.04. In fact it's worse. As the bug says, USB goes down right away when the disk is inserted. I'm not familiar enough with Ubuntu or Debian to do much snooping though.

I should add that the same disk (actually disk and disk-on-key) work fine on two other Gentoo computers with the 2.6.20 kernel, but one of those uses UHCI for USB2 and the other only has USB1.
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2007-07-08 04:15:08 UTC
(In reply to comment #11)
> but one of those uses UHCI for USB2

That's not possible, UHCI is 1.1 only. EHCI is the only widespread USB 2.0 implementation.
Comment 13 Michael Elbaum 2007-07-08 05:39:21 UTC
you're right of course. here is the lspci list for the computer (IBM Thinkpad) that does NOT crash with the external disk:

00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 03)
00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 01)
01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]
02:00.0 CardBus bridge: Texas Instruments PCI4520 PC card Cardbus Controller (rev 01)
02:00.1 CardBus bridge: Texas Instruments PCI4520 PC card Cardbus Controller (rev 01)
02:01.0 Ethernet controller: Intel Corporation 82540EP Gigabit Ethernet Controller (Mobile) (rev 03)
02:02.0 Network controller: Intel Corporation PRO/Wireless 2200BG Network Connection (rev 05)
Comment 14 Maarten Bressers (RETIRED) gentoo-dev 2007-07-20 19:20:07 UTC
When you say "Doing lsusb freezes the shell" you mean after the disk crashes, right? What happens when you unplug and replug the disk after a crash?

Since you are able to use your ps/2 keyboard after a crash, and you have SysRq support in your kernel, can you try the following: switch to a VT, press Alt+SysRq+9 followed by Alt+SysRq+t, and post the output here.

What's the latest kernel version that didn't have this problem for you?

Just an idea: you can check if there is a BIOS update available for your motherboard, and if so, update your BIOS.
Comment 15 Michael Elbaum 2007-07-20 21:04:47 UTC
First, I should have said that lsusb freezes the gnome-terminal after a crash, not a real terminal shell which I didn't try. I just close the window. When I pull the disk out I see it in the log, but there's no notice when I plug it back in.

I updated the BIOS and made it through a whole updatedb, 82 GB. once. I tried again and the USB crashed. Maybe this is useful: while updatedb was running the first time, the keyboard often repeated characters, even after a delay of a second or so. Here's an example of the alphabet:
abcdefghijkkkkklmmmmmmmnooooooooo
The skipping lasted a bit after updatedb finished, and then stopped.

I tried the Alt+SysRq+9 and +t but nothing appeared. It may be a key mapping problem because I only got a response for +5 and +6, and only with the left Alt key. I don't know what this should do so I don't know where to start troubleshooting.

It's a new computer so 2.6.20 is the first kernel. The USB disk worked fine in Windows (at least before the BIOS update), though I never exercised it as hard as updatedb.
Comment 16 Michael Elbaum 2007-07-20 21:05:59 UTC
Created attachment 125504 [details]
tail of /var/log/messages during crash
Comment 17 Maarten Bressers (RETIRED) gentoo-dev 2007-07-20 22:07:26 UTC
Here's how you use SysRq:
- Switch to a VT (Ctrl-Alt-F2) and login.
- Run updatedb, wait for a crash/freeze.
- Press and hold down left Alt key, press and hold down the SysRq key, then press '9', then release all keys. A message will appear: "SysRq: Changing Loglevel. Loglevel set to 9". Then press Enter to get your prompt back.
- Press and hold down left Alt key, press and hold down the SysRq key, then press 't', then release all keys. This will produce a lot of output, which will also go into your /var/log/messages, and you can post that here. Press Enter to get your prompt back.
- Then try running lsusb and unplugging and replugging the disk, and see what happens with that.

That should give us some more data to examine. Thank you for your help.
Comment 18 Michael Elbaum 2007-07-21 22:28:37 UTC
Turned out I didn't have SysRq support in the 2.6.20-gentoo-r8 kernel that I was using. That's most of why it didn't work. After compiling it in though, it would only give me loglevels 5 and 6. no response to any other number key with Alt+SysRq, and I don't know what is the default loglevel. I ran updatedb from the VT and it did not crash. I'll try again tomorrow, but it's interesting that the USB keyboard did not skip (repeat) in the VT. I switched back to X in VT 7 and the USB keyboard did skip as before.

I started a new /var/log/messages file for this exercise. Between the verbose USB debugging and a couple of Alt+SysRq+t hits, it's at 347 MB. Is there something I should look for in particular? Of course it will only be interesting when the crash repeats.

If it will be useful I can try an old kernel. The ubuntu post suggests that a similar problem goes back to 2.6.15.
Comment 19 Maarten Bressers (RETIRED) gentoo-dev 2007-07-21 23:11:10 UTC
No need to try 2.6.15, that one's not supported anymore, let's stick with a recent kernel. You've run these last tests with 2.6.20-r8, I recommend you use 2.6.22-r1 for your next tests, that's the most recent gentoo-sources at this time.

First off, I'm really struck by the fact that neither problem (disk crashing / keyboard repeating) occurred when you tested in a VT (and the keyboard did start repeating when you switched back to X). Please try to trigger the bug in a VT, just run updatedb several times if you have to (or maybe run some HDD benchmarks, really just anything that will trigger a crash).

About SysRq: strange that you can only activate loglevels 5 and 6, however just use 6, the higher the number, the more output you get, that's all there's to it.

And about the logging, you're right, 347 MB is a lot :) Like you said, it only becomes interesting if you see a crash. When that happens, use Alt+SysRq+6 and Alt+SysRq+t and post the call traces. You can start out with an empty /var/log/messages after each succesful (not crashing) updatedb.

Thank you again for helping us get to the bottom of this.
Comment 20 Michael Elbaum 2007-07-23 21:49:56 UTC
This was a little messy. The 2.6.22 kernel doesn't boot since the BIOS update. The screen output arrives at 
Clocksource tsc unstable (delta= - ,a big number of. ns)
Time: hpet clocksource has been installed.
HPET is enabled in both kernels. That's about all I know about it. The 2.6.20 kernel still boots, fortunately. It no longer syncs the system clock to the hardware clock though, with a message about select() to /dev/rts. The USB problem predates the clock issue, so I assume that this is a separate problem, maybe a bug, maybe not. I include it here just in case it's relevant.

I repeated the crash in a VT. To be precise, I let gdm load X, logged in, plugged in the disk to have hal mount it automatically, and then with the nautilus window open I switched to the VT. I did updatedb -U /media/disk-3. Then I did Alt+SysRq+6 and Alt+SysRq+t. 

I erased /var/log/messages just before the prior boot and it still reached 67 MB, so I'm sending the "head" with all the recordings until the disk mounted as sdb1, and the "tail" where the disk crashed and I did the SysRq business. Regretfully I forgot to unplug/replug the disk and repeat the SysRq. I'll do it next time but it's late now and I don't want to delay sending what I have. I also thought to try deleting gdm and hal from the runlevels, rebooting, mounting the disk by hand to /mnt, and then checking the disk in a VT. Let me know if anything else will be helpful.
Comment 21 Michael Elbaum 2007-07-23 21:51:21 UTC
Created attachment 125788 [details]
/var/log/messages from boot, disk mount, and crash
Comment 22 Michael Elbaum 2007-07-24 21:44:04 UTC
Created attachment 125919 [details]
/var/log/messages in VT without hald or xdm

Here it crashed on mounting manually in a VT, with xdm and hald stopped. I'm attaching the Atl+SysRq+t output after the crash, then after unplugging and replugging. I added a few blank lines to mark the changes.

I begin to suspect a conflict among devices. I did afterwards what I should have done at the beginning: unplug everything else from the USB. It survived updatedb twice. Adding the keyboard and mouse after the disk didn't seem to hurt either. Previously I was booting with them plugged in. Of course USB should handle multiple devices, but maybe this will help to isolate the problem. I'll try a few more things and post again.
Comment 23 Maarten Bressers (RETIRED) gentoo-dev 2007-07-25 21:36:08 UTC
OK, after consulting with Daniel on this one, we're going to send this bug upstream, but before we can do that you'll have to test with latest kernel prepatch, 2.6.23-rc1. 

Also we need to get more complete logs, meaning you should post the output of dmesg, not /var/log/messages. To be able to do that, you'll have to set CONFIG_LOG_BUF_SHIFT to 16, so the logs don't get overwritten. (If you run into the same problem with 2.6.23-rc1 that you did with 2.6.22, Daniel advises to build the kernel with CONFIG_SMP unset.)

Then run updatedb again, wait for the crash, and post the output of dmesg here.
Comment 24 Michael Elbaum 2007-07-25 21:48:36 UTC
Okay. just a few things so that I do it right. 1) Is that kernel from gentoo or vanilla sources? 2) Do you still need the CONFIG_USB_DEBUG set? 3) How should I set the tickless and clock business? (just unset SMP?)
Comment 25 Maarten Bressers (RETIRED) gentoo-dev 2007-07-27 15:41:24 UTC
1) it's: sys-kernel/vanilla-sources-2.6.23_rc1
2) yes, please use CONFIG_USB_DEBUG
3) just do a make oldconfig with your current .config copied over, if the kernel doesn't boot, unset CONFIG_SMP
Comment 26 Michael Elbaum 2007-07-31 22:01:03 UTC
Created attachment 126552 [details]
dmesg after crash, then unplug and replug

Here are the logs you requested. First is dmesg. I'm using the vanilla-sources-2.6.23-rc1 kernel here. I did the Alt+SysRq+6 (still no 9) and +t business after the crash.

I noticed that the crash does NOT occur if the disk is the only thing plugged into the USB. I didn't try one by one to see which causes trouble, but the hub and mouse are certainly enough.

I won't be able to deal with this during August, so I hope the information here will be useful. /var/log/messages coming next.
Comment 27 Michael Elbaum 2007-07-31 22:16:07 UTC
Created attachment 126554 [details]
.config for 2.6.23-rc1

I can't seem to attach the bzipped /var/log/messages. It tells me the file is empty. I can email if it's needed.
Comment 28 Michael Elbaum 2007-07-31 22:17:27 UTC
Created attachment 126556 [details]
/var/log/messages after the USB crash (bzip2'ed)

it was a permissions problem. here is the file.
Comment 29 Maarten Bressers (RETIRED) gentoo-dev 2007-09-17 16:44:21 UTC
Michael,

Since August has come and gone, what's the situation? Does the bug still happen? Has anything changed? Have you tested with latest development kernel?
Comment 30 Michael Elbaum 2007-09-17 21:07:09 UTC
Hi.     Before leaving I tried 2.6.23-rc1 and posted the /var/log/messages etc (comments #26, 27, 28). I now avoid using a USB hub for ANY of the external devices, and the system is much more stable. There was one suspicious crash but I couldn't get any info or repeat it. This is half a solution, in a sense, because I have tons of wires and anyhow a hub really shouldn't crash the kernel any more than a disk. On the other hand it might give a clue to the source of trouble. 
Michael
Comment 31 Maarten Bressers (RETIRED) gentoo-dev 2007-09-17 21:42:30 UTC
So the problems only occur when you use the external usb hub? Do you have another hub to test with, could be your hub that's at fault then.
Comment 32 Michael Elbaum 2007-09-17 22:23:22 UTC
I'll scrounge another hub to check.
Comment 33 Maarten Bressers (RETIRED) gentoo-dev 2007-10-02 18:40:49 UTC
(In reply to comment #32)
> I'll scrounge another hub to check.
> 

Please reopen when you've had a chance to test with another hub.
Comment 34 Peter 2010-04-20 14:16:56 UTC
I think I have this problem also in 2.6.32-gentoo-r7. I'm running gentoo off of an external usb drive. It doesn't happen all the time, but for instance if I'm compiling something and running a web browser, occasionally, kde will freeze, mouse will still work and then after a few minutes x will drop to console with a lot of "cant write" errors and I can't even "/sbin/shutdown", I have to hit the reset button.  I just enabled USB debug and I'll try to catch it next time.