Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 54413

Summary: [gentoo-dev-sources]-2.6.7 hangs when hotplug runs
Product: Gentoo Linux Reporter: Albert Hopkins (RETIRED) <marduk>
Component: New packagesAssignee: Greg Kroah-Hartman (RETIRED) <gregkh>
Status: RESOLVED UPSTREAM    
Severity: critical CC: jtw, will
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: My kernel config (that causes the system to hang)
Output of 'lsmod' when ehci_hcd is in modules.autoload and hotplug is started
sample of cron output

Description Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 06:07:48 UTC
This actually happens when I tried development-sources-2.6.7 as well.  I'm not sure how to report kernel bugs so if this is the wrong media let me know.   Basically when the hotplug service runs the computer hangs.  I get no error or panic message or any other indication.  CTRL-ALT-DEL deosn't work.  System needs reset.  If I disable the hotplug service the system boots fine, albeit with less functionality.  If I then go in and start hotplug manually, system hangs.

Reproducible: Always
Steps to Reproduce:
1. emerge gentoo-dev-sources-2.6.7, build and install kernel, reboot
2. Watch init start system services, wait for hotplug
3. Watch system hang

Actual Results:  
Nothing exciting ;-)

Expected Results:  
Should have continued to boot ;-)

Kernel config will follow.  Oh, and I'm running an AMD.
Comment 1 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 06:09:47 UTC
Created attachment 33559 [details]
My kernel config (that causes the system to hang)
Comment 2 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-06-19 09:19:55 UTC
Disable the hotplug startup service, do you really need it?

If you know of any specific modules your hardware needs to use, please add them
to the modules.autoload file.

Also, any way to tell what module is causing the problem?
Comment 3 S.Caglar Onur 2004-06-19 12:18:48 UTC
Just a small info; gentoo-dev-sources-2.6.7 with hotplug service running and configured for udev, no problem at all :)
Comment 4 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 16:23:42 UTC
Well I enabled the service because I use it.  I basically have the same config with 2.6.5 and it works fine.  What you seem to be suggesting is that I ignore the problem.  But ignoring the problem won't make it go away.  If it's the case that you don't feel the need to assist on this issue then I can close this item. Otherwise I would be willing to assist in any way I can.
Comment 5 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 16:49:16 UTC
FYI it appears to be the ehci_hcd modules.  When I modprobe it by hand the system hangs.  It seems that putting it in modules.autoload would do the same thing, only faster.  My understanding is that this is the USB 2.0 module and since I have, and would like to use, USB 2.0 devices, I consider this a necessary module.  Should I take this on with the Linux kernel folks or should it stay here?
Comment 6 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 17:02:48 UTC
Oddly enough, as suggested, I added ehci_hcd to modules.autoload, rebooted, and the system came up fine. No hangs.  Then I logged in as root and ran /etc/init.d/hotplug start and hotplug came up fine.  The output of lsmod looks just the same as it did in 2.6.5 (albeit in a different order) and USB devices appear to run fine.  So perhaps it has to do with the order in which hotplug is loading modules?  Attaching output of lsmod.

Comment 7 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 17:04:12 UTC
Created attachment 33599 [details]
Output of  'lsmod' when ehci_hcd is in modules.autoload and hotplug is started
Comment 8 Albert Hopkins (RETIRED) gentoo-dev 2004-06-19 19:56:16 UTC
More info (if anyone cares).  When the module ehci_hcd is loaded it spouts the message "Disabling IRQ #11".  My first thought was that maybe the module is disabling 11 when it's already in use, causing the system to hang.  But even after loading ehci_hcd first /proc/partitions shows 

 11:     100000          XT-PIC  aic7xxx, ehci_hcd, NVidia nForce2

Of course this could have nothing to do with anything, but just FYI.

Oh, also I found a couple of links:

http://www.short-media.com/forum/showthread.php?t=10891
http://www.uwsg.iu.edu/hypermail/linux/kernel/0310.3/1025.html
Comment 9 Jeff 2004-06-22 15:47:14 UTC
I have a similar problem after switching to gentoo-dev-sources-2.6.7 (all versions, currently using r4). Now I'm sure it's a hotplug issue, seems to me more like a ehci_hcd problem. I don't get any hangs but I do get the following dmesg output:

ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ehci_hcd 0000:00:02.2: nVidia Corporation nForce2 USB Controller
PCI: Setting latency timer of device 0000:00:02.2 to 64
ehci_hcd 0000:00:02.2: irq 5, pci mem e1044000
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 3
irq 5: nobody cared!
 [<c010595a>] __report_bad_irq+0x2a/0x90
 [<c0105a50>] note_interrupt+0x70/0xa0
 [<c0105cf1>] do_IRQ+0x121/0x130
 [<c0104134>] common_interrupt+0x18/0x20
 [<c0117f70>] __do_softirq+0x30/0x80
 [<c0117fe6>] do_softirq+0x26/0x30
 [<c0105ccd>] do_IRQ+0xfd/0x130
 [<c0104134>] common_interrupt+0x18/0x20
 [<c01df6df>] pci_bus_read_config_byte+0x5f/0x90
 [<e107639e>] ehci_start+0x2ce/0x360 [ehci_hcd]
 [<c0114b7d>] printk+0x10d/0x170
 [<e105d4f7>] usb_register_bus+0x137/0x160 [usbcore]
 [<e106254b>] usb_hcd_pci_probe+0x2ab/0x4e0 [usbcore]
 [<c0150035>] set_anon_super+0xa5/0xc0
 [<c01e3002>] pci_device_probe_static+0x52/0x70
 [<c01e305b>] __pci_device_probe+0x3b/0x50
 [<c01e309c>] pci_device_probe+0x2c/0x50
 [<c022b90f>] bus_match+0x3f/0x70
 [<c022ba39>] driver_attach+0x59/0x90
 [<c022bce1>] bus_add_driver+0x91/0xb0
 [<c022c19f>] driver_register+0x2f/0x40
 [<c01e331c>] pci_register_driver+0x5c/0x90
 [<e103b023>] init+0x23/0x30 [ehci_hcd]
 [<c0129cb8>] sys_init_module+0x118/0x230
 [<c0103f75>] sysenter_past_esp+0x52/0x71

handlers:
[<e105e2e0>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #5
PCI: cache line size of 64 is not supported by device 0000:00:02.2
ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 6 ports detected
gameport: pci0000:01:07.1 speed 864 kHz
hub 2-0:1.0: connect-debounce failed, port 1 disabled
ohci_hcd 0000:00:02.1: remote wakeup
hub 2-0:1.0: hub_port_status failed (err = -108)
hub 2-0:1.0: hub_port_status failed (err = -108)
hub 2-0:1.0: hub_hub_status failed (err = -108)
hub 2-0:1.0: get_hub_status failed
Comment 10 Jeff 2004-06-22 15:48:54 UTC
Oops, should have said "I'm NOT sure it's a hotplug issue..."
Comment 11 Jeff 2004-06-22 15:51:16 UTC
I also see we are both using nforce2 boards...
Comment 12 Albert Hopkins (RETIRED) gentoo-dev 2004-06-22 16:53:55 UTC
Switched back to 2.6.5.  I've had issues with sound (alsa) as well. 
Comment 13 Steve Romanow 2004-06-23 05:06:45 UTC
I am having system hang issues with gentoo-dev-sources, r1 and r3.

Common - amd processor, using EHCI (addon pci card with Ali chipset), nvidia-5336 driver

Differences - not using hotplug, not an NForce chipset motherboard.  no errors that i see regarding ehci, or irq's.  well, last message in dmesg is that it couldnt get irq for floppy, but i think thats old, dont use floppy anyway.

System will work fine, then after a couple of hours of inactivity, move mouse to wake from screensaver, system hangs.  Screensaver is gone, showing desktop, and clock is correct (so that makes me think the halt just occured on waking, not previously).

dmesg and /var/log/messages show nothing out of the ordinary.  g-d-s-2.6.5 does not have this issue.
Comment 14 Steve Romanow 2004-06-23 06:02:03 UTC
Created attachment 33947 [details]
sample of cron output

THis is a sampe of the cron output from a nightly emerge sync.	The machine
that is hanging is my lan rsync mirror.  Looks like issues were happenning
earlier than I thought.  Note, this output is from the functional client
machine, not the hanging server.
Comment 15 Ben 2004-06-29 22:15:43 UTC
I am having the same problem with hotplug...Also an nvidia nforce2 chipset, but I have no pci cards.  Hotplug is loading the ehci_hcd module, though.
Comment 16 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-06-30 16:39:46 UTC
It's a 2.6.7 ehci issue, combined with a acpi issue.

can you try the -mm kernel tree to see if it is fixed there?
Comment 17 Albert Hopkins (RETIRED) gentoo-dev 2004-06-30 17:50:36 UTC
Greg,

We're good on -mm4.  USB modules load fine, hotplug runs without need for modules.autoload.  ALSA runs fine. At least it fixes all my problems :-)

--m
Comment 18 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-06-30 22:02:55 UTC
good, that means 2.6.8 should work for you then.
Comment 19 Jeff 2004-07-03 05:45:47 UTC
ditto here-mm4 works fine.
Comment 20 Steve Romanow 2004-07-04 21:06:23 UTC
gkh, i will load mm kernel soon, sorry, just saw your request.  did try g-d-s-2.6.7-r7 and problem persisted.
Comment 21 dixon 2004-07-11 08:28:25 UTC
mm4 doesn't solve my problem totally neither mm5 and mm6. I sometimes have very slow system after reboot.

With an inelegant solution from 2.6.7-gentoo-r8 kernel,
I load necessary modules at boot up (include usbcore and usbhid, but exclude ehci_hcd module), add "/etc/init.d/hotplug start" into /etc/cond.d/local.start file, and also add "/etc/init.d/hotplug stop" into /etc/cond.d/local.stop file.

Not sure this is the order in which hotplug is loading modules, but it solves my problem.

Hope this will help 
Comment 22 Steve Romanow 2004-07-11 19:19:14 UTC
mm4 did not solve my problem either.  neither did using newer nvivia-driver.  i just changed xorg.conf to use nv driver instead of nvidia to see if I can localize if its related to opengl at all.  again, i apologize for my slowness in getting info, its not my machine, so I'm working around their schedule.
Comment 23 will 2004-07-12 17:44:24 UTC
mm4 worked fine, but now mm6 is doing the smae thing.
Comment 24 Albert Hopkins (RETIRED) gentoo-dev 2004-07-12 19:51:12 UTC
Indeed, mm4 solved my issue, but it returned with mm6.  However, I just tried 2.6.8-rc1 and so far it's been good.
Comment 25 Steve Romanow 2004-07-14 08:43:02 UTC
new info.  appears to hang arounf 3:08 to 3:15 am when using 2.6.7 (mm or g-d-s kernel) with (nv or nvidia video driver).  I looked at crontab and at 3am there is an unattended backup using rdiff-backup via ssh.  (one of three machines backup to this box, 2am, 3am, and 4am) when it hangs, its always at arounf 3am.

anyone seen mention of ssh or ethernet woes with 2.6.7.  I will swap 3am and 4am machine in crontab and see if error moves or stays.
Comment 26 Steve Romanow 2004-07-14 09:02:43 UTC
It appears to lockup consistently when syncing up the portage tree.  Look below, at 3:07am the sync stopped abruptly.  This only happens with 2.6.7, 2.6.5 doesnt do this.  There is a possibilty another workstation is syncing off this machine while it is syncing (bad, already changed that, should cause lockup though).

lol2 root # ls -l emerge.sync.log
-rw-r--r--  1 root root 24656925 Jul 14 03:07 emerge.sync.log
lol2 root # tail emerge.sync.log
metadata/cache/sec-policy/
metadata/cache/sys-apps/
metadata/cache/sys-cluster/
metadata/cache/sys-devel/
metadata/cache/sys-fs/
metadata/cache/sys-kernel/
metadata/cache/sys-libs/
metadata/cache/x11-libs/
metadata/cache/x11-misc/
metadata/cache/x11-plugins/
lol2 root #

*  net-libs/librsync
      Latest version available: 0.9.6
      Latest version installed: 0.9.6
      Size of downloaded files: 345 kB
      Homepage:    http://librsync.sf.net/
      Description: Flexible remote checksum-based differencing
      License:     LGPL-2.1

*  net-misc/rsync
      Latest version available: 2.6.0-r2
      Latest version installed: 2.6.0-r2
      Size of downloaded files: 517 kB
      Homepage:    http://rsync.samba.org/
      Description: File transfer program to keep remote files into sync
      License:     GPL-2

Comment 27 Steve Romanow 2004-07-22 20:47:44 UTC
my issues appear to be cleared up.  I think it was a case of make oldconfig biting me.  I had several incorrect/missing settings in my kernel .config.  After straightening them out, I've had a stable 2.6.7-gentoo-r11 system.
Comment 28 Albert Hopkins (RETIRED) gentoo-dev 2004-07-25 08:11:21 UTC
Closing this bug, I have no problem with newer versions of the kernel.