This actually happens when I tried development-sources-2.6.7 as well. I'm not sure how to report kernel bugs so if this is the wrong media let me know. Basically when the hotplug service runs the computer hangs. I get no error or panic message or any other indication. CTRL-ALT-DEL deosn't work. System needs reset. If I disable the hotplug service the system boots fine, albeit with less functionality. If I then go in and start hotplug manually, system hangs.
Steps to Reproduce:
1. emerge gentoo-dev-sources-2.6.7, build and install kernel, reboot
2. Watch init start system services, wait for hotplug
3. Watch system hang
Nothing exciting ;-)
Should have continued to boot ;-)
Kernel config will follow. Oh, and I'm running an AMD.
Created attachment 33559 [details]
My kernel config (that causes the system to hang)
Disable the hotplug startup service, do you really need it?
If you know of any specific modules your hardware needs to use, please add them
to the modules.autoload file.
Also, any way to tell what module is causing the problem?
Just a small info; gentoo-dev-sources-2.6.7 with hotplug service running and configured for udev, no problem at all :)
Well I enabled the service because I use it. I basically have the same config with 2.6.5 and it works fine. What you seem to be suggesting is that I ignore the problem. But ignoring the problem won't make it go away. If it's the case that you don't feel the need to assist on this issue then I can close this item. Otherwise I would be willing to assist in any way I can.
FYI it appears to be the ehci_hcd modules. When I modprobe it by hand the system hangs. It seems that putting it in modules.autoload would do the same thing, only faster. My understanding is that this is the USB 2.0 module and since I have, and would like to use, USB 2.0 devices, I consider this a necessary module. Should I take this on with the Linux kernel folks or should it stay here?
Oddly enough, as suggested, I added ehci_hcd to modules.autoload, rebooted, and the system came up fine. No hangs. Then I logged in as root and ran /etc/init.d/hotplug start and hotplug came up fine. The output of lsmod looks just the same as it did in 2.6.5 (albeit in a different order) and USB devices appear to run fine. So perhaps it has to do with the order in which hotplug is loading modules? Attaching output of lsmod.
Created attachment 33599 [details]
Output of 'lsmod' when ehci_hcd is in modules.autoload and hotplug is started
More info (if anyone cares). When the module ehci_hcd is loaded it spouts the message "Disabling IRQ #11". My first thought was that maybe the module is disabling 11 when it's already in use, causing the system to hang. But even after loading ehci_hcd first /proc/partitions shows
11: 100000 XT-PIC aic7xxx, ehci_hcd, NVidia nForce2
Of course this could have nothing to do with anything, but just FYI.
Oh, also I found a couple of links:
I have a similar problem after switching to gentoo-dev-sources-2.6.7 (all versions, currently using r4). Now I'm sure it's a hotplug issue, seems to me more like a ehci_hcd problem. I don't get any hangs but I do get the following dmesg output:
ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ehci_hcd 0000:00:02.2: nVidia Corporation nForce2 USB Controller
PCI: Setting latency timer of device 0000:00:02.2 to 64
ehci_hcd 0000:00:02.2: irq 5, pci mem e1044000
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 3
irq 5: nobody cared!
[<e107639e>] ehci_start+0x2ce/0x360 [ehci_hcd]
[<e105d4f7>] usb_register_bus+0x137/0x160 [usbcore]
[<e106254b>] usb_hcd_pci_probe+0x2ab/0x4e0 [usbcore]
[<e103b023>] init+0x23/0x30 [ehci_hcd]
[<e105e2e0>] (usb_hcd_irq+0x0/0x70 [usbcore])
Disabling IRQ #5
PCI: cache line size of 64 is not supported by device 0000:00:02.2
ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 6 ports detected
gameport: pci0000:01:07.1 speed 864 kHz
hub 2-0:1.0: connect-debounce failed, port 1 disabled
ohci_hcd 0000:00:02.1: remote wakeup
hub 2-0:1.0: hub_port_status failed (err = -108)
hub 2-0:1.0: hub_port_status failed (err = -108)
hub 2-0:1.0: hub_hub_status failed (err = -108)
hub 2-0:1.0: get_hub_status failed
Oops, should have said "I'm NOT sure it's a hotplug issue..."
I also see we are both using nforce2 boards...
Switched back to 2.6.5. I've had issues with sound (alsa) as well.
I am having system hang issues with gentoo-dev-sources, r1 and r3.
Common - amd processor, using EHCI (addon pci card with Ali chipset), nvidia-5336 driver
Differences - not using hotplug, not an NForce chipset motherboard. no errors that i see regarding ehci, or irq's. well, last message in dmesg is that it couldnt get irq for floppy, but i think thats old, dont use floppy anyway.
System will work fine, then after a couple of hours of inactivity, move mouse to wake from screensaver, system hangs. Screensaver is gone, showing desktop, and clock is correct (so that makes me think the halt just occured on waking, not previously).
dmesg and /var/log/messages show nothing out of the ordinary. g-d-s-2.6.5 does not have this issue.
Created attachment 33947 [details]
sample of cron output
THis is a sampe of the cron output from a nightly emerge sync. The machine
that is hanging is my lan rsync mirror. Looks like issues were happenning
earlier than I thought. Note, this output is from the functional client
machine, not the hanging server.
I am having the same problem with hotplug...Also an nvidia nforce2 chipset, but I have no pci cards. Hotplug is loading the ehci_hcd module, though.
It's a 2.6.7 ehci issue, combined with a acpi issue.
can you try the -mm kernel tree to see if it is fixed there?
We're good on -mm4. USB modules load fine, hotplug runs without need for modules.autoload. ALSA runs fine. At least it fixes all my problems :-)
good, that means 2.6.8 should work for you then.
ditto here-mm4 works fine.
gkh, i will load mm kernel soon, sorry, just saw your request. did try g-d-s-2.6.7-r7 and problem persisted.
mm4 doesn't solve my problem totally neither mm5 and mm6. I sometimes have very slow system after reboot.
With an inelegant solution from 2.6.7-gentoo-r8 kernel,
I load necessary modules at boot up (include usbcore and usbhid, but exclude ehci_hcd module), add "/etc/init.d/hotplug start" into /etc/cond.d/local.start file, and also add "/etc/init.d/hotplug stop" into /etc/cond.d/local.stop file.
Not sure this is the order in which hotplug is loading modules, but it solves my problem.
Hope this will help
mm4 did not solve my problem either. neither did using newer nvivia-driver. i just changed xorg.conf to use nv driver instead of nvidia to see if I can localize if its related to opengl at all. again, i apologize for my slowness in getting info, its not my machine, so I'm working around their schedule.
mm4 worked fine, but now mm6 is doing the smae thing.
Indeed, mm4 solved my issue, but it returned with mm6. However, I just tried 2.6.8-rc1 and so far it's been good.
new info. appears to hang arounf 3:08 to 3:15 am when using 2.6.7 (mm or g-d-s kernel) with (nv or nvidia video driver). I looked at crontab and at 3am there is an unattended backup using rdiff-backup via ssh. (one of three machines backup to this box, 2am, 3am, and 4am) when it hangs, its always at arounf 3am.
anyone seen mention of ssh or ethernet woes with 2.6.7. I will swap 3am and 4am machine in crontab and see if error moves or stays.
It appears to lockup consistently when syncing up the portage tree. Look below, at 3:07am the sync stopped abruptly. This only happens with 2.6.7, 2.6.5 doesnt do this. There is a possibilty another workstation is syncing off this machine while it is syncing (bad, already changed that, should cause lockup though).
lol2 root # ls -l emerge.sync.log
-rw-r--r-- 1 root root 24656925 Jul 14 03:07 emerge.sync.log
lol2 root # tail emerge.sync.log
lol2 root #
Latest version available: 0.9.6
Latest version installed: 0.9.6
Size of downloaded files: 345 kB
Description: Flexible remote checksum-based differencing
Latest version available: 2.6.0-r2
Latest version installed: 2.6.0-r2
Size of downloaded files: 517 kB
Description: File transfer program to keep remote files into sync
my issues appear to be cleared up. I think it was a case of make oldconfig biting me. I had several incorrect/missing settings in my kernel .config. After straightening them out, I've had a stable 2.6.7-gentoo-r11 system.
Closing this bug, I have no problem with newer versions of the kernel.