| Summary: | kernels above 2.6.23-r9 freeze with : Waiting for uevents to be processed | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Kristian Duus Østergaard <duus-gentoobugs> |
| Component: | [OLD] Core system | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
| Status: | RESOLVED TEST-REQUEST | ||
| Severity: | major | CC: | charles.nadeau, duus-gentoobugs, trefoils |
| Priority: | High | ||
| Version: | unspecified | ||
| Hardware: | AMD64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
.config file for 2.6.27
Working .config for 2.6.23-r9 .config file for 2.6.30-gentoo-r4 Photo of the screen when the machine freezes |
||
|
Description
Kristian Duus Østergaard
2009-02-24 20:31:12 UTC
Created attachment 183057 [details]
.config file for 2.6.27
Created attachment 183058 [details]
Working .config for 2.6.23-r9
Can you try this? 1) Make a backup of /etc/udev/rules.d/70-persistent*.rules 2) delete 70-persistent*.rules 3) reboot If they don't regenerate you can copy them back. Then possibly, emerge udev again Before deleting the files I would like to understand why this should solve the problem and what will happen if I still cannot boot ? And last but not least what will happen to my fallback scenario - ie. the 2.6.23 kernel. You're backing up the files, so you can restore even from a rescue cd if necessary. I think wikipedia could do a better job of teaching you udev than I can. http://en.wikipedia.org/wiki/Udev Look specifically at udev rules I have finally had a chance to reboot the system again after deleting the 70-persistant* files. Booting with 2.6.27 it freezes as before. Booting with 2.6.23-r9 regenerates the files as they were before the deletion. I'm wondering if this is a hardware issue. Do you have any extraneous hardware in your system? Any tv cards? (In reply to comment #7) > I'm wondering if this is a hardware issue. Do you have any extraneous hardware > in your system? Any tv cards? I don't have any TV cards in the machine and I was about to say no to extraneous hardware. But I actually have an older SCSI adapter that I've stopped using but which is still in the system. 04:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 26) I still have difficulty understanding why it will boot with the older kernel and not with the newer. I would expect erratic behaviour with the old kernel if I had a hardware issue. > 04:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 26)
>
> I still have difficulty understanding why it will boot with the older kernel
> and not with the newer. I would expect erratic behaviour with the old kernel if
> I had a hardware issue.
>
I have this same card connected to 2 tape libraries (a Compaq StorageWorks TL892 and a Sun StorEdge L9) and with 2.6.27-r8 I can't even boot. Upon booting I get this error message "sym1: SCSI parity error detected: SCR1=1 DBC=1100001e SBCL=ae" printed non-stop on my screen. With 2.6.25-r3, my system is rock solid. Maybe there is a link between Kristian's problem and mine
Charles
(In reply to comment #8) > (In reply to comment #7) > I don't have any TV cards in the machine and I was about to say no to > extraneous hardware. But I actually have an older SCSI adapter that I've > stopped using but which is still in the system. Would you be willing to pull the unused card? Any chance of testing with gentoo-sources-2.6.29-r1. (In reply to comment #10) > (In reply to comment #8) > > (In reply to comment #7) > > I don't have any TV cards in the machine and I was about to say no to > > extraneous hardware. But I actually have an older SCSI adapter that I've > > stopped using but which is still in the system. > > Would you be willing to pull the unused card? > Any chance of testing with gentoo-sources-2.6.29-r1. > Sorry for the late reply - I'm willing to pull the card - I just need to find time to do it. As for a newer kernel - unless you tell me that it contains changes in the udev or the the module for the scsi adapter that might directly resolve the problem, i'd rather wait and do just the adapter for now. Please reopen when you've had a chance to remove the card. If that doesn't help, here's something else to Compile your kernel with CONFIG_MAGIC_SYSRQ=y Modify your bootloader to pass the "debug" parameter to the kernel Reproduce the hang Press alt+sysrq+m Take photo of screen Upload photo here :) thanks! (In reply to comment #12) > Please reopen when you've had a chance to remove the card. > If that doesn't help, here's something else to > > Compile your kernel with CONFIG_MAGIC_SYSRQ=y > Modify your bootloader to pass the "debug" parameter to the kernel > Reproduce the hang > Press alt+sysrq+m > Take photo of screen > Upload photo here :) I have removed the card and it unfortunately didn't solve the problem. I will try again with the latest kernel adding the parameters you specified - and last but not least grab a camera :-) I have added the debug option to the kernel and tried to do the alt+sysrq(prnt scr)+m but the machine is locked hard waiting for the uevents to be processed. I have tried pushing the numlock button to see if the keyboard is responding, but it seems like it will not even do that. Two additional notes: This last attempt was done with gentoo sources2.6.28-gentoo-r5. When I posted this I did do a large number of diffs to establish when the change that broke this was introduced. As far as I could tell this is going from 2.6.23.x to 2.6.24 Can you boot with the debug parameter anyway? It may cause a message to be printed to the screen when it crashes. ALso try disabling the framebuffer if you are using one. Would you be able to set up a serial console? Also please confirm that you aren't using any out-of-kernel modules (packages which you emerge in portage every time you upgrade your kernel, for example proprietary graphics drivers) (In reply to comment #9) > > 04:04.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 26) > > > > I still have difficulty understanding why it will boot with the older kernel > > and not with the newer. I would expect erratic behaviour with the old kernel if > > I had a hardware issue. > > > > I have this same card connected to 2 tape libraries (a Compaq StorageWorks > TL892 and a Sun StorEdge L9) and with 2.6.27-r8 I can't even boot. Upon booting > I get this error message "sym1: SCSI parity error detected: SCR1=1 DBC=1100001e > SBCL=ae" printed non-stop on my screen. With 2.6.25-r3, my system is rock > solid. Maybe there is a link between Kristian's problem and mine > > Charles > I just wan to add that I tried with 2.6.28-r5 today and I have the same problem. Charles (In reply to comment #15) > Can you boot with the debug parameter anyway? It may cause a message to be > printed to the screen when it crashes. ALso try disabling the framebuffer if > you are using one. > > Would you be able to set up a serial console? > > Also please confirm that you aren't using any out-of-kernel modules (packages > which you emerge in portage every time you upgrade your kernel, for example > proprietary graphics drivers) > The last couple of times I had the debug option enabled but it didn't affect the output at the point where it freezes. As for a serial console - I have the equipment to connect to the serial console - but is there anything that I need in the kernel to actually get a connection on the serial console ? As for proprietary modules, I did have a matrox module in there - but as a result of debugging this kernel problem and upgrading to Xorg 1.5 I have removed the matrox module. This have not changed anything. As a result of upgrading to latest X I noticed that I had not upgraded to the latest gcc. Could the difference between gcc 4.1.2 and 4.3.2 explain why the newer kernels wont boot ? This has been awhile. Have you made any progress here? Created attachment 202378 [details]
.config file for 2.6.30-gentoo-r4
latest config file that still will not boot
I just tried the latest gentoo kernel (2.6.30-r4) with the same result as before. During boot I noticed that it said : please use probe_mask=0x3f I have then tried adding ide_generic.probe_mask=0x3f on the kernel commandline - but without any change in the result. I should probably have mentioned that the mainboard is a Tyan Thunder K8S Pro (S2882). My next attempt will be to remove the KVM module - as it seems my Opterons do not support it. Anything to report here? (In reply to comment #21) > Anything to report here? > No change unfortunately - it still freezes at the point where it says : Waiting for uevents to be processed Can I ask someone to do a diff of the 23 and 30 kernel and tell me if anything seems off ? I have just tried by changing CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" to CONFIG_UEVENT_HELPER_PATH="/sbin/udevadm" as I found this to be a difference between a laptop I have and the server. But unfortunately still no change. Created attachment 208681 [details]
Photo of the screen when the machine freezes
I have turned udev debug information on and attached a picture of the screen in the hope that someone can spot what might be wrong.
# udevadm monitor --env --kernel --udev & # udevadm trigger ... (lots of stuff here) # udevadm settle at this point it waits for a long time, then: UDEV [1239520736.717056] add /devices/pci0000:00/0000:00:03.3/usb1/1-8/1-8:1.0/host4/target4:0:0/4:0:0:0/block/sdb (block) UDEV_LOG=3 ACTION=add DEVPATH=/devices/pci0000:00/0000:00:03.3/usb1/1-8/1-8:1.0/host4/target4:0:0/4:0:0:0/block/sdb SUBSYSTEM=block DEVNAME=/dev/sdb DEVTYPE=disk SEQNUM=886 ID_VENDOR=Generic- ID_VENDOR_ENC=Generic- ID_VENDOR_ID=0bda ID_MODEL=xD_SDMMC_MS_Pro ID_MODEL_ENC=xD\x2fSDMMC\x2fMS\x2fPro\x20 ID_MODEL_ID=0116 ID_REVISION=1.00 ID_SERIAL=Generic-_xD_SDMMC_MS_Pro_20021111153705700-0:0 ID_SERIAL_SHORT=20021111153705700 ID_TYPE=disk ID_INSTANCE=0:0 ID_BUS=usb ID_USB_INTERFACES=:080650: ID_USB_INTERFACE_NUM=00 ID_USB_DRIVER=usb-storage ID_PATH=pci-0000:00:03.3-usb-0:8:1.0-scsi-0:0:0:0 MAJOR=8 MINOR=16 DEVLINKS=/dev/block/8:16 /dev/disk/by-id/usb-Generic-_xD_SDMMC_MS_Pro_20021111153705700-0:0 /dev/disk/by-path/pci-0000:00:03.3-usb-0:8:1.0-scsi-0:0:0:0 Without my MicroSD adapter it runs just well. I am running kernel 2.6.31 with udev 146-r1. By the way, it also works fine if a card is present in the adapter. Looks like a driver bug... Greetings, could you people try booting a recent vanilla kernel? Looking at bug #299287 it seems quite likely that a patch for fbcondecor that is included in gentoo-sources is the cause of the issue. Thank you George for a good suggestion - I thought I had tried with a vanilla kernel. But looking at grub I can see I haven't. I have compiled a new vanilla kernel - I will let you know when I have had a chance to try it out. Regards Kristian Feel free to reopen with test results Upgrading to 2.6.31-gentoo-r6 seemed to solve my problems. |