I can't resume from suspend-to-ram while using 2.6.21-r2 and subsequent kernels including 2.6.22. r1 and all previous gentoo-sources work flawlessly for my hardware. Most likely regression candidate is the patch 1001_linux-2.6.21.2.patch . AMD3800+ X2 Nvidia Nforce4 sata_nv and amd IDE driver compiled in statically No software or hardware change between successful resume and failed resume. The suspend script is used which does unload all modules. But it should not matter what script does because every release before r2 has worked with the same software and hardware.
Please post dmesg output after boot. Please list all of the kernels you have tried. Please elaborate on "I can't resume" -- what happens when you try?
Kernels I have tried (and some I have in /lib/modules and /boot currently), all gentoo-sources: 2.6.17-r[47], 2.6.19-r[24], 2.6.21-r[123], 2.6.22-rc4 (vanilla with squashfs, vesafb-tng patches from gentoo-sources-2.6.21-r1) and 2.6.22. Working kernels: all <= 2.6.21-r1 What happens when >= 2.6.21-r2 is powered on after suspend-to-ram: The reset LED on PC turns on and stays on, monitor comes to life (from orange, it becomes green), and that's about it. No messages on screen. Keyboard and mouse doesn't work, not even sysrq or three finger salute (no keyboard, duh!). I have to hit the reset button. Output of dmesg after boot: I will have to be home and boot into r2 to post that. Also, please understand that this suspend-resume cycle typically means that '/' and /home are uncleanly shutdown and have to be fsck'ed during next boot and I have lost a couple of files (nothing important). Although I do have backups, I would like to keep the experiments to the minimum. So, please let me know what other things you want me to do before I boot into the failing kernel.
Created attachment 124507 [details] dmesg immediately after booting into 2.6.21-r2 (the failed kernel) It doesn't differ from the dmesg for 2.6.21-r1 (the good kernel) in any significant way. Most diffs are in sata drive descriptions formatting and some other digits like clock speed, migrtion cost etc.
dsd, can you please tell me if you found something useful? Do I need to take this issue upstream?
I'm attaching two patches, 2117_sata-via-suspend.patch and 1001_linux-2.6.21.2.patch. Can you please revert these against gentoo-sources-2.6.21-r2 like this: # patch -p1 -R < 2117_sata-via-suspend.patch # patch -p1 -R < 1001_linux-2.6.21.2.patch And see if that fixes the problem for you?
Created attachment 125486 [details, diff] 2117_sata-via-suspend.patch
Created attachment 125488 [details, diff] 1001_linux-2.6.21.2.patch
Please give me some time to try this out. But, if I remember correctly, my earlier attempt had pointed out 1001_linux-2.6.21.2.patch as the culprit.
Please reopen when you have identified which patch introduces the bug.
OK, so I tested to revert the the 1001_linux-2.6.21.2.patch on a clean 2.6.21-r2, and it did fix my issue with acpi suspend to ram. I had the following issue; One cycle suspend (power on, suspend, wakeup) did work, but any following suspend would make the disks spin down and then the machine would hang somewhere, where a reset where the only resort. I have tested 2.6.22 and 2.6.23rc kernels and they always given me the same problems. AMDx2, nvidia chipset. Can't re-open, I'm a newbie at bugzilla... ;-)
Basically, that confirms what I thought I arrived at as well. Reopening.
Micael, can you please test with the latest development kernel, 2.6.23-rc6 as of this writing? Can you post your kernel .config and dmesg output (from after the succesful suspend cycle)? Please provide a little more detail about your hardware. devsk, do you mean to say that you're experiencing the exact same symptoms as Micael? Can you also test with latest development kernel and post your .config and dmesg output?
Assuming that 2.6.23-rc is still broken, the next step is to do a bisection. 2.6.21.2 is a large patch over 2.6.21.1 so it's not obvious which change caused the problem, which we do need to find out. It's a little time consuming (maybe 5 reboots?) but will almost certainly find the exact patch which caused the problem. http://www.reactivated.net/weblog/archives/2006/01/using-git-bisect-to-find-buggy-kernel-patches/ In this case, the git URL you need to use is: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.21.y.git use v2.6.21.1 as good and v2.6.21.2 as bad. Thanks for your help figuring this out!
Maarten, unfortunately, as this is a server that is headless, and not very easy to get to, I cannot do any tests right now. If no other volonters, I'll try to move this box into an environment where I can connect a monitor+keyboard. This is the data of the box, however; Motherboard "Asus M2N32WS Professional". 1G ram, AMD64x2. http://www.asus.com.tw/products.aspx?l1=3&l2=101&l3=300&l4=0&model=1207&modelmenu=2 lspci givs; 00:00.0 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.1 RAM memory: nVidia Corporation C51 Memory Controller 0 (rev a2) 00:00.2 RAM memory: nVidia Corporation C51 Memory Controller 1 (rev a2) 00:00.3 RAM memory: nVidia Corporation C51 Memory Controller 5 (rev a2) 00:00.4 RAM memory: nVidia Corporation C51 Memory Controller 4 (rev a2) 00:00.5 RAM memory: nVidia Corporation C51 Host Bridge (rev a2) 00:00.6 RAM memory: nVidia Corporation C51 Memory Controller 3 (rev a2) 00:00.7 RAM memory: nVidia Corporation C51 Memory Controller 2 (rev a2) 00:04.0 PCI bridge: nVidia Corporation C51 PCI Express Bridge (rev a1) 00:08.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1) 00:09.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2) 00:09.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2) 00:09.2 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2) 00:0a.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1) 00:0a.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2) 00:0c.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1) 00:0d.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:0d.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:0d.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2) 00:0e.0 PCI bridge: nVidia Corporation Unknown device 0370 (rev a2) 00:10.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2) 00:11.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2) 00:12.0 PCI bridge: nVidia Corporation Unknown device 0376 (rev a2) 00:14.0 PCI bridge: nVidia Corporation Unknown device 0374 (rev a2) 00:15.0 PCI bridge: nVidia Corporation Unknown device 0378 (rev a2) 00:16.0 PCI bridge: nVidia Corporation Unknown device 0375 (rev a2) 00:17.0 PCI bridge: nVidia Corporation Unknown device 0377 (rev a2) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 02:06.0 VGA compatible controller: S3 Inc. ViRGE/DX or /GX (rev 01) 03:00.0 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 03:00.1 PCI bridge: NEC Corporation uPD720400 PCI Express - PCI/PCI-X Bridge (rev 06) 06:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) 07:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) 08:00.0 SATA controller: Marvell Technology Group Ltd. Unknown device 6141 (rev 01) The VGA board and the Sil3132 can be excluded from the quest, since I tested without these boards installed.
Micael / devsk, please see Comment #13. If/when (one of) you can provide this information (ie. the patch that causes this), please reopen.
Since, I am not able to go to any kernel later than 2.6.21-r1 because of this bug, I took the bisect plunge (with instructions from DSD's page and URL from this bug report) and this is where it ended. Can someone please tell me how to find what change (diff) this commit introduced? If "Makefile" is any indication, I believe I messed up during bisect. # git bisect good 7682ffa25c68221cff1122b3ce26a05640a54898 is first bad commit commit 7682ffa25c68221cff1122b3ce26a05640a54898 Author: Chris Wright <chrisw@sous-sol.org> Date: Wed May 23 14:33:55 2007 -0700 Linux 2.6.21.2 :100644 100644 58c08d1b738fd6bb2dd9e679a78bd45fa43ec4a3 22839cb65557b6f572fa8b660a274cb9a5102e76 M Makefile
that commit just changes EXTRAVERSION from 1 to 2. This really means a bad interaction of base fixes in 2.6.21.2 and gentoo specific libata patches because none of the libata patches were applied in 2.6.21.1. Testing the taking out of "scsi handles suspend/resume" patches (211*) now.
Ok, took out all the scsi related patches and the suspend-to-ram works in 2.6.21-r4 as it was working in 2.6.21-r1. 2118_scsi-constants.patch 2117_sata-via-suspend.patch 2116_libata-remove-spindown-compat.patch 2115_libata-spindown-status.patch 2114_libata-shutdown-warning.patch 2113_libata-spindown-compat.patch 2112_libata-suspend.patch 2111_sd-start-stop.patch 2110_scsi-sd-printing.patch It looks like these are merged upstream because they are not applied by 2.6.22 or 2.6.23 gentoo-sources ebuilds and the problem happens in those kernels too. Does anybody know when were these patches merged in 2.6.22?
OK. After several reboots and resets, the patch that killed the suspend-to-ram on my machine is 2112_libata-suspend.patch . This patch was put into 2.6.22-rc1, hence all newer kernels don't work for me either. What I notice is that when the suspend to ram is done, and resume hangs, the BIOS does not find one of my disks on "RESET". I have to RESET again to find it. Its maxtor 300GB sata drive with higher than average spin-up time. Is there a limit on how fast the drive is supposed to spin up after resume? It looks like 3 of my other drives have < 5ms spin up time and they are found by the BIOS upon RESET from a bad resume, whereas my maxtor drive has a spin up time of > 25ms. So, this comment in the commit is relevant "Resume now has to wait for disk to spin up before proceeding." I have a feeling that resume hangs waiting to detect this drive. Does anybody know how this patch could break drive detection and resume?
Lastly, I removed the libata-suspend patch from 2.6.22-rc1 and suspend-to-ram starts to work. So, it seems like an upstream issue now.
Filed bug http://bugzilla.kernel.org/show_bug.cgi?id=9659