| Summary: | sys-kernel/gentoo-sources-6.6.13 failure to resume after suspend | ||
|---|---|---|---|
| Product: | Gentoo Linux | Reporter: | Christian D. <ThyrusG> |
| Component: | Current packages | Assignee: | Gentoo Kernel Bug Wranglers and Kernel Maintainers <kernel> |
| Status: | RESOLVED NEEDINFO | ||
| Severity: | critical | CC: | admin, ThyrusG |
| Priority: | Normal | ||
| Version: | unspecified | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Package list: | Runtime testing required: | --- | |
| Attachments: |
systemd journal
kernel .config lspci -vv |
||
|
Description
Christian D.
2024-02-06 16:00:42 UTC
Created attachment 884421 [details]
systemd journal
from beginning of suspend to next boot
Created attachment 884422 [details]
kernel .config
I would be happy to provide additional information. But, given the absence of useful error messages, I don't know how. Any help for narrowing down the cause would be appreciated. Also, up to kernel 6.1.57 suspend/resume worked flawlessly, often with 15+ suspend/resume cycles between reboots. What brand / model of system is this? System is an Intel i7-4790K on an MSI Z97S SLI Krait Edition (Intel Z97 Express Chipset) with an NVIDIA GeForce GTX 960. Custom built around 8 years ago, never had any suspend/resume issues. Firstly, can you test with nouveau to eliminate nvidia-drivers. This would be a requirement if this needs to go upstream as that marks the kernel tainted. Second, are you able to do a git bisect so we can determine if a specific commit is causing this issue? I'm not sure that is realistic. Nouveau seems to not support power management on my graphics card. Combined with this being my primary desktop machine and the error only occurring sporadically, this would probably take several days to determine whether any given kernel revision is "good" or "bad". I think, as a first step, I will try a 6.1.74 longterm kernel. If that works, at least there would be a stable and still supported baseline. I would try the latest 6.1.X whenever you do test One more data point: gentoo-sources-6.1.74 (latest stable 6.1.x) nvidia-drivers-535.146.02 RinCat/RTL88x2BU-Linux-Driver (from GitHub @ 7bdc911e1c14c...) Uptime 4 days, a dozen or so suspend cycles, no issues. So, I guess that qualifies as "good" and rules out hardware defects. (In reply to Christian D. from comment #9) > One more data point: > > gentoo-sources-6.1.74 (latest stable 6.1.x) > nvidia-drivers-535.146.02 > RinCat/RTL88x2BU-Linux-Driver (from GitHub @ 7bdc911e1c14c...) > > Uptime 4 days, a dozen or so suspend cycles, no issues. > So, I guess that qualifies as "good" and rules out hardware defects. Can you attach the output of lspci -vv Created attachment 885155 [details]
lspci -vv
generation emits the following to stderr (probably irrelevant):
pcilib: sysfs_read_vpd: read failed: No such device
Can I have your full boot log also, please ? I'm commenting because this may be related to my current issue since the update. Since the upgrade to kernel 6.6.x (Testing with gentoo-kernel-bin), I have been failing often to boot (the kernel panics even before it tries to print something which is interesting, it is stuck at checking TPM PCR 9 even though I disabled secureboot, it may be related to a grub update perhaps). Sometimes though, and only sometimes, it does boot, even if rarely (I'd like to know what cause it to boot, but I don't even know so far, I need to dig more about it). The problem when it boots is that it always fails to resume when I try to sleep the laptop (black screen), I noticed that also some filesystem kernel modules weren't loaded at all aswell or even a lot of modules. The thing is that everything works correctly when I downgrade to 6.1.x, has there been some changes that I've been unaware of in the kernel that would likely cause such issues? Let me know If I need to make another bug report for this with more details, but for now I'm using an older kernel until I dig and read code more what would cause such issues. I'm suspecting multiple things: 1) nvidia-driver (open) 2) dracut initramfs not doing something properly - grub not loading properly the kernel, perhaps due to the fact that the PCR check failed, even though I disabled secureboot, but it doesn't make sense since the kernel does boot sometimes so I suspect more of the 1) and 2), or it may be still grub but something totally different. PS It could be an issue with the kernel code itself, but it seems unlikely, it could happen though I believe. |