After the upgrade I can not suspend my Gentoo. For suspend I used: `sudo pm-suspend` or `sudo s2ram`. When I looked into a logs (https://dpaste.com/CC7RC3BEN) I saw problem with `s2ram_do: Input/output error`. Then I checked the kernel config for suspend and it was right. LOGS from pm-suspend: https://dpaste.com/CC7RC3BEN DMESG after not suspend: https://dpaste.com/H8TB7G3ZY The downgrade to the stable version solved the problem. Due to it is a unstable version I would like to create the bug ;) Regards, Marcin "y0rune" Woźniak Reproducible: Always Steps to Reproduce: Install the newest version of nvidia 1.sudo pm-suspend 2.sudo s2ram
Created attachment 680701 [details] pm-suspend (CC7RC3BEN.txt) + dmesg (H8TB7G3ZY.txt) logs Hm, seems like 460.27.04 did changes things regarding power management and introduced a undocumented nvidia-powerd (not currently getting installed, not sure what to think of it). I don't use suspend myself but that it broke something doesn't sound surprising. Officially 460.27.04 is a beta driver (not just ~testing in gentoo), so here's to hope it'll get sorted out by the time it's non-beta if can't figure anything else out. Took the liberty to attach your logs so they don't get lost in pasting service timeout (please do the same in the future).
Can confirm this issue on my end. The problem is the new version sets NVreg_PreserveVideoMemoryAllocations=1 in /etc/modprobe.d/nvidia.conf which tries to save video memory to /tmp on suspend but fails on my system because I don't have the nvidia-suspend systemd service running (I use systemd). I tried to enable that service but then suspend fails due to me not having /usr/bin/logger present on my system. I solved the problem finally by changing NVreg_PreserveVideoMemoryAllocations=1 to NVreg_PreserveVideoMemoryAllocations=0 in /etc/modprobe.d/nvidia.conf
Hi All! I will attach logs in attachments in the future. I will try your option Nick ;) --- Marcin
The updating that value works. Suspend works fine ;) Thank you :3 -- Marcin
I experience the same issue, not much more info to add but: * I am not running systemd, just openrc.d combined with elogind * I checked git and noticed that the NVreg_PreserveVideoMemoryAllocations option has been set in nvidia-430.conf ever since it's addition 8 months ago so also the previous version of the driver was working with this option set even though I'm not using systemd at all * The documentation explicitly states you need to have systemd and nvidia-suspend running if you enable the option. I suspect previous versions silently fell back to the default of not preserving video memory allocations if it detected that nvidia-suspend was not running, so either that detection is broken or it was removed. I don't think we should expect that this behaviour will change and we should actually do the reasonable thing and do what the documentation tells us to do: * USE=systemd Make sure people are actually running nvidia-suspend, which I guess is covered by the notification in the ebuild * USE=-systemd -elogind Make sure the option is not set in nvidia.conf * USE=-systemd elogind I think it might be possible to actually also use nvidia-suspend since all the systemd related interaction seems to happen through the elogind part. As I actually would like to use the feature on my system I will do some experimentation on my system to see if this is possible and report back. If it works we would need to write openrc.d scripts for the nvidia-suspend services.
After some more research and experimentation: * elogind >=246.9 has builtin support for the nvidia suspend (see related commit: https://github.com/elogind/elogind/commit/fc0661e60786c7613b0a018c188b064a4a5ef4f7 ). You need to enable HandleNvidiaSleep in /etc/elogind/logind.conf to use it * I tried to make this work on my system but got kernel panics instead. I believe elogind is doing the correct thing but probably it's just my card / nivdia driver misbehaving. * Finally I settled for NVreg_PreserveVideoMemoryAllocations=0 in my /etc/modprobe.d/nvidia.conf and previous behavior is restored If someone else can test the elogind theory and confirm it is working perhaps we can include an additional comment in the nvidia-drivers ebuild on this.
Suspend does not seem to work on my system (USE=-systemd elogind) using the latest driver with any combination of NVreg_PreserveVideoMemoryAllocations=<0,1> and HandleNvidiaSleep=<no, yes>. With the latter set to "yes", trying to suspend turns off the display, but the machine is frozen, while other combinations just result in a aborted suspend. Interestingly, dmesg includes the following message regardless of what PreserveVideoMemoryAllocations is set to in /etc/modprobe.d/nvidia.conf: NVRM: GPU 0000:1c:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
(In reply to Matti Eskelinen from comment #7) > Suspend does not seem to work on my system (USE=-systemd elogind) using the > latest driver with any combination of > NVreg_PreserveVideoMemoryAllocations=<0,1> and HandleNvidiaSleep=<no, yes>. > With the latter set to "yes", trying to suspend turns off the display, but > the machine is frozen, while other combinations just result in a aborted > suspend. > > Interestingly, dmesg includes the following message regardless of what > PreserveVideoMemoryAllocations is set to in /etc/modprobe.d/nvidia.conf: > > NVRM: GPU 0000:1c:00.0: PreserveVideoMemoryAllocations module parameter is > set. System Power Management attempted without driver procfs suspend > interface. Please refer to the 'Configuring Power Management Support' > section in the driver README. Hi! For example I do not use the HandleNvidiaSleep at all in the /etc/elogind/logind.conf. You try to do not use that value in that file ;)
nvidia-drivers-460.39 changelog has: "Updated the NVIDIA driver to restore functionality of some features, including runtime power management, hotplugging audio-capable display devices, and S0ix-based system suspend, with recent kernels such as Linux 5.10." Unsure if it really fix this bug, but it's something to look at.
(In reply to Ionen Wolkens from comment #9) > nvidia-drivers-460.39 Also, if anyone wants to test it ahead of time. Just renaming the 460.27.04 ebuild to 460.39 is sufficient (for now anyway). Would be good to know if suspend issues are really fixed (without workarounds) so it could be dumped in stable hopefully sooner than later.
I have tried the nvidia drivers 460.39 and the issue seems to persist. Is there any logs that you would like me to provide? I don't use systemd.
(In reply to João Santos from comment #11) > I have tried the nvidia drivers 460.39 and the issue seems to persist. > > Is there any logs that you would like me to provide? I don't use systemd. NVreg_PreserveVideoMemoryAllocations=0 seems to work in 460.39 as well.
Thanks for trying, maybe the ebuild should provide the workaround for now then...
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3f48cdfdff62bdeb440900f04d7745dd08365eba commit 3f48cdfdff62bdeb440900f04d7745dd08365eba Author: David Seifert <soap@gentoo.org> AuthorDate: 2021-02-08 09:39:20 +0000 Commit: David Seifert <soap@gentoo.org> CommitDate: 2021-02-08 09:39:20 +0000 x11-drivers/nvidia-drivers: Default disable PreserveVideoMemoryAllocations Bug: https://bugs.gentoo.org/763129 Package-Manager: Portage-3.0.14, Repoman-3.0.2 Signed-off-by: David Seifert <soap@gentoo.org> x11-drivers/nvidia-drivers/files/nvidia-460.conf | 20 ++++++++++++++++++++ ...460.39.ebuild => nvidia-drivers-460.39-r1.ebuild} | 2 +- 2 files changed, 21 insertions(+), 1 deletion(-)
Wow. Thanks to this bug I now know why I can't suspend any more. I was on the verge of chasing a phantom in elogind it seems. Just for the record: elogind does (almost) exactly what [/usr/bin/nvidia-sleep.sh] does. That script is called by nvidia-sleepd, btw. That systemd unit doesn't do anything else. (AFAIR, could be different now, haven't checked for a couple of months.) However the [HandleNvidiaSleep] option is defaulted to "no" and described in [man logind.conf] as experimental: -------- Using the /proc/driver/nvidia/suspend is considered experimental by Nvidia, and should only be used if it is neccessary, and the official /usr/bin/nvidia-sleep.sh can not be used from a system-sleep hook script for some reason. Please read the Nvidia power management guide[1] for more information -------- Now to something completely different but not completely unrelated: x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and therefore blocks xorg-server. @David: Why was this flag removed? The Git log does not provide any clues. USE="+libglvnd" was removed here: https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c2ba6462d3c4e8df0546bea13411ffc0faf02cf0
(In reply to Sven Eden from comment #15) > x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and > therefore blocks xorg-server. The blocker was already adjusted in xorg-server but hasn't received a revbump "yet". This means users of portage's --dynamic-deps=y (this is default, unless using USE=gentoo-dev on portage, or dealing with a binhost), "shouldn't" get the blocker but there could be some other setups where it's an issue. If in doubt, "emerge -1 xorg-server" to force-update it for now. > Why was this flag removed? The Git log does not provide any clues. Disabling it wasn't possible anymore (support is entirely gone), so it's cleanups and it's now always-enabled without a flag.
(In reply to Ionen Wolkens from comment #16) > adjusted in xorg-server but hasn't received a revbump "yet". Or just as I said that, the revbump is in. Give it a bit of time and should show up in a --sync soon.
(In reply to Sven Eden from comment #15) > Wow. Thanks to this bug I now know why I can't suspend any more. I was on > the verge of chasing a phantom in elogind it seems. > > Just for the record: > elogind does (almost) exactly what [/usr/bin/nvidia-sleep.sh] does. That > script is called by nvidia-sleepd, btw. That systemd unit doesn't do > anything else. (AFAIR, could be different now, haven't checked for a couple > of months.) > > However the [HandleNvidiaSleep] option is defaulted to "no" and described in > [man logind.conf] as experimental: > > -------- > Using the /proc/driver/nvidia/suspend is considered experimental by Nvidia, > and should only be used if it is neccessary, and the official > /usr/bin/nvidia-sleep.sh can not be used from a system-sleep hook script for > some reason. Please read the Nvidia power management guide[1] for more > information > -------- > > Now to something completely different but not completely unrelated: > > x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and > therefore blocks xorg-server. > > @David: Why was this flag removed? The Git log does not provide any clues. > > USE="+libglvnd" was removed here: > https://gitweb.gentoo.org/repo/gentoo.git/commit/ > ?id=c2ba6462d3c4e8df0546bea13411ffc0faf02cf0 I should've documented the USE="libglvnd" removal better, that's on me. That said, explicit USE="libglvnd" will disappear, because all nvidia-drivers ebuilds in 2 weeks will either 1) work with libglvnd seemlessly 2) or use that conf file (390) and still kinda work. There's no reason to bifurcate between libglvnd and eselect-opengl (which is gone) anymore.
Given haven't heard back from this, I assume the NVreg_PreserveVideoMemoryAllocations=0 workaround is working. It would be difficult to ensure every suspend method use "nvidia's way" to suspend so it's probably best kept this way (it's also nvidia's default). Edit the config file back if needed.
(In reply to Ionen Wolkens from comment #19) > Given haven't heard back from this, I assume the > NVreg_PreserveVideoMemoryAllocations=0 workaround is working. Since I have ensured that the value is zero on all my machines, every single one of them is suspending and waking up flawlessly from Plasma, OpenBox and tty, both manually and using elogind. Thanks alot!
Setting NVreg_PreserveVideoMemoryAllocations=0 does allow my system to suspend/resume "successfully", however after a suspend/resume, CUDA will not work. Other distributions suggest setting NVreg_PreserveVideoMemoryAllocations=1 to solve the CUDA issue, however then I cannot suspend at all. So, it seems like some more thought is needed here. I've tried setting HandleNvidiaSleep=yes in elogind, but this does not result in a successful suspend.
Been feeling that =0 is safer given there's many ways to suspend, e.g. pm-suspend was used here, and returning to =1 may need hooks with everything making this difficult. I'm open to suggestions if there's a way to make this more sane though (if not fit for the ebuild, improving the wiki page on how to handle suspend in general could help users too). I'm not much of a elogind (or suspend) user myself, so help would be welcome.