Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 763129 - x11-drivers/nvidia-drivers-460.27.04: Suspend mode is not working.
Summary: x11-drivers/nvidia-drivers-460.27.04: Suspend mode is not working.
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal major with 2 votes (vote)
Assignee: Ionen Wolkens
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-01-02 15:08 UTC by Marcin Woźniak
Modified: 2021-06-10 15:56 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
pm-suspend (CC7RC3BEN.txt) + dmesg (H8TB7G3ZY.txt) logs (pmsuspend-and-dmesg.txt,94.00 KB, text/plain)
2021-01-02 16:05 UTC, Ionen Wolkens
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Woźniak 2021-01-02 15:08:00 UTC
After the upgrade I can not suspend my Gentoo.

For suspend I used:
`sudo pm-suspend` or `sudo s2ram`.

When I looked into a logs (https://dpaste.com/CC7RC3BEN) I saw problem with `s2ram_do: Input/output error`. Then I checked the kernel config for suspend and it was right. 

LOGS from pm-suspend: https://dpaste.com/CC7RC3BEN
DMESG after not suspend: https://dpaste.com/H8TB7G3ZY


The downgrade to the stable version solved the problem. Due to it is a unstable version I would like to create the bug ;) 

Regards,
Marcin "y0rune" Woźniak 
 

Reproducible: Always

Steps to Reproduce:
Install the newest version of nvidia
1.sudo pm-suspend
2.sudo s2ram
Comment 1 Ionen Wolkens gentoo-dev 2021-01-02 16:05:47 UTC
Created attachment 680701 [details]
pm-suspend (CC7RC3BEN.txt) + dmesg (H8TB7G3ZY.txt) logs

Hm, seems like 460.27.04 did changes things regarding power management and introduced a undocumented nvidia-powerd (not currently getting installed, not sure what to think of it).

I don't use suspend myself but that it broke something doesn't sound surprising.

Officially 460.27.04 is a beta driver (not just ~testing in gentoo), so here's to hope it'll get sorted out by the time it's non-beta if can't figure anything else out.

Took the liberty to attach your logs so they don't get lost in pasting service timeout (please do the same in the future).
Comment 2 Nick Reale 2021-01-02 16:06:15 UTC
Can confirm this issue on my end. The problem is the new version sets 	NVreg_PreserveVideoMemoryAllocations=1 in /etc/modprobe.d/nvidia.conf which tries to save video memory to /tmp on suspend but fails on my system because I don't have the nvidia-suspend systemd service running (I use systemd). I tried to enable that service but then suspend fails due to me not having /usr/bin/logger present on my system. I solved the problem finally by changing NVreg_PreserveVideoMemoryAllocations=1 to NVreg_PreserveVideoMemoryAllocations=0 in /etc/modprobe.d/nvidia.conf
Comment 3 Marcin Woźniak 2021-01-02 22:49:09 UTC
Hi All!

I will attach logs in attachments in the future.

I will try your option Nick ;) 

---
Marcin
Comment 4 Marcin Woźniak 2021-01-04 15:05:38 UTC
The updating that value works. Suspend works fine ;) Thank you :3


--
Marcin
Comment 5 Mathy Vanvoorden 2021-01-08 08:56:07 UTC
I experience the same issue, not much more info to add but:

* I am not running systemd, just openrc.d combined with elogind
* I checked git and noticed that the NVreg_PreserveVideoMemoryAllocations option has been set in nvidia-430.conf ever since it's addition 8 months ago so also the previous version of the driver was working with this option set even though I'm not using systemd at all
* The documentation explicitly states you need to have systemd and nvidia-suspend running if you enable the option.

I suspect previous versions silently fell back to the default of not preserving video memory allocations if it detected that nvidia-suspend was not running, so either that detection is broken or it was removed. I don't think we should expect that this behaviour will change and we should actually do the reasonable thing and do what the documentation tells us to do:

* USE=systemd Make sure people are actually running nvidia-suspend, which I guess is covered by the notification in the ebuild
* USE=-systemd -elogind Make sure the option is not set in nvidia.conf
* USE=-systemd elogind I think it might be possible to actually also use nvidia-suspend since all the systemd related interaction seems to happen through the elogind part. As I actually would like to use the feature on my system I will do some experimentation on my system to see if this is possible and report back. If it works we would need to write openrc.d scripts for the nvidia-suspend services.
Comment 6 Mathy Vanvoorden 2021-01-08 13:07:23 UTC
After some more research and experimentation:

* elogind >=246.9 has builtin support for the nvidia suspend (see related commit: https://github.com/elogind/elogind/commit/fc0661e60786c7613b0a018c188b064a4a5ef4f7 ). You need to enable HandleNvidiaSleep in /etc/elogind/logind.conf to use it
* I tried to make this work on my system but got kernel panics instead. I believe elogind is doing the correct thing but probably it's just my card / nivdia driver misbehaving.
* Finally I settled for NVreg_PreserveVideoMemoryAllocations=0 in my /etc/modprobe.d/nvidia.conf and previous behavior is restored

If someone else can test the elogind theory and confirm it is working perhaps we can include an additional comment in the nvidia-drivers ebuild on this.
Comment 7 Matti Eskelinen 2021-01-09 20:45:25 UTC
Suspend does not seem to work on my system (USE=-systemd elogind) using the latest driver with any combination of NVreg_PreserveVideoMemoryAllocations=<0,1> and HandleNvidiaSleep=<no, yes>. With the latter set to "yes", trying to suspend turns off the display, but the machine is frozen, while other combinations just result in a aborted suspend.

Interestingly, dmesg includes the following message regardless of what PreserveVideoMemoryAllocations is set to in /etc/modprobe.d/nvidia.conf:

NVRM: GPU 0000:1c:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Comment 8 Marcin Woźniak 2021-01-12 13:18:12 UTC
(In reply to Matti Eskelinen from comment #7)
> Suspend does not seem to work on my system (USE=-systemd elogind) using the
> latest driver with any combination of
> NVreg_PreserveVideoMemoryAllocations=<0,1> and HandleNvidiaSleep=<no, yes>.
> With the latter set to "yes", trying to suspend turns off the display, but
> the machine is frozen, while other combinations just result in a aborted
> suspend.
> 
> Interestingly, dmesg includes the following message regardless of what
> PreserveVideoMemoryAllocations is set to in /etc/modprobe.d/nvidia.conf:
> 
> NVRM: GPU 0000:1c:00.0: PreserveVideoMemoryAllocations module parameter is
> set. System Power Management attempted without driver procfs suspend
> interface. Please refer to the 'Configuring Power Management Support'
> section in the driver README.

Hi! For example I do not use the HandleNvidiaSleep at all in the /etc/elogind/logind.conf. You try to do not use that value in that file ;)
Comment 9 Ionen Wolkens gentoo-dev 2021-01-26 15:07:40 UTC
nvidia-drivers-460.39 changelog has:

"Updated the NVIDIA driver to restore functionality of some
 features, including runtime power management, hotplugging
 audio-capable display devices, and S0ix-based system suspend,
 with recent kernels such as Linux 5.10."

Unsure if it really fix this bug, but it's something to look at.
Comment 10 Ionen Wolkens gentoo-dev 2021-01-27 02:55:52 UTC
(In reply to Ionen Wolkens from comment #9)
> nvidia-drivers-460.39
Also, if anyone wants to test it ahead of time. Just renaming the 460.27.04 ebuild to 460.39 is sufficient (for now anyway).

Would be good to know if suspend issues are really fixed (without workarounds) so it could be dumped in stable hopefully sooner than later.
Comment 11 João Santos 2021-01-27 14:40:46 UTC
I have tried the nvidia drivers 460.39 and the issue seems to persist.

Is there any logs that you would like me to provide? I don't use systemd.
Comment 12 João Santos 2021-01-27 14:45:27 UTC
(In reply to João Santos from comment #11)
> I have tried the nvidia drivers 460.39 and the issue seems to persist.
> 
> Is there any logs that you would like me to provide? I don't use systemd.

NVreg_PreserveVideoMemoryAllocations=0 seems to work in 460.39 as well.
Comment 13 Ionen Wolkens gentoo-dev 2021-01-27 14:51:12 UTC
Thanks for trying, maybe the ebuild should provide the workaround for now then...
Comment 14 Larry the Git Cow gentoo-dev 2021-02-08 09:39:33 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3f48cdfdff62bdeb440900f04d7745dd08365eba

commit 3f48cdfdff62bdeb440900f04d7745dd08365eba
Author:     David Seifert <soap@gentoo.org>
AuthorDate: 2021-02-08 09:39:20 +0000
Commit:     David Seifert <soap@gentoo.org>
CommitDate: 2021-02-08 09:39:20 +0000

    x11-drivers/nvidia-drivers: Default disable PreserveVideoMemoryAllocations
    
    Bug: https://bugs.gentoo.org/763129
    Package-Manager: Portage-3.0.14, Repoman-3.0.2
    Signed-off-by: David Seifert <soap@gentoo.org>

 x11-drivers/nvidia-drivers/files/nvidia-460.conf     | 20 ++++++++++++++++++++
 ...460.39.ebuild => nvidia-drivers-460.39-r1.ebuild} |  2 +-
 2 files changed, 21 insertions(+), 1 deletion(-)
Comment 15 Sven Eden 2021-02-09 08:32:22 UTC
Wow. Thanks to this bug I now know why I can't suspend any more. I was on the verge of chasing a phantom in elogind it seems.

Just for the record:
elogind does (almost) exactly what [/usr/bin/nvidia-sleep.sh] does. That script is called by nvidia-sleepd, btw. That systemd unit doesn't do anything else. (AFAIR, could be different now, haven't checked for a couple of months.)

However the [HandleNvidiaSleep] option is defaulted to "no" and described in [man logind.conf] as experimental:

--------
Using the /proc/driver/nvidia/suspend is considered experimental by Nvidia, and should only be used if it is neccessary, and the official /usr/bin/nvidia-sleep.sh can not be used from a system-sleep hook script for some reason. Please read the Nvidia power management guide[1] for more information
--------

Now to something completely different but not completely unrelated:

x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and therefore blocks xorg-server.

@David: Why was this flag removed? The Git log does not provide any clues.

USE="+libglvnd" was removed here:
https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c2ba6462d3c4e8df0546bea13411ffc0faf02cf0
Comment 16 Ionen Wolkens gentoo-dev 2021-02-09 10:09:16 UTC
(In reply to Sven Eden from comment #15)
> x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and
> therefore blocks xorg-server.
The blocker was already adjusted in xorg-server but hasn't received a revbump "yet". This means users of portage's --dynamic-deps=y (this is default, unless using USE=gentoo-dev on portage, or dealing with a binhost), "shouldn't" get the blocker but there could be some other setups where it's an issue.

If in doubt, "emerge -1 xorg-server" to force-update it for now.

> Why was this flag removed? The Git log does not provide any clues.
Disabling it wasn't possible anymore (support is entirely gone), so it's cleanups and it's now always-enabled without a flag.
Comment 17 Ionen Wolkens gentoo-dev 2021-02-09 10:27:46 UTC
(In reply to Ionen Wolkens from comment #16)
> adjusted in xorg-server but hasn't received a revbump "yet".
Or just as I said that, the revbump is in. Give it a bit of time and should show up in a --sync soon.
Comment 18 David Seifert gentoo-dev 2021-02-09 10:44:02 UTC
(In reply to Sven Eden from comment #15)
> Wow. Thanks to this bug I now know why I can't suspend any more. I was on
> the verge of chasing a phantom in elogind it seems.
> 
> Just for the record:
> elogind does (almost) exactly what [/usr/bin/nvidia-sleep.sh] does. That
> script is called by nvidia-sleepd, btw. That systemd unit doesn't do
> anything else. (AFAIR, could be different now, haven't checked for a couple
> of months.)
> 
> However the [HandleNvidiaSleep] option is defaulted to "no" and described in
> [man logind.conf] as experimental:
> 
> --------
> Using the /proc/driver/nvidia/suspend is considered experimental by Nvidia,
> and should only be used if it is neccessary, and the official
> /usr/bin/nvidia-sleep.sh can not be used from a system-sleep hook script for
> some reason. Please read the Nvidia power management guide[1] for more
> information
> --------
> 
> Now to something completely different but not completely unrelated:
> 
> x11-drivers/nvidia-drivers-460.39-r1 got the libglvnd flag removed and
> therefore blocks xorg-server.
> 
> @David: Why was this flag removed? The Git log does not provide any clues.
> 
> USE="+libglvnd" was removed here:
> https://gitweb.gentoo.org/repo/gentoo.git/commit/
> ?id=c2ba6462d3c4e8df0546bea13411ffc0faf02cf0

I should've documented the USE="libglvnd" removal better, that's on me.

That said, explicit USE="libglvnd" will disappear, because all nvidia-drivers ebuilds in 2 weeks will either 1) work with libglvnd seemlessly 2) or use that conf file (390) and still kinda work. There's no reason to bifurcate between libglvnd and eselect-opengl (which is gone) anymore.
Comment 19 Ionen Wolkens gentoo-dev 2021-03-21 17:33:01 UTC
Given haven't heard back from this, I assume the NVreg_PreserveVideoMemoryAllocations=0 workaround is working.

It would be difficult to ensure every suspend method use "nvidia's way" to suspend so it's probably best kept this way (it's also nvidia's default).

Edit the config file back if needed.
Comment 20 Sven Eden 2021-03-26 06:24:28 UTC
(In reply to Ionen Wolkens from comment #19)
> Given haven't heard back from this, I assume the
> NVreg_PreserveVideoMemoryAllocations=0 workaround is working.

Since I have ensured that the value is zero on all my machines, every single one of them is suspending and waking up flawlessly from Plasma, OpenBox and tty, both manually and using elogind.
Thanks alot!
Comment 21 benland100 2021-06-10 15:32:57 UTC
Setting NVreg_PreserveVideoMemoryAllocations=0 does allow my system to suspend/resume "successfully", however after a suspend/resume, CUDA will not work. Other distributions suggest setting NVreg_PreserveVideoMemoryAllocations=1 to solve the CUDA issue, however then I cannot suspend at all. So, it seems like some more thought is needed here. 

I've tried setting HandleNvidiaSleep=yes in elogind, but this does not result in a successful suspend.
Comment 22 Ionen Wolkens gentoo-dev 2021-06-10 15:56:44 UTC
Been feeling that =0 is safer given there's many ways to suspend, e.g. pm-suspend was used here, and returning to =1 may need hooks with everything making this difficult.

I'm open to suggestions if there's a way to make this more sane though (if not fit for the ebuild, improving the wiki page on how to handle suspend in general could help users too). I'm not much of a elogind (or suspend) user myself, so help would be welcome.