Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 652408

Summary: x11-drivers/nvidia-drivers - something does not work when AMD Secure Memory Encryption is enabled
Product: Gentoo Linux Reporter: Gabriel Caudrelier <gabriel.caudrelier>
Component: Current packagesAssignee: David Seifert <soap>
Status: RESOLVED FIXED    
Severity: normal CC: ionen
Priority: Normal Keywords: PullRequest
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://github.com/gentoo/gentoo/pull/20282
Whiteboard:
Package list:
Runtime testing required: ---

Description Gabriel Caudrelier 2018-04-04 12:11:31 UTC
This is an FYI but nvidia drivers do not work when AMD Secure Memory Encryption is enabled in the kernel.

The only relevant message in dmesg is this one :

nvidia 0000:41:00.0: SME is active, device will require DMA bounce buffers

Kernel configuration to disable:

Processor type and features -->
  CONFIG_AMD_MEM_ENCRYPT

Tested against 4.14.31/32 but is probably the same on 14.15.x

This feature was introduced in 4.14.x as far as I understand, so not a problem before this version.

Maybe worth adding it to the wiki.
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2018-04-04 13:16:53 UTC
What doesn't work when AMD Secure Memory Encryption is enabled?
Comment 2 Gabriel Caudrelier 2018-04-04 23:32:10 UTC
The kernel modules load properly, but when starting sddm I get a blank screen.
Comment 3 Gabriel Caudrelier 2018-04-05 13:30:59 UTC
My bad, the nvidia modules do not initialize properly at all, I missed the logs earlier

# dmesg

[   31.630667] nvidia 0000:41:00.0: SME is active, device will require DMA bounce buffers
[   31.832968] NVRM: RmInitAdapter failed! (0x24:0x1e:1087)
[   31.832991] NVRM: rm_init_adapter failed for device bearing minor number 0


Xorg.0.log


[    44.433] (**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
[    44.433] (==) NVIDIA(0): RGB weight 888
[    44.433] (==) NVIDIA(0): Default visual is TrueColor
[    44.433] (==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
[    44.433] (**) NVIDIA(0): Enabling 2D acceleration
[    44.650] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:65:0:0.  Please
[    44.650] (EE) NVIDIA(GPU-0):     check your system's kernel log for additional error
[    44.650] (EE) NVIDIA(GPU-0):     messages and refer to Chapter 8: Common Problems in the
[    44.650] (EE) NVIDIA(GPU-0):     README for additional information.
[    44.650] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device!
[    44.650] (EE) NVIDIA(0): Failing initialization of X screen 0


By the way this was tested against sys-kernel/gentoo-sources-4.15.15

Again, disabling CONFIG_AMD_MEM_ENCRYPT makes things work again.
Comment 4 Gabriel Caudrelier 2018-08-26 19:05:06 UTC
Update for kernel 4.18.5 (at the time of this writing this is a supposedly a supported kernel version by NVidia)

The nvidia drivers now works when CONFIG_AMD_MEM_ENCRYPT is enabled, but not if SME is enabled by default, with CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT.

Which is roughly the same to say that it does not work with SME.

Same error messages.
Comment 5 Maxxim 2020-07-29 19:54:59 UTC
Same here, Gentoo Linux amd64, AMD Ryzen 5 2600X, NVIDIA RTX 2060 Super.

Errors in Xorg.0.log when starting X:
(EE) NVIDIA(GPU-0): Failed to initialize DMA
(EE) NVIDIA(0): Failed to allocate push buffer

Tested with gentoo-sources 5.4.48 and nvidia-drivers 450.57.

It would seem NVIDIA has been aware of this bug for quite a while, but is either unable or unwilling to fix it:
https://forums.developer.nvidia.com/t/unable-to-start-x-failed-to-initialize-dma/64925/12
Comment 6 Maxxim 2020-08-13 15:49:31 UTC
It would seem this is not going to be fixed anytime soon. This is the reply I now got from NVIDIA after opening a bug back in February:

AMD SME is not supported in NVIDIA Linux driver. Driver README calls out that user needs to disable SME : [https://download.nvidia.com/XFree86/Linux-x86_64/450.57/README/dma_issues.html]
Engineering team is currently evaluating multiple options to support this feature in future, however we can not commit any timeline for it at this moment.
Comment 7 Ionen Wolkens gentoo-dev 2021-03-06 08:48:02 UTC
I don't think there's much that can be done here about this.

At best could add a kernel config check for CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
To at least discourage users from using it (can't check for merely enabled as this is a default in gentoo-kernel{,-bin}).
Comment 8 Larry the Git Cow gentoo-dev 2021-03-21 15:53:36 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=26146d1510fd678538b7d02400c1eb8e66e20212

commit 26146d1510fd678538b7d02400c1eb8e66e20212
Author:     Ionen Wolkens <sudinave@gmail.com>
AuthorDate: 2021-03-21 15:52:10 +0000
Commit:     David Seifert <soap@gentoo.org>
CommitDate: 2021-03-21 15:52:10 +0000

    x11-drivers/nvidia-drivers: bump to 460.67 with refactored ebuild
    
    ebuild carries a lot of history and, rather than cleanups, it needed
    something closer to a rewrite.
    
    Bugfixes:
    - Removed all udev rules to solve long standing issues (bug #454740)
    - Install libraries with no X11 dependencies with USE=-X,
      notably for headless OpenCL/CUDA (bug #561706)
    - Install systemd unit for persistenced + nvpd user (bug #591638)
    - Add custom error message for DRM_KMS_HELPER and ensure driver
      doesn't attempt building DRM support without it (bug #603818)
    - Warn about AMD SME if enabled by default (bug #652408)
    - Distribute extra sources to lift RESTRICT="bindist mirror", the
      nvidia-driver.eclass is no longer used (bug #732702)
    - Build modprobe and persistenced from source (bug #747145)
    - Use system locations for vulkan icd/layers (bug #749600)
    
    Others:
    - Dropped IUSE=compat/multilib/kms/uvm/wayland
      > compat: was for non-GLVND variants and currently a no-op
      > multilib: obsolete, abi_x86_32 does all that's needed
      > kms/uvm: modules are loaded by nvidia-modprobe as-needed and
        there's not much sense in skipping installation. Will also save
        OpenCL/CUDA packages from having to depend on [uvm]
      > wayland: library is provided by gui-libs/egl-wayland instead which
        now also provides pkgconfig files and can be a newer version.
        optfeature warning was added for awareness.
    - Dropped REQUIRED_USE, all USE can now be used independently, e.g.
      now possible to get libXNVCtrl.a (static-libs) without the
      deps-heavy USE=tools
    - Dropped locale patch, the offending code it was meant to fix is gone.
    - Dropped linker patch, uses right linker even with -native-symlinks.
    - Added modprobe.d .conf to blacklist nouveau by default.
    - Patched nvidia-modprobe to respect nvidia.conf's permissions when
      creating uvm devices, was previously created as world read-write.
    - No longer installing libOpenCL.so loader (not needed to use OpenCL,
      was used by the no longer available eselect-opencl).
    - nvidia-persistenced init script simplified and updated for nvpd user.
    - nvidia-smi init script removed (all it did was query cards every 300
      seconds), mentioned behavior is no longer observable (fan scales
      normally without X) and it wasn't intended for this purpose.
    - Removed I2C_NVIDIA_GPU check as it caused unnecessary noise for
      gentoo-kernel-bin users (built as module), and being a bad thing
      even if loaded is questionable.
    - Attempt to reduce message noise. The only fatal CONFIG_CHECK is
      fairly rare so there's little reason to check twice with pkg_pretend.
    - ... but added new conditional messages to explain important things
      often seen as common sense but that a new user likely won't know.
    - Replaced the nvidia-driver.eclass legacy test with a compact version
      that reads supported-gpus.json (usable on >450).
    - More strict deps, some may sound strange but nvidia-settings only
      use headers for some of these (dbus/Xrandr/Xv/vdpau).
      > X? libs kept separate as it's the only one needing multilib deps.
      > pax-utils now unconditional for scanelf as libraries are always
        installed. Alternatively could've generated those, but prefer to
        leave it easier to maintain for future generations.
      > virtual/opencl removed, no sense in the drivers depending on this
        and it's instead applications using opencl that should.
      > Added MODULES_OPTIONAL_USE="driver" to handle linux-mod deps
    - Added MIT license for persistenced
    - Added ZLIB license for supported-gpus.json
    - NV_KERNEL_MAX (previously NV_KV_MAX_PLUS) set to be <=5.11 form
      rather than <5.12 given that often confused users thinking it meant
      5.12 support from quick looks.
    - arm64 support "should" work but runtime untested
    - And a long list of cleanups that "hopefully" won't cause new issues.
    
    Closes: https://bugs.gentoo.org/454740
    Closes: https://bugs.gentoo.org/561706
    Closes: https://bugs.gentoo.org/591638
    Closes: https://bugs.gentoo.org/603818
    Closes: https://bugs.gentoo.org/652408
    Closes: https://bugs.gentoo.org/732702
    Closes: https://bugs.gentoo.org/747145
    Closes: https://bugs.gentoo.org/749600
    Signed-off-by: Ionen Wolkens <sudinave@gmail.com>
    Signed-off-by: David Seifert <soap@gentoo.org>

 x11-drivers/nvidia-drivers/Manifest                |   7 +
 .../files/nvidia-blacklist-nouveau.conf            |   3 +
 .../files/nvidia-modprobe-390.141-uvm-perms.patch  |  12 +
 .../nvidia-drivers/files/nvidia-persistenced.confd |   7 +
 .../nvidia-drivers/files/nvidia-persistenced.initd |  12 +
 .../nvidia-drivers/nvidia-drivers-460.67.ebuild    | 391 +++++++++++++++++++++
 6 files changed, 432 insertions(+)
Comment 9 Ionen Wolkens gentoo-dev 2021-04-05 11:43:07 UTC
Just learned this is apparently working in >=460.56

So I'll be later changing the (ignorable) check to be only in 390/450 older branches.
Comment 10 Ionen Wolkens gentoo-dev 2021-04-05 11:51:56 UTC
(In reply to Ionen Wolkens from comment #9)
> Just learned this is apparently working in >=460.56
> 
> So I'll be later changing the (ignorable) check to be only in 390/450 older
> branches.
Or I'll consider it anyway, may be causing other problems while still usable. People who need this may want to try it and see how it goes.
Comment 11 Larry the Git Cow gentoo-dev 2021-04-06 20:00:43 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=89bc4e3deeca908e37998266f92a7fd35d1edbf9

commit 89bc4e3deeca908e37998266f92a7fd35d1edbf9
Author:     Ionen Wolkens <sudinave@gmail.com>
AuthorDate: 2021-04-06 20:00:16 +0000
Commit:     David Seifert <soap@gentoo.org>
CommitDate: 2021-04-06 20:00:16 +0000

    x11-drivers/nvidia-drivers: need PROC_FS, update AMD SME check
    
    CONFIG_CHECK changes:
    - PROC_FS:
      NVIDIA has fallback code to work with a kernel without this, but seems
      to be suffering from bit rot and doesn't compile. For the unlikely
      event a user has it unset (did happen), check so they don't wonder
      what's missing.
    - AMD SME:
      Reports of it being broken with NVIDIA were a bit outdated, should be
      functioning since (at least) >=460.56. As such leaving the imperfect
      "enabled by default" check only in older branches (bug #652408).
    
    Bug: https://bugs.gentoo.org/652408
    Signed-off-by: Ionen Wolkens <sudinave@gmail.com>
    Signed-off-by: David Seifert <soap@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-390.141-r1.ebuild    | 1 +
 x11-drivers/nvidia-drivers/nvidia-drivers-450.102.04-r1.ebuild | 1 +
 x11-drivers/nvidia-drivers/nvidia-drivers-460.67.ebuild        | 2 +-
 x11-drivers/nvidia-drivers/nvidia-drivers-465.19.01.ebuild     | 2 +-
 4 files changed, 4 insertions(+), 2 deletions(-)