Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 912949 - x11-drivers/nvidia-drivers-535.104.05 Kernel Open Modules Don't Load SDDM (possible broken if built with clang/llvm?)
Summary: x11-drivers/nvidia-drivers-535.104.05 Kernel Open Modules Don't Load SDDM (po...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Ionen Wolkens
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-24 09:36 UTC by Neko-san
Modified: 2023-08-26 22:50 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Open-GPU Nvidia Driver Kernel Module Build and Emerge Info Log (nvidia-driver_build-log_and_emerge-info_log.tar.gz,103.96 KB, application/gzip)
2023-08-26 21:28 UTC, Neko-san
Details
dmesg log (dmesg_Aug-Sat-26-2023_03:55:47.log,77.14 KB, text/x-log)
2023-08-26 21:28 UTC, Neko-san
Details
Metalog Log (metalog_Aug-Sat-26-2023_03:55:55.log,181.99 KB, text/x-log)
2023-08-26 21:28 UTC, Neko-san
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Neko-san 2023-08-24 09:36:29 UTC
For some reason, using the open kernel modules seem to prevent SDDM from loading on boot both on 535.104.05 (testing) and 535.98 (stable)

OpenRC will claim the service is started but won't actually switch to the display for SDDM and, if you try to do this yourself (CRTL+ALT+F7), you'll just be met with a blinking cursor rather than SDDM for some reason
Comment 1 Ionen Wolkens gentoo-dev 2023-08-24 09:51:09 UTC
Your card is supported, right? (needs Turing/Ampere or newer, aka roughly >= GTX 1650)

Not that I think there's much I can do here downstream (not to mention I do not have a recent enough card to test it myself), these are still pretty experimental and NVIDIA still does not recommend them for desktop use and calling it alpha-quality[1] (the ebuild enables the option to allow it anyway and warns against using it in case). Alternative configuration like using llvm probably don't help.

Unless it's something else obvious, but I can't tell without seeing any Xorg logs (dmesg may have errors too).

[1] http://download.nvidia.com/XFree86/Linux-x86_64/535.104.05/README/kernel_open.html
Comment 2 Ionen Wolkens gentoo-dev 2023-08-24 10:15:09 UTC
emerge --info also wouldn't hurt, albeit I'd personally just recommend to go back to disabling USE=kernel-open for now.

Closing as NEEDINFO but odds are I won't be able to do anything about this and close this as UPSTREAM either way.
Comment 3 Neko-san 2023-08-25 00:55:00 UTC
1) My card is supported (2080 Ti)
2) Kernel-Open worked for me on Arch before I switched to Gentoo (https://github.com/frogging-family/nvidia-all)
3) The emerge --info is already included at the top of the log file I submitted
4) Considering the difference between Arch and this Gentoo setup, I can only imagine the reason is that kernel-open doesn't like LLVM for some reason but I have no clue why
5) I'm new to Gentoo, so I'm not sure how I would collect information related to the issue, where possible
Comment 4 Ionen Wolkens gentoo-dev 2023-08-26 14:09:41 UTC
(In reply to Neko-san from comment #3)
> 1) My card is supported (2080 Ti)
> 2) Kernel-Open worked for me on Arch before I switched to Gentoo
> (https://github.com/frogging-family/nvidia-all)
I see, should work in theory then.

> 3) The emerge --info is already included at the top of the log file I
> submitted
I don't see a log file.

> 4) Considering the difference between Arch and this Gentoo setup, I can only
> imagine the reason is that kernel-open doesn't like LLVM for some reason but
> I have no clue why
Maybe, never had a confirmation that it works (or not) with on a LLVM profile. Generally consider it lucky that nvidia even works at all there even with the closed source drivers (this used to be broken entirely).

> 5) I'm new to Gentoo, so I'm not sure how I would collect information
> related to the issue, where possible
Did the modules even load at all? (aka listed in `lsmod | grep nvidia`), if not dmesg tend to give some errors. If they did, then maybe sddm/xorg's logs have something.
Comment 5 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-08-26 14:13:14 UTC
I'd also add that if you're new, the LLVM profiles are not a great place to start. They're experimental and for advanced users.
Comment 6 Neko-san 2023-08-26 19:53:57 UTC
> I don't see a log file.

Thought I included it, sorry; will add

> Did the modules even load at all? (aka listed in `lsmod | grep nvidia`), if not dmesg tend to give some errors. If they did, then maybe sddm/xorg's logs have something.

I'll try again and check dmesg

> I'd also add that if you're new, the LLVM profiles are not a great place to start. They're experimental and for advanced users.

You say this, but I've have next to no issues whatsoever with LLVM and I have a full LLVM desktop with 1209 packages; very few packages have had any LLVM issues, and those that I still can't install/update because of LLVM issues are primarirly GUI apps that have flatpaks and I can just wait for their associated bugs here on the bugzilla to be fixed
Comment 7 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-08-26 19:55:32 UTC
That doesn't change the experimental nature of them, but you're the admin.
Comment 8 Neko-san 2023-08-26 21:28:03 UTC
Created attachment 868783 [details]
Open-GPU Nvidia Driver Kernel Module Build and Emerge Info Log

> That doesn't change the experimental nature of them
I suppose, but that only encourages me to report bugs I find so that it will become "stable"
Comment 9 Neko-san 2023-08-26 21:28:29 UTC
Created attachment 868784 [details]
dmesg log
Comment 10 Neko-san 2023-08-26 21:28:45 UTC
Created attachment 868785 [details]
Metalog Log
Comment 11 Neko-san 2023-08-26 21:32:22 UTC
SDDM and Xorg don't seem to even start; I created an alias for my shell to detect if their even running and dump their logs, if they are, and the aliases weren't available.
So, whatever it is, I imagine it's hanging SDDM and Xorg?
Comment 12 Ionen Wolkens gentoo-dev 2023-08-26 21:50:04 UTC
Yeah it does not seem like the modules are loading properly, so sddm/Xorg just can't do anything with that.

Not familiar with that error though:

    [  +0.000502] module: nvidia: Unknown rela relocation: 41

Few cases I see around are related to old binutils, but that does not seem to be related here (using lld anyway).

That aside I see:
    CFLAGS="
        -O3 -march=znver2 -mtune=znver2 -pipe -fno-plt
        -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat
        -Werror=format-security -fstack-clash-protection
        -fstack-protector-strong -fcf-protection
    "
Have you tried with just "-march=native -O2" so can rule these out as being related?

And note that FORTIFY_SOURCE=2, and these two -fstack* are default on gentoo so you don't need to pass them either way.

$ cat /etc/clang/gentoo-hardened.cfg
# Some of these options are added unconditionally, regardless of
# USE=hardened, for parity with sys-devel/gcc.
-fstack-clash-protection
-fstack-protector-strong
-fPIE
-include "/usr/include/gentoo/fortify.h"
Comment 13 Ionen Wolkens gentoo-dev 2023-08-26 21:56:32 UTC
On side-note, for these:

LDFLAGS="-Wl,-O3,--sort-common,--as-needed,-z,relro,-z,now -fuse-ld=lld -rtlib=compiler-rt -unwindlib=libunwind"

I'd drop lld/rtlib/unwind, and set USE="default-compiler-rt default-lld llvm-libunwind" on sys-devel/clang-common to ensure it's always used properly and does not confuse build systems.
Comment 14 Neko-san 2023-08-26 22:35:42 UTC
> Have you tried with just "-march=native -O2" so can rule these out as being related?
I just tried "-march=native -O2" and it worked; however, adding "-fno-plt" seems to be what triggers it
Comment 15 Ionen Wolkens gentoo-dev 2023-08-26 22:41:20 UTC
(In reply to Neko-san from comment #14)
> > Have you tried with just "-march=native -O2" so can rule these out as being related?
> I just tried "-march=native -O2" and it worked; however, adding "-fno-plt"
> seems to be what triggers it
Nice, guess I could filter it if it's just that then.
Comment 16 Larry the Git Cow gentoo-dev 2023-08-26 22:50:49 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=990f190320f767238a0366f7b077c73be526a890

commit 990f190320f767238a0366f7b077c73be526a890
Author:     Ionen Wolkens <ionen@gentoo.org>
AuthorDate: 2023-08-26 22:48:33 +0000
Commit:     Ionen Wolkens <ionen@gentoo.org>
CommitDate: 2023-08-26 22:49:30 +0000

    x11-drivers/nvidia-drivers: filter -fno-plt with kernel-open
    
    If similar issues come up again may opt to trade for strip-flags.
    Skipping revbump given open+no-plt is a rather rare configuration.
    
    Closes: https://bugs.gentoo.org/912949
    Signed-off-by: Ionen Wolkens <ionen@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-525.125.06.ebuild | 1 +
 x11-drivers/nvidia-drivers/nvidia-drivers-535.104.05.ebuild | 1 +
 x11-drivers/nvidia-drivers/nvidia-drivers-535.43.08.ebuild  | 1 +
 x11-drivers/nvidia-drivers/nvidia-drivers-535.98.ebuild     | 1 +
 4 files changed, 4 insertions(+)