$ sudo emerge -av x11-drivers/nvidia-drivers ... sh ./scripts/modules-check.sh /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/modules.order make -f ./scripts/Makefile.modpost sed 's/ko$/o/' /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/modules.order | scripts/mod/modpost -o /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/Module.symvers -n -e -i Module.symvers -T - ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock' ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock' make[2]: *** [scripts/Makefile.modpost:126: /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/Module.symvers] Error 1 make[1]: *** [Makefile:1967: modpost] Error 2 make: *** [Makefile:82: modules] Error 2 * ERROR: x11-drivers/nvidia-drivers-545.29.06-r1::gentoo failed (compile phase): * emake failed * * If you need support, post the output of `emerge --info '=x11-drivers/nvidia-drivers-545.29.06-r1::gentoo'`, * the complete build log and the output of `emerge -pqv '=x11-drivers/nvidia-drivers-545.29.06-r1::gentoo'`. * The complete build log is located at '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/temp/build.log'. * The ebuild environment file is located at '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/temp/environment'. * Working directory: '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel' * S: '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work' >>> Failed to emerge x11-drivers/nvidia-drivers-545.29.06-r1, Log file: Reproducible: Always
Created attachment 883906 [details] build log
Created attachment 883907 [details] emerge --info
Created attachment 883908 [details] build log
see also: https://forums.developer.nvidia.com/t/280908 see also: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/594#issuecomment-1916197641 > Oh, sorry: we actually do have a new problem in the 6.8-rcs with > __rcu_read_unlock/__rcu_read_lock, due to use of the macro > pfn_valid() which in turn calls those EXPORT_GPL_SYMBOLS. It > required a bit of detangling, but our next 550 release > should remove the pfn_valid() use. Thanks for your patience.
> … but our next 550 release should remove the pfn_valid() use. note: »next 550« is not x11-drivers/nvidia-drivers-550.40.07 but some 550 version yet to be released
Same Problem here with gentoo-sources-6.7.3 and nvidia-drivers 545.29.06-r1 and 550.40.07. build.log: https://0x0.st/HDfd.log
Same problem (GPL'd __rcu_read_lock/unlock symbols) with vanilla-sources:6.6.15 and nvidia-drivers-470.223.02. Builds fine with 6.6.14. Interestingly both kernel versions exports those symbols GPL'ed: egrep 'GPL.*__rcu_read_(un)?lock' /usr/src/linux-6.6.1[45]/kernel/rcu/tree_plugin.h
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/include/linux/mmzone.h?h=v6.7.3&id=3a01daace71b521563c38bbbf874e14c3e58adb7 added rcu_read_unlock/lock into the static inline pfn_valid - that's why nvidia-drivers is suddenly trapping over GPL exports now… (the above commit went into 6.7.3 but got also merged into the other latest Kernels released yesterday, i.e 6.6.15 and 6.1.76)
(In reply to Vasco Steinmetz from comment #7) DITO: . . . sys-kernel/gentoo-sources-6.6.15 . . . x11-drivers/nvidia-drivers-545.29.06-r1
Think I can figure out a workaround from a quick look (that is, without modifying the kernel), will need a bit to sort it out for all versions though (got 470 to build so far). All branches above 470 are affected when nvidia-drm is enabled (legacy 390 still builds fine because it only uses pfn_valid on ppc64). Likely worth an early workaround given I don't expect NVIDIA to rush out a release for all branches all that soon even if it's planned.
Created attachment 883953 [details, diff] pfn-valid-525.patch Still hardly tested and haven't tried runtime nor checked how it affects pre-6.1 kernels. Be good to know if runtime is fine if anyone want to try it early before I add it to ebuilds, only assuming that the old implementation still works with these kernels atm. This patch version is for any drivers from 525 to 550.
Hi, We also have this on the longterm 6.6 kernel beginning with sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the nVidia UNIX driver x11-drivers/nvidia-drivers:0/535. Here's what I did. It seems to be working so far. 1) mkdir -p /etc/portage/patches/sys-kernel/gentoo-sources:6.6.15 2) made some patches after the nVidia driver failed to emerge normally cat /etc/portage/patches/sys-kernel/gentoo-sources\:6.6.15/nVidia-whoopsie.patch diff --git a/arch/x86/kernel/alternative.c.orig b/arch/x86/kernel/alternative.c index aae7456..309d652 100644 --- a/arch/x86/kernel/alternative.c.orig +++ b/arch/x86/kernel/alternative.c @@ -33,7 +33,7 @@ int __read_mostly alternatives_patched; -EXPORT_SYMBOL_GPL(alternatives_patched); +EXPORT_SYMBOL(alternatives_patched); #define MAX_PATCH_LEN (255-1) cat /etc/portage/patches/sys-kernel/gentoo-sources\:6.6.15/phantom-coherence.patch diff --git a/kernel/rcu/tree_plugin.h.orig b/kernel/rcu/tree_plugin.h index 4102108..72474d8 100644 --- a/kernel/rcu/tree_plugin.h.orig +++ b/kernel/rcu/tree_plugin.h @@ -406,7 +406,7 @@ void __rcu_read_lock(void) WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true); barrier(); /* critical section after entry code. */ } -EXPORT_SYMBOL_GPL(__rcu_read_lock); +EXPORT_SYMBOL(__rcu_read_lock); /* * Preemptible RCU implementation for rcu_read_unlock(). @@ -431,7 +431,7 @@ void __rcu_read_unlock(void) WARN_ON_ONCE(rrln < 0 || rrln > RCU_NEST_PMAX); } } -EXPORT_SYMBOL_GPL(__rcu_read_unlock); +EXPORT_SYMBOL(__rcu_read_unlock); /* * Advance a ->blkd_tasks-list pointer to the next entry, instead 3) built the kernel and modules and rebooted uname -a Linux torus 6.6.15-gentoo #1 SMP PREEMPT_DYNAMIC Thu Feb 1 18:30:32 UTC 2024 x86_64 Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz GenuineIntel GNU/Linux nvidia-smi Thu Feb 1 19:11:30 2024 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:02:00.0 On | N/A | |100% 57C P0 73W / 300W | 899MiB / 11264MiB | 2% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
(In reply to Chicago from comment #12) > Hi, > > We also have this on the longterm 6.6 kernel beginning with > sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the > nVidia UNIX driver x11-drivers/nvidia-drivers:0/535. > Your thing is expected to work, but it's not ideal, as it involves patching the kernel. We need testers for Ionen's approach as it's limited to nvidia-drivers.
(In reply to Ionen Wolkens from comment #11) > nor checked how it affects pre-6.1 kernels From a quick look 4.19, and 5.4+5.10 have two different pfn_valid, so this patch would not be right. However pre-6.1 kernel have not received a release yet so they are not affected (unknown if that will last). For now figure I'll limit the patch to affected kernels.
(In reply to Sam James from comment #13) > (In reply to Chicago from comment #12) > > Hi, > > > > We also have this on the longterm 6.6 kernel beginning with > > sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the > > nVidia UNIX driver x11-drivers/nvidia-drivers:0/535. > > > > Your thing is expected to work, but it's not ideal, as it involves patching > the kernel. We need testers for Ionen's approach as it's limited to > nvidia-drivers. Im can check tomorrow with 550 and 6.7.3, it builds fine now with a patch, but i cant reboot server right now since its runing plex and few people are watching :) but my use case is very limited... just for HW transcode.
Is there anybody talking about this feature as a shots fired moment? Am I supposed to expect in the future when I want to build kernels, I'm going to have to revert prohibitive changes with "let's not and say we did" to GPL checks?
Ultimately, I expect this kind of thing to propel USE=kernel-open (nvidia's efforts for an open source kernel module).
(In reply to Ionen Wolkens from comment #11) > Created attachment 883953 [details, diff] [details, diff] > pfn-valid-525.patch > > Still hardly tested and haven't tried runtime nor checked how it affects > pre-6.1 kernels. > > Be good to know if runtime is fine if anyone want to try it early before I > add it to ebuilds, only assuming that the old implementation still works > with these kernels atm. > > This patch version is for any drivers from 525 to 550. On a side-note, don't try that patch with USE=kernel-open -- fix is incomplete for that one.
(In reply to Ionen Wolkens from comment #18) > On a side-note, don't try that patch with USE=kernel-open -- fix is > incomplete for that one. Or more precisely, the patch is unnecessary :) Just need to drop chunks that shouldn't be there that end up breaking it.
Created attachment 883954 [details, diff] nvidia-drivers-470.223.02-gpl-pfn_valid.patch Well, may as well update it here. This version should work for 470 to 550. I still haven't really tried runtime though, and also need more build testing with different versions before adding it to the ebuild.
(In reply to Ionen Wolkens from comment #20) > Created attachment 883954 [details, diff] [details, diff] > nvidia-drivers-470.223.02-gpl-pfn_valid.patch > > Well, may as well update it here. This version should work for 470 to 550. > > I still haven't really tried runtime though, and also need more build > testing with different versions before adding it to the ebuild. Appears to work runtime here, at least my desktop is up. Thanks vanilla-sources-6.7.3
(In reply to Mike Johnson from comment #21) > (In reply to Ionen Wolkens from comment #20) > > Created attachment 883954 [details, diff] [details, diff] [details, diff] > > nvidia-drivers-470.223.02-gpl-pfn_valid.patch > > > > Well, may as well update it here. This version should work for 470 to 550. > > > > I still haven't really tried runtime though, and also need more build > > testing with different versions before adding it to the ebuild. > > Appears to work runtime here, at least my desktop is up. Thanks > vanilla-sources-6.7.3 Thanks, I did some testing too and not seeing problems either. So guess I'll add it.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c64caf5352e8b82edbaa2204aaf055cbcabfc8d1 commit c64caf5352e8b82edbaa2204aaf055cbcabfc8d1 Author: Ionen Wolkens <ionen@gentoo.org> AuthorDate: 2024-02-01 19:58:45 +0000 Commit: Ionen Wolkens <ionen@gentoo.org> CommitDate: 2024-02-01 21:05:17 +0000 x11-drivers/nvidia-drivers: fix build w/ kernel 6.1.76+6.6.15+6.7.3 NVIDIA already confirmed to be planning a release without pfn_valid, so this is temporary until then. May need revisiting for older kernels if change is further backported. bug #923456 could be closed but leaving open for visibility for now. Bug: https://bugs.gentoo.org/923456 Signed-off-by: Ionen Wolkens <ionen@gentoo.org> .../nvidia-drivers-470.223.02-gpl-pfn_valid.patch | 62 ++++++++++++++++++++++ .../nvidia-drivers-470.223.02.ebuild | 1 + .../nvidia-drivers-525.147.05.ebuild | 1 + .../nvidia-drivers-535.146.02.ebuild | 1 + .../nvidia-drivers-535.154.05.ebuild | 1 + .../nvidia-drivers/nvidia-drivers-535.43.23.ebuild | 1 + .../nvidia-drivers-545.29.06-r1.ebuild | 1 + .../nvidia-drivers/nvidia-drivers-550.40.07.ebuild | 1 + 8 files changed, 69 insertions(+)