Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 923456 - x11-drivers/nvidia-drivers-545.29.06-r1 fails to build with kernel 6.1.76 and 6.7.3 (ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock')
Summary: x11-drivers/nvidia-drivers-545.29.06-r1 fails to build with kernel 6.1.76 and...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Ionen Wolkens
URL: https://forums.developer.nvidia.com/t...
Whiteboard:
Keywords: PATCH
Depends on:
Blocks:
 
Reported: 2024-02-01 10:06 UTC by Marius Stoica
Modified: 2024-02-05 16:49 UTC (History)
16 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build log (nvidia-info.log,7.40 KB, text/x-log)
2024-02-01 10:07 UTC, Marius Stoica
Details
emerge --info (nvidia-info.log,7.40 KB, text/x-log)
2024-02-01 10:07 UTC, Marius Stoica
Details
build log (build.log,636.58 KB, text/x-log)
2024-02-01 10:09 UTC, Marius Stoica
Details
pfn-valid-525.patch (pfn-valid-525.patch,1.92 KB, patch)
2024-02-01 18:58 UTC, Ionen Wolkens
Details | Diff
nvidia-drivers-470.223.02-gpl-pfn_valid.patch (nvidia-drivers-470.223.02-gpl-pfn_valid.patch,2.21 KB, patch)
2024-02-01 20:23 UTC, Ionen Wolkens
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Stoica 2024-02-01 10:06:21 UTC
$ sudo emerge -av x11-drivers/nvidia-drivers

...

sh ./scripts/modules-check.sh /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/modules.order
make -f ./scripts/Makefile.modpost
   sed 's/ko$/o/'  /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/modules.order | scripts/mod/modpost      -o /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/Module.symvers -n -e -i Module.symvers -T - 
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_lock'
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol '__rcu_read_unlock'
make[2]: *** [scripts/Makefile.modpost:126: /var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel/Module.symvers] Error 1
make[1]: *** [Makefile:1967: modpost] Error 2
make: *** [Makefile:82: modules] Error 2
 * ERROR: x11-drivers/nvidia-drivers-545.29.06-r1::gentoo failed (compile phase):
 *   emake failed
 * 
 * If you need support, post the output of `emerge --info '=x11-drivers/nvidia-drivers-545.29.06-r1::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=x11-drivers/nvidia-drivers-545.29.06-r1::gentoo'`.
 * The complete build log is located at '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/temp/environment'.
 * Working directory: '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work/kernel'
 * S: '/var/tmp/portage/x11-drivers/nvidia-drivers-545.29.06-r1/work'

>>> Failed to emerge x11-drivers/nvidia-drivers-545.29.06-r1, Log file:

Reproducible: Always
Comment 1 Marius Stoica 2024-02-01 10:07:17 UTC
Created attachment 883906 [details]
build log
Comment 2 Marius Stoica 2024-02-01 10:07:42 UTC
Created attachment 883907 [details]
emerge --info
Comment 3 Marius Stoica 2024-02-01 10:09:31 UTC
Created attachment 883908 [details]
build log
Comment 4 Christian Bricart 2024-02-01 10:17:53 UTC
see also: https://forums.developer.nvidia.com/t/280908

see also: https://github.com/NVIDIA/open-gpu-kernel-modules/issues/594#issuecomment-1916197641


> Oh, sorry: we actually do have a new problem in the 6.8-rcs with
> __rcu_read_unlock/__rcu_read_lock, due to use of the macro
> pfn_valid() which in turn calls those EXPORT_GPL_SYMBOLS. It
> required a bit of detangling, but our next 550 release
> should remove the pfn_valid() use. Thanks for your patience.
Comment 5 Christian Bricart 2024-02-01 10:21:12 UTC
> … but our next 550 release should remove the pfn_valid() use.

note: »next 550« is not x11-drivers/nvidia-drivers-550.40.07 but some 550 version yet to be released
Comment 6 Oliver Hildebrandt 2024-02-01 10:39:12 UTC
Same Problem here with gentoo-sources-6.7.3 and nvidia-drivers 545.29.06-r1 and 550.40.07.

build.log:
https://0x0.st/HDfd.log
Comment 7 Vasco Steinmetz 2024-02-01 11:23:30 UTC
Same problem (GPL'd __rcu_read_lock/unlock symbols) with vanilla-sources:6.6.15 and nvidia-drivers-470.223.02.
Builds fine with 6.6.14.
Interestingly both kernel versions exports those symbols GPL'ed:
egrep 'GPL.*__rcu_read_(un)?lock' /usr/src/linux-6.6.1[45]/kernel/rcu/tree_plugin.h
Comment 8 Christian Bricart 2024-02-01 11:40:37 UTC
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/include/linux/mmzone.h?h=v6.7.3&id=3a01daace71b521563c38bbbf874e14c3e58adb7 added rcu_read_unlock/lock into the static inline pfn_valid - that's why nvidia-drivers is suddenly trapping over GPL exports now…

(the above commit went into 6.7.3 but got also merged into the other latest Kernels released yesterday, i.e 6.6.15 and 6.1.76)
Comment 9 Manfred Knick 2024-02-01 12:41:55 UTC
(In reply to Vasco Steinmetz from comment #7)
DITO:
. . . sys-kernel/gentoo-sources-6.6.15
. . . x11-drivers/nvidia-drivers-545.29.06-r1
Comment 10 Ionen Wolkens gentoo-dev 2024-02-01 18:30:33 UTC
Think I can figure out a workaround from a quick look (that is, without modifying the kernel), will need a bit to sort it out for all versions though (got 470 to build so far).

All branches above 470 are affected when nvidia-drm is enabled (legacy 390 still builds fine because it only uses pfn_valid on ppc64).

Likely worth an early workaround given I don't expect NVIDIA to rush out a release for all branches all that soon even if it's planned.
Comment 11 Ionen Wolkens gentoo-dev 2024-02-01 18:58:32 UTC
Created attachment 883953 [details, diff]
pfn-valid-525.patch

Still hardly tested and haven't tried runtime nor checked how it affects pre-6.1 kernels.

Be good to know if runtime is fine if anyone want to try it early before I add it to ebuilds, only assuming that the old implementation still works with these kernels atm.

This patch version is for any drivers from 525 to 550.
Comment 12 Chicago 2024-02-01 19:19:42 UTC
Hi,

    We also have this on the longterm 6.6 kernel beginning with sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the nVidia UNIX driver x11-drivers/nvidia-drivers:0/535.

    Here's what I did.  It seems to be working so far.

1) mkdir -p /etc/portage/patches/sys-kernel/gentoo-sources:6.6.15
2) made some patches after the nVidia driver failed to emerge normally

cat /etc/portage/patches/sys-kernel/gentoo-sources\:6.6.15/nVidia-whoopsie.patch 
diff --git a/arch/x86/kernel/alternative.c.orig b/arch/x86/kernel/alternative.c
index aae7456..309d652 100644
--- a/arch/x86/kernel/alternative.c.orig
+++ b/arch/x86/kernel/alternative.c
@@ -33,7 +33,7 @@
 
 int __read_mostly alternatives_patched;
 
-EXPORT_SYMBOL_GPL(alternatives_patched);
+EXPORT_SYMBOL(alternatives_patched);
 
 #define MAX_PATCH_LEN (255-1)




cat /etc/portage/patches/sys-kernel/gentoo-sources\:6.6.15/phantom-coherence.patch
diff --git a/kernel/rcu/tree_plugin.h.orig b/kernel/rcu/tree_plugin.h
index 4102108..72474d8 100644
--- a/kernel/rcu/tree_plugin.h.orig
+++ b/kernel/rcu/tree_plugin.h
@@ -406,7 +406,7 @@ void __rcu_read_lock(void)
                WRITE_ONCE(current->rcu_read_unlock_special.b.need_qs, true);
        barrier();  /* critical section after entry code. */
 }
-EXPORT_SYMBOL_GPL(__rcu_read_lock);
+EXPORT_SYMBOL(__rcu_read_lock);
 
 /*
  * Preemptible RCU implementation for rcu_read_unlock().
@@ -431,7 +431,7 @@ void __rcu_read_unlock(void)
                WARN_ON_ONCE(rrln < 0 || rrln > RCU_NEST_PMAX);
        }
 }
-EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+EXPORT_SYMBOL(__rcu_read_unlock);
 
 /*
  * Advance a ->blkd_tasks-list pointer to the next entry, instead




3) built the kernel and modules and rebooted

uname -a
Linux torus 6.6.15-gentoo #1 SMP PREEMPT_DYNAMIC Thu Feb  1 18:30:32 UTC 2024 x86_64 Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz GenuineIntel GNU/Linux

nvidia-smi
Thu Feb  1 19:11:30 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05             Driver Version: 535.154.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:02:00.0  On |                  N/A |
|100%   57C    P0              73W / 300W |    899MiB / 11264MiB |      2%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-01 19:26:10 UTC
(In reply to Chicago from comment #12)
> Hi,
> 
>     We also have this on the longterm 6.6 kernel beginning with
> sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the
> nVidia UNIX driver x11-drivers/nvidia-drivers:0/535.
> 

Your thing is expected to work, but it's not ideal, as it involves patching the kernel. We need testers for Ionen's approach as it's limited to nvidia-drivers.
Comment 14 Ionen Wolkens gentoo-dev 2024-02-01 19:29:09 UTC
(In reply to Ionen Wolkens from comment #11)
> nor checked how it affects pre-6.1 kernels
From a quick look 4.19, and 5.4+5.10 have two different pfn_valid, so this patch would not be right. However pre-6.1 kernel have not received a release yet so they are not affected (unknown if that will last).

For now figure I'll limit the patch to affected kernels.
Comment 15 Klemen Mihevc 2024-02-01 19:32:44 UTC
(In reply to Sam James from comment #13)
> (In reply to Chicago from comment #12)
> > Hi,
> > 
> >     We also have this on the longterm 6.6 kernel beginning with
> > sys-kernel/gentoo-sources:6.6.15 and the latest production branch of the
> > nVidia UNIX driver x11-drivers/nvidia-drivers:0/535.
> > 
> 
> Your thing is expected to work, but it's not ideal, as it involves patching
> the kernel. We need testers for Ionen's approach as it's limited to
> nvidia-drivers.

Im can check tomorrow with 550 and 6.7.3, it builds fine now with a patch, but i cant reboot server right now since its runing plex and few people are watching :) but my use case is very limited... just for HW transcode.
Comment 16 Chicago 2024-02-01 19:35:15 UTC
Is there anybody talking about this feature as a shots fired moment?

Am I supposed to expect in the future when I want to build kernels, I'm going to have to revert prohibitive changes with "let's not and say we did" to GPL checks?
Comment 17 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-01 19:40:40 UTC
Ultimately, I expect this kind of thing to propel USE=kernel-open (nvidia's efforts for an open source kernel module).
Comment 18 Ionen Wolkens gentoo-dev 2024-02-01 20:12:21 UTC
(In reply to Ionen Wolkens from comment #11)
> Created attachment 883953 [details, diff] [details, diff]
> pfn-valid-525.patch
> 
> Still hardly tested and haven't tried runtime nor checked how it affects
> pre-6.1 kernels.
> 
> Be good to know if runtime is fine if anyone want to try it early before I
> add it to ebuilds, only assuming that the old implementation still works
> with these kernels atm.
> 
> This patch version is for any drivers from 525 to 550.
On a side-note, don't try that patch with USE=kernel-open -- fix is incomplete for that one.
Comment 19 Ionen Wolkens gentoo-dev 2024-02-01 20:14:40 UTC
(In reply to Ionen Wolkens from comment #18)
> On a side-note, don't try that patch with USE=kernel-open -- fix is
> incomplete for that one.
Or more precisely, the patch is unnecessary :) Just need to drop chunks that shouldn't be there that end up breaking it.
Comment 20 Ionen Wolkens gentoo-dev 2024-02-01 20:23:57 UTC
Created attachment 883954 [details, diff]
nvidia-drivers-470.223.02-gpl-pfn_valid.patch

Well, may as well update it here. This version should work for 470 to 550.

I still haven't really tried runtime though, and also need more build testing with different versions before adding it to the ebuild.
Comment 21 Mike Johnson 2024-02-01 21:01:44 UTC
(In reply to Ionen Wolkens from comment #20)
> Created attachment 883954 [details, diff] [details, diff]
> nvidia-drivers-470.223.02-gpl-pfn_valid.patch
> 
> Well, may as well update it here. This version should work for 470 to 550.
> 
> I still haven't really tried runtime though, and also need more build
> testing with different versions before adding it to the ebuild.

Appears to work runtime here, at least my desktop is up.  Thanks
vanilla-sources-6.7.3
Comment 22 Ionen Wolkens gentoo-dev 2024-02-01 21:05:04 UTC
(In reply to Mike Johnson from comment #21)
> (In reply to Ionen Wolkens from comment #20)
> > Created attachment 883954 [details, diff] [details, diff] [details, diff]
> > nvidia-drivers-470.223.02-gpl-pfn_valid.patch
> > 
> > Well, may as well update it here. This version should work for 470 to 550.
> > 
> > I still haven't really tried runtime though, and also need more build
> > testing with different versions before adding it to the ebuild.
> 
> Appears to work runtime here, at least my desktop is up.  Thanks
> vanilla-sources-6.7.3
Thanks, I did some testing too and not seeing problems either. So guess I'll add it.
Comment 23 Larry the Git Cow gentoo-dev 2024-02-01 21:06:09 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c64caf5352e8b82edbaa2204aaf055cbcabfc8d1

commit c64caf5352e8b82edbaa2204aaf055cbcabfc8d1
Author:     Ionen Wolkens <ionen@gentoo.org>
AuthorDate: 2024-02-01 19:58:45 +0000
Commit:     Ionen Wolkens <ionen@gentoo.org>
CommitDate: 2024-02-01 21:05:17 +0000

    x11-drivers/nvidia-drivers: fix build w/ kernel 6.1.76+6.6.15+6.7.3
    
    NVIDIA already confirmed to be planning a release without pfn_valid,
    so this is temporary until then. May need revisiting for older kernels
    if change is further backported.
    
    bug #923456 could be closed but leaving open for visibility for now.
    
    Bug: https://bugs.gentoo.org/923456
    Signed-off-by: Ionen Wolkens <ionen@gentoo.org>

 .../nvidia-drivers-470.223.02-gpl-pfn_valid.patch  | 62 ++++++++++++++++++++++
 .../nvidia-drivers-470.223.02.ebuild               |  1 +
 .../nvidia-drivers-525.147.05.ebuild               |  1 +
 .../nvidia-drivers-535.146.02.ebuild               |  1 +
 .../nvidia-drivers-535.154.05.ebuild               |  1 +
 .../nvidia-drivers/nvidia-drivers-535.43.23.ebuild |  1 +
 .../nvidia-drivers-545.29.06-r1.ebuild             |  1 +
 .../nvidia-drivers/nvidia-drivers-550.40.07.ebuild |  1 +
 8 files changed, 69 insertions(+)