The Linux Kernel changed in 2.6.32, which makes it incompatible with the current Nvidia Binary Driver. So far upstream has not provided a patch, but a patch is avaliable on the nvnews.net forums. There are two different versions for versions 190.53 and 195.30 of the Nvidia Binary Driver. Other versions are likely affected, but these are recent releases, so those using the older drivers are likely still using older kernels and are not affected by this issue.
There are threads on the Gentoo forums documenting this issue:
There is also at least one thread on the nvnews.net forums documenting it:
So far, 195.30 has not been added to portage, so it seems that only 190.53 needs to be addressed. I will attach patches after the bug is filed.
Steps to Reproduce:
1. Use Linux Kernel version 2.6.32 or 2.6.33 on a system that has hardware that the nvidia binary driver supports
2. Emerge the nvidia-drivers
3. Watch it fail in your command prompt
Created attachment 216767 [details, diff]
Patch for x11-drivers/nvidia-drivers-190.53-r1 that fixes the issue
This is a patch from the nvnews.net forums that resolves the issue of the nvidia drivers failing to emerge.
Created attachment 216768 [details, diff]
A patch to the nvidia-drivers-190.53-r1.ebuild file
This is a patch to the nvidia-drivers-190.53-r1.ebuild file that will apply nvidia-190.53-2.6.33.patch to the drivers when running an affected kernels. It works by checking if the kernel version is greater than 2.6.31 and applying the patch when that condition is found to be true.
That ebuild should probably apply the patch no matter what, assuming the patch works with < 2.6.32 kernels, if not the patch should probably be adjusted to work with < 2.6.32 kernels by adding the conditional statements or whatever.
(In reply to comment #3)
> That ebuild should probably apply the patch no matter what, assuming the patch
> works with < 2.6.32 kernels, if not the patch should probably be adjusted to
> work with < 2.6.32 kernels by adding the conditional statements or whatever.
The patch is not needed for < 2.6.32 kernels because the kernel interfaces did not change until 2.6.32. If conditional statements are added to the patch so it can be applied regardless of kernel version, how would that change things versus how they are in the ebuild.patch I provided where the ebuild has the conditional statement?
As far as I can tell, you will just push work that portage could have done once onto the C preprocessor where it will be doing it in multiple places. Is there a coding standard that requires this?
My question was if the patch broke lower than 2.6.33 kernel versions - if so a fix would be required to make it compatible with all versions. - Because nvidia wouldn't approve of a patch that breaks every kernel version older than 2.6.33, conditional statements are common for that type of thing.
Having the same issue. Adding my name to CC list.
Tested the fix. It worked for me. Am now running 2.6.33-rc4.
(In reply to comment #5)
> My question was if the patch broke lower than 2.6.33 kernel versions - if so a
> fix would be required to make it compatible with all versions. - Because nvidia
> wouldn't approve of a patch that breaks every kernel version older than 2.6.33,
> conditional statements are common for that type of thing.
Future versions of Nvidia's drivers will likely have preprocessor conditionals to handle the change, but it seems to me that doing that is something that the upstream developers should do. Nvidia is not going to make another release of the 190.53 drivers to fix this specific issue and this ebuild already has a conditional for kernels below version 2.6.26 (if I recall correctly). I do not see why using this approach for a second patch is a problem.
The attached patch looks to be a bit off to me. The change in the kernel was (committed at 2263576c):
diff --git a/include/acpi/acpixf.h b/include/acpi/acpixf.h
index 0e7efea..f9b8b28 100644
@@ -154,7 +154,8 @@ acpi_status
- acpi_walk_callback user_function,
+ acpi_walk_callback pre_order_visit,
+ acpi_walk_callback post_order_visit,
void *context, void **return_value);
(see here: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blobdiff;f=include/acpi/acpixf.h;h=f9b8b28ad802e530353696fc90539c8ff388b328;hp=0e7efeacf6cb4353c0edf84bbfaea53f2dfb3a0c;hb=2263576cfc6e8f6ab038126c3254404b9fcb1c33;hpb=7d5d05d0704127c9acd24090c14731c111bd0af1)
Note that the new parameter was added after the fourth one. See also the commet for this commit. However, the attached patch has:
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2, 6, 33)
+#define acpi_walk_namespace(a,b,c,d,e,f) acpi_walk_namespace(a,b,c,d,e,f,NULL)
I would expect something more like:
#define acpi_walk_namespace(a,b,c,d,e,f) acpi_walk_namespace(a,b,c,d,NULL,e,f)
You can also see the rest of the kernel commit contains adaptations of the same kind to other calls to the same function.
The full diff is at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=2263576cfc6e8f6ab038126c3254404b9fcb1c33
Created attachment 221145 [details, diff]
A patch from the nvnews.net forums with a minor correction that fixes this issue.
This has been been modified to correct a logic error in the original version posted to nvnews.net.
Yuval Hager, I looked through your references and it appears that you are correct. The person at nvnews.net that made the patch made a logical error that introduces an extremely obscure bug into the driver. No one at the nvnews.net seems to have caught it.
I was initially going to let someone else handle this because I am new to using bugzilla, filing bug reports and the like, but the lack of action since I filed this bug report seems to indicate to me that people are waiting for me to do something about it because I posted the patches.
Anyway, I modified the patch file and uploaded the modified version. I am going to try to scrutinize the ebuild patch a bit more (to determine whether or not making changes to the sources for 2.6.32 is strictly necessary) either today or tomorrow and then report back. Let me know if there is anything else wrong with this, as I want to see this fix introduced into the portage tree and I find myself more willing to work on it as time passes and no one else does anything.
I posted a thread on the nvnews.net forums telling them about the issue with the patch someone else posted there:
Created attachment 221323 [details, diff]
Updated ebuild patch; people who installed the nvidia drivers with the old ebuild patch do not need to apply this
Okay, I compiled sys-kernel/gentoo-sources-2.6.32-r7 and "ran eselect kernel set linux-2.6.32-gentoo-r7 && module-rebuild rebuild" without having this patch applied on my system. The result was that the drivers compiled without an issue, so it is not necessary to apply this patch to kernel 2.6.32, contrary to what I had read on the nvnews.net forums.
I assume that the patch would break the driver compilation for kernel 2.6.32 if it were applied when someone is building the drivers for 2.6.32, but I do see the value in verifying this, so I am just going to assume that is the case. I have uploaded a new ebuild patch that will leave things as they are for 2.6.32 while patching 2.6.33 and later kernels.
This is my first time patching ebuilds, so I apolgoize for the confusion the misinformation that this might have caused. I will be more thorough prior to proposing patches in the future.
Anyway, many people have had success with this and I can only find a single complaint at the moment involving the nvidia drivers 190.53 and the 2.6.33 kernel, where this patch was applied is at the nvnews.net forums:
I have determined the issue that person is experiencing to be unrelated to this patch. Unless there are any more objections, this should be ready for inclusion into the portage tree.
I probably should have spell checked my comment before posting, but I revised the wording so many times before posting it that what I thought were well formed English sentences turned out to have a few issues. While I could probably make a patch for this as well, I will just restate what I tried to say, but said incorrectly:
This is my first time patching ebuilds, so I apologize for any confusion that the misinformation involving 2.6.32 might have caused. I will be more thorough in verifying information prior to proposing patches in the future to avoid recurrences of such events.
Thanks very much for all the information you've provided, the work you've done on the patch has been very useful. I'm afraid that since we're all volunteers working on Gentoo, we don't always have time to respond all that quickly to bugs and similar. So where you mentioned in comment 11 that you thought people expected you to do more because of the time taken to respond, we didn't expect you to do more, but we do much appreciate it as it makes our job a lot easier!
Thanks also for the ebuild modifications you included, however generally we try to discourage so called "conditional patching", because it can make it much more difficult to determine exactly what source code was compiled for a particular system (suddenly it's not just the one package that we need to know about, but other dependent ones) and it's also not a good idea for packages that aren't rebuilt often, because if a package is built differently depending on package foo being version 1 or version 2, then if a user later downgrades to foo-1, it will break the other package.
That's mostly to give you some information about modifying ebuilds in the future, but you were definitely on the right lines. Generally we try to use #includes/#ifdefs in patches that we're applying so that they'll work for any version of the code they may be needed for. So thanks again for your contributions, do feel free to have a go at another patch that can be used for both 2.6.32 and 2.6.33 (it may be that this one can already, so worth testing), but don't feel obliged to. We will get round to dealing with this bug when we can...
Also for the Gentoo nvidia devs, I've tested this patch and haven't run into any problems yet.
Thanks, worked for me on ~x86 and ~amd64
*** Bug 306769 has been marked as a duplicate of this bug. ***
If I understand correctly, a variant of this patch is now present in the portage tree. However, I still cant compile. I get a ream of errors starting with:
In file included from /var/tmp/portage/x11-drivers/nvidia-drivers-190.53-r1/work/NVIDIA-Linux-x86_64-190.53-pkg2/usr/src/nv/nv.c:14:
/var/tmp/portage/x11-drivers/nvidia-drivers-190.53-r1/work/NVIDIA-Linux-x86_64-190.53-pkg2/usr/src/nv/nv-linux.h:202: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'nv_spinlock_t'
/var/tmp/portage/x11-drivers/nvidia-drivers-190.53-r1/work/NVIDIA-Linux-x86_64-190.53-pkg2/usr/src/nv/nv-linux.h:1282: error: expected specifier-qualifier-list before 'nv_spinlock_t'
...and so on. This is with the 2.6.33-rt4 kernel. Any ideas, or should I file a new bug?
This patch is not in the portage tree. It needs to have the conditional removed from the ebuild for it to be added. I have not had time to test it to see if it works without the conditional on older kernel versions, but it likely does.
There is a patch in the portage tree new, it has had the conditional moved to a header file. The file is called nvidia-drivers-190.53-2.6.33.patch, and it does accurately determine my kernel version.
However, I found out that the nvidia driver tries to use atomic_spinlock_t if CONFIG_PREEMPT_RT is defined. But the new rt-series does not define that spinlock type anymore. Thus, it has nothing to do with this bug.
Created attachment 233795 [details]
Patch to make ebuild in portage apply a patch to the nvidia drivers when kernel 2.6.34 is in use - current as of June 1, 2010
There is an issue with the ebuild in portage. While it applies the patch for kernel version 2.6.33, it only applies the patch for kernel versions 2.6.33. People with kernel version 2.6.34 run into issues because the patch is not applied.
I have modified the ebuild to account for this (and also to patch future versions) and attached a patch to the ebuild that shows the necessary change.
While I am not sure why no one caught this, the patch in the portage tree has the bug that Yuval Hager pointed out where the position of NULL in a macro is in the wrong position. The most recent patch in this bug report has the issue fixed. It would be nice to have this fixed in portage.
*** Bug 324197 has been marked as a duplicate of this bug. ***
Now that gentoo-sources-2.6.33 are stabilized, I think >=nvidia-drivers-190.53-r1 should be stabilized, too.
This is the wrong place for a stabilization request and this would be a blocker bug for any such request.
I forgot to mention that you should file a new bug report for that.
Fix is to use 195.x.y with 2.6.33 and newer.