350336 – >=x11-drivers/nvidia-drivers-260.19.04 causes many 32-bit OpenGL applications to segfault on amd64, exec permission in tmp dir

Bug 350336 - >=x11-drivers/nvidia-drivers-260.19.04 causes many 32-bit OpenGL applications to segfault on amd64, exec permission in tmp dir

Summary: >=x11-drivers/nvidia-drivers-260.19.04 causes many 32-bit OpenGL applications...

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	AMD64 Linux

Importance:	High major
Assignee:	Doug Goldstein (RETIRED)

URL:
Whiteboard:
Keywords:

Duplicates (1):	370405 (view as bug list)
Depends on:
Blocks:

Reported:	2011-01-02 07:54 UTC by Itzamna
Modified:	2012-06-19 04:01 UTC (History)
CC List:	16 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Itzamna 2011-01-02 07:54:33 UTC

Port x11-drivers/nvidia-drivers-260-19.21 and newer causes most 32-bit OpenGL applications on amd64 to segfault. The latest x11-drivers/nvidia-drivers with functioning 32-bit OpenGL emulation is 256.53, I request masking >=x11-drivers/nvidia-drivers-260.19.21 on amd64.

I could produce this bug on 3 kernels: 2.6.34-r11, 2.6.35-r15 and 2.6.36-r5.

Reproducible: Sometimes

Steps to Reproduce:
1. Start any x86 32-bit OpenGL application (I tested with Enemy Territory Quake Wars).
2. Let the application do some rendering (Enemy Territory crashes when spawning in-game).


Actual Results:  
A segmentation fault upon a call to a function in /usr/lib32/libnvidia-glcore.so.260.19.21. The binary driver is of course compiled without debug information, so a backtrace isn't helpful.

Expected Results:  
Why, render 3D scenes without crashing, of course!

Run the following command and re-emerge nvidia-drivers and nvidia-settings:

echo ">x11-drivers/nvidia-drivers-256.53" > /etc/portage/package.mask

Comment 1 Agostino Sarubbo gentoo-dev

2011-01-02 13:37:22 UTC

you must not put people in CC, it is a task of wranglers

Comment 2 Alex Domingo 2011-01-03 13:23:26 UTC

I confirm this bug with kernel 2.6.32-r24 and 2.6.26-r6

There has been already some talk about this problem in the forums at http://forums.gentoo.org/viewtopic-t-857703.html . It affects any 32-bit OpenGL application and most probably wine 32-bit on amd64.


One example is the game "The Polynomial" (demo available from http://dmytry.com/games/try_polynomial.html). It comes with both 64-bit and 32-bit binaries. The 64-bit runs fine but the 32-bit just segfaults even though all the required 32-bit libraries are present (provided by emul-linux* packages):

$ ldd Polynomial32
	linux-gate.so.1 =>  (0xf770e000)
	libpng12.so.0 => /usr/lib32/libpng12.so.0 (0xf76c0000)
	libfreetype.so.6 => /usr/lib32/libfreetype.so.6 (0xf763a000)
	libgtk-x11-2.0.so.0 => /usr/lib32/libgtk-x11-2.0.so.0 (0xf727c000)
	libgdk-x11-2.0.so.0 => /usr/lib32/libgdk-x11-2.0.so.0 (0xf71e9000)
	libatk-1.0.so.0 => /usr/lib32/libatk-1.0.so.0 (0xf71cc000)
	libgio-2.0.so.0 => /usr/lib32/libgio-2.0.so.0 (0xf7134000)
	libpangoft2-1.0.so.0 => /usr/lib32/libpangoft2-1.0.so.0 (0xf710e000)
	libgdk_pixbuf-2.0.so.0 => /usr/lib32/libgdk_pixbuf-2.0.so.0 (0xf70f5000)
	libpangocairo-1.0.so.0 => /usr/lib32/libpangocairo-1.0.so.0 (0xf70e9000)
	libcairo.so.2 => /usr/lib32/libcairo.so.2 (0xf7079000)
	libpango-1.0.so.0 => /usr/lib32/libpango-1.0.so.0 (0xf7037000)
	libfontconfig.so.1 => /usr/lib32/libfontconfig.so.1 (0xf7008000)
	libgobject-2.0.so.0 => /usr/lib32/libgobject-2.0.so.0 (0xf6fcc000)
	libgmodule-2.0.so.0 => /usr/lib32/libgmodule-2.0.so.0 (0xf6fc8000)
	libglib-2.0.so.0 => /usr/lib32/libglib-2.0.so.0 (0xf6ee3000)
	libGL.so.1 => //usr/lib32/opengl/nvidia/lib/libGL.so.1 (0xf6e1a000)
	libGLU.so.1 => /usr/lib32/libGLU.so.1 (0xf6dac000)
	libX11.so.6 => /usr/lib32/libX11.so.6 (0xf6c90000)
	libXpm.so.4 => /usr/lib32/libXpm.so.4 (0xf6c7f000)
	libXrandr.so.2 => /usr/lib32/libXrandr.so.2 (0xf6c76000)
	libopenal.so.1 => /usr/lib32/libopenal.so.1 (0xf6c30000)
	libvorbis.so.0 => /usr/lib32/libvorbis.so.0 (0xf6c08000)
	libvorbisfile.so.3 => /usr/lib32/libvorbisfile.so.3 (0xf6bff000)
	libogg.so.0 => /usr/lib32/libogg.so.0 (0xf6bf8000)
	libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/32/libstdc++.so.6 (0xf6aff000)
	libm.so.6 => /lib32/libm.so.6 (0xf6ad9000)
	libgcc_s.so.1 => /lib32/libgcc_s.so.1 (0xf6abc000)
	libpthread.so.0 => /lib32/libpthread.so.0 (0xf6aa3000)
	libc.so.6 => /lib32/libc.so.6 (0xf695e000)
	libdl.so.2 => /lib32/libdl.so.2 (0xf695a000)
	libz.so.1 => /lib32/libz.so.1 (0xf6946000)
	libXinerama.so.1 => /usr/lib32/libXinerama.so.1 (0xf6942000)
	libXi.so.6 => /usr/lib32/libXi.so.6 (0xf6934000)
	libXcursor.so.1 => /usr/lib32/libXcursor.so.1 (0xf692a000)
	libXcomposite.so.1 => /usr/lib32/libXcomposite.so.1 (0xf6926000)
	libXext.so.6 => /usr/lib32/libXext.so.6 (0xf6916000)
	libXdamage.so.1 => /usr/lib32/libXdamage.so.1 (0xf6912000)
	libXfixes.so.3 => /usr/lib32/libXfixes.so.3 (0xf690c000)
	libpixman-1.so.0 => /usr/lib32/libpixman-1.so.0 (0xf68ac000)
	libpng14.so.14 => /usr/lib32/libpng14.so.14 (0xf6887000)
	libXrender.so.1 => /usr/lib32/libXrender.so.1 (0xf687c000)
	libxcb.so.1 => /usr/lib32/libxcb.so.1 (0xf6862000)
	libXau.so.6 => /usr/lib32/libXau.so.6 (0xf685e000)
	libXdmcp.so.6 => /usr/lib32/libXdmcp.so.6 (0xf6858000)
	libresolv.so.2 => /lib32/libresolv.so.2 (0xf6843000)
	libexpat.so.1 => /usr/lib32/libexpat.so.1 (0xf681b000)
	libgthread-2.0.so.0 => /usr/lib32/libgthread-2.0.so.0 (0xf6815000)
	librt.so.1 => /lib32/librt.so.1 (0xf680c000)
	libnvidia-tls.so.260.19.29 => //usr/lib32/opengl/nvidia/lib/libnvidia-tls.so.260.19.29 (0xf680a000)
	libnvidia-glcore.so.260.19.29 => /usr/lib32/libnvidia-glcore.so.260.19.29 (0xf5158000)
	/lib/ld-linux.so.2 (0xf770f000)


Moreover, switching the OpenGL implementation to xorg-x11 allows to launch some of this 32-bit games that crash with the nvidia-drivers-260.19.29, like Machinarium for example.


One problem is that all this applications crash with very little output about the segfault if any. In some cases the only output is a line in /var/log/messages like this:

kernel: bridgebuilding[7851]: segfault at fffffff8 ip 00000000f6bd3225 sp 00000000fffa36a0 error 4 in libnvidia-glcore.so.260.19.29[f5d96000+1645000]


Finally my emerge --info:
http://pastebin.com/bEfaGmi8

Comment 3 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev

2011-01-22 19:15:40 UTC

I can confirm this problem. AFAIK all 260.xx.xx versions are affected.

Comment 4 stshine 2011-01-24 01:38:05 UTC

(In reply to comment #0)
> Port x11-drivers/nvidia-drivers-260-19.21 and newer causes most 32-bit OpenGL
> applications on amd64 to segfault. The latest x11-drivers/nvidia-drivers with
> functioning 32-bit OpenGL emulation is 256.53, I request masking
> >=x11-drivers/nvidia-drivers-260.19.21 on amd64.
> 
> I could produce this bug on 3 kernels: 2.6.34-r11, 2.6.35-r15 and 2.6.36-r5.
> 
> Reproducible: Sometimes
> 
> Steps to Reproduce:
> 1. Start any x86 32-bit OpenGL application (I tested with Enemy Territory Quake
> Wars).
> 2. Let the application do some rendering (Enemy Territory crashes when spawning
> in-game).
> 
> 
> Actual Results:  
> A segmentation fault upon a call to a function in
> /usr/lib32/libnvidia-glcore.so.260.19.21. The binary driver is of course
> compiled without debug information, so a backtrace isn't helpful.
> 
> Expected Results:  
> Why, render 3D scenes without crashing, of course!
> 
> Run the following command and re-emerge nvidia-drivers and nvidia-settings:
> 
> echo ">x11-drivers/nvidia-drivers-256.53" > /etc/portage/package.mask
> 

I believe I've found the problem.

stshine@shine ~ $ /lib/ld-linux.so.2 /usr/lib/libGL.so
Segmentation fault

sys-libs/glibc-2.12.2

Comment 5 Alex Domingo 2011-01-26 11:33:30 UTC

nvidia-drivers-260.19.36 does not fix this bug either.

Comment 6 stshine 2011-02-02 15:23:28 UTC

At last I got where the problem lies using strace:

The /tmp was mounted with the option noexec.

remove the flag in /etc/fstab fix the problem.

Tears :-)

Comment 7 Jeroen Roovers (RETIRED) gentoo-dev

2011-02-02 17:12:01 UTC

(In reply to comment #6)
> At last I got where the problem lies using strace:
> 
> The /tmp was mounted with the option noexec.
> 
> remove the flag in /etc/fstab fix the problem.

That would appear to be unrelated to the original report.

Comment 8 stshine 2011-02-03 02:14:05 UTC

(In reply to comment #7)
> (In reply to comment #6)
> > At last I got where the problem lies using strace:
> > 
> > The /tmp was mounted with the option noexec.
> > 
> > remove the flag in /etc/fstab fix the problem.
> 
> That would appear to be unrelated to the original report.
> 
Why?  The glcore mmap a tmp file with flag PROT_EXEC. The noexec option make it return an EPERM and execute it causes a segfault.
I don't think there would be any other reason for this odd problem.Any way,my error was fixed.

Comment 9 Itzamna 2011-02-03 04:53:04 UTC

Stshine,

You are probably reporting an unrelated Wine bug, see http://bugs.winehq.org/show_bug.cgi?id=25583.

As far as I know, the Nvidia blob does not use temporary files in /tmp.

Comment 10 stshine 2011-02-03 07:49:22 UTC

(In reply to comment #9)
> Stshine,
> 
> You are probably reporting an unrelated Wine bug, see
> http://bugs.winehq.org/show_bug.cgi?id=25583.
> 
> As far as I know, the Nvidia blob does not use temporary files in /tmp.
> 
That's why the error occurs only after 260... Here is the strace of glxinfo:

...
getpid()                                = 4955
open("/tmp/glDTKLjG", O_RDWR|O_CREAT|O_EXCL, 0600) = 9
unlink("/tmp/glDTKLjG")                 = 0
ftruncate(9, 8192)                      = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, 9, 0) = 0xb77df000
mmap2(NULL, 8192, PROT_READ|PROT_EXEC, MAP_SHARED, 9, 0) = 0xb57fa000
close(9)                                = 0
...

Nothing to do with wine. And the filename is always glXXXXX.
Anyone else confirm this?

P.S. My wine was indeed fixed, I didn't notice before...

Comment 11 Alex Domingo 2011-02-03 11:03:34 UTC

In my case, the fix proposed by stshine solves this bug (thank you! :^) ). After removing the noexec flag from /tmp all 32-bit linux opengl apps that were segfaulting now work properly. They are linux games basically, nothing requiring wine. Of course this bug was also causing 32-bit wine to fail with directx windows apps and this fix also solves that.

Since removing the noexec flag is not the best thing to do from a security point of view maybe a bug should be opened upstream to nvidia in order to remove the execution of files in /tmp. Just wondering.

Comment 12 Alex Domingo 2011-02-03 12:44:51 UTC

I've been tinkering around a little bit with strace and I've found out that 64-bit opengl apps try to open a file called glxxxx from /tmp too. The weird thing is that the 64-bit apps work regardless of not being able to open that file in /tmp while 32-bit apps do not.

Both 32-bit and 64-bit apps produce a lot of calls trying to open a file in /tmp (which returns a EPERM). Removing the noexec flag from /tmp reduces those calls to just one in both cases and for 32-bit apps they don't segfault anymore. Curious behavior :D

Comment 13 Gerard Neil 2011-02-12 15:24:23 UTC

This occurs on my 32-bit x86 system too. For *any* openGL application, including glxgears. Mounting /tmp with exec instead of noexec fixes the problem, as described.

Comment 14 Anton Filimonov 2011-03-07 20:34:48 UTC

I think the same issue exists with nvidia-drivers-270.30 when trying to run DirectX games with wine 1.3.11, which require 32-bit OpenGL libraries.
I get this trace for every game:
wine: Unhandled page fault on read access to 0xfffffff8 at address 0x7b378e65 (thread 0009), starting debugger...
Unhandled exception: page fault on read access to 0xfffffff8 in 32-bit code (0x7b378e65).
...
Backtrace:
=>0 0x7b378e65 in libnvidia-glcore.so.270.30 (+0xe98e65) (0x7d263870)
  1 0x7d283900 (0x7d283900)

Removing noexec from /tmp fixes issues with wine. But I think this is kind of workaround, rather than a solution.

Comment 15 Karol Grudziński 2011-06-10 17:35:44 UTC

Open /usr/lib/opengl/nvidia/lib/libGL.so.YOUR.VERSION in hexeditor, find '/tmp/glXXXXXX', replace with something with "exec" permission. I have '/.nv/glXXXXXX'. Of course /.nv must exists. Maybe this "solution" may cause some security risk, but I have no idea how to do it better.

Comment 16 Andreas Arens 2011-06-15 18:19:20 UTC

1000 thanks Alex for stracing this.

I had every 32 bit app (including simplest things as glxinfo) segfault with any supported driver on 2.6.39, while the equivalent 64 bit version worked just fine. /tmp was mounted noexec. Removing the flag in fstab fixed it - sigh!
My wine is back, so bug confirmed.
Since we cannot fix Nvidia's binary blob, it might be worth a big fat ewarn when installing these drivers on amd64?

Comment 17 Chris Bandy 2011-06-16 16:02:41 UTC

Is this reported upstream?

Comment 18 Jay M 2011-07-10 14:31:49 UTC

I have entered this as an issue with NVIDIA (NVIDIA reference # 110628-000170). I do not use Gentoo (Arch Linux), but I stumbled upon this bug while searching for a resolution to my problem with the newer NVIDIA drivers.

Comment 19 Jay M 2011-07-15 21:57:19 UTC

This will be worked on upstream, per response from NVIDIA tech.

Comment 20 Andreas K. Hüttel archtester

2011-07-19 12:22:24 UTC

*** Bug 370405 has been marked as a duplicate of this bug. ***

Comment 21 Dennis Schridde 2011-07-21 07:26:37 UTC

The issue exists for x64_64 binaries on a x86_64 system, too, when using >=x11-drivers/nvidia-drivers-275.

Comment 22 Doug Goldstein (RETIRED) gentoo-dev

2011-09-14 17:31:40 UTC

I've followed up with upstream about this issue and I'm told its still being worked on.

Comment 23 Doug Goldstein (RETIRED) gentoo-dev

2011-09-14 17:33:45 UTC

More of a follow up. They will fall back to a slow method of operation if you have /tmp mounted noexec and as a result we'll add a warning to the ebuild.

Comment 24 Doug Goldstein (RETIRED) gentoo-dev

2011-10-27 20:29:43 UTC

You will always need /tmp to have exec permissions when using 32-bit OpenGL apps on amd64 for good performance. However starting with 290.03, NVIDIA has implemented a work around that will result in slower rendering when /tmp isn't mounted with exec.

Comment 25 Gerard Neil 2011-10-28 05:08:13 UTC

Could someone quantify the security risk? My belief is that it's unwise to have a world-writeable filesystem mounted exec because then anyone managing to get an unprivileged shell can build/upload and then run their own programs which can be further used for attack.

Is it the xorg server or the client process which requires /tmp mounted exec?
Is there some way to mitigate the security risk? I'm thinking something along the lines of pam_mktemp, except using bind mounts so that only some logins are given an exec mount.

Comment 26 Jeffrey Carpenter 2011-10-28 23:43:04 UTC

Thank you so very much for figuring this out; I had found this particular bug report just before I started downgrading my kernel from 3.0.4 to 2.6.x series in order to use the nvidia-270 releases -- how very badly I would have been disappointed!

I can confirm that this bug effects nvidia release 285.05.09, as well as 280.13, 275.28 and 270.41.19.

Comment 27 Doug Goldstein (RETIRED) gentoo-dev

2012-06-19 04:01:45 UTC

All the versions in tree now have the appropriate work around and are working correctly.