Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 336837 - x11-drivers/nvidia-drivers-260.19.04 & x11-libs/cairo-1.10.0 exacerbates pthread locking issue with OpenGL
Summary: x11-drivers/nvidia-drivers-260.19.04 & x11-libs/cairo-1.10.0 exacerbates pthr...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Doug Goldstein
URL:
Whiteboard:
Keywords:
: 336808 346257 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-09-11 15:03 UTC by Andreas Proteus
Modified: 2011-05-15 05:28 UTC (History)
33 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge info (emerge.info,4.38 KB, text/plain)
2010-09-11 15:04 UTC, Andreas Proteus
Details
output of strace gvim (gvim-strace.log,869.95 KB, text/plain)
2010-09-11 15:08 UTC, Andreas Proteus
Details
emerge --info output (emerge.info,4.79 KB, text/plain)
2010-11-27 23:04 UTC, Pavel Volkov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Proteus 2010-09-11 15:03:35 UTC
After upgrading to x11-drivers/nvidia-drivers-260.19.04 gvim fails to start.
The process is running but it does not show up.
It can be only stopped with killall -9.

The fault persists even after rebuilding all its dependencies including gtk+.
All versions of gvim are affected. 
If gvim is compiled with USE="-gtk" it works but the Athena widgets are not acceptable.
So it must be something to do with the combination of  gvim,  gtk+ and nvidia-drivers-260.19.04
I did not noce any other applications affected.

Also note that  when I tried to strace it:  "strace gvim"  gvim then works!

Any way, after reverting to nvidia-drivers-256.53 the problem goes away.

Reproducible: Always

Steps to Reproduce:
1. emerge =nvidia-drivers-260.19.04
2. gvim 
3. nothing happens, kill the process with killall -9 gvim
Comment 1 Andreas Proteus 2010-09-11 15:04:47 UTC
Created attachment 246867 [details]
emerge info

Here is my emerge --info
Comment 2 Andreas Proteus 2010-09-11 15:08:31 UTC
Created attachment 246869 [details]
output of strace gvim

Please not than when gvim is started with strace it starts ok.
Comment 3 Hans Nieser 2010-09-11 16:43:43 UTC
*** Bug 336808 has been marked as a duplicate of this bug. ***
Comment 4 Hans Nieser 2010-09-11 16:46:13 UTC
Please note that (at least on my machine) just downgrading cairo (1.10.0-r3 ->
1.8.10) also makes givm start normally again, seems to be the combination of
latest nvidia-drivers and cairo that triggers this.
Comment 5 Andreas Proteus 2010-09-11 17:20:13 UTC
(In reply to comment #4)
Although I downgraded cairo to 1.8.10 yesterday to see what happens, 
after I read your comment
To make sure, I again  tried the combination of cairo-1.8.10 with nvidia-drivers-260.19.04
and gvim does not work.
So (here at least) cairo is not the culprit.
Comment 6 Hans Nieser 2010-09-11 17:30:37 UTC
(In reply to comment #5)
> (In reply to comment #4)
> Although I downgraded cairo to 1.8.10 yesterday to see what happens, 
> after I read your comment
> To make sure, I again  tried the combination of cairo-1.8.10 with
> nvidia-drivers-260.19.04
> and gvim does not work.
> So (here at least) cairo is not the culprit.
> 

hmm, ah well that's just weird then, I reverted cairo several times and each time it seemed to make gvim work again. I've only got cairo-1.10.0-r3 masked now and things are working ok (I also made sure to reload nvidia module/restart X after upgrading nvidia-drivers again etc). I'll just leave this issue for what it is and let people that understand this stuff figure it out :)
Comment 7 yanglh 2010-09-15 12:38:47 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > Although I downgraded cairo to 1.8.10 yesterday to see what happens, 
> > after I read your comment
> > To make sure, I again  tried the combination of cairo-1.8.10 with
> > nvidia-drivers-260.19.04
> > and gvim does not work.
> > So (here at least) cairo is not the culprit.
> > 
> 
> hmm, ah well that's just weird then, I reverted cairo several times and each
> time it seemed to make gvim work again. I've only got cairo-1.10.0-r3 masked
> now and things are working ok (I also made sure to reload nvidia module/restart
> X after upgrading nvidia-drivers again etc). I'll just leave this issue for
> what it is and let people that understand this stuff figure it out :)
> 

Good,cairo-1.8.10 works in my box!
Comment 8 Chris Coleman 2010-09-23 15:47:07 UTC
I'm also experiencing this, except that the gvim window opens but then it hangs before the GUI is initialized resulting in an unresponsive empty window.

There is no such problem when I run it as `gvim -f` (no fork) or when I run it in strace or gdb.

I used gdb to attach to the hung process. Here's the backtrace:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f96cecfb564 in _L_lock_1016 () from /lib/libpthread.so.0
#2  0x00007f96cecfb3c7 in __pthread_mutex_lock (mutex=0x7f96c7d04120) at pthread_mutex_lock.c:82
#3  0x00007f96c7aab7a9 in ?? () from //usr/lib64/opengl/nvidia/lib/libGL.so.1
#4  0x00007f96c7aae709 in ?? () from //usr/lib64/opengl/nvidia/lib/libGL.so.1
#5  0x00007f96c7aaeb3a in ?? () from //usr/lib64/opengl/nvidia/lib/libGL.so.1
#6  0x00007f96c64c898d in ?? () from //usr/lib64/opengl/nvidia/lib/libnvidia-tls.so.260.19.06
#7  0x00007f96c8e00693 in read_packet (c=0x25f0850) at xcb_in.c:149
#8  0x00007f96c8e01881 in _xcb_in_read (c=0x25f0850) at xcb_in.c:669
#9  0x00007f96c8e013a6 in xcb_poll_for_event (c=0x25f0850) at xcb_in.c:551
#10 0x00007f96ce9dc1dc in poll_for_event (dpy=0x25ef600) at xcb_io.c:210
#11 0x00007f96ce9dc394 in poll_for_response (dpy=0x25ef600) at xcb_io.c:235
#12 0x00007f96ce9dc5cb in _XEventsQueued (dpy=0x25ef600, mode=2) at xcb_io.c:303
#13 0x00007f96ce9bf838 in XPending (dpy=0x25ef600) at Pending.c:55
#14 0x00007f96d162daed in gdk_check_xpending (display=0x25fc000) at gdkevents-x11.c:154
#15 0x00007f96d1632bf2 in gdk_event_prepare (source=0x26090f0, timeout=0x7fff1e070078) at gdkevents-x11.c:2330
#16 0x00007f96d0c28263 in IA__g_main_context_prepare (context=0x2609170, priority=0x7fff1e0700f0) at gmain.c:2280
#17 0x00007f96d0c28ddc in g_main_context_iterate (context=0x2609170, block=0, dispatch=0, self=0x25c4550) at gmain.c:2571
#18 0x00007f96d0c29067 in IA__g_main_context_pending (context=0x2609170) at gmain.c:2619
#19 0x00000000005e55e2 in gui_mch_update () at gui_gtk_x11.c:5330
#20 0x00000000005d1cf5 in gui_start () at gui.c:193
#21 0x00000000004d43a4 in main (argc=1, argv=0x7fff1e070428) at main.c:637

There it waits for a mutex that is never unlocked.

At #7 read_packet() of libxcb calls malloc(). I don't think it should be ending up in libnvidia-tls. I think this might be related to the known issue in nvidia-drivers regarding interaction with pthreads.
Comment 9 Chris Coleman 2010-09-23 16:35:17 UTC
Running gvim with __GL_SINGLE_THREADED=1 works. I'm still trying to figure this out.
Comment 10 Chris Coleman 2010-10-01 21:12:01 UTC
The cause of this bug is in the closed-source nvidia drivers. Something changed in 260.19.04. It looks like an attempt was made by nvidia to fix some well established problems affecting the ability of their drivers to play nicely with pthreads. But their attempt to fix problems created others.

I think this bug might affect all single-threaded and possibly multi-threaded dynamically linked applications that are linked with libpthreads and that depend directly or indirectly on libGL more than once and call functions like malloc and realloc from a forked process.

I would report this to nvidia's bug tracker, but they don't have one.
Comment 12 jon R-B 2010-10-09 15:49:27 UTC
This almost looks like a openbox issue - in part, as in some variable is not set
Comment 13 Chris Coleman 2010-10-10 00:11:15 UTC
(In reply to comment #12)
> This almost looks like a openbox issue - in part, as in some variable is not
> set
> 

Thanks for posting about this to nvnews. I hope that helps.

This bug has nothing to do with openbox. I'm running a simple gnome desktop and the symptoms are the same. I've seen your screenshot of gvim and its just the same on my system.

It is a dynamic linking problem:

1. An executable, gvim, is dynamically linked with libcairo
2. libcairo is dynamically linked with libGL
3. nvidia's libGL dynamically depends on libnvidia-tls
4. So gvim indirectly dynamically depends on libnvidia-tls
5. libnvidia-tls presumably implements Thread Local Storage for libGL?
6. gvim calls heap functions like malloc(), realloc(), etc.
7. ???

Usually, glibc implements those functions. But here calls to those functions end up in libnvidia-tls. I don't know why.

There is a mention of interaction issues between nvidia's drivers and pthreads in /usr/share/doc/nvidia-drivers-*/html/knownissues.html. But that is older than this bug.

I have a poor understanding of the mechanics of dynamic linking and I don't have access to the source code of nvidia's drivers. I probably can't fix this. But perhaps what I've said will be of some help.
Comment 14 jon R-B 2010-10-11 09:09:27 UTC
Ahh yes my mistake - I did test in gnome but it might have been when I did an: alias gvim="gvim -f" 
working in gnome did render the save/open dialog fully 

A bit of progress/workaround
#1 disabling the xorg.conf composite extension stops 3D apps (in my case HoN) from crashing.

Also 
http://www.nvnews.net/vbulletin/showthread.php?t=154919
There is a cairo-1.10.10 patch, it disables render acceleration and thus allows 2D apps that depend on cairo to function where a 3d enabled widget is at fault
Comment 15 Doug Goldstein gentoo-dev 2010-10-11 23:29:48 UTC
This is why these drivers are masked. Don't use them. Report issues directly to NVIDIA, a forum post is not enough.

We won't be patching x11-libs/cairo to remove functionality out of that due to a bug in a masked beta version of x11-drivers/nvidia-drivers.
Comment 16 Brandon Wright 2010-10-13 06:28:53 UTC
(In reply to comment #13)
> It is a dynamic linking problem
I've built a gvim without an libGL dependency, but that doesn't fix the problem. Here's the ldd output:

	linux-vdso.so.1 =>  (0x00007fffc4dff000)
	libgtk-x11-2.0.so.0 => /usr/lib/libgtk-x11-2.0.so.0 (0x00007f486781b000)
	libgdk-x11-2.0.so.0 => /usr/lib/libgdk-x11-2.0.so.0 (0x00007f4867567000)
	libgdk_pixbuf-2.0.so.0 => /usr/lib/libgdk_pixbuf-2.0.so.0 (0x00007f4867347000)
	libpango-1.0.so.0 => /usr/lib/libpango-1.0.so.0 (0x00007f48670f5000)
	libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x00007f4866ea2000)
	libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0x00007f4866b88000)
	libgnomeui-2.so.0 => /usr/lib/libgnomeui-2.so.0 (0x00007f48668e8000)
	libbonoboui-2.so.0 => /usr/lib/libbonoboui-2.so.0 (0x00007f4866674000)
	libgnome-2.so.0 => /usr/lib/libgnome-2.so.0 (0x00007f486645a000)
	libXt.so.6 => /usr/lib/libXt.so.6 (0x00007f48661ed000)
	libncurses.so.5 => /lib/libncurses.so.5 (0x00007f4865f97000)
	libacl.so.1 => /lib/libacl.so.1 (0x00007f4865d8e000)
	libgpm.so.1 => /lib/libgpm.so.1 (0x00007f4865b87000)
	libperl.so.5.12 => /usr/lib/libperl.so.5.12 (0x00007f4865817000)
	libc.so.6 => /lib/libc.so.6 (0x00007f4865493000)
	libpython2.6.so.1.0 => /usr/lib/libpython2.6.so.1.0 (0x00007f48650c8000)
	libm.so.6 => /lib/libm.so.6 (0x00007f4864e45000)
	libpthread.so.0 => /lib/libpthread.so.0 (0x00007f4864c28000)
	libX11.so.6 => /usr/lib/libX11.so.6 (0x00007f48648d9000)
	libdl.so.2 => /lib/libdl.so.2 (0x00007f48646d5000)
	libSM.so.6 => /usr/lib/libSM.so.6 (0x00007f48644cc000)
	libICE.so.6 => /usr/lib/libICE.so.6 (0x00007f48642ae000)
	libpangocairo-1.0.so.0 => /usr/lib/libpangocairo-1.0.so.0 (0x00007f48640a0000)
	libXfixes.so.3 => /usr/lib/libXfixes.so.3 (0x00007f4863e9a000)
	libatk-1.0.so.0 => /usr/lib/libatk-1.0.so.0 (0x00007f4863c78000)
	libcairo.so.2 => /usr/lib/libcairo.so.2 (0x00007f4863925000)
	libgio-2.0.so.0 => /usr/lib/libgio-2.0.so.0 (0x00007f4863606000)
	libpangoft2-1.0.so.0 => /usr/lib/libpangoft2-1.0.so.0 (0x00007f48633d8000)
	libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0x00007f4863197000)
	libgmodule-2.0.so.0 => /usr/lib/libgmodule-2.0.so.0 (0x00007f4862f93000)
	libXi.so.6 => /usr/lib/libXi.so.6 (0x00007f4862d83000)
	libXrandr.so.2 => /usr/lib/libXrandr.so.2 (0x00007f4862b7a000)
	libXcursor.so.1 => /usr/lib/libXcursor.so.1 (0x00007f486296f000)
	libXcomposite.so.1 => /usr/lib/libXcomposite.so.1 (0x00007f486276c000)
	libXext.so.6 => /usr/lib/libXext.so.6 (0x00007f4862556000)
	libXdamage.so.1 => /usr/lib/libXdamage.so.1 (0x00007f4862353000)
	libXrender.so.1 => /usr/lib/libXrender.so.1 (0x00007f4862148000)
	libresolv.so.2 => /lib/libresolv.so.2 (0x00007f4861f2f000)
	libz.so.1 => /lib/libz.so.1 (0x00007f4861d16000)
	libgthread-2.0.so.0 => /usr/lib/libgthread-2.0.so.0 (0x00007f4861b11000)
	librt.so.1 => /lib/librt.so.1 (0x00007f4861908000)
	libgnomecanvas-2.so.0 => /usr/lib/libgnomecanvas-2.so.0 (0x00007f48616cf000)
	libart_lgpl_2.so.2 => /usr/lib/libart_lgpl_2.so.2 (0x00007f48614b2000)
	libgnomevfs-2.so.0 => /usr/lib/libgnomevfs-2.so.0 (0x00007f4861240000)
	libgconf-2.so.4 => /usr/lib/libgconf-2.so.4 (0x00007f4860ffe000)
	libgnome-keyring.so.0 => /usr/lib/libgnome-keyring.so.0 (0x00007f4860ddd000)
	libbonobo-2.so.0 => /usr/lib/libbonobo-2.so.0 (0x00007f4860b63000)
	libbonobo-activation.so.4 => /usr/lib/libbonobo-activation.so.4 (0x00007f4860945000)
	libxml2.so.2 => /usr/lib/libxml2.so.2 (0x00007f48605cb000)
	libORBit-2.so.0 => /usr/lib/libORBit-2.so.0 (0x00007f4860357000)
	libpopt.so.0 => /usr/lib/libpopt.so.0 (0x00007f4860149000)
	libcanberra.so.0 => /usr/lib/libcanberra.so.0 (0x00007f485ff39000)
	libuuid.so.1 => /lib/libuuid.so.1 (0x00007f485fd34000)
	libxcb.so.1 => /usr/lib/libxcb.so.1 (0x00007f485fb16000)
	libXau.so.6 => /usr/lib/libXau.so.6 (0x00007f485f912000)
	libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0x00007f485f70b000)
	libattr.so.1 => /lib/libattr.so.1 (0x00007f485f506000)
	libcrypt.so.1 => /lib/libcrypt.so.1 (0x00007f485f2cd000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4867e54000)
	libutil.so.1 => /lib/libutil.so.1 (0x00007f485f0ca000)
	libpixman-1.so.0 => /usr/lib/libpixman-1.so.0 (0x00007f485ee4c000)
	libpng14.so.14 => /usr/lib/libpng14.so.14 (0x00007f485ec24000)
	libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0x00007f485e979000)
	libexpat.so.1 => /usr/lib/libexpat.so.1 (0x00007f485e74d000)
	libgailutil.so.18 => /usr/lib/libgailutil.so.18 (0x00007f485e545000)
	libdbus-glib-1.so.2 => /usr/lib/libdbus-glib-1.so.2 (0x00007f485e31e000)
	libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0x00007f485e06f000)
	libtasn1.so.3 => /usr/lib/libtasn1.so.3 (0x00007f485de5d000)
	libgcrypt.so.11 => /usr/lib/libgcrypt.so.11 (0x00007f485dbcc000)
	libgpg-error.so.0 => /usr/lib/libgpg-error.so.0 (0x00007f485d9c8000)
	libavahi-glib.so.1 => /usr/lib/libavahi-glib.so.1 (0x00007f485d7c4000)
	libavahi-client.so.3 => /usr/lib/libavahi-client.so.3 (0x00007f485d5b3000)
	libdbus-1.so.3 => /usr/lib/libdbus-1.so.3 (0x00007f485d36d000)
	libavahi-common.so.3 => /usr/lib/libavahi-common.so.3 (0x00007f485d160000)
	libORBitCosNaming-2.so.0 => /usr/lib/libORBitCosNaming-2.so.0 (0x00007f485cf59000)
	libvorbisfile.so.3 => /usr/lib/libvorbisfile.so.3 (0x00007f485cd50000)
	libvorbis.so.0 => /usr/lib/libvorbis.so.0 (0x00007f485cb20000)
	libogg.so.0 => /usr/lib/libogg.so.0 (0x00007f485c919000)
	libltdl.so.7 => /usr/lib/libltdl.so.7 (0x00007f485c70f000)

And the gvim backtrace:

#0  0x00007ff6b7cf1327 in sched_yield () from /lib/libc.so.6
#1  0x00007ff6b5e49593 in ?? () from /usr/lib/libgio-2.0.so.0
#2  0x00007ff6b5e3c3db in ?? () from /usr/lib/libgio-2.0.so.0
#3  0x00007ff6b5e3caa1 in g_bus_get_sync () from /usr/lib/libgio-2.0.so.0
#4  0x00007ff6b37a23ea in ?? () from /usr/lib/libgconf-2.so.4
#5  0x00007ff6b37a6cad in gconf_activate_server ()
   from /usr/lib/libgconf-2.so.4
#6  0x00007ff6b37b07f4 in ?? () from /usr/lib/libgconf-2.so.4
#7  0x00007ff6b37b0dff in ?? () from /usr/lib/libgconf-2.so.4
#8  0x00007ff6b37b1471 in gconf_engine_get_default ()
   from /usr/lib/libgconf-2.so.4
#9  0x00007ff6b37b7f71 in gconf_client_get_default ()
   from /usr/lib/libgconf-2.so.4
#10 0x00007ff6b90d50d8 in ?? () from /usr/lib/libgnomeui-2.so.0
#11 0x00007ff6b8bfac4e in gnome_program_postinit ()
   from /usr/lib/libgnome-2.so.0
#12 0x00007ff6b8bfb097 in ?? () from /usr/lib/libgnome-2.so.0
#13 0x00007ff6b8bfb46d in gnome_program_initv () from /usr/lib/libgnome-2.so.0
#14 0x00007ff6b8bfb56c in gnome_program_init () from /usr/lib/libgnome-2.so.0
#15 0x00000000005a6dd1 in gui_mch_init ()
#16 0x00000000005999af in gui_init ()
#17 0x00000000005841ae in set_termname ()
#18 0x0000000000599f66 in gui_start ()

No pthreads there--it seems now it's stopping on an I/O block.
Comment 17 Brandon Wright 2010-10-13 06:42:50 UTC
Fleshing out the top 4 frames with debugging symbols on _does_ show some posix threads involvement, but it's still just a yield causing the freeze.

#0  0x00007f9384398327 in sched_yield () from /lib/libc.so.6
#1  0x00007f938091d274 in g_thread_yield_posix_impl () at gthread-posix.c:378
#2  0x00007f93824e3633 in _g_dbus_shared_thread_ref (
    func=0x7f93824e5605 <_g_dbus_worker_thread_begin_func>, 
    user_data=0x15788d0) at gdbusprivate.c:355
#3  0x00007f93824e58cc in _g_dbus_worker_new (stream=0x1575400, 
    capabilities=G_DBUS_CAPABILITY_FLAGS_UNIX_FD_PASSING, initially_frozen=0, 
    message_received_callback=0x7f93824cc264 <on_worker_message_received>, 
    message_about_to_be_sent_callback=0x7f93824cc665 <on_worker_message_about_to_be_sent>, disconnected_callback=0x7f93824cc8fd <on_worker_closed>, 
    user_data=0x1570030) at gdbusprivate.c:1439
#4  0x00007f93824cd129 in initable_init (initable=0x1570030, cancellable=0x0, 
    error=0x7fffd0b50ea0) at gdbusconnection.c:2371
#5  0x00007f938246ff65 in g_initable_init (initable=0x1570030, 
    cancellable=0x0, error=0x7fffd0b50ea0) at ginitable.c:105
#6  0x00007f93824d4756 in g_bus_get_sync (bus_type=G_BUS_TYPE_SESSION, 
    cancellable=0x0, error=0x7fffd0b50ea0) at gdbusconnection.c:6247
#7  0x00007f937fe1b3ea in ?? () from /usr/lib/libgconf-2.so.4
Comment 18 Chris Coleman 2010-10-13 22:43:37 UTC
(In reply to comment #17)
> Fleshing out the top 4 frames with debugging symbols on _does_ show some posix
> threads involvement, but it's still just a yield causing the freeze.

Looking at your backtrace, your bug seems to be quite different to mine. I can work around mine by exporting __GL_SINGLE_THREADED=1. That variable changes behaviour within nvidia's shared libraries. But that would be of no help on your system because you're not using nvidia's shared libraries.

This just occurred to me: it might actually be the same bug. I'm not very experienced with gdb. So I don't know exactly what happens when you use gdb to attach to a multi-threaded program and then do a backtrace. Each thread has its own stack. So when I type 'bt', which thread is the resulting backtrace for? Couldn't you or I have been looking at the wrong thread?

Just a thought. Perhaps there is only one thread when gvim hangs. I'll probably look into it later.
Comment 19 Chris Coleman 2010-10-13 23:12:53 UTC
(In reply to comment #18)
> Just a thought. Perhaps there is only one thread when gvim hangs. I'll probably
> look into it later.
> 

I looked into it. There actually is just one thread when gvim hangs. Ignore my previous comment.
Comment 20 Brandon Wright 2010-10-13 23:16:46 UTC
(In reply to comment #18)
> This just occurred to me: it might actually be the same bug. I'm not very
> experienced with gdb. So I don't know exactly what happens when you use gdb to
> attach to a multi-threaded program and then do a backtrace. Each thread has its
> own stack. So when I type 'bt', which thread is the resulting backtrace for?
> Couldn't you or I have been looking at the wrong thread?

For me, there is indeed a thread 2, and it reveals the following:

#0  0x00007ff6af7f1774 in __lll_lock_wait () from /lib/libpthread.so.0
#1  0x00007ff6af7ec8c4 in _L_lock_547 () from /lib/libpthread.so.0
#2  0x00007ff6af7ec727 in pthread_mutex_lock () from /lib/libpthread.so.0
#3  0x00007ff6a643b7a9 in ?? () from //usr/lib64/opengl/nvidia/lib/libGL.so.1
#4  0x00007ff6a643d8ce in ?? () from //usr/lib64/opengl/nvidia/lib/libGL.so.1
#5  0x00007ff6af7eaa01 in start_thread () from /lib/libpthread.so.0
#6  0x00007ff6b013566d in clone () from /lib/libc.so.6

But there's still nothing linking to libGL that would allow it here. I'm wondering if the NVIDIA driver is intercepting kernel threads.
Comment 21 Brandon Wright 2010-10-13 23:20:40 UTC
And as a note: what I've been getting is a result of removing opengl from cairo. I did this by removing the "opengl" use flag from cairo and rebuilding first it and then pango. With opengl built in I get roughly the same backtrace as you do.
Comment 22 Chris Coleman 2010-10-14 00:15:29 UTC
(In reply to comment #20)
> I'm wondering if the NVIDIA driver is intercepting kernel threads.
> 

I read that and at first I thought to myself, "That's ridiculous. How could it possibly do that?".

But then, I think you might be on to something. I completely forgot about the kernel module. It is entirely possible that it is hijacking system calls.
Comment 23 Brandon Wright 2010-10-14 06:47:40 UTC
(In reply to comment #22)
> I read that and at first I thought to myself, "That's ridiculous. How could it
> possibly do that?".
> 
> But then, I think you might be on to something. I completely forgot about the
> kernel module. It is entirely possible that it is hijacking system calls.

No, sorry, you're right--I'm being stupid again. I obviously forgot about it possibly dlopen-ing another library and pulling in symbols as global. So after finally stripping the libGL from all runtime loaded libraries (the theme engine) gvim starts cleanly. 

So I do think you're right about it being a symbol conflict. I imagine the nvidia driver is directly including pthreads functions or wrapping those functions in libGL. Neither of those is a correct way of doing things, nvidia. :-(


Comment 24 jon R-B 2010-10-16 10:08:06 UTC
Well problem exists with the new drivers...
The new 260 has been released as stable (ie no longer beta drivers) by nvidia...
Comment 25 Andreas Proteus 2010-10-17 07:57:17 UTC
The problem has gone away with =x11-drivers/nvidia-drivers-260.19.12.
and 260.19.04 is not in portage any more.
So I close it.
Comment 26 jon R-B 2010-10-17 14:56:24 UTC
This problem hasn't gone away with 260.19.12 and gvim 
Comment 27 Chris Coleman 2010-10-18 23:55:52 UTC
The bug is gone for me too.

Jon, have you tried reloading the nvidia kernel module? Or just rebooting your computer?

If that doesn't help, it's probable that your problem isn't related to this bug.
Comment 28 Chris Coleman 2010-10-19 00:48:29 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > I read that and at first I thought to myself, "That's ridiculous. How could it
> > possibly do that?".
> > 
> > But then, I think you might be on to something. I completely forgot about the
> > kernel module. It is entirely possible that it is hijacking system calls.
> 
> No, sorry, you're right--I'm being stupid again. I obviously forgot about it
> possibly dlopen-ing another library and pulling in symbols as global. So after
> finally stripping the libGL from all runtime loaded libraries (the theme
> engine) gvim starts cleanly. 
> 
> So I do think you're right about it being a symbol conflict. I imagine the
> nvidia driver is directly including pthreads functions or wrapping those
> functions in libGL. Neither of those is a correct way of doing things, nvidia.
> :-(
> 

I'm impressed. How did you do it? I don't even know how to trace calls to dlopen().
Comment 29 Philip L 2010-10-19 01:08:04 UTC
I'm also still suffering this problem with the new drivers. Downgrading cairo doesn't help, but running gvim with __GL_SINGLE_THREADED=1 works.
Comment 30 Chris Coleman 2010-10-21 00:58:06 UTC
(In reply to comment #27)
> The bug is gone for me too.

It's back. Or perhaps I made a mistake when I concluded that it had gone. It hasn't.
Comment 31 Mikkl 2010-10-21 15:18:28 UTC
(In reply to comment #30)
> (In reply to comment #27)
> > The bug is gone for me too.
> 
> It's back. Or perhaps I made a mistake when I concluded that it had gone. It
> hasn't.
> 
The bug hasn't gone for me, too.
Comment 32 jon R-B 2010-10-21 15:53:20 UTC
like i said :)
Comment 33 Tiziano Müller gentoo-dev 2010-10-30 15:32:37 UTC
@Cardoe: would it be possible to get a 256.53 with the patch from http://www.nvnews.net/vbulletin/showthread.php?p=2329622 ?
I guess a lot of people are using the p.masked versions only for the compatibility with kernel 2.6.36.
Comment 34 Tolga Dalman 2010-11-04 12:10:07 UTC
Is there still a problem with this bug for anyone ?
Comment 35 Andrew Frink 2010-11-04 13:43:35 UTC
(In reply to comment #34)
> Is there still a problem with this bug for anyone ?
> 

Still an issue for me. 
Kernel: 2.6.36-gentoo
nvidia-drivers: 260.19.12
gvim: 7.3

I have worked around it by 'alias gvim="__GL_SINGLE_THREADED=1 gvim"' system wide. That is hardly optimal but it does work for the time being. 

It would be nice to provide a better test case than "run gvim". I'm not really sure what is going on here though., so I can't write one.
Comment 36 Philip L 2010-11-05 04:15:48 UTC
(In reply to comment #35)
> It would be nice to provide a better test case than "run gvim". I'm not really
> sure what is going on here though., so I can't write one.

The freeze seems to happen when gvim forks. Gvim won't even start unless it's run with --nofork, and even if it is, it will freeze (for me) when editing certain types of files. When that happens, ps shows that gvim has forked. There's a post on the nvnews board that ties forking to a similar deadlock, where someone has linked to this bug: http://www.nvnews.net/vbulletin/showthread.php?t=156532
Comment 37 Simon Kohlmeyer 2010-11-05 09:25:44 UTC
(In reply to comment #36)
> (In reply to comment #35)
> > It would be nice to provide a better test case than "run gvim". I'm not really
> > sure what is going on here though., so I can't write one.
> 
> The freeze seems to happen when gvim forks. Gvim won't even start unless it's
> run with --nofork, [snip]
I just want to stress again that running `strace gvim` works perfectly fine for me, even when doing long editing sessions with many files/tabs/windows.
Comment 38 Chris Coleman 2010-11-06 23:25:57 UTC
This bug seems to disappear when using a debugger. So I added a printf() to dlopen_doit() in glibc to track calls to dlopen():

$ gvim
dlopen(NULL, RTLD_GLOBAL | RTLD_LAZY);
dlopen(NULL, RTLD_LAZY);
dlopen("/lib/libc.so.6", RTLD_LAZY);
dlopen("/lib/libdl.so.2", RTLD_LAZY);
dlopen("/lib/libpthread.so.0", RTLD_LAZY);
dlopen("libnvidia-tls.so.260.19.12", RTLD_LAZY);
dlopen(NULL, RTLD_LAZY);
dlopen("libc.so.6", RTLD_LAZY);
dlopen("/usr/lib64/gtk-2.0/2.10.0/engines/libmurrine.so", RTLD_LAZY);
dlopen("/usr/lib64/gtk-2.0/2.10.0/engines/libpixmap.so", RTLD_LAZY);
dlopen("/usr/lib64/gtk-2.0/modules/libcanberra-gtk-module.so", RTLD_LAZY);
dlopen("/usr/lib64/gtk-2.0/modules/libgnomebreakpad.so", RTLD_LAZY);
dlopen("/usr/lib64/pango/1.6.0/modules/pango-basic-fc.so", RTLD_NOW);
dlopen("/usr/lib64/gtk-2.0/2.10.0/loaders/libpixbufloader-xpm.so", RTLD_LAZY);
dlopen("libXcursor.so.1", RTLD_LAZY);

When __GL_SINGLE_THREADED is set, the first of those calls to dlopen() disappears.

Like this:

if __GL_SINGLE_THREADED is not set {
    dlopen(NULL, RTLD_GLOBAL | RTLD_LAZY);
}
Comment 39 Daniel Schömer 2010-11-14 16:20:40 UTC
BUG 279125 might be a dup for this.

This is what I could see:

- x11-drivers/nvidia-drivers-260.19.12: gvim gets stuck
- x11-drivers/nvidia-drivers-260.19.12: gvim -f doesn't get stuck
- x11-drivers/nvidia-drivers-260.19.12 and
__GL_SINGLE_THREADED=1: gvim doesn't get stuck
- x11-drivers/nvidia-drivers-256.53: gvim doesn't get stuck

I needed to add the patch mentioned in comment #33 to
x11-drivers/nvidia-drivers-256.53 to make it compile with
sys-kernel/gentoo-sources-2.6.36-r1.
Comment 40 Tolga Dalman 2010-11-14 18:35:52 UTC
Can you try with the new nvidia drivers 260.19.21 ? To my understanding of the changelog, multi-threaded OpenGL applications should be fixed in this release.
Comment 41 Chris Coleman 2010-11-14 20:19:04 UTC
(In reply to comment #40)
> Can you try with the new nvidia drivers 260.19.21 ? To my understanding of the
> changelog, multi-threaded OpenGL applications should be fixed in this release.
> 

The changelog made me hopeful. But I can confirm that the bug is still present in version 260.19.21.

Comment 42 jon R-B 2010-11-15 09:33:19 UTC
I can confirm as well. It has improved in some places (HoN doesn't crash as much BUT it does crash, unlike with 256.*) and gvim still acting up
Comment 43 Daniel Schömer 2010-11-19 10:50:47 UTC
The gvim problem exists for me with x11-drivers/nvidia-drivers-260.19.21. It behaves the same as with 260.19.12. 256.53 is OK for me.

- x11-drivers/nvidia-drivers-256.53: gvim doesn't get stuck

- x11-drivers/nvidia-drivers-260.19.12: gvim gets stuck
- x11-drivers/nvidia-drivers-260.19.12: gvim -f doesn't get stuck
- x11-drivers/nvidia-drivers-260.19.12 and
__GL_SINGLE_THREADED=1: gvim doesn't get stuck

- x11-drivers/nvidia-drivers-260.19.21: gvim gets stuck
- x11-drivers/nvidia-drivers-260.19.21: gvim -f doesn't get stuck
- x11-drivers/nvidia-drivers-260.19.21 and
__GL_SINGLE_THREADED=1: gvim doesn't get stuck
Comment 44 Chris Coleman 2010-11-20 22:14:54 UTC
I've found that simply setting LD_PRELOAD=/path/to/libGL.so is enough to 
break a gvim executable that has no existing dependency on libGL. That's all it takes. Just the presence of libGL.so among the loaded shared libraries.

There's a single constructor in libGL.so. I got rid of that. But that didn't help. There's also a non-standard DT_INIT function. I got rid of that. And that solved the problem completely.

This bug is somewhere in that function.
Comment 45 Chris Coleman 2010-11-21 00:28:47 UTC
> This bug is somewhere in that function.

Unfortunately, it's not a simple function. It calls lots of others. Some of which are in nvidia's other shared libraries which use deliberately unhelpful names for private symbols.

Nvidia can't fix this bug for us because they need more information. And I can't fix this bug because I need more information.
Comment 46 Martin Wegner 2010-11-24 16:30:27 UTC
Is there any reason why the affected version(s) of nvidia-drivers are unmasked?

Without unmasking anything myself I now have this version installed which breaks gvim:

[ebuild   R   ] x11-drivers/nvidia-drivers-260.19.21  USE="acpi gtk (multilib) -custom-cflags" 46,862 kB
Comment 47 Alexandre Rostovtsev (RETIRED) gentoo-dev 2010-11-24 17:25:36 UTC
(In reply to comment #46)
> Is there any reason why the affected version(s) of nvidia-drivers are unmasked?

One obvious reason is that xorg-serve-1.9 suffers from a *horrific* performance regression on earlier versions of nvidia-drivers. Given the choice between getting 2-second lag when resizing or scrolling almost any application, and needing to write a small wrapper script around gvim, I would obviously choose the latter.

In addition, nvidia-drivers-260.* add support for a number of new Nvidia graphics cards (especially for laptops) that otherwise would be forced to use the vesa driver.
Comment 48 Tolga Dalman 2010-11-24 18:58:30 UTC
(In reply to comment #47)
> (In reply to comment #46)
> > Is there any reason why the affected version(s) of nvidia-drivers are unmasked?
> 
> One obvious reason is that xorg-serve-1.9 suffers from a *horrific* performance
> regression on earlier versions of nvidia-drivers. Given the choice between
> getting 2-second lag when resizing or scrolling almost any application, and
> needing to write a small wrapper script around gvim, I would obviously choose
> the latter.

Well, there *is* no wrapper script around gvim. Instead, gvim is now plainly broken (and probably other packages as well).

And BTW: I am not seeing any performance regression with 256. nvidia-drivers. What is the bug report nunmber ?

> In addition, nvidia-drivers-260.* add support for a number of new Nvidia
> graphics cards (especially for laptops) that otherwise would be forced to use
> the vesa driver.

So ? And that justifies the unmasking of broken software ?
Comment 49 Alexandre Rostovtsev (RETIRED) gentoo-dev 2010-11-24 20:47:36 UTC
(In reply to comment #48)
> And BTW: I am not seeing any performance regression with 256. nvidia-drivers.
> What is the bug report nunmber ?

Consider yourself lucky. I don't know if there is a bug for it in Gentoo bugzilla, but the upstream bug report is here: http://www.nvnews.net/vbulletin/showthread.php?t=154563

On my machine, 256.* is entirely unusable.
Comment 50 jon R-B 2010-11-24 21:27:37 UTC
(In reply to comment #48)
> (In reply to comment #47)
> > (In reply to comment #46)
> > > Is there any reason why the affected version(s) of nvidia-drivers are unmasked?
> > 
> > One obvious reason is that xorg-serve-1.9 suffers from a *horrific* performance
> > regression on earlier versions of nvidia-drivers. Given the choice between
> > getting 2-second lag when resizing or scrolling almost any application, and
> > needing to write a small wrapper script around gvim, I would obviously choose
> > the latter.
> 
> Well, there *is* no wrapper script around gvim. Instead, gvim is now plainly
> broken (and probably other packages as well).
> 
> And BTW: I am not seeing any performance regression with 256. nvidia-drivers.
> What is the bug report nunmber ?
> 
> > In addition, nvidia-drivers-260.* add support for a number of new Nvidia
> > graphics cards (especially for laptops) that otherwise would be forced to use
> > the vesa driver.
> 
> So ? And that justifies the unmasking of broken software ?
> 


Just because you do not see any performance regression does not mean that there isn't any "I live in the sahara and never see snow, therefore there is no such thing as snow"
I experience performance regression and 260.* is a godsend from that point of view HOWEVER I am then also hit all these issues.

Also 256.* doesn't work on kernel 2.6.36 (without a patch ,do not know if gentoo have added this).

While I agree it should not have been unmasked, I don't agree with how you projected it
Comment 51 Chris Coleman 2010-11-24 23:04:57 UTC
The __GL_SINGLE_THREADED=1 workaround is still good, but another good workaround is to remove gvim's dependency on libGL.so:

echo 'x11-libs/cairo -opengl' >> /etc/portage/package.use
emerge -1 x11-libs/cairo
emerge -1 x11-libs/pango media-libs/libcanberra

That last step is necessary because pango and libcanberra make use of `pkg-config --libs cairo` at build time. So, unless they are re-emerged, they will pull in libGL.so themselves if cairo doesn't.
Comment 52 Doug Goldstein gentoo-dev 2010-11-25 04:54:31 UTC
*** Bug 346257 has been marked as a duplicate of this bug. ***
Comment 53 Pavel Volkov 2010-11-27 23:04:07 UTC
I had this bug before, but no problems are present now.
nvidia-drivers 260.19.21, cairo 1.10.0-r3, 7.3.50, gcc 4.5

 * Found these USE flags for x11-libs/cairo-1.10.0-r3:
 U I
 + + X           : Adds support for X11
 - - debug       : Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see
                   http://www.gentoo.org/proj/en/qa/backtraces.xml
 + + directfb    : Adds support for DirectFB layer (library for FB devices)
 - - doc         : Adds extra documentation (API, Javadoc, etc)
 + + opengl      : (Restricted to >=x11-libs/cairo-1.10.0)
                    Use Mesa backend for acceleration 
 - - qt4         : Adds support for the Qt GUI/Application Toolkit version 4.x
 - - static-libs : Build static libraries
 + + svg         : Adds support for SVG (Scalable Vector Graphics)
 + + xcb         : Support the X C-language Binding, a replacement for Xlib

Attaching emerge --info
Comment 54 Pavel Volkov 2010-11-27 23:04:46 UTC
Created attachment 255679 [details]
emerge --info output
Comment 55 jon R-B 2010-11-30 13:14:13 UTC
260.19.26 has come out
I can't fix at the moment. No changelog out yet either 
Comment 56 jon R-B 2010-12-01 01:34:29 UTC
ok 260.19.26   seems to have fixed this!!!! 
These are beta drivers, not "officially" released yet (nv dev says just writing the release notes)

http://www.nvnews.net/vbulletin/showthread.php?t=157563
Comment 57 Stephan Friedrichs 2010-12-13 11:15:41 UTC
(In reply to comment #56)
> ok 260.19.26   seems to have fixed this!!!!

I can confirm that. Just installed nvidia-drivers-260.19.26 and the problem is gone :)
Comment 58 Tolga Dalman 2010-12-13 11:31:43 UTC
Though .26 was never released officially. However, you can find the newest .29
release on the NVIDIA homepage.
Comment 59 Chris Coleman 2011-02-25 01:28:33 UTC
I had performance issues with 260.19.26 and 260.19.29, but gvim worked. I've been using 260.19.36 for 30 days now with no problems.
Comment 60 Chris Coleman 2011-05-15 00:49:24 UTC
I think this can be closed now. There's no affected version in portage.
Comment 61 Doug Goldstein gentoo-dev 2011-05-15 03:37:14 UTC
Glad its finally resolved for you guys.