Hi! After upgrade my system up to gnome-3.8 (and changing openrc to systemd) I have the following problem. If I run evolution from gnome-shell than the 1st bogofilter child process becomes a zombie. Evolution waits them infinitely and stops to check the mail for SPAM. Evolution in this case can be stopped only via kill -9. If I run evolution from terminal then the bug disappear. The problem is also workarounded if I create /usr/local/bin/evolution, which call /usr/bin/evolution and resend its STDERR to /dev/null. I have gdm compiled with +systemd USE flag, so that generally STDERR is sent to journald if evolution is called from gnome-shell. So perhaps this problem is due to some magic interaction of bogofilter with journald. Many thanks for your work!
I cannot reproduce this, and I use evolution+bogofilter+systemd always I have: $ emerge -Opv bogofilter evolution systemd These are the packages that would be merged, in order: [ebuild R ] mail-filter/bogofilter-1.2.3 USE="berkdb -sqlite -tokyocabinet" 0 kB [ebuild R ~] mail-client/evolution-3.8.5:2.0 USE="bogofilter crypt gnome-online-accounts gstreamer ldap ssl weather -highlight -kerberos -map -spamassassin" 0 kB [ebuild R ~] sys-apps/systemd-208-r2:0/1 USE="acl filecaps firmware-loader gudev http introspection kmod pam policykit tcpd {test} xattr -audit -cryptsetup -doc -gcrypt -lzma -openrc -python -qrcode (-selinux) -vanilla" ABI_X86="(64) -32 (-x32)" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7" 0 kB
Very similar (I have tried different versions of bogofilter and all of them demonstrate this behavior) [ebuild U ] mail-filter/bogofilter-1.2.4::x-portage [1.2.2::gentoo] USE="berkdb sqlite -tokyocabinet" 0 kB [ebuild R ] mail-client/evolution-3.8.5:2.0 USE="bogofilter crypt gnome-online-accounts gstreamer ssl weather -highlight -kerberos -ldap -map -spamassassin" 0 kB [ebuild U ] sys-apps/systemd-208-r2:0/1 [208-r1:0/1] USE="acl audit doc filecaps firmware-loader gudev introspection kmod lzma openrc pam policykit python tcpd xattr -cryptsetup -gcrypt -http -qrcode (-selinux) {-test} -vanilla" ABI_X86="(64) -32 (-x32)" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7" 8 kB
Your bogofilter looks to come from an external overlay (and also different version)
(In reply to Pacho Ramos from comment #3) > Your bogofilter looks to come from an external overlay (and also different > version) I have tried 1.2.2 and 1.2.3 from portage and 1.2.4 (which still is not in portage) and all of them demonstrate this behavior.
This is not only evolution problem (I have discovered many zombie processes in my system) and seems to be caused by upgrade of nvidia-drivers up to 319.60. Downgrade to 319.49 seems to fix the bug (though, I must investigate this a little bit more).
Any news on this?
Fixed by downgrade of nvidia-drivers to 319.49. It seems to be mysterious.
This reminds me to an old bug on nvidia-drivers that causes similar problems (lots of zombie process), but it was solved by nvidia on a newer version
*** Bug 487700 has been marked as a duplicate of this bug. ***
(In reply to Pacho Ramos from comment #8) > This reminds me to an old bug on nvidia-drivers that causes similar problems > (lots of zombie process), but it was solved by nvidia on a newer version This issue continues with x11-drivers/nvidia-drivers-331.17 in case it hasn't been mentioned I rolled them back to 325.15 and everything builds fine again. Is this something that needs to be mentioned upstream?
(In reply to Ron from comment #10) > Is this something that needs to be mentioned upstream? Yes, of course. It's closed source software so we cannot investigate very deeply ourselves. Run `nvidia-bug-report.sh' and send the output to Nvidia as instructed.
FYI, what seems to be happening, if it's the same as on my system, is that the signal mask being propagated by the driver is simply out of whack. Now SIGCHLD signals are being masked, so zombies never get reaped by processes that expect to reap children manually (as opposed to ignored). $ ps -eda -o pid,ppid,blocked | grep -v 00000 PID PPID BLOCKED 1376 1 fffffffe7ffbfeff 2721 2715 fffffffe3ffba207 2722 2715 fffffffe3ffba207 3097 1 fffffffe7ffbfeff 3099 3097 fffffffe7ffbfeff 3991 3827 00007ffe7330cc90 3994 3991 00007ffe7330c810 3996 1 00007f4c53e28688 4007 3994 00007ffe7330c810 4014 1 00007f4c53e28688 4077 1 00007f3a0b825688 4154 3827 00007f2666110418 4222 3991 00007ffe7330c810 4223 3991 00007ffe7330c810 4463 1 00007f4c53e28620 Those first ones, with leading f's, are okay - those are daemons that are purposefully setting their signal mask to make them harder to kill (e.g., DB2). The ones starting with 00007 are going to be ones with problems. Normal processes will have all zeros, or close thereto. I'll be downgrading my nvidia shortly to see if that resolves the issue here.
After downgrade, the bad signal masks shown earlier have gone away. This is apparently an issue with the nvidia driver. (I'm back to 325.15 - 331.17 is definitely bad.)
Oh, I should also add: I'm running KDE 4.11.2. And akonadi has problems due to this as well. But what has problems should be of passing interest only. The real cause is the signal mask that nvidia gives to the parent process and gets passed on to everything else. Arguably, everything else could reset their own signal masks. But they shouldn't have to.
Could you please post the output of `emerge --info`?
(In reply to Tom Wijsman (TomWij) from comment #15) > Could you please post the output of `emerge --info`? Who are you asking this of? Everyone?
(In reply to Tanktalus from comment #16) > (In reply to Tom Wijsman (TomWij) from comment #15) > > Could you please post the output of `emerge --info`? > > Who are you asking this of? Everyone? Yes, why not. Also, send the output of nvidia-bug-report.sh to Nvidia, so they can fix their proprietary software and we can then write an ebuild for the fixed proprietary software version. There isn't really anything else we can do except urge you to send reports upstream.
(In reply to Jeroen Roovers from comment #17) > (In reply to Tanktalus from comment #16) > > (In reply to Tom Wijsman (TomWij) from comment #15) > > > Could you please post the output of `emerge --info`? > > > > Who are you asking this of? Everyone? > > Yes, why not. > > Also, send the output of nvidia-bug-report.sh to Nvidia, so they can fix > their proprietary software and we can then write an ebuild for the fixed > proprietary software version. There isn't really anything else we can do > except urge you to send reports upstream. nvidia: my plan is, once I have sufficient time available, to re-upgrade nvidia, reproduce the problem, submit the nvidia bug upstream with their tool, and then downgrade again. I just haven't had time yet :) Info: Portage 2.2.7 (default/linux/amd64/13.0, gcc-4.7.3, glibc-2.15-r3, 3.11.6-gentoo x86_64) ================================================================= System uname: Linux-3.11.6-gentoo-x86_64-Intel-R-_Core-TM-_i7_CPU_930_@_2.80GHz-with-gentoo-2.2 KiB Mem: 12296636 total, 788496 free KiB Swap: 25165820 total, 25090636 free Timestamp of tree: Sat, 02 Nov 2013 07:45:01 +0000 ld GNU ld (GNU Binutils) 2.23.1 distcc 3.1 x86_64-pc-linux-gnu [enabled] app-shells/bash: 4.2_p45 dev-java/java-config: 2.1.12-r1 dev-lang/python: 2.7.5-r3, 3.2.5-r3 dev-util/cmake: 2.8.11.2 dev-util/pkgconfig: 0.28 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.6-r1 sys-devel/autoconf: 2.13, 2.69 sys-devel/automake: 1.4_p6-r1, 1.11.6, 1.12.6, 1.13.4 sys-devel/binutils: 2.23.1 sys-devel/gcc: 4.6.3, 4.7.3-r1 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.11 (virtual/os-headers) sys-libs/glibc: 2.15-r3 Repositories: gentoo private x11 kde ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O3 -pipe -march=core2 -ggdb" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/polkit-1/actions /var/bind" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/init.d /etc/php/apache2-php5.4/ext-active/ /etc/php/apache2-php5.5/ext-active/ /etc/php/cgi-php5.4/ext-active/ /etc/php/cgi-php5.5/ext-active/ /etc/php/cli-php5.4/ext-active/ /etc/php/cli-php5.5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O3 -pipe -march=core2 -ggdb" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs collision-protect config-protect-if-modified distcc distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://gentoo.arcticnetwork.ca/ ftp://gentoo.mirrors.tds.net/gentoo http://mirror.datapipe.net/gentoo ftp://mirror.datapipe.net/gentoo ftp://gentoo.arcticnetwork.ca/pub/gentoo/ http://gentoo.llarian.net/ ftp://gentoo.llarian.net/pub/gentoo" LANG="en_US.utf8" LC_ALL="en_US.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j13 -l25" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/home/portdir-mine /usr/portage/local/layman/x11 /usr/portage/local/layman/kde" SYNC="rsync://rsync.ca.gentoo.org/gentoo-portage" USE="X a52 aac acl acpi alsa amd64 apache2 audiofile avahi avi bash-completion berkdb branding bzip2 cairo cdda cddb cdparanoia cdr cli consolekit cracklib crypt css cups cxx dbus dri dvd dvdr dvdread enca encode exif expat ffmpeg fftw firefox fontconfig fortran gd gdbm gif gimp gmp gnutls gs handbook htmlhandbook iconv imagemagick ipv6 java jbig jpeg jpeg2k kde kipi lcms libnotify lzma lzo mad mjpeg mmx mng modules mp3 mpeg mudflap multilib ncurses nls nptl nsplugin ogg opengl openmp pam pcre perl png policykit python qt4 rdesktop readline scanner sdl semantic-desktop session smp sse sse2 ssl subversion svg tcpd threads tiff truetype udev unicode vaapi vcd vde vorbis win32codecs wmf x264 xcb xcomposite xinerama xml xulrunner xvid zlib" ABI_X86="32 64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="proxy proxy_http actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="en" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" QEMU_SOFTMMU_TARGETS="i386 x86_64" QEMU_USER_TARGETS="i386 x86_64" RUBY_TARGETS="ruby19 ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
(In reply to Tanktalus from comment #18) > Portage 2.2.7 (default/linux/amd64/13.0, gcc-4.7.3, glibc-2.15-r3, > 3.11.6-gentoo x86_64) Great. An unsupported kernel. "Patched" that yourself? Break it and you get to keep the pieces. Does anyone here experience this very same problem with an actually supported kernel?
(In reply to Jeroen Roovers from comment #19) > Does anyone here experience this very same problem with an actually > supported kernel? I cannot tell if it's the very same problem, however, I can observe things similar to what reported in comment 12 with the following apparent symptoms : - dolphin, kmail2 (from kde-4.10.5) take ages to start - libreoffice-4.1.2.3 takes ages to show the file open / save dialog box Running nvidia-drivers-319.60 on whatever (ck/g/rt)-sources 2.6.38 - recent 3.4 and 3.8 No problems with nvidia-drivers <= 319.49
*** Bug 490718 has been marked as a duplicate of this bug. ***
(In reply to Jeroen Roovers from comment #19) > Does anyone here experience this very same problem with an actually > supported kernel? Hi, yes. Since the newest drivers (331.20), the 3.12 and 3.11 kernels are officially supported. Hence the info in https://bugs.gentoo.org/show_bug.cgi?id=490718 shows the problem still persists by using a supported combination of kernel and drivers. I see that I can update again, create and send a bug report to nvidia on Sunday. I won't have a chance to access my nvidia system do it before. Kind regards, Christian
Upstream bug report linking to this one: https://devtalk.nvidia.com/default/topic/633706/linux/recent-drivers-cause-applications-to-hang-not-start-at-all-or-compilation-failures/ As mentioned in 487548, the issue seems to happen less often with the most recent driver, but unfortunately it still does happen.
Hi! Reporting here as requested in this thread: http://forums.gentoo.org/viewtopic-t-975106.html?sid=8b4f5670553424affe500aad0b28b764 I had problems with CTRL+C in konsole (but, curiously, not in xterm, uxterm, VT, or in screen sessions started in those - but, vice versa, screen sessions started in Konsole experience the problem when re-attatched in a VT). See the discussion for details. Oh, this is on the dreaded 3.12 Kernel, but I tried nvidia-drivers-331.20, which should be officially supported, and it experiences this problems still. The versions between 319.49 and 331.20 (that I have tried) all have this problem, but were patched to run with the kernel I was currently running on. My info (running on 319.49 - will re-upgrade, run the nvidia patch script and downgrade again later when I have time ): # emerge --info Portage 2.2.7 (default/linux/amd64/13.0/desktop/kde, gcc-4.6.3, glibc-2.15-r3, 3.12.0-gentooVillenMyytti-2 x86_64) ================================================================= System uname: Linux-3.12.0-gentooVillenMyytti-2-x86_64-AMD_Athlon-tm-_Dual_Core_Processor_4850e-with-gentoo-2.2 KiB Mem: 6101848 total, 1219984 free KiB Swap: 19433468 total, 19432248 free Timestamp of tree: Sat, 09 Nov 2013 15:30:01 +0000 ld GNU ld-versio (GNU Binutils) 2.23.1 app-shells/bash: 4.2_p45 dev-java/java-config: 2.1.12-r1 dev-lang/python: 2.7.5-r3, 3.2.5-r3 dev-util/cmake: 2.8.11.2 dev-util/pkgconfig: 0.28 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-apps/sandbox: 2.6-r1 sys-devel/autoconf: 2.13, 2.69 sys-devel/automake: 1.11.6, 1.12.6, 1.13.4 sys-devel/binutils: 2.23.1 sys-devel/gcc: 4.6.3, 4.7.3-r1 sys-devel/gcc-config: 1.7.3 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r4 sys-kernel/linux-headers: 3.9 (virtual/os-headers) sys-libs/glibc: 2.15-r3 Repositories: gentoo gamerlay mythtv x-portage ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="*" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=athlon64-sse3 -O2 -pipe -fomit-frame-pointer" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php5.3/ext-active/ /etc/php/apache2-php5.5/ext-active/ /etc/php/cgi-php5.3/ext-active/ /etc/php/cgi-php5.5/ext-active/ /etc/php/cli-php5.3/ext-active/ /etc/php/cli-php5.5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=athlon64-sse3 -O2 -pipe -fomit-frame-pointer" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--keep-going -j 3 --load-average 1.90" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync metadata-transfer news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://trumpetti.atm.tut.fi/gentoo/ ftp://trumpetti.atm.tut.fi/gentoo/" LANG="fi_FI.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/var/lib/layman/gamerlay /usr/local/mythtv_portage/Gentoo /usr/local/portage" USE="32bit 3dnow 3dnowext S3TC X a52 aac aacplus aacs aalib ace acl acpi alsa amd64 apache2 bash-completion berkdb bittorrent bluetooth bluray branding bzip2 cairo cdda cddb cdio cdr cdrom chardet cli consolekit cracklib crypt css cups cxx dbus declarative dri dts dvb dvd dvdnav dvdr embedded emboss enca encode examples exif fam ffmpeg fftw fi firefox flac floppy fluidsynth fontconfig fortran ftp g3dvl gdbm gif git google-gadgets goom gpm gtk hddtemp iconv icu id3 id3tag ieee1394 imlib ipod ipv6 jack java javascript joystick jpeg jpeg2k kde kdecards kipi latin1 lcd lcms ldap libnotify lirc lm_sensors logrotate mad maildir matroska mbox md5sum midi mikmod mixer mjpeg mmx mmxext mng mod modplug modules mp3 mp4 mpeg mplayer mtp mudflap multilib multiuser musepack music mysql mythtv ncurses nls nodrm nptl nsplugin ntfs ntfsprogs nvcontrol nvidia offensive ogg ogg123 openal opencl opengl openmp pam pango pcre pdf phonon php plasma png policykit ppds projectm projectx pvr qt3support qt4 quicktime rar readline real rpc rtc s3tc scanner sdl semantic-desktop sensord session sftp sid sndfile spell sse sse2 sse3 ssl startup-notification stk stream subtitles svg systray taglib tcpd test-programs tga theora threads tiff timidity tk transcode truetype udev udisks unicode unzip upower usb vaapi vcd vdpau vhosts vim vim-syntax vim-with-x vorbis vst wallpapers wavpack win32 wma wxwidgets x264 xcb xcomposite xinerama xml xrandr xscreensaver xterm xv xvid zip zlib" ABI_X86="64" ALSA_CARDS="hda-intel virmidi" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" GRUB_PLATFORMS="pc" INPUT_DEVICES="keyboard mouse joystick synaptics evdev" KERNEL="linux" LCD_DEVICES="imon" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LINGUAS="fi en_GB en" LIRC_DEVICES="userspace" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-4 php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby19 ruby18" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, MAKEOPTS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, SYNC, USE_PYTHON
*** Bug 490496 has been marked as a duplicate of this bug. ***
FWIW, here is a workaround to build freezing packages like tar, grub, etc. with nvidia-drivers-331.20 and kernel 3.12.2: ssh to localhost. It seems that nvidia's buggy wrappers don't get loaded as long as you are ssh-ed in without X forwarding, or otherwise are not in an X session.
*** Bug 493040 has been marked as a duplicate of this bug. ***
*** Bug 492984 has been marked as a duplicate of this bug. ***
I've masked the affected nvidia-drivers versions in the KDE profiles, since this hits Akonadi and KWin. Global mask requires the package maintainer.
(In reply to Andreas K. Hüttel from comment #29) > I've masked the affected nvidia-drivers versions in the KDE profiles, since > this hits Akonadi and KWin. > > Global mask requires the package maintainer. It also affects GNOME users[1] so I am pretty sure it's an issue between the kernel and nvidia.ko. Now, if only someone who can reproduce the bug would do some debugging or even perhaps show a kernel .config with which we might reproduce it. I can't believe it has been this long and nobody yet showed up with a proper analysis. It's probably down to a mere CONFIG_* option in the kernel which we could test for and warn against. [1] https://devtalk.nvidia.com/default/topic/638521/linux/gnome-terminal-problems-ctrl-c-and-exit/
Created attachment 364448 [details] .config for kernel 3.12.2 (In reply to Jeroen Roovers from comment #30) My .config file. I am experiencing this problem in gnome with vanilla 3.12.2 and nvidia-drivers-331.20
> It also affects GNOME users[1] so I am pretty sure it's an issue between the > kernel and nvidia.ko. I have this problem (emerge freezing at configure step) with openbox too. Check my report : https://bugs.gentoo.org/show_bug.cgi?id=490496
Can someone tell me if I am affected!? # grep SigBlk /proc/*/status| grep -v 0000000000000000 /proc/1/status:SigBlk: 7be3c0fe28014a03 /proc/1229/status:SigBlk: 7be3c0fe28014a03 /proc/1230/status:SigBlk: 0000000000010000 /proc/1291/status:SigBlk: 00007f4425234e90 /proc/1330/status:SigBlk: 00007ffe15e91cb0 /proc/1333/status:SigBlk: 00007ffe15e91a20 /proc/1334/status:SigBlk: 00007f201480a8c8 /proc/1350/status:SigBlk: 000000000000000a /proc/1378/status:SigBlk: 0000000000010000 /proc/1482/status:SigBlk: 7be3c0fe28014a03 /proc/1617/status:SigBlk: fffffffe7ffb9eff /proc/1620/status:SigBlk: 0000000000010000 /proc/1630/status:SigBlk: 0000000000010000 /proc/2137/status:SigBlk: 0000000000010000 /proc/2283/status:SigBlk: fffffffe7ffb9eff /proc/2286/status:SigBlk: 0000000000010000 /proc/2306/status:SigBlk: 0000000000010000 /proc/394/status:SigBlk: 0000000000004002 /proc/67/status:SigBlk: 0000000000004a02 /proc/92/status:SigBlk: fffffffe7ffbfeff
This SigBlk I can kill 9 or 15 as user or as root-user: /proc/1333/status:SigBlk: 00007ffe15e91a20 It is kde-dolphin
Created attachment 364456 [details] working .config for kernel 3.12.2 (In reply to Alexandre Rostovtsev from comment #26) > build freezing packages like tar, grub, etc. (In reply to scrimekiler from comment #32) > I have this problem (emerge freezing at configure step) with openbox too. Cannot reproduce this on unpatched 3.12.2-gentoo with unpatched =x11-drivers/nvidia-drivers-331.20. Tried thrice for each. Also have no other processes hanging. Tried reproducing in other ways from what I read on the forums (sleep in a while loop, ^C^C^C^Z, ps). Everything works here. Attached my working .config. Whether or not you can reproduce; please upload your .config such that we can determine a common denominator, but make sure you have properly tested first.
On a side note, I am also running systemd; GNOME 3.10 though. (In reply to Ulenrich from comment #33) > Can someone tell me if I am affected!? > > # grep SigBlk /proc/*/status| grep -v 0000000000000000 > /proc/1291/status:SigBlk: 00007f4425234e90 > /proc/1330/status:SigBlk: 00007ffe15e91cb0 > /proc/1333/status:SigBlk: 00007ffe15e91a20 > /proc/1334/status:SigBlk: 00007f201480a8c8 Possibly, but I guess you'll want to try to reproduce. These look like memory addresses and thus match the description of what might be an indicator; just checked mine as well (I haven't been able to reproduce yet), but I notice I have at least one like this (gnome-shell): /proc/3925/status:SigBlk: 00007f143178e000 Not sure what it means or whether it is a false positive in my case. :/
for i in $(grep SigBlk /proc/*/status| grep -v 00000000000) ; do [ -f ${i/status*}cmdline ] \ && cat ${i/status*}cmdline \ || echo -e "\n SigBlk $i \n" done ---- /sbin/systemd SigBlk 7be3c0fe28014a03 /usr/lib/systemd/systemd--user SigBlk 7be3c0fe28014a03 kwin SigBlk 00007f4425234e90 kdeinit4: krunner [kdeinit] SigBlk 00007ffe15e91cb0 su- SigBlk fffffffe7ffb9eff su- SigBlk fffffffe7ffb9eff ----
Not reproducible on 331.17 either, I have tried that one because it is supposed to happen more often there; I'll probably try to change some of the kernel options. If I find one that allows me to reproduce, I'll let you know.
I just installed x11-drivers/nvidia-drivers-319.72 --- see bug https://bugs.gentoo.org/show_bug.cgi?id=493160 --- but nothing changes (I was never hit by this bug), like above: /sbin/systemd SigBlk 7be3c0fe28014a03 /usr/lib/systemd/systemd--user SigBlk 7be3c0fe28014a03 /usr/lib/systemd/systemd--user SigBlk 7be3c0fe28014a03 kwin SigBlk 00007efe34c32e90 kdeinit4: krunner [kdeinit] SigBlk 0000000001323e50 kdeinit4: dolphin [kdeinit] --icon system-file-ma SigBlk 00007ffe61f8ca30 /usr/bin/aqualung SigBlk 00007fa463d198c8 /usr/lib/systemd/systemd-udevd SigBlk fffffffe7ffbfeff
Hi, I have same/similar issue with 331.20 drivers, an increasing number of defunct processes: $ ps ax |grep defunct 2723 ? Z 0:00 [kwin_opengl_tes] <defunct> 2749 ? ZN 0:00 [virtuoso-t] <defunct> 3103 ? ZN 0:00 [virtuoso-t] <defunct> 3105 ? ZN 0:00 [virtuoso-t] <defunct> 3141 ? Z 0:00 [virtuoso-t] <defunct> 3143 ? Z 0:00 [virtuoso-t] <defunct> 3147 ? Z 0:00 [virtuoso-t] <defunct> 3148 ? Z 0:00 [virtuoso-t] <defunct> My kernel config is in the attatchement.
Created attachment 364486 [details] kernel-3.12.1-config-broken
(In reply to Martin Samek from comment #40) > Hi, I have same/similar issue with 331.20 drivers We don't need more confirmation, thanks. We do need at the very least someone posting their nvidia-bug-report.sh output on the upstream forum, and ideally someone going through various kernel configuration switches to see which one trips up nvidia.ko.
(In reply to Jeroen Roovers from comment #42) > (In reply to Martin Samek from comment #40) > > Hi, I have same/similar issue with 331.20 drivers > > We don't need more confirmation, thanks. We do need at the very least > someone posting their nvidia-bug-report.sh output on the upstream forum, This has been done multiple times, see https://devtalk.nvidia.com/default/topic/633706/linux/recent-drivers-cause-applications-to-hang-not-start-at-all-or-compilation-failures/ and https://devtalk.nvidia.com/default/topic/638521/linux/gnome-terminal-problems-ctrl-c-and-exit/ nVidia opened a bug in their internal tracker (bug 1414070), so they are aware of the issue (but apparently struggle with reproducing it) Of course more people can add their reports there :) As other distributions seem to be affected as well, I can't say whether it really is a kernel option or maybe a specific software version (e.g. gcc or libc) Kind regards, Christian
(In reply to Jeroen Roovers from comment #42) > ideally someone going through various kernel configuration switches to see > which one trips up nvidia.ko. 1/ I have no logic yet capable of justifying, 2/ The troubles occur randomly => I might well not have tested enough. Both tests made under identical hardware + ck-sources-3.4.68 + all drivers statically built + (nvidia-drivers-319.49 (troublefree) || nvidia-drivers-319.60 (misc and random problems already reported above and elsewhere)) - Building the kernel with CONFIG_BSD_PROCESS_ACCT=y and CONFIG_BSD_PROCESS_ACCT_V3=y + nvidia-drivers-319.60 => Troubles! - Building the kernel with CONFIG_BSD_PROCESS_ACCT and CONFIG_BSD_PROCESS_ACCT_V3 unset => No problem... yet! (including no problem for akonadi registering with dbus) For what it is worth... that is, for the now... almost nothing.
@Eric, am I affected using now nvidia-drivers-319.76 with both 'y'? Output at: https://forums.gentoo.org/viewtopic-p-7453354.html#7453354 ... to get some user input in the forum.
(In reply to Eric F. GARIOUD from comment #44) > - Building the kernel with CONFIG_BSD_PROCESS_ACCT=y and > CONFIG_BSD_PROCESS_ACCT_V3=y + nvidia-drivers-319.60 => Troubles! > - Building the kernel with CONFIG_BSD_PROCESS_ACCT and > CONFIG_BSD_PROCESS_ACCT_V3 unset => No problem... yet! (including no problem > for akonadi registering with dbus) Thank you for sharing this discovery. Can confirm this to break on the very first emerge that I do after rebuilding it with just those tw toggled; both app-arch/tar and sys-boot/grub now show this is their log: configure:26161: checking for working re_compile_pattern configure:26352: x86_64-pc-linux-gnu-gcc -std=gnu99 -o conftest -O2 -pipe -O2 -pipe -march=native -fomit-frame-pointer -Wl,-O1 -Wl,--as-needed conftest.c -lacl >&5 configure:26352: $? = 0 configure:26352: ./conftest *** Error in `./conftest': malloc(): memory corruption: 0x0000000000604fc0 *** When checking the signal masks, I see /usr/bin/gnome-shell as before which is normal; but this time I additionally see the following as well, it might or might not be part of the problem: /usr/libexec/ibus-x11 --kill-daemon The builds failing are sufficient proof though. Updated the bug summary with this config variable, as we also know that 325.15 works and 331.17 and 331.20 fail we can update the version as well. Only thing left to figure out is the kernel version and to be more specific the kernel commit where this behavior was introduced. Unless it applies to all kernel versions... ================= = TEMPORARY FIX = ================= Downgrade to ~x11-drivers/nvidia-drivers-325.25 or alternatively set CONFIG_BSD_PROCESS_ACCT=n CONFIG_BSD_PROCESS_ACCT_V3=n in the kernel .config
"with kernel ? and CONFIG_BSD_PROCESS_ACCT{,_V3}=y" I've had those enabled all along and that was never the issue for me.
(In reply to Jeroen Roovers from comment #47) > "with kernel ? and CONFIG_BSD_PROCESS_ACCT{,_V3}=y" > > I've had those enabled all along and that was never the issue for me. There might well be another CONFIG setting involved. On my side I can make observations identical to my #44 comment under : ck-sources-2.6.38-r3 ; 3.4.68 ; 3.8.13 ; 3.10.17
(In reply to Jeroen Roovers from comment #47) > "with kernel ? and CONFIG_BSD_PROCESS_ACCT{,_V3}=y" > > I've had those enabled all along and that was never the issue for me. Hmm, then either this is card specific or depends on some other config variable as well; feel free to share .config if you want this investigated further, but I however think like you that this should be further investigated upstream. Further testing reveals this can also be reproduced using an unpatched 331.17 on 3.9.11 and 3.6.11; so, it is definitely not a recent kernel regression. Will do some further testing later to see if older versions are not affected. It starts to seem like a NVIDIA drivers regression where some kernel options and/or graphics cards just serve as a condition to reveal it. My card is: 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation G92M [GeForce GTX 285M] [10de:060f] (rev a2) I'll send more details upstream with `nvidia-bug-report.sh` when I revisit this.
@Jer about the new summary. As written in #44, <=319.49 systems are systematically OK *319.60* is the first release causing troubles.
(In reply to Tanktalus from comment #14) > Oh, I should also add: I'm running KDE 4.11.2. And akonadi has problems due > to this as well. But what has problems should be of passing interest only. > The real cause is the signal mask that nvidia gives to the parent process > and gets passed on to everything else. Arguably, everything else could > reset their own signal masks. But they shouldn't have to. This all sounds like an ABI change. It may not even come up in nvidia.ko vs. the kernel, but in the userland libraries that talk to nvidia.ko. Try this and see how it goes: 1) upgrade sys-kernel/linux-headers to a version approaching your current kernel version. 2) re-emerge sys-libs/glibc 3) re-emerge x11-drivers/nvidia-drivers 4) reboot 5) test
A few things I tried on a current ~amd64 system: + recompiled glibc & nvidia-drivers, reboot, test + same as above but w/ gcc 4.7.3 (as opposed to 4.8.2) and recompiled the kernel as well, reboot, test + BSD accounting off None of it made any change. KDE behaved strangely sometimes and I got zombie processes (due to the "masking corruption" noted earlier by someone else). For what it's worth: It is a GTX470 and the kernel is 3.11.7 (vanilla flavor). Downgrading to 325.15 stopped all the madness again and everything runs smoothly.
331.17 in the topic name doesn't seem to exist. 331.20 does exist but doesn't seem to be affected
(In reply to scrimekiler from comment #53) > 331.17 in the topic name doesn't seem to exist. It is a beta version in the ebuild attic that makes it easier to reproduce the problem; please note that the summary ATOM has ">=", which means 331.17 or newer. (In reply to Matthias Dahl from comment #52) > + BSD accounting off > > None of it made any change. How did you turn it off? Did you change the kernel options? Can you check if you did boot the newly build kernel? If you have this problem and cannot toggle it using the kernel config variable we have found; then, it seems that there are other ways to trigger this behavior.
(In reply to Tom Wijsman (TomWij) from comment #54) > How did you turn it off? Changed the kernel config. Recompiled the kernel. Rebooted. :) > Can you check if you did boot the newly build kernel? You can be pretty sure I know what I am doing. :) In fact, if it weren't for the fact that the drivers mostly consist of the crappy blob, I would have already put gdb to get use to figure out what is going on. > If you have this problem and cannot toggle it using the kernel config > variable we have found; then, it seems that there are other ways to trigger > this behavior. Yeah. It would have surprised me, honestly, if this had been really a reliable trigger or workaround. :( I also had a look through the configs (mine and the ones posted), but nothing really jumped out. Maybe it is really a memory corruption happening on the nvidia side due to their recent changes with the unified memory support. Or it is indeed an ABI clash somewhere. With nvidia's history of taking their time to fix things, this could be an issue for quite some time to come, I am afraid. :(( What we know so far: - signal mask gets corrupted (w/ all its side effects) - the gfx chip doesn't seem to make any difference - glibc makes no difference (happens w/ glibc 2.15 and 2.17) - gcc makes no difference (happens w/ gcc 4.7 and 4.8) - kernel version seems to makes no difference - drivers > 325.15 are affected
319.60 is definitely affected 319.49 is probably affected too. The fresh X session behaves in a normal way, but after a week I have $ ps ax | grep Z 1358 ? Z 11:25 [vlc] <defunct> 6392 ? Z 0:00 [su] <defunct> 6396 ? Z 0:00 [sh] <defunct> 9357 ? Z 0:53 [digikam] <defunct> 9684 ? Zs 0:00 [ssh-euclid] <defunct> 12906 ? Z 12:56 [vlc] <defunct> 21672 ? Z 1:50 [vmplayer] <defunct> 21986 ? Z 2:19 [darktable] <defunct> 22526 ? Z 3:24 [vmplayer] <defunct> 23966 ? Z 0:25 [recoll] <defunct>
(In reply to Serge Gavrilov from comment #56) > 319.60 is definitely affected > > 319.49 is probably affected too. This is not my opinion. In my opinion : - 319.49 is *definitely not* concerned by *this* (487558) precise bug. - 319.60 is the first one being concerned. - 319.49 is known for being concerned with another bug. The one you are experiencing and reporting about with your list of defunct processes. - 319.60 tried to address this issue (" Fixed a bug that could cause OpenGL applications to crash during the initialization of new threads." quoted from nvidia-319.60 release highlights) It is highly probable that the bug we are speaking about here (487558) has been introduced by nvidia as a consequence of the above mentioned bugfix, but I acknowledge I get no mean to prove that.
If you diff the non binary files there are only two files with not just versions differently showing up. But this two issues of: a) drm_fasync b) gfp_mask isn't of any significance in our case, or is it? --- NVIDIA-Linux-x86_64-319.49/kernel/nv-drm.c +++ NVIDIA-Linux-x86_64-319.60/kernel/nv-drm.c @@ -106,7 +106,6 @@ .unlocked_ioctl = drm_ioctl, .mmap = drm_gem_mmap, .poll = drm_poll, - .fasync = drm_fasync, .read = drm_read, .llseek = noop_llseek, }; --- NVIDIA-Linux-x86_64-319.49/kernel/nv-vm.c +++ NVIDIA-Linux-x86_64-319.60/kernel/nv-vm.c @@ -483,6 +483,9 @@ gfp_mask = NV_GFP_DMA32; } #endif +#if defined(__GFP_NORETRY) + gfp_mask |= __GFP_NORETRY; +#endif #if defined(__GFP_ZERO) if (at->flags & NV_ALLOC_TYPE_ZEROED) gfp_mask |= __GFP_ZERO; @@ -532,7 +535,7 @@ NV_GET_FREE_PAGES(virt_addr, at->order, (gfp_mask | __GFP_COMP)); if (virt_addr == 0) { - nv_printf(NV_DBG_ERRORS, + nv_printf(NV_DBG_MEMINFO, "NVRM: VM: %s: failed to allocate memory\n", __FUNCTION__); return RM_ERR_NO_FREE_MEM; } @@ -700,7 +703,7 @@ NV_GET_FREE_PAGES(virt_addr, 0, gfp_mask); if (virt_addr == 0) { - nv_printf(NV_DBG_ERRORS, + nv_printf(NV_DBG_MEMINFO, "NVRM: VM: %s: failed to allocate memory\n", __FUNCTION__); status = RM_ERR_NO_FREE_MEM; goto failed;
I observe this bug also, on any drivers newer than 325.15 on any kernel tried (3.10, 3.11, 3.12). It appears every time during Xfce session initialization (one of Xfce's root processes gets wrong sigmask and infects most of GUI), but I was unlucky finding the exact way to provoke the bug in a single isolated process.
Interestingly, while re-installing a kernel with some changed .config options, I saw this: DEPMOD 3.12.2-gentoo-JeR depmod: WARNING: /lib/modules/3.12.2-gentoo-JeR/video/nvidia.ko needs unknown symbol kmem_cache_alloc_trace depmod: WARNING: /lib/modules/3.12.2-gentoo-JeR/video/nvidia.ko needs unknown symbol add_preempt_count depmod: WARNING: /lib/modules/3.12.2-gentoo-JeR/video/nvidia.ko needs unknown symbol debug_smp_processor_id depmod: WARNING: /lib/modules/3.12.2-gentoo-JeR/video/nvidia.ko needs unknown symbol sub_preempt_count The symbol references are present in the nvidia.ko built against the previously installed kernel, while apparently nvidia.ko (or probably more precisely nv-kernel.o) hides these symbols. After reinstalling nvidia-drivers, this is magically corrected for. The main difference in the .config is that I played with enabling/disabling CONFIG_TRACING, but since that enables/disables some other dependent options on its own, I can't be sure which is triggering the behaviour we see. Also, the attached .configs don't agree on CONFIG_TRACING itself.
*** Bug 494212 has been marked as a duplicate of this bug. ***
With nvidia-drivers-319.76 and quite old kernel 3.5.7 where # CONFIG_BSD_PROCESS_ACCT is not set I reproduce the initial problem related to bogofilter. Thus it seems there is no working driver in portage now.
The same problem with 331.20
Created attachment 365380 [details] Non-working .config for 3.5.7 with CONFIG_BSD_PROCESS_ACCT=n
Could we please get 325.15 back in tree, as this is the last driver that works? Removing this leaves KDE users with only non-working drivers ... Kind regards
(In reply to Christian Loosli from comment #65) > Could we please get 325.15 back in tree, as this is the last driver that > works? > > Removing this leaves KDE users with only non-working drivers ... I am running KDE just fine with newer nvidia-drivers.
(In reply to Jeroen Roovers from comment #66) > (In reply to Christian Loosli from comment #65) > > Could we please get 325.15 back in tree, as this is the last driver that > > works? > > > > Removing this leaves KDE users with only non-working drivers ... > > I am running KDE just fine with newer nvidia-drivers. I'm not. The bug does not occur for everyone (also, it is not KDE specific). Actually, nothing newer than 319.49, which has also been removed from the tree, is working for me. When the bug is fixed in a new driver, I'd presume this bug will be labeled FIXED. I'd also like to suggest versions that are the last ones that seem to work, would not be removed from the tree until then, as it causes unecessary hassle for people hit by this bug. Cheers!
*** Bug 494618 has been marked as a duplicate of this bug. ***
I have this issue as well, but it appears *very* sporadically, and wether or not I get this bug is decided at boot time it seems (once I have booted and I don't see any curiously blocked signals, I'm good at least until the next time I boot). 4 out of 5 times my machine boots fine, so it's hard for me to really pin this down on any particular kernel option - I initially thought toggling CONFIG_BSD_PROCESS_ACCT did help, but numerous reboots down the road this issue seemed to hit me again
The nvidia drivers I am using is version 319.49, if I upgrade to a newer version I suffer these problems: - VMWare Workstation do not starts virtual machines, giving error "Cannot find a valid peer process to connect to". - Some wine/crossover application (f.e. DVDFab) not starts at all. - SMPlayer/mplayer hangs on quitting application or, if I am watching the TV, when I change channel. - Mono applications (f.e. Keepass) do not starts at all. A workaround to these problems is it start the applications from the terminal (very boring). Another issue with drivers > 319.49 is with the SLI, it does not work, giving error "trouble accessing pci config space". I apoligize for my english.
Adding myself to the bug list. Mainly had issues with KDE freezing even after a laptop suspend-to-RAM.
I had kernel 3.11.6 with CONFIG_BSD_PROCESS_ACCT=y. nvidia-drivers-319.76 worked and 331.20 had all problems already listed in this bug report. Set CONFIG_BSD_PROCESS_ACCT=n. Still had the same problem with 331.20. Switched to kernel 3.12.6 (CONFIG_BSD_PROCESS_ACCT=n) and nvidia-drivers-331.20. So far it seems to work. No delay when opening dolphin or the save-as dialog in KDE apps. Emerged grep and it did not hang on recompile-pattern like it did before.
(In reply to Markus Strobl from comment #72) > > Switched to kernel 3.12.6 (CONFIG_BSD_PROCESS_ACCT=n) and > nvidia-drivers-331.20. So far it seems to work. No delay when opening > dolphin or the save-as dialog in KDE apps. Emerged grep and it did not hang > on recompile-pattern like it did before. I wish I was as lucky... $ uname -a Linux kimura 3.12.6-gentoo #1 SMP Sun Dec 22 10:19:31 GMT 2013 x86_64 Intel(R) Core(TM)2 CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux $ eix -Ic nvidia-drivers [I] x11-drivers/nvidia-drivers (331.20@12/22/13): NVIDIA X11 driver and GLX libraries [1] "Personal overlay" /usr/portage/local $ tail -n 30 /var/tmp/portage/sys-apps/grep-2.15-r1/temp/build.log checking whether mbrtowc handles a NULL pwc argument... (cached) yes checking whether mbrtowc handles a NULL string argument... (cached) yes checking whether mbrtowc has a correct return value... (cached) yes checking whether mbrtowc returns 0 when parsing a NUL character... (cached) yes checking whether mbrtowc handles incomplete characters... (cached) yes checking whether mbrtowc works as well as mbtowc... (cached) yes checking whether mbrtowc handles incomplete characters... (cached) yes checking whether mbrtowc works as well as mbtowc... (cached) yes checking whether mbsrtowcs works... yes checking for mempcpy... (cached) yes checking for memrchr... yes checking whether YESEXPR works... yes checking for obstacks... yes checking whether open recognizes a trailing slash... yes checking for opendir... yes checking for perl5.005 or newer... yes checking whether frexp works... (cached) yes checking whether ldexp can be used without linking with libm... yes checking whether frexpl() can be used without linking with libm... (cached) yes checking whether frexpl works... (cached) yes checking whether frexpl is declared... (cached) yes checking whether ldexpl() can be used without linking with libm... yes checking whether ldexpl works... yes checking whether ldexpl is declared... (cached) yes checking whether program_invocation_name is declared... yes checking whether program_invocation_short_name is declared... yes checking for readdir... yes checking for stdlib.h... (cached) yes checking for GNU libc compatible realloc... yes checking for working re_compile_pattern... ...and its just hung there. grub2 hangs in the same manner.
(In reply to Neil from comment #73) > I wish I was as lucky... > ...and its just hung there. grub2 hangs in the same manner. You may be interested in the following (duplicate) bug : https://bugs.gentoo.org/show_bug.cgi?id=490496 It seems it depends of the ebuilds.I had only this problem with grub2, not with other ebuilds Neil, maybe you should try with version <=319.32
(In reply to scrimekiler from comment #74) > (In reply to Neil from comment #73) > > I wish I was as lucky... > > ...and its just hung there. grub2 hangs in the same manner. > > You may be interested in the following (duplicate) bug : > https://bugs.gentoo.org/show_bug.cgi?id=490496 > > It seems it depends of the ebuilds.I had only this problem with grub2, not > with other ebuilds > > Neil, maybe you should try with version <=319.32 Thanks the suggestion, unfortunately I've not got a blocker from chromium on <=x11-drivers/nvidia-drivers-331.20... # emerge -a nvidia-drivers [ebuild UD] x11-drivers/nvidia-drivers-319.76 [331.20] USE="X acpi (-multilib) -pax_kernel tools" [blocks B ] <x11-drivers/nvidia-drivers-331.20 ("<x11-drivers/nvidia-drivers-331.20" is blocking www-client/chromium-32.0.1700.68) * Error: The above package list contains packages which cannot be * installed at the same time on the same system. (x11-drivers/nvidia-drivers-319.76::gentoo, ebuild scheduled for merge) pulled in by x11-drivers/nvidia-drivers required by @selected nvidia-drivers x11-drivers/nvidia-drivers required by (x11-base/xorg-drivers-1.15::gentoo, installed) (www-client/chromium-32.0.1700.68::gentoo, installed) pulled in by www-client/chromium required by @selected 319.76 is the current 319.* version in portage.... # eix nvidia-drivers [D] x11-drivers/nvidia-drivers Available versions: (~)96.43.09^s[1] 96.43.23^msd 173.14.39^msd 304.116^msd (~)304.117^msd 319.76^msd [m]331.20^msd {+X acpi custom-cflags gtk multilib pax_kernel (+)tools KERNEL="FreeBSD linux"} Installed versions: 331.20^msd(10:23:37 22/12/13)(X acpi tools -multilib -pax_kernel KERNEL="linux -FreeBSD") Homepage: http://www.nvidia.com/ Description: NVIDIA X11 driver and GLX libraries Its not a game stopper (yet!) and if I've understood reading this bug report correctly (which may not be the case, as I feel slightly confused) then its something that needs sorting upstream by nvidia and I am happy to wait (but also to try out solutions suggested here in the meantime).
Please ignore everything I said in comment #69, my conclusions were tainted by not having the nvidia opengl libs selected (with 'eselect opengl'). It does seem however that switching to the xorg-x11 option makes the corrupt/weird signal mask problem go away. Not really a usable work-around for those on Gnome since that means no 3D accel, but still better than hanging terminals and crashing/stuck applications and systemd getting stuck when trying to reboot/shutdown.
And.. I forgot to add: this problem only occurs on a machine with a GTX275, I have another, newer machine that has a GTX780, which does not have this problem at all. When I get some time I will see if swapping the GPUs changes anything and if not, see if I can spot some differences in the kernel config and installed software.
Nvidia seems to have a fix soon. See post 13 in https://devtalk.nvidia.com/default/topic/638521/linux/gnome-terminal-problems-ctrl-c-and-exit/
I confirm that I had the same problem on a new system that I just worked on. For whatever reason, this Portage tree didn't have version 319.49 so I made an overlay for it. I also had to mask 319.76 which also seems to be affected by this problem. I experienced the same issues with zombie processes including nepomuk, virtuoso-t, and kwin_opengl_test. I also was unable to get gettext to compile; it also got stuck on the sleep step during the configure process. Thank you for posting this bug - I was going around in circles trying to get these figured out. I wasn't able to find the bug until I came across the gettext compile issue. Rolling back to v319.49 resolved these. Hopefully NVidia will be able to develop a fix. If I can provide anything that would help then please let me know. Thank you again.
^c is now working for me after rebuilding kernel without BSD_PROCESS_ACCT. Also I had some random system freezes (reset button was the only thing that helped), but don't know if it is related to this bug, will see if they occur again.
Sorry, it wasn't really true about working ^C. After reboot, it stopped working again, seems that it don't work when Konsole is started by KDE session manager. Restarting Konsole made it working again (but it wasn't working that way with BSD_PROCESS_ACCT, so seems that it has some influence). I'm using sys-kernel/gentoo-sources-3.11.10 and x11-drivers/nvidia-drivers-331.20
Try the latest 3.12 kernel. I had the same issue with 3.11. 3.12.6 (CONFIG_BSD_PROCESS_ACCT=n) and nvidia-drivers-331.20 has been OK for me for 2 weeks now.
Created attachment 366974 [details] non-working .config with CONFIG_BSD_PROCESS_ACCT disabled I've tested CONFIG_BSD_PROCESS_ACCT=n with gentoo-sources-3.12.6, nvidia-drivers-331.20 over a NVIDIA GTX560m, however it does not fixes the issue
Created attachment 366976 [details] Working config for 3.12.6 So far I have not encountered any issues with this config and nvidia-drivers-331.20
*** Bug 497000 has been marked as a duplicate of this bug. ***
I get the same error with 319.76 (and it does not seem possible for me to use the GeForce 650M with version 304). $ ps -eda -o pid,ppid,blocked,comm | grep -v 00000 PID PPID BLOCKED COMMAND 275 1 fffffffe7ffbfeff udevd 9107 1 00007f847f9b5c00 kded4 9162 1 00007f847f9b5800 bluedevil-monol 9196 9104 00007ffe750a90f8 kio_trash 9214 9206 00007f8c7868f070 kwin 9226 9223 00007f847f9b5c00 ksysguardd 9235 9104 00007ffe750a90f8 kio_trash 9292 9104 00007ffe750a8410 konqueror 9391 9104 00007ff845816440 pidgin 9659 9104 00007ffe750a90f8 kio_trash It is even worse if anything at the kdeinit4 level gets touched. It makes stuff like ^C stop working in Konsole properly.
For the record I extended the mask in the kde profile to cover all 319* and 331* versions.
I tried to use 304.* with my GeForce 650M (MacBook Pro machine) and got errors starting up. It does not initialise the card. This might be a compatibility issue. [ 5.500558] NVRM: failed to copy vbios to system memory.
I confirm the problem. x11-drivers/nvidia-drivers-331.20 was built with the following: USE="X acpi (multilib) tools -pax_kernel" sys-kernel/gentoo-sources-3.10.17 was built with the following: USE="-build -deblob -experimental -symlink"
BTW, is this really a KDE issue WRT masking? Seems to have been observed on KDE systems at first, but if it's the signal mask on the X server, it really shouldn't be limited to KDE. Anyways a heads-up: The 331.38 driver just came out: http://www.nvidia.com/download/driverResults.aspx/72250/en-us According to https://devtalk.nvidia.com/default/topic/638521/linux/gnome-terminal-problems-ctrl-c-and-exit/ it's supposed to be fixed there. I'm going to try it out right now. May it should be bumped in unstable so more people will test it?
OK, so: # modinfo nvidia filename: /lib/modules/3.12.6-gentoo_wald/video/nvidia.ko alias: char-major-195-* version: 331.38 [...] # cat /proc/`pgrep X`/status | grep Sig SigQ: 2/61376 SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 00000001d18066cf Is this it? Looks OK to me. Please bump.
Same for me: filename: /lib/modules/3.12.7-gentoo/video/nvidia.ko alias: char-major-195-* version: 331.38 supported: external license: NVIDIA SigQ: 0/126391 SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 00000001d18066cf A this moment without defuncts
It seems that 331.38 fixes initial problem related with bogofilter freezes.
(In reply to Serge Gavrilov from comment #93) > It seems that 331.38 fixes initial problem related with bogofilter freezes. Perhaps =x11-drivers/nvidia-drivers-319.82 too?
I tried almost each version of x11-drivers/nvidia-drivers but cannot reproduce the problem, output of "cat "/proc/`pgrep X`/status | grep Sig" always shows: SigQ: 0/31107 SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 00000001d18066cf No crashes or application hangs. And I'm using KDE with kmail/akonadi, really strange. Maybe this is somehow hardware dependent? (using an old Thinkpad with an ancient nvidia card here).
nvidia-drivers-331.38 fixes the problem for me (gentoo-sources-3.12.7, nvidia gtx560m) # modinfo nvidia filename: /lib/modules/3.12.7-gentoo/video/nvidia.ko alias: char-major-195-* version: 331.38 supported: external license: NVIDIA # cat /proc/`pgrep X`/status | grep Sig SigQ: 0/128018 SigPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000381000 SigCgt: 00000001d18066cf I'm now able to use ctrl+c and my system is running without any defuncts
331.38 seems to work fine for me. Note that exact garbage in sigmasks may vary, thus observed behaviour may be very different (INT and CHLD blocking is described above; my first experience with the bug was in blocked ALRM that is funny too). Better way to test current state is statistics from command: ps ax --no-headings -o sigmask | sort | uniq -c Run it with old nvidia or non-nvidia drivers and compare with current statistics.
331.38 installed. $ ps -eda -o pid,ppid,blocked,comm | grep -v 00000 PID PPID BLOCKED COMMAND 275 1 fffffffe7ffbfeff udevd
(In reply to Andrew Udvare from comment #98) > 331.38 installed. > > $ ps -eda -o pid,ppid,blocked,comm | grep -v 00000 > PID PPID BLOCKED COMMAND > 275 1 fffffffe7ffbfeff udevd This is OK. At least I have the same number on intel-only laptop. I guess it blocks all signals and unblocks only few of them actually needed.
*** Bug 490256 has been marked as a duplicate of this bug. ***
https://devtalk.nvidia.com/default/topic/690793 = Linux, Solaris, and FreeBSD driver 331.49 (long-lived branch release) = [...] "Fixed a bug which could sometimes corrupt a newly-created thread's signal mask in multi-threaded applications that load libGL."
Hi! I've been running 334.16-r5 (and some previous version including 331.38 and perhaps after it, I could check my emerge logs) seemingly without problems, though I haven't come around to check the sigmask as described above (I need my computer so I don't want to install the broken drivers). Though, if I understand correctly, seemingly correct operation might not be enough to determine if this is indeed fixed (maybe it's just a coincidence / different circumstances that cause different kind of corruption, which might not be observed...). But confirming in any cse that newer drivers seem to work for me, and if others can confirm too, maybe this bug can be CLOSED FIXED? Cheers!
I believe this is obsolete long time ago.