Bug 771750 - x11-misc/bumblebee-3.2.1_p20210112 not working correctly with x11-drivers/nvidia-drivers-460.39-r1
Summary: x11-misc/bumblebee-3.2.1_p20210112 not working correctly with x11-drivers/nvi...
Product: Gentoo Linux
Component: Current packages (show other bugs)
Assignee: Adam Feldman
Reported: 2021-02-20 17:25 UTC by Stefano
Modified: 2021-04-18 13:00 UTC (History)
4 users (show)

Description Stefano 2021-02-20 17:25:10 UTC
As per the summary, current bumblebee version (3.2.1_p20210112) seems not to work with latest stable x11-drivers/nvidia-drivers-460.39-r1, whereas the same configuration works like a charm with 450.102.04

When malfunctioning, bumblebee starts normally; however, running optirun <whatever> hangs; if interrupted with ctrl+c it just prints a noninformative error message "[WARN]Could not read data! Error: Bad file descriptor"

Relevant portion of messages:

Feb 20 17:54:14 tardis kernel: bbswitch: enabling discrete graphics
Feb 20 17:54:14 tardis kernel: nvidia: module license 'NVIDIA' taints kernel.
Feb 20 17:54:14 tardis kernel: Disabling lock debugging due to kernel taint
Feb 20 17:54:14 tardis kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 241
Feb 20 17:54:15 tardis kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  460.39  Thu Jan 21 21:54:06 UTC 2021
Feb 20 17:55:15 tardis kernel: udevd[2061]: worker [19609] /module/nvidia is taking a long time
Feb 20 17:57:15 tardis kernel: udevd[2061]: worker [19609] /module/nvidia timeout; kill it
Feb 20 17:57:15 tardis kernel: udevd[2061]: seq 2769 '/module/nvidia' killed
Feb 20 17:57:15 tardis kernel: udevd[2061]: worker [19609] terminated by signal 9 (Killed)
Feb 20 17:57:15 tardis kernel: udevd[2061]: worker [19609] failed while handling '/module/nvidia'

The Xorg instance opened by bumblebee remains hanging and is impossible to kill even manually.

Any advice on how to debug this thing further would be helpful.

Reproducible: Always

$ emerge --info
Portage 3.0.13 (python 3.8.7-final-0, default/linux/amd64/17.1/desktop/plasma, gcc-9.3.0, glibc-2.32-r6, 5.10.12-gentoo x86_64)
System uname: Linux-5.10.12-gentoo-x86_64-Intel-R-_Core-TM-_i7-10750H_CPU_@_2.60GHz-with-glibc2.2.5
KiB Mem:    32627884 total,  25968356 free
KiB Swap:  131071996 total, 131071996 free
Timestamp of repository gentoo: Sat, 20 Feb 2021 16:30:01 +0000
Head commit of repository gentoo: 432c872d280240f814b1641cdb4f1560f54e4b46
sh bash 5.0_p18
ld GNU ld (Gentoo 2.35.1 p2) 2.35.1
ccache version 4.1 [disabled]
app-shells/bash:          5.0_p18::gentoo
dev-java/java-config:     2.3.1::gentoo
dev-lang/perl:            5.30.3::gentoo
dev-lang/python:          2.7.18-r6::gentoo, 3.8.7-r1::gentoo, 3.9.1-r1::gentoo
dev-util/ccache:          4.1::gentoo
dev-util/cmake:           3.18.5::gentoo
sys-apps/baselayout:      2.7::gentoo
sys-apps/openrc:          0.42.1-r1::gentoo
sys-apps/sandbox:         2.20::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r5::gentoo
sys-devel/automake:       1.13.4-r2::gentoo, 1.16.2-r1::gentoo
sys-devel/binutils:       2.35.1-r1::gentoo
sys-devel/gcc:            9.3.0-r2::gentoo
sys-devel/gcc-config:     2.3.2-r1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.3::gentoo
sys-kernel/linux-headers: 5.4-r1::gentoo (virtual/os-headers)
sys-libs/glibc:           2.32-r6::gentoo

    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://
    priority: -1000
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-max-age: 24
    sync-rsync-verify-metamanifest: yes

    location: /var/lib/layman/steam-overlay
    masters: gentoo
    priority: 50

CFLAGS="-march=native -O2 -pipe"
CONFIG_PROTECT="/etc /etc/stunnel/stunnel.conf /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/themes/oxygen-gtk/gtk-2.0 /usr/share/themes/oxygen-gtk/gtk-3.0"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS=" rsync://"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
USE="X a52 aac acl acpi activities alsa amd64 archive arts async bash-completion berkdb branding bzip2 cairo cardbus cli crypt cryptsetup css cups curlwrappers dbus declarative dell dga dhcp directfb djbfft dri dts dvb dvd dvdr elogind emboss encode exif fat fbcon fbsplash fftw flac foomaticdb fortran fpx ftp gdbm gif glibc-omitfp gnutls gphoto2 gpm gs gtk gui gzip hostonly hpn hybrid-auth iconv icq icu idea ieee1394 imagemagick imap ios ipod iproute2 ipv6 irda jabber javascript john jpeg jpeg2k jumbo-build kde kipi kpathsea kwallet lapack laptop lcdfilter libglvnd libnotify libtirpc lilo lm_sensors mad mbox messages mime mmap mmx mmxext mng mozilla mp3 mp4 mpeg mplayer msn multilib mysqli nagios-dns ncurses nls nocd nptl nsplugin ntfs ntlm nvidia octave ogg opengl optimization optimized-qmake oscar pam pango pcapnav pcmcia pcntl pcre pda pdf perl phonon plasma pm-utils png policykit posix ppds pulseaudio python qml qt5 radius rdesktop readline replytolist samba scanner seccomp semantic-desktop sharedmem silc slp sms sockets sox spell split-usr sse sse2 sse3 sse4 sse4_1 sse4_2 sse4a ssl ssse3 startup-notification svg sysfs sysvipc tcpd threads tiff truetype udev udisks uefi unicode upower usb userlocales v4l v4l2 vcd vorbis widgets wifi winbind wps wxwidgets x264 xcb xcomposite xml xmlrpc xv xvid xvmc yahoo zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="hda-intel" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="evdev synaptics libinput v4l" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_8" PYTHON_TARGETS="python3_8" RUBY_TARGETS="ruby26" USERLAND="GNU" VIDEO_CARDS="i965 intel iris nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Comment 1 Ionen Wolkens gentoo-dev 2021-02-20 20:08:18 UTC
If I were to guess it's probably related to 460's reworked power management which is a major change since 450.xx, not that I use bumblebee so I can't help with that.
Comment 2 Stefano 2021-02-21 09:34:04 UTC
(In reply to Ionen Wolkens from comment #1)
> If I were to guess it's probably related to 460's reworked power management
> which is a major change since 450.xx, not that I use bumblebee so I can't
> help with that.

I have the same feeling but no proof. In case it helps, since the new driver comes with systemd stuff, I'll point out I'm not on systemd.
Comment 3 Pacho Ramos gentoo-dev 2021-02-23 15:19:40 UTC
Does bumblebee-3.2.1_p20190421-r1 still work for you with that setup?
Comment 4 Pacho Ramos gentoo-dev 2021-02-23 15:22:07 UTC
I am also unsure if this change could be related
Comment 5 Stefano 2021-02-26 15:44:23 UTC
So, I tried downgrading bumblebee and upgrading nvidia-drivers, and it seems to work. Then I tried upgrading bumblebee again... and the very same configuration that didn't work at all now works. I'll try to figure out what butterfly flapped their wings somewhere.
Comment 6 Stefano 2021-02-26 16:09:46 UTC
OK, the relationship of this bug with the power management is getting more and more concrete. The factor that seems to change between everything working and viceversa is booting the machine with, or without, the AC adapter. Now I can dive in the mess of ACPI, bumblebee and laptop-mode for hours.
Comment 7 Stefano 2021-04-17 08:13:07 UTC
This was fixed somehow with the following versions:
Bumblebee 3.2.1_p20210112-r4
Nvidia-drivers 460.56

The only required trick: not starting the vgl service (i.e. commenting out the want vgl in the init script, if using openrc).

I would suggest removing that dependency, it really does not seem necessary?
Comment 8 Ionen Wolkens gentoo-dev 2021-04-17 08:26:04 UTC
(In reply to Stefano from comment #7)
> This was fixed somehow with the following versions:
> Bumblebee 3.2.1_p20210112-r4
> Nvidia-drivers 460.56
If you could, can you check if all good with =nvidia-drivers-460.67 as well? 460.67 ebuild changed many things but I'm not familiar with bumblebee setups to test, and then it'll be stable in roughly ~3 days if no issues.

Knowing if no regression with 465.19.01 would be useful too, it did change a few things wrt acpi handling (not that I'm planning to stable 465 branch anytime soon).
Comment 9 Stefano 2021-04-17 09:41:44 UTC
Disregard my previous optimistic assumption. This setup literally works or stops working across reboots.

I'll work on it some more.
Comment 10 Stefano 2021-04-18 13:00:10 UTC
(In reply to Ionen Wolkens from comment #8)

> If you could, can you check if all good with =nvidia-drivers-460.67 as well?
> 460.67 ebuild changed many things but I'm not familiar with bumblebee setups
> to test, and then it'll be stable in roughly ~3 days if no issues.

It works (with the same random quirks I'm still trying to understand) just as well as .56