Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 739490

Summary: x11-drivers/nvidia-drivers-450.66 hangs system
Product: Gentoo Linux Reporter: Alex Efros <powerman-asdf>
Component: Current packagesAssignee: David Seifert <soap>
Status: RESOLVED OBSOLETE    
Severity: normal CC: harrisl, ionen, stig
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: nvidia-drivers-450.66.build.log.bz2
emerge nvidia-drivers-455.38 build log against gentoo-sources 5.4.72

Description Alex Efros 2020-08-29 11:40:17 UTC
After updating from x11-drivers/nvidia-drivers 450.57-r1 to 450.66 system hangs on boot when started Xorg (doesn't respond even on reset button, only 5-sec power off button works). I had to downgrade to 440.100-r2 because 450.57-r1 was already removed from portage.

The Xorg.0.log contents for 450.66 and 440.100-r2 is mostly the same, except 450.66 output just stops at some point:

-----
--- Xorg-450.66.log	2020-08-29 14:37:42.210895666 +0300
+++ Xorg-440.100.log	2020-08-29 14:37:29.786895293 +0300
@@ -12,7 +12,7 @@
  Markers: (--) probed, (**) from config file, (==) default setting,
 	(++) from command line, (!!) notice, (II) informational,
 	(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
- (==) Log file: "/var/log/Xorg.0.log", Time: Sat Aug 29 12:50:47 2020
+ (==) Log file: "/var/log/Xorg.0.log", Time: Sat Aug 29 12:56:05 2020
  (==) Using config directory: "/etc/X11/xorg.conf.d"
  (==) Using system config directory "/usr/share/X11/xorg.conf.d"
  (==) No Layout section.  Using the first Screen section.
@@ -37,7 +37,7 @@
  (==) ModulePath set to "/usr/lib64/xorg/modules"
  (II) The server relies on udev to provide the list of input devices.
 	If no devices become available, reconfigure udev or disable AutoAddDevices.
- (II) Loader magic: 0x560ebdf5cce0
+ (II) Loader magic: 0x5591756e9ce0
  (II) Module ABI versions:
  	X.Org ANSI C Emulation: 0.4
  	X.Org Video Driver: 24.1
@@ -59,7 +59,7 @@
  (II) Module nvidia: vendor="NVIDIA Corporation"
  	compiled for 1.6.99.901, module version = 1.0.0
  	Module class: X.Org Video Driver
- (II) NVIDIA dlloader X Driver  450.66  Wed Aug 12 19:44:12 UTC 2020
+ (II) NVIDIA dlloader X Driver  440.100  Fri May 29 08:21:27 UTC 2020
  (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
  (II) Loading sub module "fb"
  (II) LoadModule: "fb"
@@ -88,7 +88,7 @@
  (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
  	compiled for 1.6.99.901, module version = 1.0.0
  	Module class: X.Org Server Extension
- (II) NVIDIA GLX Module  450.66  Wed Aug 12 19:41:37 UTC 2020
+ (II) NVIDIA GLX Module  440.100  Fri May 29 08:19:01 UTC 2020
  (II) NVIDIA: The X server supports PRIME Render Offload.
  (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:1:0:0
  (--) NVIDIA(0):     DFP-0 (boot)
@@ -127,3 +127,183 @@
  (II) NVIDIA: Using 24576.00 MB of virtual memory for indirect memory
  (II) NVIDIA:     access.
  (II) NVIDIA(0): Setting mode "DFP-0:1680x1050"
+ (==) NVIDIA(0): Disabling shared memory pixmaps
+ (==) NVIDIA(0): Backing store enabled
+ (==) NVIDIA(0): Silken mouse enabled
+ (==) NVIDIA(0): DPMS enabled
+ (II) Loading sub module "dri2"
+ (II) LoadModule: "dri2"
+ (II) Module "dri2" already built-in
+ (II) NVIDIA(0): [DRI2] Setup complete
+ (II) NVIDIA(0): [DRI2]   VDPAU driver: nvidia
.....
-----


My video card is GeForce GTX 1060 3GB.


Portage 2.3.103 (python 3.7.8-final-0, default/linux/amd64/17.1/hardened, gcc-9.3.0, glibc-2.31-r6, 5.4.60-gentoo x86_64)
=================================================================
System uname: Linux-5.4.60-gentoo-x86_64-Intel-R-_Core-TM-_i7-2600K_CPU_@_3.40GHz-with-gentoo-2.6
KiB Mem:    24585736 total,  14460896 free
KiB Swap:    8388604 total,   8388604 free
Timestamp of repository gentoo: Sat, 29 Aug 2020 11:00:01 +0000
Head commit of repository gentoo: 0df649292b6972fd4a177f8429ba103e6d28a241
sh bash 5.0_p18
ld GNU ld (Gentoo 2.33.1 p2) 2.33.1
ccache version 3.7.10 [enabled]
app-shells/bash:          5.0_p18::gentoo
dev-java/java-config:     2.3.1::gentoo
dev-lang/perl:            5.30.3::gentoo
dev-lang/python:          2.7.18-r1::gentoo, 3.7.8-r2::gentoo, 3.8.5::gentoo
dev-util/ccache:          3.7.10::gentoo
dev-util/cmake:           3.16.5::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/sandbox:         2.18::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.13.4-r2::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.33.1-r1::gentoo
sys-devel/gcc:            9.3.0-r1::gentoo
sys-devel/gcc-config:     2.3.1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.4-r1::gentoo (virtual/os-headers)
sys-libs/glibc:           2.31-r6::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    sync-rsync-verify-max-age: 24
    sync-rsync-extra-opts: 
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-jobs: 1

local
    location: /usr/local/portage
    masters: gentoo
    priority: 0

powerman
    location: /home/powerman/proj/gentoo/powerman-overlay
    masters: gentoo
    priority: 50

steam-overlay
    location: /var/lib/layman/steam-overlay
    sync-type: laymansync
    sync-uri: https://github.com/anyc/steam-overlay.git
    masters: gentoo
    priority: 50

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /service /usr/inferno/keydb /usr/inferno/lib /usr/inferno/services /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/easy-rsa /usr/share/gnupg/qualified.txt /usr/share/i2p/scripts /var/log"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/usr/portage-distfiles"
EMERGE_DEFAULT_OPTS="--with-bdeps=y --autounmask --autounmask-write --alert=y"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs ccache clean-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict strict-keepdir unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="http://mirrors.soeasyto.com/distfiles.gentoo.org/ http://gentoo.supp.name/ http://ftp.snt.utwente.nl/pub/os/linux/gentoo http://mirror.netcologne.de/gentoo/"
LANG="ru_RU.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en ru ru_RU"
MAKEOPTS="-j8"
PKGDIR="/usr/portage-packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X a52 aac acl acpi aes alac alsa amd64 avx bash-completion bluetooth branding bzip2 cairo caps cdda cddb cdr chm cli crypt cups dbus dga djvu dri dts dvb dvd dvdr egl eglfs elogind emboss encode exif fam ffmpeg flac fontconfig gallium gdbm gif gpg gtk hardened iconv icu id3tag idn ipv6 jpeg jpeg2k lcms libglvnd libnotify libtirpc mac mad matroska mmx mmxext mng mp3 mp4 mpeg mtp multilib musepack ncurses network-cron nls nptl nsplugin ogg opengl openmp opus pam pango pclmul pcre pdf perl pie png policykit popcnt ppds projectm qt5 readline rtc sdl seccomp spell split-usr sse sse2 sse3 sse4_1 sse4_2 ssl ssp ssse3 startup-notification svg tcpd theora tiff truetype udev udisks unicode upower usb vaapi vdpau vim-syntax vorbis wavpack wxwidgets x264 x265 xattr xcb xml xscreensaver xtpax xv xvid xvmc zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="log_config vhost_alias autoindex alias rewrite dir deflate filter mime negotiation auth_basic authn_file authz_host authz_user authz_groupfile cgi actions headers env setenvif authn_core authz_core unixd socache_shmcb access_compat" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64 pc" INPUT_DEVICES="evdev" KERNEL="linux" L10N="en ru" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NGINX_MODULES_HTTP="access auth_basic autoindex browser charset empty_gif fastcgi geo gzip limit_conn limit_req map memcached proxy referer rewrite scgi split_clients ssi upstream_ip_hash userid uwsgi addition fancyindex" OFFICE_IMPLEMENTATION="libreoffice" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_7" PYTHON_TARGETS="python2_7 python3_7" QEMU_SOFTMMU_TARGETS="x86_64 i386" QEMU_USER_TARGETS="x86_64 i386" RUBY_TARGETS="ruby25" USERLAND="GNU" VIDEO_CARDS="nvidia nouveau" XFCE_PLUGINS="clock trash" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Jeroen Roovers (RETIRED) gentoo-dev 2020-08-29 13:16:40 UTC
Observing that the entire system apparently stops responding, it should come as no surprise that the output to the Xorg log stops as well. Maybe the kernel panicked? Anything useful in dmesg?
Comment 2 Alex Efros 2020-08-29 14:59:11 UTC
(In reply to Jeroen Roovers from comment #1)
> Observing that the entire system apparently stops responding, it should come
> as no surprise that the output to the Xorg log stops as well. Maybe the
> kernel panicked? Anything useful in dmesg?

Nope. Kernel log is just interrupted without any error/panic at the end, both versions of nvidia-drivers write nearly the same in the kernel log.
Comment 3 Jeroen Roovers (RETIRED) gentoo-dev 2020-08-30 08:24:27 UTC
Please attach the build log for x11-drivers/nvidia-drivers-450.66.
Comment 4 Alex Efros 2020-08-30 08:47:37 UTC
Created attachment 657458 [details]
nvidia-drivers-450.66.build.log.bz2
Comment 5 Jeroen Roovers (RETIRED) gentoo-dev 2020-08-30 09:05:50 UTC
(In reply to Alex Efros from comment #4)
> Created attachment 657458 [details]
> nvidia-drivers-450.66.build.log.bz2

Thanks.
Comment 6 Paweł Metelski 2020-09-06 22:54:11 UTC
Similar results for me, keyboard and mouse is dead, network interface is dead (can't ssh into the box), both on nvidia-drivers 450.66 and 440.100-r2, only on gentoo-sources 5.4.60. At 5.4.48 with the exact same config file both driver versions run okay, good performance in games etc. However, the kernel is not panicking, I can see my logon screen with the clock still incresing the seconds counter.

I managed to get some dmesg output through syslog:
Sep  7 00:10:40 box kernel: [   26.071627] nvidia: module license 'NVIDIA' taints kernel.
Sep  7 00:10:40 box kernel: [   26.071629] Disabling lock debugging due to kernel taint
Sep  7 00:10:40 box kernel: [   26.086162] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
Sep  7 00:10:40 box kernel: [   26.086611] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Sep  7 00:10:40 box kernel: [   26.285961] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
Sep  7 00:10:40 box kernel: [   26.366579] EXT4-fs (dm-0): re-mounted. Opts: (null)
Sep  7 00:10:40 box kernel: [   26.502723] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]
Sep  7 00:10:40 box kernel: [   26.502819] caller _nv000908rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
Sep  7 00:10:40 box kernel: [   26.980927] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  440.100  Fri May 29 08:14:04 UTC 2020
Sep  7 00:10:40 box kernel: [   27.006523] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Sep  7 00:10:40 box kernel: [   27.006525] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Sep  7 00:10:40 box kernel: [   27.174497] nvidia-smi (4147) used greatest stack depth: 12560 bytes left
Sep  7 00:10:40 box kernel: [   31.362410] ip (5358) used greatest stack depth: 12192 bytes left
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

Now, if I'm not mistaken, the current ebuild only supports gentoo-sources <5.4 so we should probably just wait for new proprietary driver or stay with 4.19.141. Still, it's interesting to know what is wrong.
Comment 7 Harris Landgarten 2020-09-20 17:31:25 UTC
same here but ssh still working. kernel-5.8.10. X showing 100% cpu and stack trace of nvidia-driver crash in logs. Only recovery is cold boot.

system starts gnome and runs for a while but then locks up with no mouse, no keyboard and will not reboot with systemctl reboot from ssh.
Comment 8 Paweł Metelski 2020-09-22 03:14:49 UTC
(In reply to Harris Landgarten from comment #7)
> same here but ssh still working.
Perhaps you can access the accurate log files then? I'd say dmesg and Xorg. I'm afraid that my system goes down before it can write these on disk. I have a serial port too but I'm not as desperate to use it for this investigation.
Also, please vote on this bug to get attention.
Comment 9 Harris Landgarten 2020-09-22 03:20:25 UTC
I am having this issue with 455.23.04 This is the kernel oops it causes:

NVRM: GPU at PCI:0000:04:00: GPU-11502392-bb1d-6042-b964-805668887312
Sep 20 10:19:38 harrisl-desktop.landgarten.local kernel: NVRM: Xid (PCI:0000:04:00): 31, pid=8208, Ch 00000068, intr 10000000. MMU Fault: ENGINE MSPDEC HUBCLIENT_MSPDEC faulted >
Sep 20 10:19:38 harrisl-desktop.landgarten.local kernel: BUG: kernel NULL pointer dereference, address: 00000000000003a8
Sep 20 10:19:38 harrisl-desktop.landgarten.local kernel: #PF: supervisor read access in kernel mode
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: #PF: error_code(0x0000) - not-present page
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: PGD 0 P4D 0 
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: Oops: 0000 [#1] SMP PTI
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: CPU: 7 PID: 631 Comm: irq/50-nvidia Tainted: P          IO    T 5.8.10-gentoo #1
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: Hardware name:  /DX58SO2, BIOS SOX5820J.86A.0920.2013.0729.0042 07/29/2013
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: RIP: 0010:_nv018304rm+0x0/0x20 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: Code: 48 89 ca 44 8b 44 24 10 48 8d 4d 0c 48 8b 87 f8 03 00 00 e8 e2 b6 46 e1 48 83 c4 08 48 83 c5 10 c3 66 0f 1f 84 00 >
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: RSP: 0018:ffffc90000b43bd0 EFLAGS: 00010246
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: RAX: ffffffffa09955f0 RBX: ffff8885f36c8008 RCX: 0000000000000000
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: RDX: ffff8885f3a02bb8 RSI: ffff8885fcb08008 RDI: ffff8885f36c8008
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: RBP: ffff8885f3a02b50 R08: 0000000000000000 R09: 00000000718b3d00
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: R10: 0000000000000001 R11: ffffffffffffffff R12: ffff8885fcb08008
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: R13: 0000000000000000 R14: 00000000718b3d00 R15: ffff8885f39a0808
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: FS:  0000000000000000(0000) GS:ffff888617bc0000(0000) knlGS:0000000000000000
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: CR2: 00000000000003a8 CR3: 00000004316fa001 CR4: 00000000000206e0
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel: Call Trace:
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv029707rm+0x207/0x860 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv035039rm+0x296/0x530 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv034992rm+0x6ea/0xf20 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv034993rm+0xd52/0xd90 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv018254rm+0x219/0x3e0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv018315rm+0x46a/0x6b0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv018072rm+0x1a2/0x1d0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv026016rm+0x10/0x10 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv018321rm+0x1f2/0x2d0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv026016rm+0x10/0x10 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv018354rm+0xac/0xe0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv027674rm+0x820/0xdc0 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv007560rm+0x155/0x270 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv027682rm+0x8d/0x180 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? _nv000711rm+0xa9/0x200 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? disable_irq_nosync+0x10/0x10
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? rm_isr_bh+0x1c/0x60 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? nvidia_isr_kthread_bh+0x1b/0x40 [nvidia]
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? irq_thread_fn+0x20/0x60
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? irq_thread+0xdb/0x180
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? irq_thread_check_affinity+0x80/0x80
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? irq_forced_thread_fn+0x80/0x80
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? kthread+0x11b/0x140
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Sep 20 10:19:39 harrisl-desktop.landgarten.local kernel:  ? ret_from_fork+0x22/0x30
Comment 10 Paweł Metelski 2020-11-07 10:54:39 UTC
Similar results for kernels 5.4.66 and 5.4.72. Meanwhile I'm using nvidia 455.38 with 5.4.48 just fine. Some random info: I don't use systemd, I use multiple monitors, I use genkernel to build the image.

Attaching build log for nvidia-drivers 455.38 against gentoo-sources 5.4.72, maybe some of the warnings helps.
 ‘__builtin_strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
 ‘GTimeVal’ is deprecated
 ‘GTypeDebugFlags’ is deprecated [-Wdeprecated-declarations]
 this statement may fall through [-Wimplicit-fallthrough=]
 #warning "Update libvdpau to version x.x" [-Wcpp]

I also have a virtually identical hardware on another box and plan the updates above 5.4.38 there, will report if the results are different. I'll also try downgrading libvdpau to 1.2 (currently using 1.3) as the last warning seems curious.
Comment 11 Paweł Metelski 2020-11-07 10:56:15 UTC
Created attachment 670307 [details]
emerge nvidia-drivers-455.38 build log against gentoo-sources 5.4.72
Comment 12 Paweł Metelski 2020-11-07 11:22:44 UTC
libvdpau-1.2 doesn't change a thing.

I ran X with default config, managed to reboot with SysRq this way and save the logs. Not much interesting stuff there:

kernel.log:
Nov  7 12:08:21 hal2 kernel: [  168.187150] resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window]
Nov  7 12:08:21 hal2 kernel: [  168.187267] caller _nv000709rm+0x1af/0x200 [nvidia] mapping multiple BARs

Xorg.0.log:
[   167.545] (WW) Warning, couldn't open module glxservernvidia
[   167.545] (EE) Failed to load module "glxservernvidia" (module does not exist, 0)
[   168.590] (II) NVIDIA(0): ACPI: failed to connect to the ACPI event daemon; the daemon
[   168.590] (II) NVIDIA(0):     may not be running or the "AcpidSocketPath" X
[   168.590] (II) NVIDIA(0):     configuration option may not be set correctly.  When the
[   168.590] (II) NVIDIA(0):     ACPI event daemon is available, the NVIDIA X driver will
[   168.590] (II) NVIDIA(0):     try to use it to receive ACPI event notifications.  For
[   168.590] (II) NVIDIA(0):     details, please see the "ConnectToAcpid" and
[   168.590] (II) NVIDIA(0):     "AcpidSocketPath" X configuration options in Appendix B: X
[   168.590] (II) NVIDIA(0):     Config Options in the README.
Comment 13 Phyo Arkar Lwin 2020-11-07 18:34:57 UTC
Same problem here .
Tested drivers :
450.57-450.66
Tested Kernels
5.4.51 - 5.7.10
Frozen as soon as X Started
No Kernel Panic
No Error on Xorg logs.
Comment 14 Phyo Arkar Lwin 2020-11-07 18:41:33 UTC
The same problem reported in Ubuntu too.
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-450/+bug/1894454

it is confirmed there.
Comment 15 Paweł Metelski 2020-11-08 01:44:52 UTC
My issue is now resolved, 5.4.72 can work smoothly with nvidia-drivers-455.38. Since nobody helped me since September, I'm not sharing the solution, you can throw away the whole ticket for all I care.
Comment 16 Stig Nielsen 2020-11-08 09:39:30 UTC
@Pawel I am happy you solved your problem. I am making a paper about open source and the entitlement people can feel to get their problems solved even if they havent paid or otherwise contributed to the effort other people have to make to solve this particular problem. Will you allow that I use you as a case?
Comment 17 Phyo Arkar Lwin 2020-11-09 11:26:53 UTC
(In reply to Paweł Metelski from comment #15)
> My issue is now resolved, 5.4.72 can work smoothly with
> nvidia-drivers-455.38. Since nobody helped me since September, I'm not
> sharing the solution, you can throw away the whole ticket for all I care.

Wow , Just wow.
Comment 18 Paweł Metelski 2020-11-09 14:38:13 UTC
I don't understand what purpose these passive-aggressive comments serve. I am a professional software developer and my time costs money. As there is serious lack of at least bug reviewers, leave alone responsible package maintainers or subject matter experts assigned, I feel entitled to complain on the bugzilla service, after all tens of sponsors already paid for it. I considered a donation to Gentoo project after resolving this issue for me but now I guess I will have to spend it on myself as a consolation for having to look for another, more actively maintained distro after 15 years with Gentoo. This bug has not even been confirmed after 2 months, this SLA is simply unacceptable for core system components maintenance. Kernel 5.4.48 is already out of stable portage channel, what would I do if I had to reinstall it?

This said, I guess I can share a hint that the crash only occurs after a few seconds of lxdm/lightdm/gdm running, so there is just enough time to press Alt+SysRq+R and then Ctrl+Alt+F1 to access current Xorg and dmesg logs. Also I suppose I can share that running rc-config show --all may help the impacted users with finding out the solution but I'm done with further log collecting, describing my observations etc.
Comment 19 Stig Nielsen 2020-11-09 14:55:35 UTC
@Pawel, let me just understand this. Your time is precious and costs money. You haven't paid anything to this project. Still, you expect anyone else then you to solve this bug, for free, in their own spare time? You do know how open source projects work right?

If I were you, I would write something along: Dear all, since September I have worked with this bug (feels like I did it alone) and I finally found the problem. The problem was ... The solution is ... Regards Pawel. 

I guarantee you that will give you a much better response next time you need help and you get that superior feeling of having solved something others didn't know how to. 

Anyway, I wish you all the best. 

Regards 
Stig Nielsen
Comment 20 Ionen Wolkens gentoo-dev 2021-03-02 23:04:27 UTC
I feel multiple different issues been reported in this bug.

Some may be related to the null pointer deref issue (supposedly fixed in 460.56, perhaps in .39 too), or page alloc issues (fixed since gentoo added a patch to 455.23.04, later fixed by nvidia too), making it hard to tell.

The linked ubuntu one is different though, is that still relevant? (I don't have a quadro card to tell). Not that we typically can do much about these kind of issues here until nvidia does something about it beside keep older drivers for a bit longer, albeit security vulnerabilities unfortunately prompted a cleanup.
Comment 21 Alex Efros 2021-03-03 00:04:37 UTC
I don't remember when I had such hang last time, probably 2-3 months ago. Right now I'm using 460.39-r1, so far so good.
Comment 22 Ionen Wolkens gentoo-dev 2021-03-03 00:14:27 UTC
(In reply to Alex Efros from comment #21)
> I don't remember when I had such hang last time, probably 2-3 months ago.
> Right now I'm using 460.39-r1, so far so good.
Thanks for reporting, and good to hear (hope working out for everyone else too).

If it happens again, feel free to re-open.