Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 919411 - screen freeze when switching X display
Summary: screen freeze when switching X display
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Ionen Wolkens
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-07 16:59 UTC by LABBE Corentin
Modified: 2024-01-20 10:55 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description LABBE Corentin 2023-12-07 16:59:34 UTC
I have a PC with multiple X running for each user.
Example :0 for me, :1 for my son, etc...

Since at least one year, when switching from one screen to another (via ctrl alt F7-9) I have some screen freeze.

By screen freeze I mean I saw the destination X screen, but nothing move in the screen, waiting for some time (1 minute) everything goes back to work.
I see X process beiing CPU hungry during that time.
The bug have evolued along time, changing after each reboot (perhaps related to nvidia-drivers upgrade)
One period I got mouse move when everything was frozen.
Very often, firefox process were killed (but no message of it in dmesg)
no proof if it is related yet but I thing it is.

Since last upgrade one week ago, now I got black screen.

So today I started to debug more:

The Xorg.log was flooded by loop of:
[369557.964] (--) NVIDIA(GPU-0): 
[369558.008] (--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): connected
[369558.008] (--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): Internal TMDS
[369558.008] (--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): 600.0 MHz maximum pixel clock

I straced the X process which was 100% CPU hungry:
6260  rt_sigreturn({mask=[]})           = 0
6260  write(4, "[370013.853] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): connected\n", 60) = 60
6260  write(4, "[370013.853] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): Internal TMDS\n", 64) = 64
6260  write(4, "[370013.854] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): 600.0 MHz maximum pixel clock\n", 80) = 80
6260  write(4, "[370013.854] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): \n", 21) = 21
6260  ioctl(16, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0, 0x10), 0x7ffdea8ef7b0) = 0
6260  ioctl(16, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0, 0x10), 0x7ffdea8ef710) = 0
6260  --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
6260  rt_sigreturn({mask=[]})           = 0
6260  write(4, "[370013.897] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): connected\n", 60) = 60
6260  write(4, "[370013.897] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): Internal TMDS\n", 64) = 64
6260  write(4, "[370013.897] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): Idek Iiyama PLE2483H (DFP-1): 600.0 MHz maximum pixel clock\n", 80) = 80
6260  write(4, "[370013.898] ", 13)     = 13
6260  write(4, "(--) NVIDIA(GPU-0): \n", 21) = 21
6260  ioctl(16, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0, 0x10), 0x7ffdea8ef7b0) = 0
6260  ioctl(16, _IOC(_IOC_READ|_IOC_WRITE, 0x6d, 0, 0x10), 0x7ffdea8ef710) = 0
6260  --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---

Today it was harder than before, I needed to sysrq+r to get it working.

The dmesg related to nvidia:
[    3.953050] nvidia: loading out-of-tree module taints kernel.
[    3.953058] nvidia: module license 'NVIDIA' taints kernel.
[    3.974159] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[    3.974837] nvidia 0000:08:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    4.179822] usb 3-6: config 1 has an invalid interface number: 2 but max is 1
[    4.185485] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.129.03  Thu Oct 19 18:56:32 UTC 2023
[    4.193625] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  535.129.03  Thu Oct 19 18:42:12 UTC 2023
[    4.195925] [drm] [nvidia-drm] [GPU ID 0x00000800] Loading driver
[    4.195926] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:08:00.0 on minor 0
[    4.499084] usb 1-2: Manufacturer: American Power Conversion
[    4.546013] hid-generic 0003:051D:0002.0003: hiddev97,hidraw2: USB HID v1.00 Device [American Power Conversion Back-UPS RS 1200G FW:877.L4 .I USB FW:L4 ] on usb-0000:05:00.1-2/input0
[  497.139365] nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
[  497.143268] nvidia-uvm: Loaded the UVM driver, major device number 244.
[38974.470382] NVRM: GPU at PCI:0000:08:00: GPU-b9e88d57-e331-c450-a1ee-6ccdea77d250
[38974.470387] NVRM: Xid (PCI:0000:08:00): 31, pid=6864, name=Renderer, Ch 00000068, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043a0000. Fault is of type FAULT_PTE ACCESS_TYPE_READ
[104091.545775] NVRM: Xid (PCI:0000:08:00): 31, pid=20519, name=Renderer, Ch 00000040, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_04330000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
[122426.279518] NVRM: Xid (PCI:0000:08:00): 31, pid=17839, name=Renderer, Ch 00000068, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043a0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
[161392.072083] NVRM: Xid (PCI:0000:08:00): 31, pid=17839, name=Renderer, Ch 00000028, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043a0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
[214247.337189] NVRM: Xid (PCI:0000:08:00): 31, pid=26388, name=Renderer, Ch 00000048, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043a0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
[293718.758681] NVRM: Xid (PCI:0000:08:00): 31, pid=32088, name=Renderer, Ch 00000068, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043a0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE
[334424.210336] NVRM: Xid (PCI:0000:08:00): 31, pid=32088, name=Renderer, Ch 00000048, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_SCC faulted @ 0x0_043d0000. Fault is of type FAULT_PTE ACCESS_TYPE_WRITE

The NVRM trace happen in unrelated time of my problem.


Thanks for any advice

Reproducible: Always




Portage 3.0.56 (python 3.11.6-final-0, default/linux/amd64/17.1, gcc-12, glibc-2.37-r7, 6.1.42-dirty x86_64)
=================================================================
System uname: Linux-6.1.42-dirty-x86_64-AMD_Ryzen_5_3600_6-Core_Processor-with-glibc2.37
KiB Mem:    32797784 total,   2695784 free
KiB Swap:          0 total,         0 free
Timestamp of repository gentoo: Thu, 07 Dec 2023 16:30:01 +0000
Head commit of repository gentoo: 9178113334d4b29057955d1e5ec5a874ecdde6d3
sh bash 5.1_p16-r6
ld GNU ld (Gentoo 2.40 p7) 2.40.0
distcc 3.4 x86_64-pc-linux-gnu [disabled]
app-misc/pax-utils:        1.3.5::gentoo
app-shells/bash:           5.1_p16-r6::gentoo
dev-java/java-config:      2.3.1-r1::gentoo
dev-lang/perl:             5.38.0-r1::gentoo
dev-lang/python:           3.10.13::gentoo, 3.11.6::gentoo, 3.12.0_p1::gentoo
dev-lang/rust:             1.71.1::gentoo
dev-lang/rust-bin:         1.71.1::gentoo
dev-util/cmake:            3.27.7::gentoo
dev-util/meson:            1.2.3::gentoo
sys-apps/baselayout:       2.14::gentoo
sys-apps/openrc:           0.48::gentoo
sys-apps/sandbox:          2.38::gentoo
sys-devel/autoconf:        2.13-r7::gentoo, 2.69-r5::gentoo, 2.71-r6::gentoo
sys-devel/automake:        1.16.5-r1::gentoo
sys-devel/binutils:        2.40-r9::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/clang:           15.0.7-r3::gentoo, 16.0.6::gentoo
sys-devel/gcc:             10.5.0::gentoo, 11.4.1_p20230622::gentoo, 12.3.1_p20230825::gentoo, 13.2.1_p20230826::gentoo
sys-devel/gcc-config:      2.11::gentoo
sys-devel/libtool:         2.4.7-r1::gentoo
sys-devel/llvm:            15.0.7-r3::gentoo, 16.0.6::gentoo
sys-devel/make:            4.4.1-r1::gentoo
sys-kernel/linux-headers:  6.1::gentoo (virtual/os-headers)
sys-libs/glibc:            2.37-r7::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    volatile: True
    sync-rsync-verify-max-age: 3
    sync-rsync-verify-metamanifest: yes
    sync-rsync-extra-opts: 
    sync-rsync-verify-jobs: 1

montjoie
    location: /usr/local/portage
    masters: gentoo
    priority: 0
    volatile: True

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe -march=znver2"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind /var/spool/munin-async/.ssh"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe -march=znver2"
DISTDIR="/usr/portage/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="fr_FR.UTF-8"
LC_ALL="fr_FR.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
LINGUAS="fr"
MAKEOPTS="-j11"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="X a52 aac aalib acpi alsa amd64 audit bzip2 caps cdda cli crypt dbus device-mapper dri dvd egl encode exif gif gtk iconv ipv6 jpeg libcaca libkms libnotify libtirpc lm-sensors lto lzma mad mikmod mmx mmxext mod modplug mp3 mpeg mudflap multilib mysqli nautilus ncurses nls nptl ogg opengl openvg osmesa pam pcre pgo png pulseaudio readline real sdl seccomp sid spell split-usr sse sse2 sse3 ssl ssse3 svg test-rust threads tiff unicode v4l v4l2 vaapi vdpau verify-sig vorbis wmf x264 x265 xattr xcb xext xulrunner xv xvid xvmc zlib" ABI_X86="64" ADA_TARGET="gnat_2021" APACHE2_MODULES="access_compat alias auth_basic authn_core authn_file authz_core authz_host authz_user cgi cgid dav dav_fs dav_lock deflate dir env expires filter headers include log_config mime proxy proxy_http rewrite socache_shmcb status unixd version" CALLIGRA_FEATURES="karbon sheets words" CAMERAS="canon" COLLECTD_PLUGINS="apache cpu cpufreq df disk hddtemp interface iptables irq load memory network ntpd processes rrdtool sensors syslog uptime users" CPU_FLAGS_X86="mmx sse mmxext sse2 mmxext sse3 ssse3 fma sse4_1 sse4_2 aes avx f16c sse4a avx2 popcnt sha" CURL_SSL="openssl" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc ieee-1275 efi-64" INPUT_DEVICES="evdev keyboard mouse" KERNEL="linux" L10N="fr" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LLVM_TARGETS="AArch64 ARM BPF NVPTX" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-1" POSTGRES_TARGETS="postgres15" PYTHON_SINGLE_TARGET="python3_11" PYTHON_TARGETS="python3_11" QEMU_SOFTMMU_TARGETS="x86_64 i386 arm armeb aarch64 ppc ppc64 sparc sparc64 m68k mips64 mips mips64el mipsel s390x alpha xtensa microblaze nios2 or1k riscv32 riscv64 microblazeel sh4 cris hppa cris nios2" QEMU_USER_TARGETS="x86_64 i386 arm aarch64 ppc ppc64 sparc sparc64 sparc32plus" RUBY_TARGETS="ruby31" SANE_BACKENDS="snapscan canon" VIDEO_CARDS="nv vesa nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LD, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 Matt Turner gentoo-dev 2023-12-07 21:41:24 UTC
Looks like an nvidia driver issue.
Comment 2 Ionen Wolkens gentoo-dev 2023-12-07 22:26:46 UTC
(In reply to Matt Turner from comment #1)
> Looks like an nvidia driver issue.
And unfortunately nothing can really do about it downstream. It's mostly closed source, and I doubt it's a packaging issue (sounds possibly hardware-specific).

fwiw there's several supported branches/version and you could try to see if another version helps. Notably could try ~testing's =nvidia-drivers-535.146.02 that just released today (has several bug fixes, due to be stabilized in bit over a week or so). There's also the older (despite higher version number, different branch) 545.29.06 that could help albeit it may be lacking fixes from today's release. Bare that, the older 525 branch is still supported and you could try that as well.

Do make sure you're always using the right kernel that you've built the modules against, mismatching modules does result in a black screen either way.
Comment 3 LABBE Corentin 2023-12-14 06:44:04 UTC
Upgrading to x11-drivers/nvidia-drivers-535.146.02:0/535 didnt fix the issue.

It seems that the problem is faster to recover when no firefox process are running.
Comment 4 LABBE Corentin 2024-01-20 10:55:50 UTC
Upgraded to x11-drivers/nvidia-drivers-545.29.06-r1:0/545 no change.
Killing all firefox reduce the problem, but it still present.

Note that in driver sources in html/knownissues.html I saw a chapter about Console restore behavior.
So I tried to boot with nvidia_drm.modeset=1 and for the moment is seems to fix the issue.