Summary: | x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1 is not compatible with x11-base/xorg-server-21.1.2-r2 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | email200202 |
Component: | Current packages | Assignee: | Gentoo X packagers <x11> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | alexander, chiitoo, grivital, joakim.tjernlund, joost.ruis, jstein, Klaus+gentoo, m1027, sam |
Priority: | Normal | Keywords: | PATCH |
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: | https://bugs.gentoo.org/show_bug.cgi?id=860267 | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
Xorg.0.log
Xorg.log with debug symbol Quick-fix xf86-video-qxl to work with xorg-server-21.1.3 |
Description
email200202
2021-12-21 08:26:35 UTC
Created attachment 759954 [details]
Xorg.0.log
Please follow https://wiki.gentoo.org/wiki/Debugging to get a better log. Created attachment 764623 [details]
Xorg.log with debug symbol
This is just a quick note as I am not 100% sure yet whether my case relates to
this, but currently it seems so.
My setup here:
- xorg-server-21.1.3
- xorg-drivers-21.1
- all inside qemu-6.2.0, with spice
The issue:
When connecting to the kvm over remmina with spice I get a black screen only.
Inside the KVM, I have no xorg-server crash like the OP but gdm tries to run
xorg-server, which in turn fails with "no suitable screen found" (for qxl).
All this had worked 3 weeks ago, before applying package upgrades.
Workaround 1:
Launching qemu with -vga std worked, connection via VNC but no spice then.
Workaround 2 (inside KVM):
- Downgrade to xorg-server-1.20.14 (forced with --nodeps)
- Downgrade to xorg-drivers-1.20-r2
- Recompiling xf86-input-evdev (forced with --nodeps)
- Recompiling xf86-input-libinput
Connecting to the KVM via spice works then. Well, at least kind of. Not sure
whether I had this before:
> kernel: qxl 0000:00:02.0: object_init failed for (7442432, 0x00000001)
> kernel: [drm:qxl_alloc_bo_reserved] *ERROR* failed to allocate VRAM BO
Hm...
I have a similar problem: xorg-xserver crashes with core-dump when it runs in a KVM/Qemu VM The problem exists with x11-base/xorg-server-1.20.14-r1 and with 21.1.3-r1 (always with corresponding x11-drivers). I have followed the debug instructions including for xorg-server, xf86-video-qxl, and glibc. Xorg.0.log does not differ from the logs already attached to this case. gdm tells the following: ~ # gdb /usr/bin/X /var/lib/systemd/coredump/core.X.0.81a05b5c223041c0ba7d20797fe326e2.2204.1647187335000000 GNU gdb (Gentoo 11.2 vanilla) 11.2 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://bugs.gentoo.org/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/bin/X... Reading symbols from /usr/lib/debug//usr/bin/Xorg.debug... warning: core file may not match specified executable file. [New LWP 2204] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/bin/X -nolisten tcp -auth /var/run/sddm/{55f123a4-1564-4571-ae9f-d0fa6f263'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 return ret; (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007f1b013a9546 in __GI_abort () at abort.c:79 #2 0x000056186105c9da in OsAbort () at ../xorg-server-21.1.3/os/utils.c:1353 #3 0x00005618610620b3 in AbortServer () at ../xorg-server-21.1.3/os/log.c:879 #4 0x0000561861063086 in FatalError ( f=f@entry=0x5618610e2df0 "Caught signal %d (%s). Server aborting\n") at ../xorg-server-21.1.3/os/log.c:1017 #5 0x0000561861059ee9 in OsSigHandler (unused=<optimized out>, sip=0x7ffca222f430, signo=11) at ../xorg-server-21.1.3/os/osinit.c:156 #6 OsSigHandler (signo=11, sip=0x7ffca222f430, unused=<optimized out>) at ../xorg-server-21.1.3/os/osinit.c:110 #7 <signal handler called> #8 xf86InitViewport (pScr=0x5618626fcdf0) at ../xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c:104 #9 0x00005618610771e4 in InitOutput (pScreenInfo=pScreenInfo@entry=0x56186117a8e0 <screenInfo>, argc=argc@entry=13, argv=argv@entry=0x7ffca222fab8) at ../xorg-server-21.1.3/hw/xfree86/common/xf86Init.c:518 #10 0x0000561860f92774 in dix_main (argc=13, argv=0x7ffca222fab8, envp=<optimized out>) at ../xorg-server-21.1.3/dix/main.c:190 #11 0x00007f1b013aa7fd in __libc_start_main (main=0x561860f56a80 <main>, argc=13, argv=0x7ffca222fab8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca222faa8) at ../csu/libc-start.c:332 #12 0x0000561860f56aba in _start () (gdb) list 44 45 int ret = INLINE_SYSCALL_CALL (tgkill, pid, tid, sig); 46 47 __libc_signal_restore_set (&set); 48 49 return ret; 50 } 51 libc_hidden_def (raise) 52 weak_alias (raise, gsignal) (gdb) Since a Fedora 35 installation runs well in a similar KVM/Qemu VM i have even compiled a kernel with the Fedora kernel config. X still crashes. Fedora runs with X.Org X Server 1.20.8. Since my Gentoo VM worked in the past quite well i believe the issue had been introduced between X.Org X Server 1.20.8 and 1.20.14. I found that x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1 does not work in (now current) x11-base/xorg-server-21.1.3-r1. I traced the issue down a bit: https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/blob/xf86-video-qxl-0.1.5/src/qxl_driver.c#L239 This pci_device_map_range() fails with EINVAL, which apparently means that an exact duplicate mapping already exists: https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/master/src/common_interface.c#L297 I have no idea where that mapping might come from; this occurs both with autodetection (which also loads modesetting_drv.so, fbdev_drv.so, vesa_drv.so, libfbdevhw.so for me), and with an explicit X.org configuration specifying only qxl (so only qxl_drv.so is loaded). Line 239 in qxl_driver.c only gets called once. Using ltrace, I also can only see a total of the three expected pci_device_map_range() calls - RAM, VRAM and ROM. It's the very first at all that fails with EINVAL (22): 7554 fwrite("(II) qxl(0): qxl_map_memory: XSERVER_LIBPCIACCESS\n", 50, 1, 0x55ccc538a3a0) = 1 7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xc0000000, 0x20000000, 3) = 22 7554 fwrite("(II) qxl(0): qxl_map_memory: pci_device_map_range(0x55ccc53a7b40"..., 112, 1, 0x55ccc538a3a0) = 1 7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xe0000000, 0x10000000, 1) = 0 7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xf1244000, 8192, 0) = 0 7554 pci_device_open_io@libpciaccess.so.0(0x55ccc53a7b40, 4160, 32, 0x7f0683069c73) = 0x55ccc53b8950 7554 fwrite("(II) qxl(0): qxl_map_memory failed, ram: (nil), vram: 0x7f067285"..., 89, 1, 0x55ccc538a3a0) = 1 This error occurs no matter whether the qxl drm kernel module is loaded or not (disabled, blacklisted). I don't understand enough of how this is supposed to work - for example, there's no obvious interface to get at the allegedly already existing mapping, and "simply" use that. Guess I'll pull pciaccess_private.h into qxl_driver.c next to dig around in devp->mappings, see what's up with that. Small update, that EINVAL is returned from the mmap() on /sys/bus/pci/devices/0000:00:01.0/resource0 here: https://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/linux_sysfs.c#n640 Which supposedly means it doesn't like its parameters. Alignment? addr is NULL, so the kernel can freely choose an address Length? map->size is exactly the (displayed) "file" size prot? is 3 according to ltrace, so PROT_READ | PROT_WRITE flags? is passed as MAP_SHARED offset? I don't see that one in the ltrace. Hm. But, following the code from pci_device_map_range() through pci_device_linux_sysfs_map_range() the way it is called here, it looks like it should come out as offset = base_address - base_address basically, which makes 0 in my (algebra, not necessarily always matched by C) book. Good news is, I can reproduce the exact problem with a very short test program, basically: int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR | O_CLOEXEC); void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); // => errno == EINVAL (0x20000000 happens to be what I currently have for the size of that sysfs file, corresponding to [a multiple of] the memory parameter that's configured for qxl in qemu / Proxmox.) Using a shorter, aligned value like length / 2 or length >> 8 all give the same error. Again, this EINVAL occurs the same whether I have the qxl driver module loaded or not. /sys/bus/pci/devices/0000:00:01.0/driver only exists and points to qxl when it's loaded, as expected; also, the qxl module loading logs look good: Apr 10 11:29:36 VMjmbreuer kernel: qxl 0000:00:01.0: vgaarb: deactivate vga console Apr 10 11:29:36 VMjmbreuer kernel: [drm] Device Version 0.0 Apr 10 11:29:36 VMjmbreuer kernel: [drm] Compression level 0 log level 0 Apr 10 11:29:36 VMjmbreuer kernel: [drm] 98302 io pages at offset 0x8000000 Apr 10 11:29:36 VMjmbreuer kernel: [drm] 134217728 byte draw area at offset 0x0 Apr 10 11:29:36 VMjmbreuer kernel: [drm] RAM header offset: 0x1fffe000 Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 128M of VRAM memory size Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 511M of IO pages memory ready (VRAM domain) Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 256M of Surface memory size Apr 10 11:29:36 VMjmbreuer kernel: [drm] slot 0 (main): base 0xc0000000, size 0x1fffe000 Apr 10 11:29:36 VMjmbreuer kernel: [drm] slot 1 (surfaces): base 0xe0000000, size 0x10000000 Apr 10 11:29:36 VMjmbreuer kernel: [drm] Initialized qxl 0.1.0 20120117 for 0000:00:01.0 on minor 0 Apr 10 11:29:36 VMjmbreuer kernel: qxl 0000:00:01.0: [drm] fb0: qxldrmfb frame buffer device FWIW, I can confirm all of this to be an issue with gentoo/the package versions it's using; I had Arch Linux installed in the very same VM before and QXL worked out of the box there. (In reply to Joe Breuer from comment #7) > Good news is, I can reproduce the exact problem with a very short test > program, basically: > > int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR | > O_CLOEXEC); > void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, > 0); > // => errno == EINVAL This appears to be down to a (separate) kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=215678 With that kernel patch series applied (just the one patch wouldn't do it for me), qxl initializes at all and fails in xf86InitViewport for me, too. Kinda half related, vmwgfx breaks very similarly for me in that VM. I'll try downgrading xorg-server to 1.20.14-r1 next and see how things go. This VM (guest) is a testing system anyway, so I can try things out there. OK, with x11-base/xorg-server-1.20.14-r1 and x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1, X works for me. Which means that by switching to xorg-server-21.1.3-r1 I've got a repro scenario, I'll see what I can figure out. Comment #5 mentions xf86InitViewport (pScr=0x5618626fcdf0) at ../xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c:104 https://gitlab.freedesktop.org/xorg/xserver/-/blob/xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c#L104 ... I'll track this down properly with a local debug build. Created attachment 770447 [details, diff] Quick-fix xf86-video-qxl to work with xorg-server-21.1.3 The incompatibility / crash is mostly due to stuff that's not initialized - varying cases of xorg-server 21.1.3 no longer initializing values as the older version used to, and the newer Xorg server requiring stuff to be initialized the older version didn't care about. Tracking the resulting SEGFAULTs down I've come up with the attached patch. With this, I get a working X server 21.1.3 in VM using qxl for graphics. I'm not an expert on X driver code, so this might not be the best way to fix this. - There isn't exactly a "How to port X.org drivers from 1.20 to 21?" guide that I could find. Be aware that if your kernel is new enough, e.g. (at least) 5.15.32, there's another issue that might need addressing to get qxl to work. In a nutshell, apply this patch series to the kernel: https://patchwork.freedesktop.org/series/99243/#rev2 More information is in a kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=215678 Applying only patch 2/5 as is mentioned there was not enough to get it working in my particular case. YMMV. Forgot to mention, I've also submitted this issue to upstream: https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/merge_requests/9 (In reply to Joe Breuer from comment #10) > Created attachment 770447 [details, diff] [details, diff] > Quick-fix xf86-video-qxl to work with xorg-server-21.1.3 > > The incompatibility / crash is mostly due to stuff that's not initialized - > varying cases of xorg-server 21.1.3 no longer initializing values as the > older version used to, and the newer Xorg server requiring stuff to be > initialized the older version didn't care about. > > Tracking the resulting SEGFAULTs down I've come up with the attached patch. > > With this, I get a working X server 21.1.3 in VM using qxl for graphics. > > I'm not an expert on X driver code, so this might not be the best way to fix > this. - There isn't exactly a "How to port X.org drivers from 1.20 to 21?" > guide that I could find. > > Be aware that if your kernel is new enough, e.g. (at least) 5.15.32, there's > another issue that might need addressing to get qxl to work. In a nutshell, > apply this patch series to the kernel: > > https://patchwork.freedesktop.org/series/99243/#rev2 > > More information is in a kernel bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=215678 > > Applying only patch 2/5 as is mentioned there was not enough to get it > working in my particular case. YMMV. Still a problem in current 5.15.x(5.15.36) kernel? (In reply to Joe Breuer from comment #8) > (In reply to Joe Breuer from comment #7) > > Good news is, I can reproduce the exact problem with a very short test > > program, basically: > > > > int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR | > > O_CLOEXEC); > > void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, > > 0); > > // => errno == EINVAL > > This appears to be down to a (separate) kernel bug: > > https://bugzilla.kernel.org/show_bug.cgi?id=215678 > > With that kernel patch series applied (just the one patch wouldn't do it for > me), qxl initializes at all and fails in xf86InitViewport for me, too. > > Kinda half related, vmwgfx breaks very similarly for me in that VM. Perhaps vmwgfx needs this? https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.15/drm-vmwgfx-initialize-drm_mode_fb_cmd2.patch?id=25ec26903868a0a3750e75e63685d25cab2256ed Applying the following patch fixes this issue for me: https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/commit/52e975263fe88105d151297768c7ac675ed94122 Before: xf86-video-qxl 0.1.5 ===================== prefix: /usr c compiler: x86_64-pc-linux-gnu-gcc drm: KMS: no Build qxl: yes Build xspice: no Build spiceccid: no After: xf86-video-qxl 0.1.5 ===================== prefix: /usr c compiler: x86_64-pc-linux-gnu-gcc drm: -I/usr/include/libdrm KMS: yes Build qxl: yes Build xspice: no Build spiceccid: no Also previously there were warnings in the X logs before the crash: [ 7.566] (WW) qxl(0): No outputs definitely connected, trying again... [ 7.566] (WW) qxl(0): Unable to find connected outputs - setting 1024x768 initial framebuffer And after compiling with drm it detects all outputs correctly. For what is worth it. If the VM has latest stable Xorg installed it would fail to load X after I applied the patch mentioned in comment 14 it works again. https://github.com/mocaccinoOS/desktop/blob/eb992d889be2e5ba32971b89360970386b38a2b8/packages/layers/X/patches/x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1/52e975263fe88105d151297768c7ac675ed94122.patch *** Bug 860267 has been marked as a duplicate of this bug. *** The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e6ac76831d716668591a412fcdaa715bb39f813b commit e6ac76831d716668591a412fcdaa715bb39f813b Author: Matt Turner <mattst88@gentoo.org> AuthorDate: 2023-01-24 06:00:47 +0000 Commit: Matt Turner <mattst88@gentoo.org> CommitDate: 2023-01-24 06:07:28 +0000 x11-drivers/xf86-video-qxl: Version bump to 0.1.6 Bug: https://bugs.gentoo.org/829759 Signed-off-by: Matt Turner <mattst88@gentoo.org> x11-drivers/xf86-video-qxl/Manifest | 1 + .../xf86-video-qxl/xf86-video-qxl-0.1.6.ebuild | 45 ++++++++++++++++++++++ 2 files changed, 46 insertions(+) Please confirm that v0.1.6 works. What I can confirm (see comment #4): I run remmina over spice to some Gnome KVMs after a upgrade to xf86-video-qxl-0.1.6 without issues. |