Bug 829759

Summary:	x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1 is not compatible with x11-base/xorg-server-21.1.2-r2
Product:	Gentoo Linux	Reporter:	email200202
Component:	Current packages	Assignee:	Gentoo X packagers <x11>
Status:	RESOLVED FIXED
Severity:	normal	CC:	alexander, chiitoo, grivital, joakim.tjernlund, joost.ruis, jstein, Klaus+gentoo, m1027, sam
Priority:	Normal	Keywords:	PATCH
Version:	unspecified
Hardware:	All
OS:	Linux
See Also:	https://bugs.gentoo.org/show_bug.cgi?id=860267
Whiteboard:
Package list:		Runtime testing required:	---
Attachments:	Xorg.0.log Xorg.log with debug symbol Quick-fix xf86-video-qxl to work with xorg-server-21.1.3

Description email200202 2021-12-21 08:26:35 UTC

x11-base/xorg-server-21.1.2-r2 crashed with driver x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1. See the attached Xorg.0.log

x11-base/xorg-server-1.20.14 works the same version of xf86-video-qxl.

A version dependency condition is missing. 

Reproducible: Always

Steps to Reproduce:
1. Upgrade to the latest versions
2. restart

Actual Results:  
Black screen and xorg-server crashed

Expected Results:  
No xorg-server crash

Comment 1 email200202 2021-12-21 08:27:49 UTC

Created attachment 759954 [details]
Xorg.0.log

Comment 2 Sam James archtester

2021-12-21 08:31:27 UTC

Please follow https://wiki.gentoo.org/wiki/Debugging to get a better log.

Comment 3 Vitaly Grinin 2022-02-08 14:02:28 UTC

Created attachment 764623 [details]
Xorg.log with debug symbol

Comment 4 m1027 2022-02-28 19:01:49 UTC

This is just a quick note as I am not 100% sure yet whether my case relates to
this, but currently it seems so.

My setup here:

- xorg-server-21.1.3
- xorg-drivers-21.1
- all inside qemu-6.2.0, with spice

The issue:

When connecting to the kvm over remmina with spice I get a black screen only.
Inside the KVM, I have no xorg-server crash like the OP but gdm tries to run
xorg-server, which in turn fails with "no suitable screen found" (for qxl).
All this had worked 3 weeks ago, before applying package upgrades.

Workaround 1:

Launching qemu with -vga std worked, connection via VNC but no spice then.

Workaround 2 (inside KVM):

- Downgrade to xorg-server-1.20.14 (forced with --nodeps)
- Downgrade to xorg-drivers-1.20-r2
- Recompiling xf86-input-evdev (forced with --nodeps)
- Recompiling xf86-input-libinput

Connecting to the KVM via spice works then. Well, at least kind of. Not sure
whether I had this before:

> kernel: qxl 0000:00:02.0: object_init failed for (7442432, 0x00000001)
> kernel: [drm:qxl_alloc_bo_reserved] *ERROR* failed to allocate VRAM BO

Hm...

Comment 5 Stefan Trenker 2022-03-13 16:30:45 UTC

I have a similar problem: xorg-xserver crashes with core-dump when it runs in a KVM/Qemu VM

The problem exists with x11-base/xorg-server-1.20.14-r1 and with 21.1.3-r1 (always with corresponding x11-drivers). I have followed the debug instructions including for xorg-server, xf86-video-qxl, and glibc. Xorg.0.log does not differ from the logs already attached to this case.

gdm tells the following:

 ~ # gdb /usr/bin/X /var/lib/systemd/coredump/core.X.0.81a05b5c223041c0ba7d20797fe326e2.2204.1647187335000000
GNU gdb (Gentoo 11.2 vanilla) 11.2
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/X...
Reading symbols from /usr/lib/debug//usr/bin/Xorg.debug...

warning: core file may not match specified executable file.
[New LWP 2204]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/X -nolisten tcp -auth /var/run/sddm/{55f123a4-1564-4571-ae9f-d0fa6f263'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49        return ret;

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007f1b013a9546 in __GI_abort () at abort.c:79
#2  0x000056186105c9da in OsAbort () at ../xorg-server-21.1.3/os/utils.c:1353
#3  0x00005618610620b3 in AbortServer () at ../xorg-server-21.1.3/os/log.c:879
#4  0x0000561861063086 in FatalError (
    f=f@entry=0x5618610e2df0 "Caught signal %d (%s). Server aborting\n")
    at ../xorg-server-21.1.3/os/log.c:1017
#5  0x0000561861059ee9 in OsSigHandler (unused=<optimized out>, sip=0x7ffca222f430, signo=11)
    at ../xorg-server-21.1.3/os/osinit.c:156
#6  OsSigHandler (signo=11, sip=0x7ffca222f430, unused=<optimized out>)
    at ../xorg-server-21.1.3/os/osinit.c:110
#7  <signal handler called>
#8  xf86InitViewport (pScr=0x5618626fcdf0) at ../xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c:104
#9  0x00005618610771e4 in InitOutput (pScreenInfo=pScreenInfo@entry=0x56186117a8e0 <screenInfo>, 
    argc=argc@entry=13, argv=argv@entry=0x7ffca222fab8)
    at ../xorg-server-21.1.3/hw/xfree86/common/xf86Init.c:518
#10 0x0000561860f92774 in dix_main (argc=13, argv=0x7ffca222fab8, envp=<optimized out>)
    at ../xorg-server-21.1.3/dix/main.c:190
#11 0x00007f1b013aa7fd in __libc_start_main (main=0x561860f56a80 <main>, argc=13, 
    argv=0x7ffca222fab8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffca222faa8) at ../csu/libc-start.c:332
#12 0x0000561860f56aba in _start ()

(gdb) list
44
45        int ret = INLINE_SYSCALL_CALL (tgkill, pid, tid, sig);
46
47        __libc_signal_restore_set (&set);
48
49        return ret;
50      }
51      libc_hidden_def (raise)
52      weak_alias (raise, gsignal)
(gdb) 


Since a Fedora 35 installation runs well in a similar KVM/Qemu VM i have even compiled a kernel with the Fedora kernel config. X still crashes. Fedora runs with X.Org X Server 1.20.8.

Since my Gentoo VM worked in the past quite well i believe the issue had been introduced between X.Org X Server 1.20.8 and 1.20.14.

Comment 6 Joe Breuer 2022-04-10 10:42:13 UTC

I found that x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1 does not work in (now current) x11-base/xorg-server-21.1.3-r1.

I traced the issue down a bit:

https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/blob/xf86-video-qxl-0.1.5/src/qxl_driver.c#L239

This pci_device_map_range() fails with EINVAL, which apparently means that an exact duplicate mapping already exists:

https://gitlab.freedesktop.org/xorg/lib/libpciaccess/-/blob/master/src/common_interface.c#L297

I have no idea where that mapping might come from; this occurs both with autodetection (which also loads modesetting_drv.so, fbdev_drv.so, vesa_drv.so, libfbdevhw.so for me), and with an explicit X.org configuration specifying only qxl (so only qxl_drv.so is loaded). Line 239 in qxl_driver.c only gets called once. Using ltrace, I also can only see a total of the three expected pci_device_map_range() calls - RAM, VRAM and ROM. It's the very first at all that fails with EINVAL (22):

7554 fwrite("(II) qxl(0): qxl_map_memory: XSERVER_LIBPCIACCESS\n", 50, 1, 0x55ccc538a3a0) = 1
7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xc0000000, 0x20000000, 3) = 22
7554 fwrite("(II) qxl(0): qxl_map_memory: pci_device_map_range(0x55ccc53a7b40"..., 112, 1, 0x55ccc538a3a0) = 1
7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xe0000000, 0x10000000, 1) = 0
7554 pci_device_map_range@libpciaccess.so.0(0x55ccc53a7b40, 0xf1244000, 8192, 0) = 0
7554 pci_device_open_io@libpciaccess.so.0(0x55ccc53a7b40, 4160, 32, 0x7f0683069c73) = 0x55ccc53b8950
7554 fwrite("(II) qxl(0): qxl_map_memory failed, ram: (nil), vram: 0x7f067285"..., 89, 1, 0x55ccc538a3a0) = 1


This error occurs no matter whether the qxl drm kernel module is loaded or not (disabled, blacklisted).

I don't understand enough of how this is supposed to work - for example, there's no obvious interface to get at the allegedly already existing mapping, and "simply" use that.


Guess I'll pull pciaccess_private.h into qxl_driver.c next to dig around in devp->mappings, see what's up with that.

Comment 7 Joe Breuer 2022-04-10 14:47:02 UTC

Small update, that EINVAL is returned from the mmap() on /sys/bus/pci/devices/0000:00:01.0/resource0 here:

https://cgit.freedesktop.org/xorg/lib/libpciaccess/tree/src/linux_sysfs.c#n640

Which supposedly means it doesn't like its parameters.

Alignment? addr is NULL, so the kernel can freely choose an address
Length? map->size is exactly the (displayed) "file" size 
prot? is 3 according to ltrace, so PROT_READ | PROT_WRITE
flags? is passed as MAP_SHARED
offset? I don't see that one in the ltrace. Hm.

But, following the code from pci_device_map_range() through pci_device_linux_sysfs_map_range() the way it is called here, it looks like it should come out as offset = base_address - base_address basically, which makes 0 in my (algebra, not necessarily always matched by C) book.


Good news is, I can reproduce the exact problem with a very short test program, basically:

int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR | O_CLOEXEC);
void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
// => errno == EINVAL

(0x20000000 happens to be what I currently have for the size of that sysfs file, corresponding to [a multiple of] the memory parameter that's configured for qxl in qemu / Proxmox.)

Using a shorter, aligned value like length / 2 or length >> 8 all give the same error.

Again, this EINVAL occurs the same whether I have the qxl driver module loaded or not. /sys/bus/pci/devices/0000:00:01.0/driver only exists and points to qxl when it's loaded, as expected; also, the qxl module loading logs look good:

Apr 10 11:29:36 VMjmbreuer kernel: qxl 0000:00:01.0: vgaarb: deactivate vga console
Apr 10 11:29:36 VMjmbreuer kernel: [drm] Device Version 0.0
Apr 10 11:29:36 VMjmbreuer kernel: [drm] Compression level 0 log level 0
Apr 10 11:29:36 VMjmbreuer kernel: [drm] 98302 io pages at offset 0x8000000
Apr 10 11:29:36 VMjmbreuer kernel: [drm] 134217728 byte draw area at offset 0x0
Apr 10 11:29:36 VMjmbreuer kernel: [drm] RAM header offset: 0x1fffe000
Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 128M of VRAM memory size
Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 511M of IO pages memory ready (VRAM domain)
Apr 10 11:29:36 VMjmbreuer kernel: [drm] qxl: 256M of Surface memory size
Apr 10 11:29:36 VMjmbreuer kernel: [drm] slot 0 (main): base 0xc0000000, size 0x1fffe000
Apr 10 11:29:36 VMjmbreuer kernel: [drm] slot 1 (surfaces): base 0xe0000000, size 0x10000000
Apr 10 11:29:36 VMjmbreuer kernel: [drm] Initialized qxl 0.1.0 20120117 for 0000:00:01.0 on minor 0
Apr 10 11:29:36 VMjmbreuer kernel: qxl 0000:00:01.0: [drm] fb0: qxldrmfb frame buffer device


FWIW, I can confirm all of this to be an issue with gentoo/the package versions it's using; I had Arch Linux installed in the very same VM before and QXL worked out of the box there.

Comment 8 Joe Breuer 2022-04-12 10:08:07 UTC

(In reply to Joe Breuer from comment #7)
> Good news is, I can reproduce the exact problem with a very short test
> program, basically:
> 
> int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR |
> O_CLOEXEC);
> void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
> 0);
> // => errno == EINVAL

This appears to be down to a (separate) kernel bug:

https://bugzilla.kernel.org/show_bug.cgi?id=215678

With that kernel patch series applied (just the one patch wouldn't do it for me), qxl initializes at all and fails in xf86InitViewport for me, too.

Kinda half related, vmwgfx breaks very similarly for me in that VM.

I'll try downgrading xorg-server to 1.20.14-r1 next and see how things go.

This VM (guest) is a testing system anyway, so I can try things out there.

Comment 9 Joe Breuer 2022-04-12 10:36:40 UTC

OK, with x11-base/xorg-server-1.20.14-r1 and x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1, X works for me.

Which means that by switching to xorg-server-21.1.3-r1 I've got a repro scenario, I'll see what I can figure out.

Comment #5 mentions xf86InitViewport (pScr=0x5618626fcdf0) at ../xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c:104

https://gitlab.freedesktop.org/xorg/xserver/-/blob/xorg-server-21.1.3/hw/xfree86/common/xf86Cursor.c#L104

... I'll track this down properly with a local debug build.

Comment 10 Joe Breuer 2022-04-12 19:06:44 UTC

Created attachment 770447 [details, diff]
Quick-fix xf86-video-qxl to work with xorg-server-21.1.3

The incompatibility / crash is mostly due to stuff that's not initialized - varying cases of xorg-server 21.1.3 no longer initializing values as the older version used to, and the newer Xorg server requiring stuff to be initialized the older version didn't care about.

Tracking the resulting SEGFAULTs down I've come up with the attached patch.

With this, I get a working X server 21.1.3 in VM using qxl for graphics.

I'm not an expert on X driver code, so this might not be the best way to fix this. - There isn't exactly a "How to port X.org drivers from 1.20 to 21?" guide that I could find.

Be aware that if your kernel is new enough, e.g. (at least) 5.15.32, there's another issue that might need addressing to get qxl to work. In a nutshell, apply this patch series to the kernel:

https://patchwork.freedesktop.org/series/99243/#rev2

More information is in a kernel bug:

https://bugzilla.kernel.org/show_bug.cgi?id=215678

Applying only patch 2/5 as is mentioned there was not enough to get it working in my particular case. YMMV.

Comment 11 Joe Breuer 2022-04-12 19:07:49 UTC

Forgot to mention, I've also submitted this issue to upstream:

https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/merge_requests/9

Comment 12 Joakim Tjernlund 2022-05-04 10:05:41 UTC

(In reply to Joe Breuer from comment #10)
> Created attachment 770447 [details, diff] [details, diff]
> Quick-fix xf86-video-qxl to work with xorg-server-21.1.3
> 
> The incompatibility / crash is mostly due to stuff that's not initialized -
> varying cases of xorg-server 21.1.3 no longer initializing values as the
> older version used to, and the newer Xorg server requiring stuff to be
> initialized the older version didn't care about.
> 
> Tracking the resulting SEGFAULTs down I've come up with the attached patch.
> 
> With this, I get a working X server 21.1.3 in VM using qxl for graphics.
> 
> I'm not an expert on X driver code, so this might not be the best way to fix
> this. - There isn't exactly a "How to port X.org drivers from 1.20 to 21?"
> guide that I could find.
> 
> Be aware that if your kernel is new enough, e.g. (at least) 5.15.32, there's
> another issue that might need addressing to get qxl to work. In a nutshell,
> apply this patch series to the kernel:
> 
> https://patchwork.freedesktop.org/series/99243/#rev2
> 
> More information is in a kernel bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=215678
> 
> Applying only patch 2/5 as is mentioned there was not enough to get it
> working in my particular case. YMMV.

Still a problem in current 5.15.x(5.15.36) kernel?

Comment 13 Joakim Tjernlund 2022-05-16 09:14:05 UTC

(In reply to Joe Breuer from comment #8)
> (In reply to Joe Breuer from comment #7)
> > Good news is, I can reproduce the exact problem with a very short test
> > program, basically:
> > 
> > int fd = open("/sys/bus/pci/devices/0000:00:01.0/resource0", O_RDWR |
> > O_CLOEXEC);
> > void *mem = mmap(NULL, 0x20000000, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
> > 0);
> > // => errno == EINVAL
> 
> This appears to be down to a (separate) kernel bug:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=215678
> 
> With that kernel patch series applied (just the one patch wouldn't do it for
> me), qxl initializes at all and fails in xf86InitViewport for me, too.
> 
> Kinda half related, vmwgfx breaks very similarly for me in that VM.

Perhaps vmwgfx needs this?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/diff/queue-5.15/drm-vmwgfx-initialize-drm_mode_fb_cmd2.patch?id=25ec26903868a0a3750e75e63685d25cab2256ed

Comment 14 Alexander Tsoy 2022-05-28 12:31:04 UTC

Applying the following patch fixes this issue for me:

https://gitlab.freedesktop.org/xorg/driver/xf86-video-qxl/-/commit/52e975263fe88105d151297768c7ac675ed94122

Before:

        xf86-video-qxl 0.1.5
        =====================

        prefix:                   /usr
        c compiler:               x86_64-pc-linux-gnu-gcc

        drm:                      
        KMS:                      no
        Build qxl:                yes
        Build xspice:             no
        Build spiceccid:          no

After:

        xf86-video-qxl 0.1.5
        =====================

        prefix:                   /usr
        c compiler:               x86_64-pc-linux-gnu-gcc

        drm:                      -I/usr/include/libdrm 
        KMS:                      yes
        Build qxl:                yes
        Build xspice:             no
        Build spiceccid:          no

Comment 15 Alexander Tsoy 2022-05-28 12:34:13 UTC

Also previously there were warnings in the X logs before the crash:

[     7.566] (WW) qxl(0): No outputs definitely connected, trying again...
[     7.566] (WW) qxl(0): Unable to find connected outputs - setting 1024x768 initial framebuffer

And after compiling with drm it detects all outputs correctly.

Comment 16 Joost Ruis 2022-08-02 19:22:31 UTC

For what is worth it.
If the VM has latest stable Xorg installed it would fail to load X after I applied the patch mentioned in comment 14 it works again.

https://github.com/mocaccinoOS/desktop/blob/eb992d889be2e5ba32971b89360970386b38a2b8/packages/layers/X/patches/x11-drivers/xf86-video-qxl-0.1.5_p20200205-r1/52e975263fe88105d151297768c7ac675ed94122.patch

Comment 17 Sam James archtester

2022-09-19 04:32:04 UTC

*** Bug 860267 has been marked as a duplicate of this bug. ***

Comment 18 Larry the Git Cow gentoo-dev

2023-01-24 06:07:40 UTC

The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e6ac76831d716668591a412fcdaa715bb39f813b

commit e6ac76831d716668591a412fcdaa715bb39f813b
Author:     Matt Turner <mattst88@gentoo.org>
AuthorDate: 2023-01-24 06:00:47 +0000
Commit:     Matt Turner <mattst88@gentoo.org>
CommitDate: 2023-01-24 06:07:28 +0000

    x11-drivers/xf86-video-qxl: Version bump to 0.1.6
    
    Bug: https://bugs.gentoo.org/829759
    Signed-off-by: Matt Turner <mattst88@gentoo.org>

 x11-drivers/xf86-video-qxl/Manifest                |  1 +
 .../xf86-video-qxl/xf86-video-qxl-0.1.6.ebuild     | 45 ++++++++++++++++++++++
 2 files changed, 46 insertions(+)

Comment 19 Matt Turner gentoo-dev

2023-01-24 06:08:06 UTC

Please confirm that v0.1.6 works.

Comment 20 m1027 2023-01-24 11:27:00 UTC

What I can confirm (see comment #4): I run remmina over spice to some Gnome KVMs after a upgrade to xf86-video-qxl-0.1.6 without issues.