Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 736916 - media-libs/mesa: USE=libglvnd breaks kde-plasma/kwin aurora-based windecos
Summary: media-libs/mesa: USE=libglvnd breaks kde-plasma/kwin aurora-based windecos
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo KDE team
URL: https://bugs.kde.org/show_bug.cgi?id=...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-12 20:34 UTC by Duncan
Modified: 2020-08-27 14:03 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge.info,6.72 KB, text/plain)
2020-08-12 20:50 UTC, Duncan
Details
glxinfo output diff, eselect-opengl to libglvnd (glxinfo.diff,1.51 KB, text/plain)
2020-08-13 03:53 UTC, Duncan
Details
full glxinfo (running with libglvnd) (glxinfo.libglvnd,146.29 KB, text/plain)
2020-08-13 03:55 UTC, Duncan
Details
kwin --replace output with libglvnd (kwin-replace.libglvnd,11.02 KB, text/plain)
2020-08-13 09:42 UTC, Duncan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duncan 2020-08-12 20:34:39 UTC
Short version: mesa with libglvnd gives me invisible titlebars; mesa with eselect-opengl gives me working titlebars.

Status set to blocking since I'm having to un-use-force libglvnd and unmask eselect-opengl.

After an update earlier today, including the switch to libglvnd, and restarting X/plasma to get it to take effect...

I lost my titlebar/windecos!

Or more precisely, they were still there, but transparent.  I could still click the close button (top-left corner), etc, but the app-menu button was rather harder as it's further into the titlebar, with other buttons (keep-above/keep-below) that may or may not appear in between.  Similar story on the right with the maximize/minimize/widget-help buttons.

Actually, because I have the blur-under-semi-transparent effect on, the area where the titlebar was, while transparent, was blurred, so I could sort of see where it should be.  In my experimentation I tried turning the blur off and then the titlebar was /entirely/ transparent, no blur, so I couldn't see where it was at all.

In further experiments, toggling the compositor off got me back my titlebars, but then of course most effects including semi-transparent windows and more importantly, zoom, were disabled.  So that's not long-term viable at all.

Additionally, I found that switching windecos to the default breeze or the old kde4 default oxygen did get windecos back with compositing on, while switching to any aurora-based windecos, including the native plastik windeco (so it's anything using the aurora windeco engine) gave me invisible-but-blurred titlebars again.  While perhaps a bit more workable as a workaround than killing composite, the aurora-based windeco I use allows far shorter (15 px) titlebars than breeze or oxygen (30 px even with the buttons set as small as possible), so it wastes space, and because I have a bunch of window-rules setup for 15px titlebars, the bigger ones break my usual window-grid layout.


Since I run live-git kde-*, initially I thought it was a bad kwin update so I tried bisecting on that to no avail.  Then I tried masking the fresh-updated mesa-20.2.0_rc1, and that didn't help either.

Then I decided to try switching back to eselect-opengl, and after jumping thru all the use-force undos and eselect-opengl unmasking hoops to do it...  I had titlebar/windecos again!

With the unmasking/unforcing setup so I could switch by just toggling USE=libglvnd, I've tried toggling between it and eselect-opengl, and the bug is repeatable -- with eselect-opengl I have working titlebars, with libglvnd they're invisible (but still clickable, and background blurred so I can see where they are).

Radeon rx460 graphics hardware, running the freedomware amdgpu drivers
~amd64
Linux kernel 5.8 mainline
kde-frameworks/plasma/apps-9999 versions from the gentoo/kde overlay
qt*-5.15.0
xorg-server-1.20.8-r1
mesa-20.1.5
libglvnd-1.3.2
xf86-video-amdgpu-19.1.0
linux-firmware-20200721 (for the amdgpu firmware)
Comment 1 Duncan 2020-08-12 20:40:02 UTC
CCing mattst88 as the filer of the blocked bug
Comment 2 Duncan 2020-08-12 20:42:23 UTC
CCing kde@gentoo due to the kwin aurora windecos angle
Comment 3 Duncan 2020-08-12 20:50:14 UTC
Created attachment 654380 [details]
emerge --info
Comment 4 Matt Turner gentoo-dev 2020-08-12 20:54:45 UTC
Strange. That's oddly specific.

Could you check whether DRI is working generally? Does glxinfo show any differences between using libglvnd vs not?
Comment 5 Duncan 2020-08-13 02:45:13 UTC
(In reply to Matt Turner from comment #4)
> Strange. That's oddly specific.
> 
> Could you check whether DRI is working generally? Does glxinfo show any
> differences between using libglvnd vs not?

Everything else, including zoom and window transparency which I use very regularly so I'd quickly notice if they were broken and which break if I turn off compositing, is fine.

(I haven't done a glxinfo diff yet and needed to actually get some work done for a few hours, but I will.)

There is some precedent for aurora-based windecos being the only common kde/plasma symptom, however.  I don't recall the details (and it may be I only saw it in the git logs with it triggered and fixed between my updates, and never saw it myself, thus explaining my lack of recalled detail) but maybe six months or a year ago there was something that triggered in kwin with that being the primary symptom.  They had to make some adjustments.  The gentoo/kde devs may remember it and if not the kwin devs almost certainly should have more detail, tho of course they're not in the current loop as I traced it to this gentoo libglvnd issue before reporting, not the kwin issue I first suspected in part /because/ of that past issue.

I can probably do a kwin git log search and see if anything comes up from that earlier event, too...

I'm guessing it's aurora using one potentially obscure part of the OpenGL API that libglvnd implements differently, perhaps interacting with a corner-case of AMD/Radeon-only hardware or driver implementation.  And given the aurora windeco engine is shipped by default but not used by either the breeze default windeco or the older kde4-default oxygen windeco, and has less fancy features except that it's the engine most get-hot-new-stuff windecos seem to use (probably easiest to customize as I believe the customizations are all config using the common aurora engine, no code), only a relatively small fraction of users will be affected, a fraction of a fraction if there's a hardware-dep element as well.
Comment 6 Duncan 2020-08-13 03:53:26 UTC
Created attachment 654388 [details]
glxinfo output diff, eselect-opengl to libglvnd

(In reply to Matt Turner from comment #4)
> Does glxinfo show any differences between using libglvnd vs not?

Good question!  Unfortunately, not much in the diff.  Seems to be memory usage only, and with an X/plasma restart in between and a switch back to my routine read-only root (so no deleted libs still loaded) after the update, identical memory usage /would/ be something!
Comment 7 Duncan 2020-08-13 03:55:01 UTC
Created attachment 654390 [details]
full glxinfo (running with libglvnd)
Comment 8 Matt Turner gentoo-dev 2020-08-13 04:27:13 UTC
glxinfo output looks totally normal to me.

Any ideas, kde@?
Comment 9 Matt Turner gentoo-dev 2020-08-13 04:38:28 UTC
https://bbs.archlinux.org/viewtopic.php?id=223603&p=2 seems to have some relevant information. The suggestion there is to run

>  kwin_x11 --replace & 

and look at the terminal output. In theirs they have

> kwin_core: Failed to initialize compositing, compositing disabled

which looks indicative of a problem.

A few posts below someone just says that they need to go into a settings menu and reenable compositing? Not sure.
Comment 10 Duncan 2020-08-13 06:16:58 UTC
(In reply to Matt Turner from comment #9)
> https://bbs.archlinux.org/viewtopic.php?id=223603&p=2

Looking at it...

> >  kwin_x11 --replace & 
> 
> and look at the terminal output. In theirs they have
> 
> > kwin_core: Failed to initialize compositing, compositing disabled
> 
> which looks indicative of a problem.

I'm familiar with kwin_x11 replace (familiar enough I have it in a script, I'm running live-git kde-frameworks/plasma after all...), but hadn't thought to run it from a konsole window and check the output.

One thing I do know from experience, most kde/plasma apps dump a /lot/ of alarming looking but useless-for-the-user output, even when running without evident issue (aka "normally"), so often it's only helpful (at least for users) if you can get a diff bad run against normal run.  Which will of course involve building with and without USE=libglvnd and running it both ways, saving the output from each and doing a diff, again.

Meanwhile, the plasma-system-settings compositor settings (the same ones including the reenable compositing option from the arch thread) have some options (opengl3/opengl2/xrender, among them) that I'm playing with ATM, and I need to restart X/plasma between them to be sure I'm getting clean results.

> A few posts below someone just says that they need to go into a settings
> menu and reenable compositing? Not sure.

Tried that before filing the bug.  Didn't work.  Variously kwin_x11 crashing until it turns off compositing again, compositing staying on but with stale titlebars/windecos showing up, etc.  Having to renabling compositing isn't exactly routine, but it's common enough I don't file bugs when that fixes things, maybe a couple incidents a year.


BTW I may have found that aurorae (terminating "e" I missed previously!) bug in kwin's git with a link to the bug (which in turn mentions the commits).  Older than I remembered, June of 2018.  Not sure if it's useful, but here's the bug reference to have it somewhere as I need to do that X/plasma restart and xrender/opengl2/opengl3 testing before I investigate that bug and the kwin_x11 output idea further.

https://bugs.kde.org/show_bug.cgi?id=395732
Comment 11 Duncan 2020-08-13 06:57:13 UTC
(In reply to Duncan from comment #10)
> Meanwhile, the plasma-system-settings compositor settings 

The three kwin compositor backends and their results:

OpenGL 3.1: My default, of course broken aurorae titlebars.

XRender: This works as I expected, but it disables some of the more eye-candy effects including wobbly-windows.  I wouldn't want to live with this permanently, but at least it doesn't break either titlebars or effects like zoom and semi-transparent windows that I'd have serious trouble doing without, so it's an acceptable temporary workaround.

OpenGL 2.0: This was the one I was wondering about.  Broken.

So the bug appears to apply to older OpenGL as well as newer, but (as expected) not kwin's XRender backend.
Comment 12 Duncan 2020-08-13 09:42:22 UTC
Created attachment 654432 [details]
kwin --replace output with libglvnd

(In reply to Duncan from comment #10)
> (In reply to Matt Turner from comment #9)
>>>  kwin_x11 --replace & 
>> 
>> and look at the terminal output. In theirs they have
>> 
>>> kwin_core: Failed to initialize compositing, compositing disabled

> I'm familiar with kwin_x11 replace but hadn't thought
> to run it from a konsole window and check the output.

Thanks for the idea.  Seems it may have provided a couple clue lines, below.

> One thing I do know from experience, most kde/plasma apps
> dump a /lot/ of alarming looking output even when running
> [normally,] so often it's only helpful if you can get a diff,
> bad run against normal run.

Tried it, exactly that, so noisy with alarming output only a diff between normal and not helps, and even it's noisy enough it needs manual deduping/decluttering.

But I /did/ get some interesting results:

* With kwin_x11 --replace, *both* the broken libglvnd and working eselect-opengl session new kwins crashed and were auto-restarted a few times, until the restart-count reached a threshold and a dialog (probably from kcrash) popped up, offering the ability to start a different wm.

Not having any other wm installed, there's little to do but hit the OK, and kwin_x11 (which is filled in as the default) restarted again, but now with composite disabled.  (Apparently plasma's kwin restarter tries N times with composite enabled, then pops the dialog, and disables composite if you hit OK with kwin still set at the dialog.  FWIW this matches my previous experience so no different behavior here.)

At that point, with composite disabled, both sessions had working titlebars, as previously reported.

* The user-visible behavior difference is that once restarted with composite disabled, while I can enable composite again in both cases and once I do I see the windows get semi-transparent so I know it's working, the eselect-opengl instance still has good titlebars, as it did before the replace, while the libglvnd instance titlebars go fully transparent (albeit with background blur where the otherwise invisible titlebar is, if that effect is enabled), again, just as they were before the replace.

* The diff output, as mentioned, has a lot of duplicate output in both cases, sometimes reordered and/or with more dupes in one or the other, but in general, pretty similar.

** There were TWO DISTINCT LINES in the bad output, however, that look like reasonable clues, as they appear after the last crash and the OpenGL boilerplate that apparently appears (in both sessions) as it reinitializes, thus look like they could be where the titlebar disappears on the libglvnd session, AND particularly the second one is obviously OpenGL.  In order (with a few other lines duplicated in both instances in between):

QQuickRenderControl::initialize called with incorrect current context

QOpenGLVertexArrayObject::destroy() failed to make VAO's context current


So OpenGL context is bad (but only on with libglvnd) at initialization, such that the OpenGL VAO/Vertex-Array-Object presumably later called on that bad context fails?

Initialization timing/race that at least on my hardware happens to work on eselect-opengl but fail with a still uninitialized/null context on libglvnd?  Perhaps the current context setup is entirely missed on libglvnd?  Or is this a rabbit trail to nowhere?  If these are clues how does libglvnd work differently to trigger it?

I'm attaching the kwin_x11 --replace output log from the libglvnd session.  The diff against the eselect-opengl session is pretty useless without manual deduping/decluttering but the two real differences are listed above.
Comment 13 Matt Turner gentoo-dev 2020-08-13 17:49:54 UTC
Nice.

> QQuickRenderControl::initialize called with incorrect current context
> QOpenGLVertexArrayObject::destroy() failed to make VAO's context current

definitely looks relevant.

I'm not familiar with KDE. I know kwin is the compositor, but are you using some compositor plugin on top? (aurora-based windecos?) Or is that just an option in KDE/kwin?
Comment 14 Duncan 2020-08-13 21:57:13 UTC
(In reply to Matt Turner from comment #13)
> Nice.
> 
> > QQuickRenderControl::initialize called with incorrect current context
> > QOpenGLVertexArrayObject::destroy() failed to make VAO's context current
> 
> definitely looks relevant.
> 
> I'm not familiar with KDE. I know kwin is the compositor, but are you using
> some compositor plugin on top? (aurora-based windecos?) Or is that just an
> option in KDE/kwin?

In tabular form:

Windeco engines, all native-shipped:

Aurorae: Affected, public API exposed for user-created windecos
Breeze: Unaffected, no public API exposed (to my knowledge)
Oxygen: Unaffected, no public API exposed (to my knowledge)

Windecos, native-shipped:

Plastik: aurorae-engine-based, affected
Breeze: breeze-engine-based, unaffected
Oxygen: oxygen-engine-based, unaffected

Windecos from KDE store:

All (apparently) aurorae-engine based.  All (that I've tried) affected.

So the bug appears to affect (only?) the native-shipped aurorae windeco engine, with all its users including the native-shipped plastik as well as all the KDE store based windecos, affected.

(FWIW, the option as exposed in the GUI is for the windeco.  I only happen to know the engine information from following development, of course along with the surmised fact that all the kde store windecos happen to use aurorae, also used by the native-shipped plastik windeco, so aurorae's API is obviously publicly exposed, while the other two native-shipped windecos use their own not-public-API engines.)

But why is the bug only triggered by libglvnd, not by eselect-opengl?  What's different with the libglvnd implementation that would trigger it, when the eselect-opengl implementation does not?
Comment 15 Matt Turner gentoo-dev 2020-08-13 22:20:51 UTC
(In reply to Duncan from comment #14)
> But why is the bug only triggered by libglvnd, not by eselect-opengl? 
> What's different with the libglvnd implementation that would trigger it,
> when the eselect-opengl implementation does not?

My guess is that the KDE code is doing something unreliable in context creation that just so happens to work on Mesa's libGL but not libglvnd's. Perhaps something similar to https://cgit.freedesktop.org/xorg/app/xdriinfo/commit/?id=6273d9dacbf165331c21bcda5a8945c8931d87b8 is needed.

But I find it strange that this is the the only report of this given that I enabled libglvnd by default in early March.

commit 6a770860c347a92a246c3df5297725ef0871a83e
Author: Matt Turner <mattst88@gentoo.org>
Date:   Sat Mar 7 16:26:52 2020 -0800

    media-libs/mesa: Enable IUSE=libglvnd by default
Comment 16 Duncan 2020-08-13 22:43:53 UTC
Bugzi seems to have lost my latest comment, probably mid-air collision and I closed the window after hitting submit without reading (BAD Duncan!), so I missed it.  Anyway...

Got me googling (on ddg FWIW)...

https://blog.martin-graesslin.com/blog/2012/01/aurorae-3-window-decorations-with-qtquick/

2012 era as the link suggests. Martin was the kwin maintainer until a couple years ago when he got burned out.  Turns out aurorae was his project, born out of early kde4 experiments.  The above was blogging the port to qml for kde 4.9.

There have been third-party windeco theming engines as well.  deKorator was an old one I had forgotten about that I believe is mentioned in the above link.  Here's an article introducing a newer one, 2020 update it says, called "Hello KWin":

https://www.pcsuggest.com/hello-customizable-kwin-decoration/
Comment 17 Duncan 2020-08-13 23:15:40 UTC
(In reply to Matt Turner from comment #15)
> But I find it strange that this is the the only report of this given that I
> enabled libglvnd by default in early March.

Explained in my case with USE="-* ...", since I got tired of having to suss out why USE flags had changed state when I hadn't changed them, and then decide what I wanted to do about it, keep the changes or change it back.  Now, I see new flags on old packages, and all new flags on new packages, and decide at that point, after which they don't change unless the flag is removed or renamed, or I find reason to change them, in which case I /know/ why it's changed.  (Meanwhile, switching profiles isn't half the hassle it used to be, as USE flags aren't affected unless use-masked or use-forced.)

That doesn't explain why others haven't reported it, but that might be explained by while aurorae/plastik is native-shipped it's not the default-enabled option, so those seeing the bug would be limited to the fraction of gentooers that use kde/plasma, the fraction of /them/ that use the non-default aurorae either via the native-shipped plastik windeco or via a kde-store installed (or self-authored custom) windeco, and /possibly/ the fraction of /them/ that either have similar hardware/drivers have some other not yet discovered trigger-qualifier (maybe it's only when built with gcc-10, or with some CFLAG...).  And only a fraction of /them/ might find it less hassle to report than to simply switch to another option that actually works for them.

Fraction of a fraction of a fraction and pretty quickly it's only handful of instances, only one of which might bother to report...
Comment 18 Andreas Sturmlechner gentoo-dev 2020-08-18 15:05:34 UTC
I just switched to Plastik window decoration and could not reproduce the issue, that's with amdgpu, OGL-3.1 backend and ofc compositing enabled.

Incidentally, on this system the only delta over your package version infos (apart from unrelated kde-apps) is media-libs/mesa-20.2.0_rc2 on my part.
Comment 19 Duncan 2020-08-18 18:48:27 UTC
(In reply to Andreas Sturmlechner from comment #18)
> I just switched to Plastik window decoration and could not reproduce the
> issue, that's with amdgpu, OGL-3.1 backend and ofc compositing enabled.

Thanks for looking at this.  I'm actually working on this bug (to the extent I can as a non-dev gentoo or otherwise, but advanced gentooer who knows his way around patching/git well enough to revert and occasionally modify) this morning too.

> Incidentally, on this system the only delta over your package version infos
> (apart from unrelated kde-apps) is media-libs/mesa-20.2.0_rc2 on my part.

I'm updated to mesa-20.2.0_rc2 as well now.  (I had only masked rc1 due to this bug and it was still there on the previous version too, so when rc2 became available I went ahead and updated.)

Unfortunately it's still happening here. =:^(

Any hints as to what to investigate?

Right now I'm starting to look at kwin plastik and aurora sources, seeing if I can semi-blindly figure out where to try to add something like that call to glXGetClientString in the xdriinfo patch to support libglvnd Matt mentioned in comment #15.  Maybe I can figure it out if I compare against the other two engines that work, but I have to find the relevant code in each one first, and not being a dev or having a commit to hint where the code I want to look at is at...

Only real progress I've made is determining that the QQuickRenderControl::initialize error appears at kwin-replace load time, while the QOpenGLVertexArrayObject::destroy() call doesn't seem to appear until later, when a window is closed.  Which makes sense...  And those calls appear to be indirect, kwin code calling into qt, which actually makes the named calls, so I don't know what the kwin code is actually calling yet, and thus can't just grep for it.

But the kwin breeze/oxygen code gets it right so it's unlikely to be a bug in qt, just qt finding and printing the error.  Unfortunately with Andreas' failure to replicate I can't just file it as an upstream kwin bug and close it here, just yet, tho I'm getting closer to filing the upstream kwin bug too, at least.

I'm beginning to wonder if it might be easier just to local-patch the oxygen windeco titlebar to be shorter, as that's the big (only? breeze is flat which I hate, but oxygen's 3D) reason I use black-square and thus aurorae...  But that wouldn't solve the problem, just locally hack around it, and I've not yet looked at whether I can suss that out either.
Comment 20 Duncan 2020-08-19 07:46:05 UTC
(In reply to Duncan from comment #19)
> I'm beginning to wonder if it might be easier just to local-patch the oxygen
> windeco titlebar to be shorter, as that's the big reason I use black-square
> and thus aurorae... But that wouldn't solve the problem, just locally hack
> around it, and I've not yet looked at whether I can suss that out either.

Took me all day, but that's what I did.  I hacked all the padding out of the oxygen (which doesn't have the bug as it's not aurorae-based) titlebar, and with the font set to 8pt I have my 15px high titlebar again, "Now with oxygen!" (R) =:^)  Previously even with the smallest settings possible (small titlebar buttons and entirely unreadable 4px font size) it was something like twice that, the reason I switched to aurorae-based black-square (which I hacked slightly to get to 15px high, but unlike the oxygen hacks it was just one semi-user-accessible-value, in a theme for an engine /designed/ for user hacking) in the first place.

So for me personally the bug here is closeable now.  But I'm still going to try to work on the aurorae bug, probably filing it and a couple others that frustrated me while working on this, upstream.  I can still switch to aurorae-based to test any fixes or run debugging tests upstream might come up with.

I'll leave it up to you to decide whether to close this one and get on with the eselect-opengl removal, or decide there's likely other gentooers with the bug so it's worth further pursuit at this level as well.
Comment 21 Matt Turner gentoo-dev 2020-08-25 17:40:35 UTC
(In reply to Duncan from comment #20)
> I'll leave it up to you to decide whether to close this one and get on with
> the eselect-opengl removal, or decide there's likely other gentooers with
> the bug so it's worth further pursuit at this level as well.

Removing blocker.

My sense is that this is a KDE bug; why no one else has encountered it, I'm not sure.
Comment 22 Duncan 2020-08-27 11:38:20 UTC
Adding the upstream kwin bug URL here as a comment and to the URL field: https://bugs.kde.org/show_bug.cgi?id=425864