When first logging into a wayland gnome session, it might crash, while the second one succeeds. This seems to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1398142 - dconf file create error, gnome-shell segfaults inside libxkbcommon with the same backtrace and all.
Perhaps a symptom of https://git.gnome.org/browse/gnome-settings-daemon/commit/?id=e5c6735c6b9ed4fc6466d3ea1aac3838ffbb6695 (fix included in gnome-settings-daemon-3.24.0), or maybe something else. Would be nice to re-confirm the issue, then upgrade gnome-settings-daemon, then test again, to be sure this was indeed the issue, but I think there's a 50% chance it's something else, as the log messages are a different - or it could be just a different plugin deadlocking (e.g something that sets up some environment stuff for dconf and co)
I think the https://bugzilla.redhat.com/show_bug.cgi?id=1398142 case or at least what I had in mind here is fixed by the gnome-settings-daemon fixes, but gdm-wayland first launch from bootup is still crashing with both 3.22 and 3.24 things - gnome-shell crashes, gdm ends up trying Xorg insteadi and if that worked, won't have the wayland sessions unexpectedly. If Xorg also happened to crash (e.g like I have now due to uninstalling xf86-video-intel but not cleaning up /etc/X11/xorg.conf.d so it tries to load a non-existent video driver and fail), it retries wayland again and then it succeeds. Also restarting gdm makes it go into wayland fine. I suspect it's some race condition on too fast booting, maybe potentialy combined with us not using plymouth.
This seems to be a race between:
gnome-shell: Can't initialize KMS backend: could not find drm kms device
kernel: [drm] Initialized i915 1.6.0 20161121 for 0000:00:02.0 on minor 0
If gdm is started up and reaches the gnome-shell/mutter KMS backend init before kernel is done with i915 init, it goes boom, tries Xorg and probably unexpectedly works there and thus effectively no GNOME wayland session.
Are you testing w/ or w/o Plymouth? I seem to recall something similar from the past, but at the time it was Plymouth with race-like initialization problems.
(In reply to Leho Kraav (:macmaN @lkraav) from comment #4)
> Are you testing w/ or w/o Plymouth? I seem to recall something similar from
> the past, but at the time it was Plymouth with race-like initialization
Point being - do gdm things work, if Plymouth already initializes drm/kms early on?
I do not use an initramfs, therefore I can not easily test.
I suspect it will also be more likely to not fail when i915 is built into the kernel, so it'd hopefully start to initialize sooner. But we can't just rely on some luck here.
Plymouth could be ensuring as a side-effect it's set up in time for gdm, and handle the race conditions itself, but gdm/mutter must be able to do that too, when plymouth hasn't ran to ensure this.
But yes, it'd be nice if someone affected by this could test with plymouth to confirm the theories.
(In reply to Mart Raudsepp from comment #6)
> But yes, it'd be nice if someone affected by this could test with plymouth
> to confirm the theories.
I could test, but could use more gnome-3.24 packages to land in tree (even if masked).
The issue happens with both 3.22 and 3.24; my 3.22 system with broadwell and ttambet 3.24 system with skylake.
But I hope to get a 3.24 PR reviewed and merged soon, that should bring in the new gdm/mutter/gnome-shell stuff.
(In reply to Mart Raudsepp from comment #8)
> The issue happens with both 3.22 and 3.24; my 3.22 system with broadwell and
> ttambet 3.24 system with skylake.
> But I hope to get a 3.24 PR reviewed and merged soon, that should bring in
> the new gdm/mutter/gnome-shell stuff.
OK. For 3.22 I can confirm that at least with plymouth inside dracut initramfs fronting the initialization, gdm hasn't had a single issue booting up.
Keywording 3.24 perhaps shouldn't be blocked by this bug. Not really sure who has the edge case here, though.
(In reply to Leho Kraav (:macmaN @lkraav) from comment #9)
> OK. For 3.22 I can confirm that at least with plymouth inside dracut
> initramfs fronting the initialization, gdm hasn't had a single issue booting
It doesn't tell much though unless you confirm that without plymouth the issue does exist for you too.
> Keywording 3.24 perhaps shouldn't be blocked by this bug. Not really sure
> who has the edge case here, though.
I'm not sure about keywording, but stabling will be blocked by this at the very least for sure. The intention is to flip wayland USE flag to enabled in gnome target profiles, etc.
Meanwhile that "gnome-3.24" alias is just a catcher for bugs that should be looked over before unmasking and before stabilizing to make a decision at that point if we can fix them first or not. It doesn't necessary mean it'll block unmasking, etc.
Leho, have you been able to try if it starts having the issue when not using plymouth?
(In reply to Mart Raudsepp from comment #11)
> Leho, have you been able to try if it starts having the issue when not using
Negative, too much stuff on this camel's back. My next opportunity will be when I rebuild the initramfs for kernel upgrade (4.11 was just released), but not sure whether it will happen this week or not.
ok, don't worry about it. We've identified a way to fix this upstream with Ray Strode and I'll backport with whatever he or we come up with when in a good enough state.
As a temporary workaround, it's possible to put this in /usr/lib/systemd/system/gdm.service in the same [Unit] category as the other After keys:
But we can't really patch the service file with this, because this would wait for udev-settle on other stuff as well, not just graphics, which in corner case could take over a minute.
The proper solution will be doing what https://www.freedesktop.org/wiki/Software/systemd/writing-display-managers/ tells to do regarding CanGraphical property, if anyone fancies to do some coding of their own faster :)
I promised to file a nice bug upstream for Ray with all this context and logic we concluded, so if I haven't added such a bug into "See also" in a couple days, feel free to poke about it on IRC.
Just as a note:
- The current elogind-229.3 sends SeatNew and SeatRemoved signals via
( However, freedesktop.org says it should be "SeatAdded", but systemd-login
(on master!) sends SeatNew as well. So which is correct?)
- The current elogind-229.3 sends a property change for CanGraphical via
org.freedesktop.login1.Seat, just like systemd-login (up to master) does.
Important Quote: "leio asked me to report logind behavior here" ;^D
(See Bug 618498)
Default wayland will get deferred to 3.26 on the premise that it should go stable not too long after 3.24 anyways. Upstream is now looking a bit on the issue as well.
> ok, don't worry about it. We've identified a way to fix this upstream with Ray Strode and I'll backport with whatever he or we come up with when in a good enough state.
How did all this end up working out? Maybe it got shipped by 3.28?
No, it's still not handled to the best of my knowledge. Ray Strode keeps being busy with RHEL work whenever I poke him ;p
It'd help if someone did the upstream work and posted a merge request, etc. Shouldn't be too hard, I'm just swamped with other important gentoo stuff too and am not experiencing this personally anymore with only radeon machines. If someone wants to do it, I think I have some tips how to go about it from conversations with upstream.
I'm not seeing it either, personally. But I think it held up gnome-3.26 stabilization, perhaps for no particularly good reason (but I don't have data on the problem surface size).
It's not holding up anything right now. It was just a contributing factor to the decision to revert gnome-session-2.24 back to having Xorg as the default, not wayland. It might hold back gnome 3.28 (if we do skip 3.26 stabling) stabilization, as we may want to switch things over to global USE=wayland in gnome profiles and have wayland be used by default.
+1(00) skipping straight to 3.28. Even in-tree unstable would be great.
Punting to at least 3.28 for now. There was upstream work on this some weeks ago, so should be easier to get it for 3.28 or 3.30. Targeting wayland for stable 3.30, meanwhile it's all ~arch and haven't flipped profile to default wayland yet.
Temporary workaround from Ubuntu for systemd until clean logind approach is ready from halfline and Trevinho:
The bug has been referenced in the following commit(s):
Author: Mart Raudsepp <email@example.com>
AuthorDate: 2019-03-27 08:37:19 +0000
Commit: Mart Raudsepp <firstname.lastname@example.org>
CommitDate: 2019-03-27 10:11:03 +0000
gnome-base/gdm: wait for graphics DRM master with systemd
gdm currently lacks code to properly wait for the CanGraphical property
on a logind seat to switch to "Yes" before gnome-shell is started for the
login VT. This is a problem, especially with wayland enabled, when the
graphics system isn't fully initialized by the time gdm is started in
parallel, because gnome-shell will fail to start graphics and gdm will
retry with a X session, which likely succeeds at that point. This
unexpectedly ends up in a gdm Xorg session, instead of a gdm Wayland
session, which won't be able to start Wayland sessions, or reap itself
for memory savings once logged in, etc.
For systemd we can grab a workaround used by Ubuntu, which adds an
ExecStartPre command to the gdm service, that waits for the DRM master
to appear (with a 10 seconds safety fallback) before letting gdm itself
For OpenRC this is not effective, but combined with usually slower startup
of the system with OpenRC, and xdm service usually starting at the very end
(compared to rather early in parallel with systemd) due to various service
rules, it should be much more unlikely to be a problem for OpenRC systems,
or even impossible if something in init deps ends up waiting for udev to
Eventually, in a future release, there should be upstream gdm full
CanGraphical waiting on its own, which should solve any OpenRC issues as
well, provided that in-use elogind handles CanGraphical correctly (there
have been issues in systemd code too).
Package-Manager: Portage-2.3.52, Repoman-2.3.12
Signed-off-by: Mart Raudsepp <email@example.com>
gnome-base/gdm/files/gdm-CanGraphical-wait.patch | 189 +++++++++++++++++++
gnome-base/gdm/gdm-3.30.3-r2.ebuild | 228 +++++++++++++++++++++++
2 files changed, 417 insertions(+)
Please test if the above workaround in gdm-3.30.3-r2 solves the issues now (for those that hit them), after removing any local workarounds put in place already. My current machines don't hit the race conditions for easy testing myself.
With the workaround in place, I don't consider this anymore as "3.30 blocker", but leaving bug open for tracking inclusion of final upstream solution and removal of this workaround. Plus there's no working workaround for non-systemd as of yet, but I don't know if it's really hit by OpenRC users at all due to probably more strict startup order, which probably orders gdm to start much later than with systemd, thus graphics being much more likely to be ready at the crucial point of time.