Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 613222 - gnome-base/gdm[wayland] fails to start wayland session on boot when kernel driver is slower (gdm not waiting for CanGraphical property on the seat)
Summary: gnome-base/gdm[wayland] fails to start wayland session on boot when kernel dr...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Linux Gnome Desktop Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-19 14:27 UTC by Mart Raudsepp
Modified: 2019-03-27 10:17 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mart Raudsepp gentoo-dev 2017-03-19 14:27:48 UTC
When first logging into a wayland gnome session, it might crash, while the second one succeeds. This seems to be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1398142 - dconf file create error, gnome-shell segfaults inside libxkbcommon with the same backtrace and all.
Comment 1 Mart Raudsepp gentoo-dev 2017-03-20 16:30:56 UTC
Perhaps a symptom of https://git.gnome.org/browse/gnome-settings-daemon/commit/?id=e5c6735c6b9ed4fc6466d3ea1aac3838ffbb6695 (fix included in gnome-settings-daemon-3.24.0), or maybe something else. Would be nice to re-confirm the issue, then upgrade gnome-settings-daemon, then test again, to be sure this was indeed the issue, but I think there's a 50% chance it's something else, as the log messages are a different - or it could be just a different plugin deadlocking (e.g something that sets up some environment stuff for dconf and co)
Comment 2 Mart Raudsepp gentoo-dev 2017-04-10 15:00:21 UTC
I think the https://bugzilla.redhat.com/show_bug.cgi?id=1398142 case or at least what I had in mind here is fixed by the gnome-settings-daemon fixes, but gdm-wayland first launch from bootup is still crashing with both 3.22 and 3.24 things - gnome-shell crashes, gdm ends up trying Xorg insteadi and if that worked, won't have the wayland sessions unexpectedly. If Xorg also happened to crash (e.g like I have now due to uninstalling xf86-video-intel but not cleaning up /etc/X11/xorg.conf.d so it tries to load a non-existent video driver and fail), it retries wayland again and then it succeeds. Also restarting gdm makes it go into wayland fine. I suspect it's some race condition on too fast booting, maybe potentialy combined with us not using plymouth.
Comment 3 Mart Raudsepp gentoo-dev 2017-04-23 12:29:54 UTC
This seems to be a race between:

gnome-shell: Can't initialize KMS backend: could not find drm kms device

and

kernel: [drm] Initialized i915 1.6.0 20161121 for 0000:00:02.0 on minor 0

If gdm is started up and reaches the gnome-shell/mutter KMS backend init before kernel is done with i915 init, it goes boom, tries Xorg and probably unexpectedly works there and thus effectively no GNOME wayland session.
Comment 4 Leho Kraav (:macmaN @lkraav) 2017-04-23 12:36:13 UTC
Are you testing w/ or w/o Plymouth? I seem to recall something similar from the past, but at the time it was Plymouth with race-like initialization problems.
Comment 5 Leho Kraav (:macmaN @lkraav) 2017-04-23 12:36:47 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #4)
> Are you testing w/ or w/o Plymouth? I seem to recall something similar from
> the past, but at the time it was Plymouth with race-like initialization
> problems.

Point being - do gdm things work, if Plymouth already initializes drm/kms early on?
Comment 6 Mart Raudsepp gentoo-dev 2017-04-23 13:37:50 UTC
I do not use an initramfs, therefore I can not easily test.

I suspect it will also be more likely to not fail when i915 is built into the kernel, so it'd hopefully start to initialize sooner. But we can't just rely on some luck here.

Plymouth could be ensuring as a side-effect it's set up in time for gdm, and handle the race conditions itself, but gdm/mutter must be able to do that too, when plymouth hasn't ran to ensure this.

But yes, it'd be nice if someone affected by this could test with plymouth to confirm the theories.
Comment 7 Leho Kraav (:macmaN @lkraav) 2017-04-23 13:45:01 UTC
(In reply to Mart Raudsepp from comment #6)
> 
> But yes, it'd be nice if someone affected by this could test with plymouth
> to confirm the theories.

I could test, but could use more gnome-3.24 packages to land in tree (even if masked).
Comment 8 Mart Raudsepp gentoo-dev 2017-04-23 13:48:30 UTC
The issue happens with both 3.22 and 3.24; my 3.22 system with broadwell and ttambet 3.24 system with skylake.
But I hope to get a 3.24 PR reviewed and merged soon, that should bring in the new gdm/mutter/gnome-shell stuff.
Comment 9 Leho Kraav (:macmaN @lkraav) 2017-04-23 14:10:09 UTC
(In reply to Mart Raudsepp from comment #8)
> The issue happens with both 3.22 and 3.24; my 3.22 system with broadwell and
> ttambet 3.24 system with skylake.
> But I hope to get a 3.24 PR reviewed and merged soon, that should bring in
> the new gdm/mutter/gnome-shell stuff.

OK. For 3.22 I can confirm that at least with plymouth inside dracut initramfs fronting the initialization, gdm hasn't had a single issue booting up.

Keywording 3.24 perhaps shouldn't be blocked by this bug. Not really sure who has the edge case here, though.
Comment 10 Mart Raudsepp gentoo-dev 2017-04-23 14:19:42 UTC
(In reply to Leho Kraav (:macmaN @lkraav) from comment #9)
> OK. For 3.22 I can confirm that at least with plymouth inside dracut
> initramfs fronting the initialization, gdm hasn't had a single issue booting
> up.

gdm-wayland-session, right?
It doesn't tell much though unless you confirm that without plymouth the issue does exist for you too.
 
> Keywording 3.24 perhaps shouldn't be blocked by this bug. Not really sure
> who has the edge case here, though.

I'm not sure about keywording, but stabling will be blocked by this at the very least for sure. The intention is to flip wayland USE flag to enabled in gnome target profiles, etc.
Meanwhile that "gnome-3.24" alias is just a catcher for bugs that should be looked over before unmasking and before stabilizing to make a decision at that point if we can fix them first or not. It doesn't necessary mean it'll block unmasking, etc.
Comment 11 Mart Raudsepp gentoo-dev 2017-04-29 11:44:39 UTC
Leho, have you been able to try if it starts having the issue when not using plymouth?
Comment 12 Leho Kraav (:macmaN @lkraav) 2017-05-02 13:04:11 UTC
(In reply to Mart Raudsepp from comment #11)
> Leho, have you been able to try if it starts having the issue when not using
> plymouth?

Negative, too much stuff on this camel's back. My next opportunity will be when I rebuild the initramfs for kernel upgrade (4.11 was just released), but not sure whether it will happen this week or not.
Comment 13 Mart Raudsepp gentoo-dev 2017-05-05 20:54:00 UTC
ok, don't worry about it. We've identified a way to fix this upstream with Ray Strode and I'll backport with whatever he or we come up with when in a good enough state.

As a temporary workaround, it's possible to put this in /usr/lib/systemd/system/gdm.service in the same [Unit] category as the other After keys:

Wants=systemd-udev-settle.service
After=systemd-udev-settle.service


But we can't really patch the service file with this, because this would wait for udev-settle on other stuff as well, not just graphics, which in corner case could take over a minute.

The proper solution will be doing what https://www.freedesktop.org/wiki/Software/systemd/writing-display-managers/ tells to do regarding CanGraphical property, if anyone fancies to do some coding of their own faster :)

I promised to file a nice bug upstream for Ray with all this context and logic we concluded, so if I haven't added such a bug into "See also" in a couple days, feel free to poke about it on IRC.
Comment 14 Sven Eden 2017-05-23 16:12:34 UTC
Just as a note:
- The current elogind-229.3 sends SeatNew and SeatRemoved signals via
  org.freedesktop.login1.Manager
  ( However, freedesktop.org says it should be "SeatAdded", but systemd-login
    (on master!) sends SeatNew as well. So which is correct?)
- The current elogind-229.3 sends a property change for CanGraphical via
  org.freedesktop.login1.Seat, just like systemd-login (up to master) does.

Important Quote: "leio asked me to report logind behavior here" ;^D

(See Bug 618498)
Comment 15 Mart Raudsepp gentoo-dev 2017-09-21 16:07:16 UTC
Default wayland will get deferred to 3.26 on the premise that it should go stable not too long after 3.24 anyways. Upstream is now looking a bit on the issue as well.
Comment 16 Leho Kraav (:macmaN @lkraav) 2018-07-24 13:46:35 UTC
> ok, don't worry about it. We've identified a way to fix this upstream with Ray Strode and I'll backport with whatever he or we come up with when in a good enough state.

How did all this end up working out? Maybe it got shipped by 3.28?
Comment 17 Mart Raudsepp gentoo-dev 2018-07-24 13:49:35 UTC
No, it's still not handled to the best of my knowledge. Ray Strode keeps being busy with RHEL work whenever I poke him ;p
It'd help if someone did the upstream work and posted a merge request, etc. Shouldn't be too hard, I'm just swamped with other important gentoo stuff too and am not experiencing this personally anymore with only radeon machines. If someone wants to do it, I think I have some tips how to go about it from conversations with upstream.
Comment 18 Leho Kraav (:macmaN @lkraav) 2018-07-24 13:53:16 UTC
I'm not seeing it either, personally. But I think it held up gnome-3.26 stabilization, perhaps for no particularly good reason (but I don't have data on the problem surface size).
Comment 19 Mart Raudsepp gentoo-dev 2018-07-24 13:56:09 UTC
It's not holding up anything right now. It was just a contributing factor to the decision to revert gnome-session-2.24 back to having Xorg as the default, not wayland. It might hold back gnome 3.28 (if we do skip 3.26 stabling) stabilization, as we may want to switch things over to global USE=wayland in gnome profiles and have wayland be used by default.
Comment 20 Leho Kraav (:macmaN @lkraav) 2018-07-24 14:00:25 UTC
+1(00) skipping straight to 3.28. Even in-tree unstable would be great.
Comment 21 Mart Raudsepp gentoo-dev 2018-12-13 22:35:02 UTC
Punting to at least 3.28 for now. There was upstream work on this some weeks ago, so should be easier to get it for 3.28 or 3.30. Targeting wayland for stable 3.30, meanwhile it's all ~arch and haven't flipped profile to default wayland yet.
Comment 22 Mart Raudsepp gentoo-dev 2019-03-11 16:52:08 UTC
Temporary workaround from Ubuntu for systemd until clean logind approach is ready from halfline and Trevinho:
https://git.launchpad.net/~ubuntu-desktop/ubuntu/+source/gdm3/tree/debian/patches/ubuntu/gdm3.service-wait-for-drm-device-before-trying-to-start-i.patch
Comment 23 Larry the Git Cow gentoo-dev 2019-03-27 10:12:49 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=85ea6bed8c994eae9891af1a2fba0e99aa1c3031

commit 85ea6bed8c994eae9891af1a2fba0e99aa1c3031
Author:     Mart Raudsepp <leio@gentoo.org>
AuthorDate: 2019-03-27 08:37:19 +0000
Commit:     Mart Raudsepp <leio@gentoo.org>
CommitDate: 2019-03-27 10:11:03 +0000

    gnome-base/gdm: wait for graphics DRM master with systemd
    
    gdm currently lacks code to properly wait for the CanGraphical property
    on a logind seat to switch to "Yes" before gnome-shell is started for the
    login VT. This is a problem, especially with wayland enabled, when the
    graphics system isn't fully initialized by the time gdm is started in
    parallel, because gnome-shell will fail to start graphics and gdm will
    retry with a X session, which likely succeeds at that point. This
    unexpectedly ends up in a gdm Xorg session, instead of a gdm Wayland
    session, which won't be able to start Wayland sessions, or reap itself
    for memory savings once logged in, etc.
    For systemd we can grab a workaround used by Ubuntu, which adds an
    ExecStartPre command to the gdm service, that waits for the DRM master
    to appear (with a 10 seconds safety fallback) before letting gdm itself
    start up.
    For OpenRC this is not effective, but combined with usually slower startup
    of the system with OpenRC, and xdm service usually starting at the very end
    (compared to rather early in parallel with systemd) due to various service
    rules, it should be much more unlikely to be a problem for OpenRC systems,
    or even impossible if something in init deps ends up waiting for udev to
    settle.
    Eventually, in a future release, there should be upstream gdm full
    CanGraphical waiting on its own, which should solve any OpenRC issues as
    well, provided that in-use elogind handles CanGraphical correctly (there
    have been issues in systemd code too).
    
    Bug: https://bugs.gentoo.org/613222
    Package-Manager: Portage-2.3.52, Repoman-2.3.12
    Signed-off-by: Mart Raudsepp <leio@gentoo.org>

 gnome-base/gdm/files/gdm-CanGraphical-wait.patch | 189 +++++++++++++++++++
 gnome-base/gdm/gdm-3.30.3-r2.ebuild              | 228 +++++++++++++++++++++++
 2 files changed, 417 insertions(+)
Comment 24 Mart Raudsepp gentoo-dev 2019-03-27 10:17:09 UTC
Please test if the above workaround in gdm-3.30.3-r2 solves the issues now (for those that hit them), after removing any local workarounds put in place already. My current machines don't hit the race conditions for easy testing myself.
With the workaround in place, I don't consider this anymore as "3.30 blocker", but leaving bug open for tracking inclusion of final upstream solution and removal of this workaround. Plus there's no working workaround for non-systemd as of yet, but I don't know if it's really hit by OpenRC users at all due to probably more strict startup order, which probably orders gdm to start much later than with systemd, thus graphics being much more likely to be ready at the crucial point of time.