Using elogind with GNOME is resulting in eventually being unable to unlock the machine, due to: gdm: Unable to activate session: Timeout was reached but even loginctl over ssh doesn't work at that point: $ loginctl Failed to list sessions: Connection timed out Where do I begin to debug this? Some other log entries: Mar 20 14:33:25 gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address Mar 20 14:33:51 gdm-password]: AccountsService: Could not get current seat: No such device or address Mar 20 22:18:11 gdm: Unable to activate session: Timeout was reached
I should probably mention that I don't have elogind service enabled, due to documentation on wiki stating that it'll auto-activate (as I would expect from dbus activatable services), and one elogind-daemon process is running.
Are there any errors or warning from elogind-daemon in /var/log/messages?
No errors, but why could AccountsService not be getting current seat from sd_session_get_seat call there? I did add a -session optional pam_elogind.so line to /etc/pam.d/gdm-launch-environment Some initial login slowness after boot I did figure out - crng init was slow, so had to add the haveged service to boot runlevel, as unfortunately commonly needed with OpenRC. But the AccountsService complaining didn't go away: Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.159506] elogind-daemon[1847]: New seat seat0. Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.164139] elogind-daemon[1847]: Watching system buttons on /dev/input/event3 (Power Button) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.172818] elogind-daemon[1847]: Watching system buttons on /dev/input/event0 (Power Button) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.174680] elogind-daemon[1847]: Watching system buttons on /dev/input/event2 (Lid Switch) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.174975] elogind-daemon[1847]: Watching system buttons on /dev/input/event1 (Sleep Button) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.185085] elogind-daemon[1847]: Watching system buttons on /dev/input/event9 (Logitech USB Receiver) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.185779] elogind-daemon[1847]: Watching system buttons on /dev/input/event11 (Logitech USB Receiver Consumer Control) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.186433] elogind-daemon[1847]: Watching system buttons on /dev/input/event12 (Logitech USB Receiver System Control) Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.204709] elogind-daemon[1847]: Watching system buttons on /dev/input/event6 (AT Translated Set 2 keyboard) Mar 21 12:16:44 asus-openrc gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address Mar 21 12:16:44 asus-openrc kernel: <38>[ 16.489592] elogind-daemon[1847]: New session c1 of user gdm. Mar 21 12:16:53 asus-openrc kernel: <38>[ 25.359586] elogind-daemon[1847]: New session 1 of user leio. Mar 21 12:18:47 asus-openrc gdm-password]: AccountsService: Could not get current seat: No such device or address That complaining also remains if having added elogind to default runlevel. Haven't been able to reproduce the unlock and timeout issue today, probably have to wait for some time, as it usually happened if I hadn't touched the computer, so the automatic lock kicked in
Looks like suspend doesn't work and causes the issues: Mar 21 13:03:56 asus-openrc kernel: <38>[ 2053.990532] elogind-daemon[1586]: Suspending system... Mar 21 13:03:56 asus-openrc kernel: [ 2053.990561] PM: suspend entry (deep) Mar 21 13:03:56 asus-openrc kernel: [ 2053.990564] PM: Syncing filesystems ... done. Mar 21 13:05:11 asus-openrc gdm: Unable to activate session: Timeout was reached but in reality system is still running, but elogind is giving the timeouts. GNOME also gets locked, as is done before suspending, and can't unlock afterwards, which is that timeout entry in the log above. loginctl is timing out at this point as well. UPower is also complaining something on lock screen before for automatic suspend, but that was a cleaner failure that didn't actually suspend the system, and logind remaining operational from that - it was a manual suspend call that caused the timeouts to start happening. Suspend worked great on this machine with a systemd install.
If I kill -9 the elogind-daemon process, then the suspend process continues with a wakeup too: Mar 21 13:08:15 asus-openrc kernel: [ 2301.071260] Freezing user space processes ... (elapsed 0.003 seconds) done. Mar 21 13:08:15 asus-openrc kernel: [ 2301.074394] OOM killer disabled. Mar 21 13:08:15 asus-openrc kernel: [ 2301.074396] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. Mar 21 13:08:15 asus-openrc kernel: [ 2301.075896] printk: Suspending console(s) (use no_console_suspend to debug) Mar 21 13:08:15 asus-openrc kernel: [ 2301.076465] wlp4s0: deauthenticating from 00:22:07:aa:d4:1f by local choice (Reason: 3=DEAUTH_LEAVING) Mar 21 13:08:15 asus-openrc kernel: [ 2301.084698] sd 0:0:0:0: [sda] Synchronizing SCSI cache Mar 21 13:08:15 asus-openrc kernel: [ 2301.084825] sd 0:0:0:0: [sda] Stopping disk Mar 21 13:08:15 asus-openrc kernel: [ 2301.712829] ACPI: EC: interrupt blocked Mar 21 13:08:15 asus-openrc kernel: [ 2301.726862] ACPI: Preparing to enter system sleep state S3 Mar 21 13:08:15 asus-openrc kernel: [ 2301.727286] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored Mar 21 13:08:15 asus-openrc kernel: [ 2301.729771] ACPI: EC: event blocked Mar 21 13:08:15 asus-openrc kernel: [ 2301.729773] ACPI: EC: EC stopped Mar 21 13:08:15 asus-openrc kernel: [ 2301.729775] PM: Saving platform NVS memory Mar 21 13:08:15 asus-openrc kernel: [ 2301.731932] Disabling non-boot CPUs ... Mar 21 13:08:15 asus-openrc kernel: [ 2301.734386] smpboot: CPU 1 is now offline Mar 21 13:08:15 asus-openrc kernel: [ 2301.735209] ACPI: Low-level resume complete Mar 21 13:08:15 asus-openrc kernel: [ 2301.735263] ACPI: EC: EC started Mar 21 13:08:15 asus-openrc kernel: [ 2301.735264] PM: Restoring platform NVS memory Mar 21 13:08:15 asus-openrc kernel: [ 2301.735756] LVT offset 0 assigned for vector 0x400 Mar 21 13:08:15 asus-openrc kernel: [ 2301.736060] Enabling non-boot CPUs ... Mar 21 13:08:15 asus-openrc kernel: [ 2301.736117] x86: Booting SMP configuration: Mar 21 13:08:15 asus-openrc kernel: [ 2301.736119] smpboot: Booting Node 0 Processor 1 APIC 0x1 Mar 21 13:08:15 asus-openrc kernel: [ 2301.738987] cache: parent cpu1 should not be sleeping Mar 21 13:08:15 asus-openrc kernel: [ 2301.739105] microcode: CPU1: patch_level=0x05000101 Mar 21 13:08:15 asus-openrc kernel: [ 2301.739239] CPU1 is up Mar 21 13:08:15 asus-openrc kernel: [ 2301.739806] ACPI: Waking up from system sleep state S3 Mar 21 13:08:15 asus-openrc kernel: [ 2301.741341] ACPI: EC: interrupt unblocked Mar 21 13:08:15 asus-openrc kernel: [ 2301.765446] ACPI: EC: event unblocked Mar 21 13:08:15 asus-openrc kernel: [ 2301.765956] [drm] Found smc ucode version: 0x00010601 Mar 21 13:08:15 asus-openrc kernel: [ 2301.772114] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000). Mar 21 13:08:15 asus-openrc kernel: [ 2301.772298] radeon 0000:00:01.0: WB enabled Mar 21 13:08:15 asus-openrc kernel: [ 2301.772304] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000018000c00 and cpu addr 0x00000000214bb16d Mar 21 13:08:15 asus-openrc kernel: [ 2301.772308] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000018000c0c and cpu addr 0x000000006520734b Mar 21 13:08:15 asus-openrc kernel: [ 2301.772773] radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0x0000000077a59baf Mar 21 13:08:15 asus-openrc kernel: [ 2301.789208] [drm] ring test on 0 succeeded in 0 usecs Mar 21 13:08:15 asus-openrc kernel: [ 2301.789217] [drm] ring test on 3 succeeded in 3 usecs Mar 21 13:08:15 asus-openrc kernel: [ 2301.791419] sd 0:0:0:0: [sda] Starting disk Mar 21 13:08:15 asus-openrc kernel: [ 2301.835923] [drm] ring test on 5 succeeded in 1 usecs Mar 21 13:08:15 asus-openrc kernel: [ 2301.856027] [drm] UVD initialized successfully. Mar 21 13:08:15 asus-openrc kernel: [ 2301.856089] ACPI: button: The lid device is not compliant to SW_LID. Mar 21 13:08:15 asus-openrc kernel: [ 2301.856227] [drm] ib test on ring 0 succeeded in 0 usecs Mar 21 13:08:15 asus-openrc kernel: [ 2301.856290] [drm] ib test on ring 3 succeeded in 0 usecs Mar 21 13:08:15 asus-openrc kernel: [ 2301.954540] r8169 0000:03:00.0 enp3s0: Link is Down Mar 21 13:08:15 asus-openrc kernel: [ 2302.079875] ata3: SATA link down (SStatus 0 SControl 300) Mar 21 13:08:15 asus-openrc kernel: [ 2302.079928] ata4: SATA link down (SStatus 0 SControl 300) Mar 21 13:08:15 asus-openrc kernel: [ 2302.152752] usb 1-3: reset high-speed USB device number 2 using ehci-pci Mar 21 13:08:15 asus-openrc kernel: [ 2302.238811] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 21 13:08:15 asus-openrc kernel: [ 2302.245831] ata2.00: configured for UDMA/100 Mar 21 13:08:15 asus-openrc kernel: [ 2302.382805] [drm] ib test on ring 5 succeeded Mar 21 13:08:15 asus-openrc kernel: [ 2302.408787] usb 1-5: reset high-speed USB device number 3 using ehci-pci Mar 21 13:08:15 asus-openrc kernel: [ 2304.574758] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 21 13:08:15 asus-openrc kernel: [ 2304.575897] ata1.00: configured for UDMA/133 Mar 21 13:08:15 asus-openrc kernel: [ 2304.594657] OOM killer enabled. Mar 21 13:08:15 asus-openrc kernel: [ 2304.594660] Restarting tasks ... done. Mar 21 13:08:15 asus-openrc kernel: [ 2304.611924] video LNXVIDEO:00: Restoring backlight state Mar 21 13:08:16 asus-openrc kernel: [ 2304.815404] PM: suspend exit Mar 21 13:08:17 asus-openrc kernel: [ 2305.761339] wlp4s0: authenticate with 00:22:07:aa:d4:1f Mar 21 13:08:17 asus-openrc kernel: [ 2305.778860] wlp4s0: send auth to 00:22:07:aa:d4:1f (try 1/3) Mar 21 13:08:17 asus-openrc kernel: [ 2305.780677] wlp4s0: authenticated Mar 21 13:08:17 asus-openrc kernel: [ 2305.783218] wlp4s0: associate with 00:22:07:aa:d4:1f (try 1/3) Mar 21 13:08:17 asus-openrc kernel: [ 2305.789606] wlp4s0: RX AssocResp from 00:22:07:aa:d4:1f (capab=0x411 status=0 aid=2) Mar 21 13:08:17 asus-openrc kernel: [ 2305.789783] wlp4s0: associated Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.043295] elogind-daemon[2827]: New seat seat0. Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.067600] elogind-daemon[2827]: Watching system buttons on /dev/input/event3 (Power Button) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077134] elogind-daemon[2827]: Watching system buttons on /dev/input/event0 (Power Button) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077440] elogind-daemon[2827]: Watching system buttons on /dev/input/event2 (Lid Switch) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077806] elogind-daemon[2827]: Watching system buttons on /dev/input/event1 (Sleep Button) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087128] elogind-daemon[2827]: Watching system buttons on /dev/input/event9 (Logitech USB Receiver) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087441] elogind-daemon[2827]: Watching system buttons on /dev/input/event11 (Logitech USB Receiver Consumer Control) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087790] elogind-daemon[2827]: Watching system buttons on /dev/input/event12 (Logitech USB Receiver System Control) Mar 21 13:08:23 asus-openrc gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.105943] elogind-daemon[2827]: Watching system buttons on /dev/input/event6 (AT Translated Set 2 keyboard) Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.107239] elogind-daemon[2827]: New session 1 of user leio. Mar 21 13:08:24 asus-openrc gdm: Child process -2834 was already dead. However I end up with rather dead input, so I can't even enter password anymore. SSH sessions work and elogind is alive from automatic restart once again. Looks like elogind-daemon freezes up somewhere in the suspending process.
Inputs I can remove and re-add (the USB mouse receiver) start working again afterwards
This really smells like gdm is too fast. The freeze on suspending sounds like something is blocking. The "what" isn't clear, yet. Could you please start elogind from runlevel "boot" and start gdm by hand from the console instead of doing so automatically? I'd like to know whether the issues go away if gdm is started clearly after elogind has set everything up porperly. Further, if you log into the console without gdm being started, does loginctl show that session? In your log, gdm errors out _before_ the sessions were set up. That's really odd. Do you start gdm directly or via xdm? Apart from that I am currently completely at sea, as I don't use Gnome anywhere and haven't heard of such issues. I follow Dantrell's "Gnome without systemd" thread lightly, to get hints about stuff to improve, but I don't remember something like that having been talked about. Thread: https://forums.gentoo.org/viewtopic-t-1082226.html (Although I guess you know that one better than me, right?) Oh! One last question: Which version of elogind do you use? And one nasty plea: Could you, by any chance, test sys-auth/elogind-241.9999 from my "seden" overlay (via layman) whether that improves things? If it does, there is something wrong with what elogind does for sure. The risk is, of course, that if the situation does not improve for you, that we might end up none the wiser... ...at least for now...
I'll try these things out soon. I have a much simpler patchset to gdm than dantrell, but should be no difference in practice. I keep the pam_systemd.so entry in gdm's pam file, as it's optional with the minus in front, just adding elogind one, while he swaps them out. And I just make gdm configure think it finds systemd from libelogind by passing SYSTEMD_CFLAGS and SYSTEMD_LIBS to it from pkg-config libelogind output (--cflags and --libs accordingly) to not patch and eautoreconf that. I'm not sure the initial login warnings and suspend trouble are related, but we'll see from testing soon I suppose. I have version 241.1 on there. loginctl does show all sessions until it isn't hung from partial suspend. I start with xdm and it's all wayland, no X besides Xwayland. There is nothing besides xdm to start gdm with OpenRC. The gdm used gnome-shell gets shut down too after login and it doesn't show up as a session in loginctl until I go back to the gdm VT (with wayland gdm reaps the gnome-shell and gnome-session stuff away to save memory now when the VT isn't active after a timeout == soon after logged in; and comes back when doing "Switch user" or manually changing back to VT7).
Hmmm... * When I look into the tree, gnome-base/gdm has elogind support now. Are you sure that patching is still necessary and is not doing any damage? * Do you have "elogind" USE flag enabled globally and both "systemd" and "consolekit" disabled? elogind does not do anything spectacular on suspending/hibernating the system. 1. open /sys/power/state and fail if this is not possible 2. Determine hibernation location when hibernating (aka which swap to write to) 3. Write configured "HibernateMode" into /sys/power/disk when hibernating 4. Execute all scripts from SYSTEM_SLEEP_PATH (defaults to /lib64/elogind/system-sleep) and fail on errors if "AllowSuspendInterrupts" is set to yes 5. Write "SuspendState" or "HibernateState" to /sys/power/state If, from here on, suspension hangs and you have to kill elogind, than someone has a thumb on /sys/power/state and elogind is simply waiting for the write to finish. (Which normally is when the system is resumed.) Is there anything running which might race with elogind to send the system to sleep? Or hindering it to do so? What happens if you do $ sudo echo "mem" > /sys/power/state manually?
The only patching I'm doing in gdm is to add a -session optional pam_elogind.so line to /etc/pam.d/gdm-launch-environment file, alongside the existing -session optional pam_systemd.so line, otherwise the gdm launch environment doesn't properly talk to elogind with its custom pam config, presumably. The rest is done via configure precious arguments. That system has never had systemd nor consolekit installed. It's a fresh test install on a separate disk that went straight towards elogind. I have added USE="elogind -consolekit" to GNOME profiles some time ago and that box has been handled with emerge --changed-use.
Created attachment 570912 [details] strace output while suspending I don't see anything else accessing /sys/power/state, other than elogind-daemon, at the time of the stuckedness as per lsof. Attached is a strace output from trying to suspend via gnome-shell menu. Excerpts from IRC about it (Sven disconnected with timeout at some point): <leio> first half of it is from opening gnome-shell top right menu <leio> it probably queries some CanSuspend and whatever stuff <leio> and then it ends once it gets stuck from suspend action <leio> computer stays on, screen is blanked <leio> if I move touchpad, gnome unlock screen is visible <leio> and I can try to unlock <leio> but because logind is stuck, it doesn't work <leio> I've heard success reports with this gnome elogind stuff from others, however; just having problems on my own test rig <leio> though only one explicit success report regarding suspending; another I talked to just doesn't have a working suspend overall <leio> that strace probably quite revealing in which thing to try next? The in-use proc/dev thing? <leio> at least ssh'ing in as root for extra debugging works after a long wait :) <leio> at least lsof /sys/power/state only claims elogind-daemon having it open (and stuck)
I have thought about this, and I think we are getting nearer. * The only difference between elogind suspending and manually writing "mem" into /sys/power/state is elogind sending "PrepareForSleep" to all registered processes. * After a long wait, which is inconsistent in length between attempts, the machine suspends. * The delay is during "Freezing user processes" My guess is, that at least one process is stuck in whatever it does as a reaction on the "PrepareForSleep" signal. That process is either catching (and then delaying) the freeze signal (SIGSTOP I guess...) or is locked in Disk Wait. As you can ssh into the machine during that delay, could you please have a look with 'ps' and/or [h]top which process that might be? If it is Disk Wait, it will have a D+ state. elogind doesn't count, as it waits for its write to /sys/power/state to return. But I hope the process is actively delaying the freeze, meaning it'll generate CPU usage in [h]top and/or IO in iotop. My hope comes from D+ being absolutely uninterruptible.
As it works with systemd, but elogind isn't doing anything different than systemd, the reason might be a process that tries to talk to systemd manager when receiving the PrepareForSleep signal and then simply hangs until it times out. If we can find out which process that is, we can patch the package and send a fix upstream.
For me, this only happens when trying to suspend from a Plasma wayland session, either from the interface, or via ssh with loginctl suspend. Everything seems to be normal under X using both methods. Please let me know if there is any additional info I can provide.
(In reply to Josh Norton from comment #14) > For me, this only happens when trying to suspend from a Plasma wayland > session, either from the interface, or via ssh with loginctl suspend. > Everything seems to be normal under X using both methods. > > Please let me know if there is any additional info I can provide. Yep same here (but GNOME Wayland). Under X everything works perfectly. I'm also on dedicated NVIDIA GPU (RTX 4090)
(In reply to Brett Alcox from comment #15) > Yep same here (but GNOME Wayland). Under X everything works perfectly. I'm > also on dedicated NVIDIA GPU (RTX 4090) Here to report the same issue. Can't 'loginctl suspend' from a Wayland session, causes elogind to hang. As noted in another comment, if I kill -9 elogind, the suspend will suddenly complete. On the console it works. GPU is an AMD ATI Mobility Radeon HD 5730.
(In reply to Jonathan Changnon from comment #16) > Here to report the same issue. Can't 'loginctl suspend' from a Wayland > session, causes elogind to hang. As noted in another comment, if I kill -9 > elogind, the suspend will suddenly complete. On the console it works. > > GPU is an AMD ATI Mobility Radeon HD 5730. Tested this with X and KDE, 'loginctl suspend' works as expected. So, the problem does appear to be Wayland/elogind related.
Is this still a problem?
I believe so. I still get the same behavior I described.
Please test with elogind-255.5-r1.
It seems to be fixed for me.