Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 681080 - sys-auth/elogind connection timeouts
Summary: sys-auth/elogind connection timeouts
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Andreas Sturmlechner
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-20 20:25 UTC by Mart Raudsepp
Modified: 2019-03-28 17:04 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
strace output while suspending (elogind-suspend-strace.txt,21.20 KB, text/plain)
2019-03-27 09:34 UTC, Mart Raudsepp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mart Raudsepp gentoo-dev 2019-03-20 20:25:17 UTC
Using elogind with GNOME is resulting in eventually being unable to unlock the machine, due to:

gdm: Unable to activate session: Timeout was reached

but even loginctl over ssh doesn't work at that point:

$ loginctl 
Failed to list sessions: Connection timed out


Where do I begin to debug this?


Some other log entries:

Mar 20 14:33:25 gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address 
Mar 20 14:33:51 gdm-password]: AccountsService: Could not get current seat: No such device or address 
Mar 20 22:18:11 gdm: Unable to activate session: Timeout was reached
Comment 1 Mart Raudsepp gentoo-dev 2019-03-20 20:30:11 UTC
I should probably mention that I don't have elogind service enabled, due to documentation on wiki stating that it'll auto-activate (as I would expect from dbus activatable services), and one elogind-daemon process is running.
Comment 2 Sven Eden 2019-03-21 07:04:01 UTC
Are there any errors or warning from elogind-daemon in /var/log/messages?
Comment 3 Mart Raudsepp gentoo-dev 2019-03-21 10:31:55 UTC
No errors, but why could AccountsService not be getting current seat from sd_session_get_seat call there?
I did add a
-session    optional    pam_elogind.so
line to /etc/pam.d/gdm-launch-environment

Some initial login slowness after boot I did figure out - crng init was slow, so had to add the haveged service to boot runlevel, as unfortunately commonly needed with OpenRC. But the AccountsService complaining didn't go away:

Mar 21 12:16:44 asus-openrc kernel: <38>[   16.159506] elogind-daemon[1847]: New seat seat0.
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.164139] elogind-daemon[1847]: Watching system buttons on /dev/input/event3 (Power Button)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.172818] elogind-daemon[1847]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.174680] elogind-daemon[1847]: Watching system buttons on /dev/input/event2 (Lid Switch)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.174975] elogind-daemon[1847]: Watching system buttons on /dev/input/event1 (Sleep Button)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.185085] elogind-daemon[1847]: Watching system buttons on /dev/input/event9 (Logitech USB Receiver)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.185779] elogind-daemon[1847]: Watching system buttons on /dev/input/event11 (Logitech USB Receiver Consumer Control)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.186433] elogind-daemon[1847]: Watching system buttons on /dev/input/event12 (Logitech USB Receiver System Control)
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.204709] elogind-daemon[1847]: Watching system buttons on /dev/input/event6 (AT Translated Set 2 keyboard)
Mar 21 12:16:44 asus-openrc gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address 
Mar 21 12:16:44 asus-openrc kernel: <38>[   16.489592] elogind-daemon[1847]: New session c1 of user gdm.
Mar 21 12:16:53 asus-openrc kernel: <38>[   25.359586] elogind-daemon[1847]: New session 1 of user leio.
Mar 21 12:18:47 asus-openrc gdm-password]: AccountsService: Could not get current seat: No such device or address 

That complaining also remains if having added elogind to default runlevel.

Haven't been able to reproduce the unlock and timeout issue today, probably have to wait for some time, as it usually happened if I hadn't touched the computer, so the automatic lock kicked in
Comment 4 Mart Raudsepp gentoo-dev 2019-03-21 11:07:13 UTC
Looks like suspend doesn't work and causes the issues:

Mar 21 13:03:56 asus-openrc kernel: <38>[ 2053.990532] elogind-daemon[1586]: Suspending system...
Mar 21 13:03:56 asus-openrc kernel: [ 2053.990561] PM: suspend entry (deep)
Mar 21 13:03:56 asus-openrc kernel: [ 2053.990564] PM: Syncing filesystems ... done.
Mar 21 13:05:11 asus-openrc gdm: Unable to activate session: Timeout was reached 

but in reality system is still running, but elogind is giving the timeouts. GNOME also gets locked, as is done before suspending, and can't unlock afterwards, which is that timeout entry in the log above. loginctl is timing out at this point as well.
UPower is also complaining something on lock screen before for automatic suspend, but that was a cleaner failure that didn't actually suspend the system, and logind remaining operational from that - it was a manual suspend call that caused the timeouts to start happening.

Suspend worked great on this machine with a systemd install.
Comment 5 Mart Raudsepp gentoo-dev 2019-03-21 11:11:45 UTC
If I kill -9 the elogind-daemon process, then the suspend process continues with a wakeup too:

Mar 21 13:08:15 asus-openrc kernel: [ 2301.071260] Freezing user space processes ... (elapsed 0.003 seconds) done.
Mar 21 13:08:15 asus-openrc kernel: [ 2301.074394] OOM killer disabled.
Mar 21 13:08:15 asus-openrc kernel: [ 2301.074396] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
Mar 21 13:08:15 asus-openrc kernel: [ 2301.075896] printk: Suspending console(s) (use no_console_suspend to debug)
Mar 21 13:08:15 asus-openrc kernel: [ 2301.076465] wlp4s0: deauthenticating from 00:22:07:aa:d4:1f by local choice (Reason: 3=DEAUTH_LEAVING)
Mar 21 13:08:15 asus-openrc kernel: [ 2301.084698] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Mar 21 13:08:15 asus-openrc kernel: [ 2301.084825] sd 0:0:0:0: [sda] Stopping disk
Mar 21 13:08:15 asus-openrc kernel: [ 2301.712829] ACPI: EC: interrupt blocked
Mar 21 13:08:15 asus-openrc kernel: [ 2301.726862] ACPI: Preparing to enter system sleep state S3
Mar 21 13:08:15 asus-openrc kernel: [ 2301.727286] ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
Mar 21 13:08:15 asus-openrc kernel: [ 2301.729771] ACPI: EC: event blocked
Mar 21 13:08:15 asus-openrc kernel: [ 2301.729773] ACPI: EC: EC stopped
Mar 21 13:08:15 asus-openrc kernel: [ 2301.729775] PM: Saving platform NVS memory
Mar 21 13:08:15 asus-openrc kernel: [ 2301.731932] Disabling non-boot CPUs ...
Mar 21 13:08:15 asus-openrc kernel: [ 2301.734386] smpboot: CPU 1 is now offline
Mar 21 13:08:15 asus-openrc kernel: [ 2301.735209] ACPI: Low-level resume complete
Mar 21 13:08:15 asus-openrc kernel: [ 2301.735263] ACPI: EC: EC started
Mar 21 13:08:15 asus-openrc kernel: [ 2301.735264] PM: Restoring platform NVS memory
Mar 21 13:08:15 asus-openrc kernel: [ 2301.735756] LVT offset 0 assigned for vector 0x400
Mar 21 13:08:15 asus-openrc kernel: [ 2301.736060] Enabling non-boot CPUs ...
Mar 21 13:08:15 asus-openrc kernel: [ 2301.736117] x86: Booting SMP configuration:
Mar 21 13:08:15 asus-openrc kernel: [ 2301.736119] smpboot: Booting Node 0 Processor 1 APIC 0x1
Mar 21 13:08:15 asus-openrc kernel: [ 2301.738987]  cache: parent cpu1 should not be sleeping
Mar 21 13:08:15 asus-openrc kernel: [ 2301.739105] microcode: CPU1: patch_level=0x05000101
Mar 21 13:08:15 asus-openrc kernel: [ 2301.739239] CPU1 is up
Mar 21 13:08:15 asus-openrc kernel: [ 2301.739806] ACPI: Waking up from system sleep state S3
Mar 21 13:08:15 asus-openrc kernel: [ 2301.741341] ACPI: EC: interrupt unblocked
Mar 21 13:08:15 asus-openrc kernel: [ 2301.765446] ACPI: EC: event unblocked
Mar 21 13:08:15 asus-openrc kernel: [ 2301.765956] [drm] Found smc ucode version: 0x00010601
Mar 21 13:08:15 asus-openrc kernel: [ 2301.772114] [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
Mar 21 13:08:15 asus-openrc kernel: [ 2301.772298] radeon 0000:00:01.0: WB enabled
Mar 21 13:08:15 asus-openrc kernel: [ 2301.772304] radeon 0000:00:01.0: fence driver on ring 0 use gpu addr 0x0000000018000c00 and cpu addr 0x00000000214bb16d
Mar 21 13:08:15 asus-openrc kernel: [ 2301.772308] radeon 0000:00:01.0: fence driver on ring 3 use gpu addr 0x0000000018000c0c and cpu addr 0x000000006520734b
Mar 21 13:08:15 asus-openrc kernel: [ 2301.772773] radeon 0000:00:01.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0x0000000077a59baf
Mar 21 13:08:15 asus-openrc kernel: [ 2301.789208] [drm] ring test on 0 succeeded in 0 usecs
Mar 21 13:08:15 asus-openrc kernel: [ 2301.789217] [drm] ring test on 3 succeeded in 3 usecs
Mar 21 13:08:15 asus-openrc kernel: [ 2301.791419] sd 0:0:0:0: [sda] Starting disk
Mar 21 13:08:15 asus-openrc kernel: [ 2301.835923] [drm] ring test on 5 succeeded in 1 usecs
Mar 21 13:08:15 asus-openrc kernel: [ 2301.856027] [drm] UVD initialized successfully.
Mar 21 13:08:15 asus-openrc kernel: [ 2301.856089] ACPI: button: The lid device is not compliant to SW_LID.
Mar 21 13:08:15 asus-openrc kernel: [ 2301.856227] [drm] ib test on ring 0 succeeded in 0 usecs
Mar 21 13:08:15 asus-openrc kernel: [ 2301.856290] [drm] ib test on ring 3 succeeded in 0 usecs
Mar 21 13:08:15 asus-openrc kernel: [ 2301.954540] r8169 0000:03:00.0 enp3s0: Link is Down
Mar 21 13:08:15 asus-openrc kernel: [ 2302.079875] ata3: SATA link down (SStatus 0 SControl 300)
Mar 21 13:08:15 asus-openrc kernel: [ 2302.079928] ata4: SATA link down (SStatus 0 SControl 300)
Mar 21 13:08:15 asus-openrc kernel: [ 2302.152752] usb 1-3: reset high-speed USB device number 2 using ehci-pci
Mar 21 13:08:15 asus-openrc kernel: [ 2302.238811] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar 21 13:08:15 asus-openrc kernel: [ 2302.245831] ata2.00: configured for UDMA/100
Mar 21 13:08:15 asus-openrc kernel: [ 2302.382805] [drm] ib test on ring 5 succeeded
Mar 21 13:08:15 asus-openrc kernel: [ 2302.408787] usb 1-5: reset high-speed USB device number 3 using ehci-pci
Mar 21 13:08:15 asus-openrc kernel: [ 2304.574758] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar 21 13:08:15 asus-openrc kernel: [ 2304.575897] ata1.00: configured for UDMA/133
Mar 21 13:08:15 asus-openrc kernel: [ 2304.594657] OOM killer enabled.
Mar 21 13:08:15 asus-openrc kernel: [ 2304.594660] Restarting tasks ... done.
Mar 21 13:08:15 asus-openrc kernel: [ 2304.611924] video LNXVIDEO:00: Restoring backlight state
Mar 21 13:08:16 asus-openrc kernel: [ 2304.815404] PM: suspend exit
Mar 21 13:08:17 asus-openrc kernel: [ 2305.761339] wlp4s0: authenticate with 00:22:07:aa:d4:1f
Mar 21 13:08:17 asus-openrc kernel: [ 2305.778860] wlp4s0: send auth to 00:22:07:aa:d4:1f (try 1/3)
Mar 21 13:08:17 asus-openrc kernel: [ 2305.780677] wlp4s0: authenticated
Mar 21 13:08:17 asus-openrc kernel: [ 2305.783218] wlp4s0: associate with 00:22:07:aa:d4:1f (try 1/3)
Mar 21 13:08:17 asus-openrc kernel: [ 2305.789606] wlp4s0: RX AssocResp from 00:22:07:aa:d4:1f (capab=0x411 status=0 aid=2)
Mar 21 13:08:17 asus-openrc kernel: [ 2305.789783] wlp4s0: associated
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.043295] elogind-daemon[2827]: New seat seat0.
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.067600] elogind-daemon[2827]: Watching system buttons on /dev/input/event3 (Power Button)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077134] elogind-daemon[2827]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077440] elogind-daemon[2827]: Watching system buttons on /dev/input/event2 (Lid Switch)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.077806] elogind-daemon[2827]: Watching system buttons on /dev/input/event1 (Sleep Button)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087128] elogind-daemon[2827]: Watching system buttons on /dev/input/event9 (Logitech USB Receiver)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087441] elogind-daemon[2827]: Watching system buttons on /dev/input/event11 (Logitech USB Receiver Consumer Control)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.087790] elogind-daemon[2827]: Watching system buttons on /dev/input/event12 (Logitech USB Receiver System Control)
Mar 21 13:08:23 asus-openrc gdm-launch-environment]: AccountsService: Could not get current seat: No such device or address 
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.105943] elogind-daemon[2827]: Watching system buttons on /dev/input/event6 (AT Translated Set 2 keyboard)
Mar 21 13:08:23 asus-openrc kernel: <38>[ 2312.107239] elogind-daemon[2827]: New session 1 of user leio.
Mar 21 13:08:24 asus-openrc gdm: Child process -2834 was already dead. 

However I end up with rather dead input, so I can't even enter password anymore. SSH sessions work and elogind is alive from automatic restart once again.

Looks like elogind-daemon freezes up somewhere in the suspending process.
Comment 6 Mart Raudsepp gentoo-dev 2019-03-21 12:09:57 UTC
Inputs I can remove and re-add (the USB mouse receiver) start working again afterwards
Comment 7 Sven Eden 2019-03-21 18:54:34 UTC
This really smells like gdm is too fast.

The freeze on suspending sounds like something is blocking. The "what" isn't clear, yet.

Could you please start elogind from runlevel "boot" and start gdm by hand from the console instead of doing so automatically? I'd like to know whether the issues go away if gdm is started clearly after elogind has set everything up porperly.

Further, if you log into the console without gdm being started, does loginctl show that session?

In your log, gdm errors out _before_ the sessions were set up. That's really odd.
Do you start gdm directly or via xdm?

Apart from that I am currently completely at sea, as I don't use Gnome anywhere and haven't heard of such issues.

I follow Dantrell's "Gnome without systemd" thread lightly, to get hints about stuff to improve, but I don't remember something like that having been talked about.
Thread: https://forums.gentoo.org/viewtopic-t-1082226.html (Although I guess you know that one better than me, right?)

Oh! One last question: Which version of elogind do you use?

And one nasty plea: Could you, by any chance, test sys-auth/elogind-241.9999 from my "seden" overlay (via layman) whether that improves things?
If it does, there is something wrong with what elogind does for sure. The risk is, of course, that if the situation does not improve for you, that we might end up none the wiser...
...at least for now...
Comment 8 Mart Raudsepp gentoo-dev 2019-03-21 19:07:01 UTC
I'll try these things out soon.
I have a much simpler patchset to gdm than dantrell, but should be no difference in practice. I keep the pam_systemd.so entry in gdm's pam file, as it's optional with the minus in front, just adding elogind one, while he swaps them out. And I just make gdm configure think it finds systemd from libelogind by passing SYSTEMD_CFLAGS and SYSTEMD_LIBS to it from pkg-config libelogind output (--cflags and --libs accordingly) to not patch and eautoreconf that.

I'm not sure the initial login warnings and suspend trouble are related, but we'll see from testing soon I suppose.
I have version 241.1 on there.

loginctl does show all sessions until it isn't hung from partial suspend.

I start with xdm and it's all wayland, no X besides Xwayland. There is nothing besides xdm to start gdm with OpenRC.
The gdm used gnome-shell gets shut down too after login and it doesn't show up as a session in loginctl until I go back to the gdm VT (with wayland gdm reaps the gnome-shell and gnome-session stuff away to save memory now when the VT isn't active after a timeout == soon after logged in; and comes back when doing "Switch user" or manually changing back to VT7).
Comment 9 Sven Eden 2019-03-27 06:34:05 UTC
Hmmm... 
  * When I look into the tree, gnome-base/gdm has elogind support now.
    Are you sure that patching is still necessary and is not doing any damage?
  * Do you have "elogind" USE flag enabled globally and both "systemd" and
    "consolekit" disabled?

elogind does not do anything spectacular on suspending/hibernating the system.

  1. open /sys/power/state and fail if this is not possible
  2. Determine hibernation location when hibernating
     (aka which swap to write to)
  3. Write configured "HibernateMode" into /sys/power/disk when hibernating
  4. Execute all scripts from SYSTEM_SLEEP_PATH
     (defaults to /lib64/elogind/system-sleep)
     and fail on errors if "AllowSuspendInterrupts" is set to yes
  5. Write "SuspendState" or "HibernateState" to /sys/power/state

If, from here on, suspension hangs and you have to kill elogind, than someone has a thumb on /sys/power/state and elogind is simply waiting for the write to finish. (Which normally is when the system is resumed.)

Is there anything running which might race with elogind to send the system to sleep? Or hindering it to do so?

What happens if you do
  $ sudo echo "mem" > /sys/power/state
manually?
Comment 10 Mart Raudsepp gentoo-dev 2019-03-27 06:52:59 UTC
The only patching I'm doing in gdm is to add a
-session    optional    pam_elogind.so
line to /etc/pam.d/gdm-launch-environment file, alongside the existing
-session    optional    pam_systemd.so
line, otherwise the gdm launch environment doesn't properly talk to elogind with its custom pam config, presumably.
The rest is done via configure precious arguments.

That system has never had systemd nor consolekit installed. It's a fresh test install on a separate disk that went straight towards elogind.
I have added USE="elogind -consolekit" to GNOME profiles some time ago and that box has been handled with emerge --changed-use.
Comment 11 Mart Raudsepp gentoo-dev 2019-03-27 09:34:29 UTC
Created attachment 570912 [details]
strace output while suspending

I don't see anything else accessing /sys/power/state, other than elogind-daemon, at the time of the stuckedness as per lsof.

Attached is a strace output from trying to suspend via gnome-shell menu.

Excerpts from IRC about it (Sven disconnected with timeout at some point):

<leio> first half of it is from opening gnome-shell top right menu
<leio> it probably queries some CanSuspend and whatever stuff
<leio> and then it ends once it gets stuck from suspend action
<leio> computer stays on, screen is blanked
<leio> if I move touchpad, gnome unlock screen is visible
<leio> and I can try to unlock
<leio> but because logind is stuck, it doesn't work
<leio> I've heard success reports with this gnome elogind stuff from others, however; just having problems on my own test rig
<leio> though only one explicit success report regarding suspending; another I talked to just doesn't have a working suspend overall
<leio> that strace probably quite revealing in which thing to try next? The in-use proc/dev thing?
<leio> at least ssh'ing in as root for extra debugging works after a long wait :)
<leio> at least lsof /sys/power/state only claims elogind-daemon having it open (and stuck)
Comment 12 Sven Eden 2019-03-28 06:13:10 UTC
I have thought about this, and I think we are getting nearer.

 * The only difference between elogind suspending and manually writing "mem" into /sys/power/state is elogind sending "PrepareForSleep" to all registered processes.

 * After a long wait, which is inconsistent in length between attempts, the machine suspends.

 * The delay is during "Freezing user processes"

My guess is, that at least one process is stuck in whatever it does as a reaction on the "PrepareForSleep" signal.
That process is either catching (and then delaying) the freeze signal (SIGSTOP I guess...) or is locked in Disk Wait.

As you can ssh into the machine during that delay, could you please have a look with 'ps' and/or [h]top which process that might be?
If it is Disk Wait, it will have a D+ state. elogind doesn't count, as it waits for its write to /sys/power/state to return.

But I hope the process is actively delaying the freeze, meaning it'll generate CPU usage in [h]top and/or IO in iotop.
My hope comes from D+ being absolutely uninterruptible.
Comment 13 Sven Eden 2019-03-28 17:04:18 UTC
As it works with systemd, but elogind isn't doing anything different than systemd, the reason might be a process that tries to talk to systemd manager when receiving the PrepareForSleep signal and then simply hangs until it times out.

If we can find out which process that is, we can patch the package and send a fix upstream.