738640 – net-misc/openssh doesn't kill active sessions when the machine is rebooted

Bug 738640 - net-misc/openssh doesn't kill active sessions when the machine is rebooted

Summary: net-misc/openssh doesn't kill active sessions when the machine is rebooted

Status:	CONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2020-08-23 01:43 UTC by Herbert Wantesh
Modified:	2021-03-08 19:10 UTC (History)
CC List:	1 user (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Herbert Wantesh 2020-08-23 01:43:10 UTC

openssh-8.3_p1-r5 doesn't kill session when the machine is rebooted, which leads to hanging ssh session that have to be kileld manually

Reproducible: Always

Steps to Reproduce:
1. start sshd
2. reboot
Actual Results:  
ssh session hangs

Expected Results:  
open ssh sessions get killed when the machine reboots

Comment 1 Herbert Wantesh 2020-08-23 02:04:13 UTC

adding this to stop_pre fixes this:

        if [ "${RC_RUNLEVEL}" = "shutdown" ]; then
                SSH_CLIENT_PIDS="$(pgrep -f 'sshd:')"
                if [[ -n ${SSH_CLIENT_PIDS} ]] ; then
                    kill -TERM ${SSH_CLIENT_PIDS}
                fi
        fi

Comment 2 Mike Gilbert gentoo-dev

2020-11-05 18:42:30 UTC

Is this a regression from a previous version?

Comment 3 Thomas Deutschmann (RETIRED) gentoo-dev

2020-11-05 19:15:23 UTC

No, not a regression.

I remember that I filed a similar bug when I was new in Gentoo (like other distributions are actively killing and Gentoo was behaving differently) and also had a conversation with William about this:

The service itself is only stopping the master process but not any running child because we don't want to disconnect system administrator (the suggested fix is taking care of this by checking for runlevel).

I remember that I was told that /etc/init.d/killprocs should normally take care of childs so it was believed this isn't needed. Would be interesting to understand why killprocs isn't helping us here...

Comment 4 Thomas Deutschmann (RETIRED) gentoo-dev

2020-11-05 19:26:22 UTC

Hrm, I can no longer reproduce the problem -- it's working for me. killprocs service is killing remaining processes, including sshd child processes which will terminate any active SSH connection as expected.

Comment 5 Mike Gilbert gentoo-dev

2020-11-05 20:28:54 UTC

I don't think it makes sense to add logic to the sshd init script to kill existing ssh sessions.

Comment 6 Herbert Wantesh 2020-11-06 15:26:59 UTC

it doesnt work for me. rebooting the machine over an active ssh session creates a "frozen" session for me

Comment 7 Thomas Deutschmann (RETIRED) gentoo-dev

2020-11-06 16:08:05 UTC

Apply the following changes to debug this,

> --- /etc/init.d/killprocs.old   2020-11-06 17:06:26.000000000 +0100
> +++ /etc/init.d/killprocs       2020-11-05 20:32:23.000000000 +0100
> @@ -18,10 +18,15 @@
> 
>  start()
>  {
> +       set -x
> +       pgrep sshd
> +
>         ebegin "Terminating remaining processes"
>         kill_all -v 15 ${killall5_opts}
>         eend 0
>         ebegin "Killing remaining processes"
>         kill_all -v 9 ${killall5_opts}
>         eend 0
> +
> +       sleep 20
>  }

Comment 8 Mike Gilbert gentoo-dev

2020-11-06 16:25:48 UTC

Possibly the network interface(s) are stopped before killprocs is started?

Comment 9 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev

2020-11-06 17:08:01 UTC

(In reply to Mike Gilbert from comment #8)
> Possibly the network interface(s) are stopped before killprocs is started?

Very likely. Happens to me when I let NetworkManager handle my network devices. NM usually gets stopped (on openrc systems) before the ssh-logins are taken down.

Comment 10 Herbert Wantesh 2021-03-03 16:29:46 UTC

ping

Comment 11 Mike Gilbert gentoo-dev

2021-03-03 17:41:07 UTC

This works fine on systemd since it is smart enough to kill user sessions before stopping the network.

OpenRC doesn't provide any way to identify user sessions, and doesn't have any logic to terminate them before stopping the network.

We could add a workaround to the sshd init script, but I don't really see the point. 

I vote WONTFIX on this.

Comment 12 Thomas Deutschmann (RETIRED) gentoo-dev

2021-03-07 19:38:39 UTC

I changed my mind. There are multiple scenarios where killprocs will be too late, for example when network were already stopped.

I am currently testing something like

> # diff -u /var/db/repos/gentoo/net-misc/openssh/files/sshd-r1.initd /etc/init.d/sshd
> --- /var/db/repos/gentoo/net-misc/openssh/files/sshd-r1.initd   2019-03-08 01:31:51.175977236 +0100
> +++ /etc/init.d/sshd    2021-03-07 20:34:27.006650772 +0100
> @@ -72,10 +72,23 @@
>  }
> 
>  stop_pre() {
> -       # If this is a restart, check to make sure the user's config
> -       # isn't busted before we stop the running daemon.
>         if [ "${RC_CMD}" = "restart" ] ; then
> +               # If this is a restart, check to make sure the user's config
> +               # isn't busted before we stop the running daemon.
>                 checkconfig || return $?
> +       elif yesno "${RC_GOINGDOWN}" && [ -s "${pidfile}" ] && hash pgrep 2>/dev/null ; then
> +               # Disconnect any clients before killing the master process
> +               local pid=$(cat "${pidfile}" 2>/dev/null)
> +               if [ -n "${pid}" ] ; then
> +                       local ssh_session_pattern='sshd: \S.*@pts/[0-9]+'
> +
> +                       IFS="${IFS}@"
> +                       local daemon pid pty user
> +                       pgrep -a -P ${pid} -f "$ssh_session_pattern" | while read pid daemon user pty ; do
> +                               ewarn "Found ${daemon%:} session ${pid} on ${pty}; sending SIGTERM ..."
> +                               kill "${pid}" || true
> +                       done
> +               fi
>         fi
>  }
> 

and also plan to bring something like https://salsa.debian.org/ssh-team/openssh/-/blob/master/debian/systemd/ssh-session-cleanup.service for systemd users.

Comment 13 Patrick McLean gentoo-dev

2021-03-08 18:49:06 UTC

You could use the openrc cgroup_cleanup function on hosts with cgroups enabled, it's probably nicer (and more complete) than the approach detailed here. It will also only get the ssh sessions for the sshd instance that is going down (in the case a host has multiple sshds running).

Comment 14 Thomas Deutschmann (RETIRED) gentoo-dev

2021-03-08 19:10:21 UTC

(In reply to Patrick McLean from comment #13)
> You could use the openrc cgroup_cleanup function on hosts with cgroups
> enabled, it's probably nicer (and more complete) than the approach detailed
> here. It will also only get the ssh sessions for the sshd instance that is
> going down (in the case a host has multiple sshds running).

Thank you for the feedback. I am not using cgroup_cleanup because like you said, it's not available for everyone and having two code paths would require two tests...

My approach should take care of multiple sshd instances because we are passing pidfile from current master process to identify child processes.

However, during testing I thought I am over-engineering given that we are about to shutdown, so it should be fine to end *all* connections, not just from this specific sshd instance.