When a gentoo box is brought down, it changes run level to 0, killing all daemon services from other run levels. This includes sshd, if it is running. But stopping sshd won't kill or terminate any connections; the connections process will automatically attach themselves to init, and are then closed without notifing the other part of the connection, leaving out dead connections. This is a problem when shutdown command is issued with a delay to let all processes have enough time saving their status and quit gracefully, because ssh clients are never given such time. I find this problem on x86 and amd64 machines, so I guess it is probably on all platforms. Reproducible: Always Steps to Reproduce: 1. login to a gentoo box over ssh. 2. shutdown the server 3. the connection on the client is still there, but dead Expected Results: The connection on the client should be terminated, returning to the shell before connecting. killall5(8) should be called to stop the processes on shutting down, from /etc/init.d/halt.sh or somewhere.
This bug is also present on Arch Linux http://bbs.archlinux.org/viewtopic.php?pid=543532
OpenRC has the killprocs init script which does just this.
http://roy.marples.name/projects/openrc/browser/trunk/init.d/killprocs.in ?
Yes, that one
I'm not sure how it is setup in Gentoo but in Arch Linux that is called already. /etc/rc.shutdown: ... # Terminate all processes stat_busy "Sending SIGTERM To Processes" /sbin/killall5 -15 &> /dev/null /bin/sleep 5 stat_done stat_busy "Sending SIGKILL To Processes" /sbin/killall5 -9 &> /dev/null /bin/sleep 1 stat_done ... Still have the same problem. PS. Sorry for filling up the gentoo bug tracker with Arch linux stuff...
This is because the network is shutdown before killprocs is called. I've just comitted an update to OpenRC svn which will prevent the network script being stopped on runlevel change by default. The nostop keyword will need to be added to other network related scripts, such as dhcpcd, but that may not be a good default.
And the fix is ?
Honestly I see this issue with nearly every distro I work with and I work with A LOT of distros. It's just a fact of life, use [enter] ~ . and be done with it.
ssh isnt special. the same could happen with any client/server. but i dont believe there is a way to sanely detect "this is a network process" and kill it before taking down the network. this cannot be added to the `sshd` script because having `/etc/init.d/sshd stop` take down clients is wrong. this cannot be added to the net.* scripts both because it cant be detected sanely and even if it could, it too would be wrong. i dont see this as any sort of bug worth "fixing" as the vast majority of cases are valid in that the processes shouldnt be killed. and when they arent, it isnt that big of a deal at all. hit enter, then ~, then ., and be done with it.
*** Bug 406169 has been marked as a duplicate of this bug. ***
(In reply to comment #9) > ssh isnt special. the same could happen with any client/server. but i dont > believe there is a way to sanely detect "this is a network process" and kill > it before taking down the network. > > this cannot be added to the `sshd` script because having `/etc/init.d/sshd > stop` take down clients is wrong. > > this cannot be added to the net.* scripts both because it cant be detected > sanely and even if it could, it too would be wrong. > > i dont see this as any sort of bug worth "fixing" as the vast majority of > cases are valid in that the processes shouldnt be killed. and when they > arent, it isnt that big of a deal at all. hit enter, then ~, then ., and be > done with it. Could we check in the net.* scripts if any programs have connections over a given interface and issue kill commands to them?
(In reply to comment #11) > Could we check in the net.* scripts if any programs have connections over a > given interface and issue kill commands to them? I looked into this in a bit more detail. Adding the following line to /etc/init.d/net.lo's stop() function seems to do the trick, provided that all of the other net.* scripts are symbolically linked to it: for i in `ifconfig ${IFACE} | grep 'inet ' | awk '{ print $2}' | sed 's/addr://'`; do /bin/kill "$(lsof -iTCP@$i -Fp | cut -c2-)"; done; This introduces a dependency on lsof and needs to be tweaked to work for IPv6, but it appears to work on my system.
radhermit in #gentoo-dev suggested the following improvement, which eliminates the use of grep and sed: for i in `ifconfig ${IFACE} | awk '/inet / {gsub(/addr:/,"");print $2}'`; do /bin/kill "$(lsof -iTCP@$i -Fp | cut -c2-)"; done;
(In reply to comment #12) i'm not sure that's acceptable either. i can run `/etc/init.d/net.lo restart` on a remote box right now and not worry about my stuff getting punted.
(In reply to comment #14) > (In reply to comment #12) > > i'm not sure that's acceptable either. i can run `/etc/init.d/net.lo > restart` on a remote box right now and not worry about my stuff getting > punted. What if this were added to the shutdown runlevel, so that it only occurs when the system is actually preparing to halt/restart?
(In reply to comment #15) might work. feel free to post a PoC ;).
Hello, I've been aware of this issue for a long time. It seems to be gentoo-specific. I was discussing it on the forums and experimenting with various suggestions and methods. It seems the best place for this is actually in the /etc/init.d/sshd script itself. A simple check of the current runlevel, placed within the stop() block of the script, will cleanly close any active connections only during system shutdown. I do hope this can be resolved once and for all. http://forums.gentoo.org/viewtopic-p-7242058.html#7242058 if [ "$RC_RUNLEVEL" = shutdown ]; then ps auxw | grep sshd\: | grep -v grep | awk '{print $2}' | xargs kill -s 15 fi
Due to comment #9 as well as the additional comments in the forum thread, I am not comfortable with the fix in comment #17. However, if you use newnet, we don't attempt to bring down the interfaces when we shut down, so why do we attempt to stop them with oldnet? Here is something else to test instead of the suggestions in comment #17 or comment #15. Add the following line at the very top of the stop function in net.lo; this will make the interfaces under oldnet behave like they do in newnet in this respect. They will not go down when the system is going down. yesno $RC_GOINGDOWN && return 0 How do things behave if you do this?
Thanks for the response. I'll try to clarify a bit here. You say Comment #9 is correct? Let's examine that, shall we?: > i dont see this as any sort of bug worth "fixing" as the vast majority of > cases are valid in that the processes shouldnt be killed. and when they > arent, it isnt that big of a deal at all. hit enter, then ~, then ., and be > done with it. I fully understand and appreciate the need to preserve active connections during a process shutdown - such as during an upgrade of ssh. Absolutely correct. However, when the system is going down for shutdown or reboot - the parent sshd process, the associated login shell, and everything under it... are all ultimately terminated anyways, and the client has absolutely NO HOPE of recovering the connection. It's gone for good. When the system comes back up, you need to start a new connection anyways. Right? (Interestingly, once sshd comes back up, THAT's when a hung client will realize what's happened and finally wake-up and recognize there's been a real disconnect.) Now, nearly every other distro I've come across, *performs the courtesy* of closing any active ssh connections. Gentoo leaves them hanging. Why? What possible benefit is there to leaving otherwise *unrecoverable* ssh client connections in a hung state? Now I've tried your suggestion for putting "yesno $RC_GOINGDOWN && return 0" at the top of the stop block in net.lo, -and it does initially seem to appear the interface is kept up a bit longer during shutdown, ... but it's still leaving ssh clients hanging, no change there, sorry. Ultimately, it's of little consequence where the fix fits in - be it my suggested change to the sshd init script - or something elsewhere altogether. I just think it's wrong to label this as 'no big deal' and 'WONTFIX'. It's clearly a ongoing gentoo-specific issue, going on 10+ years now... but the problem is minor! the fix is easy! I've pointed the way! Unfortunately, That's all I can do.
(In reply to comment #19) > Thanks for the response. I'll try to clarify a bit here. > > You say Comment #9 is correct? Let's examine that, shall we?: Sure, but you picked the wrong part of the comment: (In reply to comment #9) > ssh isnt special. the same could happen with any client/server. but i dont > believe there is a way to sanely detect "this is a network process" and kill > it before taking down the network. > > this cannot be added to the `sshd` script because having `/etc/init.d/sshd > stop` take down clients is wrong. > > this cannot be added to the net.* scripts both because it cant be detected > sanely and even if it could, it too would be wrong. I'm attempting to propose what I think is a better solution than trying to kill the processes the way you are suggesting, because I don't agree that a solution specific to sshd is a good one, and I also am not comfortable adding code to the net.lo script to attempt to kill network processes. I was able to reproduce your issue by connecting to a system using ssh then issuing a shutdown command. I did see what you see with ssh not disconnecting when the system was shut down. Then, I added this line to the top of net.lo's stop() function, which is taken from the stop() function in the network script in newnet. yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 When I added this line then executed the shutdown again, I was disconnected from the system before it went down instead of waiting until it came back up. This is the result you are looking for. Correct? Can you please verify that this works for you as well by adding this code to the top of your stop() function in net.lo then shutting down. If this works for you, you should be disconnected on shutdown instead of waiting for the system to come back up. Thanks, William
> added this line to the top of net.lo's stop() function, which is taken from > the stop() function in the network script in newnet. > > yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 Well, yes that does work with my system based of stage3-amd64-20121210, which is great going forward! And older system (that's been kept up to date), however, must be missing something...
I added commit 1280b97 to OpenRc. This means network interfaces will no longer come down by default, so unless you change this, you should not see this issue.
This will be part of OpenRc-0.12.
(In reply to comment #21) > > added this line to the top of net.lo's stop() function, which is taken from > > the stop() function in the network script in newnet. > > > > yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 > > > Well, yes that does work with my system based of stage3-amd64-20121210, > which is great going forward! > > And older system (that's been kept up to date), however, must be missing > something... Hi DNAspark99 Did you figure out why your older system didn't work with yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 ? I too got an older system that has been kept up to date and this doesn't work for me either
(In reply to comment #23) > This will be part of OpenRc-0.12. (In reply to comment #20) > (In reply to comment #19) > > Thanks for the response. I'll try to clarify a bit here. > > > > You say Comment #9 is correct? Let's examine that, shall we?: > > Sure, but you picked the wrong part of the comment: > > (In reply to comment #9) > > ssh isnt special. the same could happen with any client/server. but i dont > > believe there is a way to sanely detect "this is a network process" and kill > > it before taking down the network. > > > > this cannot be added to the `sshd` script because having `/etc/init.d/sshd > > stop` take down clients is wrong. > > > > this cannot be added to the net.* scripts both because it cant be detected > > sanely and even if it could, it too would be wrong. > > I'm attempting to propose what I think is a better solution than trying to > kill the processes the way you are suggesting, because I don't agree that a > solution specific to sshd is a good one, and I also am not comfortable > adding code to the net.lo script to attempt to kill network processes. > > I was able to reproduce your issue by connecting to a system using ssh then > issuing a shutdown command. I did see what you see with ssh not > disconnecting when the system was shut down. > > Then, I added this line to the top of net.lo's stop() function, which is > taken from the stop() function in the network script in newnet. > > yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 This does not work on my system as I use dhcp. By the time /etc/init.d/killprocs is executed, my IP address on eth0 is gone(eth0 is still up and running though) Adding this to /etc/init.d/sshd, stop(): if [[ "$RC_RUNLEVEL" = shutdown ]] ; then pkill -f "sshd:" fi does the trick though.
(In reply to comment #25) > (In reply to comment #23) > > This will be part of OpenRc-0.12. > > (In reply to comment #20) > > (In reply to comment #19) > > > Thanks for the response. I'll try to clarify a bit here. > > > > > > You say Comment #9 is correct? Let's examine that, shall we?: > > > > Sure, but you picked the wrong part of the comment: > > > > (In reply to comment #9) > > > ssh isnt special. the same could happen with any client/server. but i dont > > > believe there is a way to sanely detect "this is a network process" and kill > > > it before taking down the network. > > > > > > this cannot be added to the `sshd` script because having `/etc/init.d/sshd > > > stop` take down clients is wrong. > > > > > > this cannot be added to the net.* scripts both because it cant be detected > > > sanely and even if it could, it too would be wrong. > > > > I'm attempting to propose what I think is a better solution than trying to > > kill the processes the way you are suggesting, because I don't agree that a > > solution specific to sshd is a good one, and I also am not comfortable > > adding code to the net.lo script to attempt to kill network processes. > > > > I was able to reproduce your issue by connecting to a system using ssh then > > issuing a shutdown command. I did see what you see with ssh not > > disconnecting when the system was shut down. > > > > Then, I added this line to the top of net.lo's stop() function, which is > > taken from the stop() function in the network script in newnet. > > > > yesno ${shutdown_network:-YES} && yesno $RC_GOINGDOWN && return 0 > > This does not work on my system as I use dhcp. By the time > /etc/init.d/killprocs > is executed, my IP address on eth0 is gone(eth0 is still up and running > though) > > Adding this to /etc/init.d/sshd, stop(): > if [[ "$RC_RUNLEVEL" = shutdown ]] ; then > pkill -f "sshd:" > fi > does the trick though. hmm, a bit cleaner might be: [[ "$RC_RUNLEVEL" = shutdown ]] && pkill -P `cat "${SSHD_PIDFILE}"` just before stopping the parent. That way you don't kill any stray sshd's which was started by other means.
in the end, I went with this (complete stop block for clarity): stop() { if [ "${RC_CMD}" = "restart" ] ; then checkconfig || return 1 fi ebegin "Stopping ${SVCNAME}" start-stop-daemon --stop --exec "${SSHD_BINARY}" \ --pidfile "${SSHD_PIDFILE}" --quiet eend $? if [ "${RC_RUNLEVEL}" = "shutdown" ]; then SSH_CLIENT_PIDS="$(pgrep -f 'sshd:')" if [[ -n ${SSH_CLIENT_PIDS} ]] ; then kill -TERM ${SSH_CLIENT_PIDS} fi fi }
(In reply to comment #27) > in the end, I went with this (complete stop block for clarity): Please do not do this. Once OpenRC-0.12 is released, you will need to go back to the stock sshd init script. I have been testing here with git OpenRC and the connections are brought down fine with the stock script. > stop() { > if [ "${RC_CMD}" = "restart" ] ; then > checkconfig || return 1 > fi > > ebegin "Stopping ${SVCNAME}" > start-stop-daemon --stop --exec "${SSHD_BINARY}" \ > --pidfile "${SSHD_PIDFILE}" --quiet > eend $? > > if [ "${RC_RUNLEVEL}" = "shutdown" ]; then > SSH_CLIENT_PIDS="$(pgrep -f 'sshd:')" > if [[ -n ${SSH_CLIENT_PIDS} ]] ; then > kill -TERM ${SSH_CLIENT_PIDS} > fi > fi > }
(In reply to comment #28) > (In reply to comment #27) > > in the end, I went with this (complete stop block for clarity): > > Please do not do this. Once OpenRC-0.12 is released, you will need to go > back to the stock sshd init script. I have been testing here with git OpenRC > and the connections are brought down fine with the stock script. > Yes yes, this is certainly not required on newer/ up-to-date systems. However, for older systems, where it is undesirable to update the whole system, the changes to the sshd init script are the quick & dirty fix that works. http://forums.gentoo.org/viewtopic-t-950496-highlight-.html
*** Bug 367553 has been marked as a duplicate of this bug. ***
This is part of the netifrc network scripts, which will be pulled in as a separate package when you update to OpenRC-0.12.
Hi, I am not sure if this is working in openrc-0.12 as expected: I established a SSH connection via PuTTY to a system with openrc-0.12. I now restart the system from a *local* shell. I'll get the message "The system is going down for reboot NOW!4 (tty1) (Fri Aug 16 16:11:34 2013):" in PuTTY, but the SSH connection won't be terminated. When the system comes back online again (the system restarts very fast, <30secs), I'll hear the *pling* sound from PuTTY saying my connection is dead, now. I expected to hear that *pling* sound while the system is shutting down. I see the same when I ssh'd from another Gentoo box into that system just before I initialized the restart. This connection will hang until the system comes back online (or a normal timeout will happen). Then the (old) ssh connection will die with "Write failed: Broken pipe".
There is no final solution yet.
Hi, I'm working on making available a template for packer to build gentoo VM [0]. I encountered this bug also. [1] I'm wondering if you found a solution, and if yes, do you have an idea when will it be released? If not, what is missing, and how can I help? Thanks! [0] : https://github.com/pierreozoux/packer-warehouse/ [1] : https://github.com/mitchellh/packer/issues/354
Hi, you should not see this behavior when using any kind of static IP setup. Also, Gentoo only leaves dead connections when using dhcpcd, Gentoo's default DHCP client (i.e. when you use net-misc/dhcp, you should not see this at all). But this is already "fixed" by upstream: http://roy.marples.name/projects/dhcpcd/changeset/f87ced10d4316cdf60dd2c0f1b38cc825e845c64 We are currently waiting for a new release... If you experience the problem with a static IP setup, please tell us...
This is fixed in dhcpcd-6.1. Please re-open if it is still an issue after you upgrade to this version of dhcpcd.