Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 547468 - net-misc/openssh: sshd dies on emerge update or high load or randomly
Summary: net-misc/openssh: sshd dies on emerge update or high load or randomly
Status: RESOLVED TEST-REQUEST
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: Gentoo Linux bug wranglers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-04-23 10:32 UTC by Darko Luketic
Modified: 2016-01-25 02:55 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Darko Luketic 2015-04-23 10:32:34 UTC
I'm running Gentoo on 3 servers using systemd init.
From 218 to sys-apps/systemd-219-r2.
Now, sometimes, very often, actually almost randomly or let's phrase it
at undetermined conditions but likely related to high load with burst characteristics (for instance when doing emerge -avuD @world),
the sshd program dies, aka

root     pts/2        2a02:8070:c697:a Thu Apr 23 11:59   still logged in
root     ssh          2a02:8070:c697:a Thu Apr 23 11:59 - 12:13  (00:14)
root     pts/0        2a02:8070:c697:a Thu Apr 23 11:41 - 12:13  (00:32)
root     ssh          2a02:8070:c697:a Thu Apr 23 11:41 - 11:59  (00:17)

as you can see,
1. a second connection should be "named" pts/1, not pts/2
2. at pts/0 the ssh process is terminated before pts/0 is terminated, exactly 15 minutes later.
3. at pts/2 the ssh process is terminated but pts/2 is still active
4. seems like the second ssh process is being associated with pts/0 and closed when pts/0 is terminated

The symptoms are,
output that should be sent to the client isn't sent to the client
however when you keep your finger on the enter key, which repeats keypresses, and doing so for about 2 seconds output that should have arrived at the client is sent to the client and the session continues working as usual.
I have not experienced this behaviour on any of the 3 machines with other distros.

It doesn't always happen but it happens often enough to be considered a bug.

Here's a console capture, last few lines, when I don't keep pressing the enter key:

Would you like to merge these packages? [Yes/No] 
>>> Verifying ebuild manifests
>>> Running pre-merge checks for net-libs/iojs-1.8.1
>>> Emerging (1 of 14) sys-libs/cracklib-2.9.4::gentoo
>>> Installing (1 of 14) sys-libs/cracklib-2.9.4::gentoo
>>> Emerging (2 of 14) dev-db/sqlite-3.8.9::gentoo
>>> Jobs: 1 of 14 complete, 1 running               Load avg: 0.32, 0.10, 0.07packet_write_wait: Connection to _the_ip_address_: Broken pipe



Reproducible: Sometimes
Comment 1 Darko Luketic 2015-04-23 10:35:31 UTC
This is how a fresh session looks:

# last|head
root     pts/0        2a02:8070:c697:a Thu Apr 23 12:34   still logged in
root     ssh          2a02:8070:c697:a Thu Apr 23 12:34   still logged in
root     pts/2        2a02:8070:c697:a Thu Apr 23 11:59 - 12:34  (00:35)
root     ssh          2a02:8070:c697:a Thu Apr 23 11:59 - 12:13  (00:14)
root     pts/0        2a02:8070:c697:a Thu Apr 23 11:41 - 12:13  (00:32)
root     ssh          2a02:8070:c697:a Thu Apr 23 11:41 - 11:59  (00:17)
Comment 2 Panagiotis Christopoulos (RETIRED) gentoo-dev 2015-04-27 07:48:23 UTC
First of all, please paste your "emerge --info" and "emerge -pv openssh" output.
Comment 3 Darko Luketic 2015-04-27 08:09:45 UTC
# emerge --info
Portage 2.2.18 (python 2.7.9-final-0, default/linux/amd64/13.0, gcc-4.9.2, glibc-2.20-r2, 3.18.11-gentoo x86_64)
=================================================================
System uname: Linux-3.18.11-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E3-1245_V2_@_3.40GHz-with-gentoo-2.2
KiB Mem:    16378924 total,  14279780 free
KiB Swap:   16776188 total,  16776188 free
Timestamp of repository gentoo: Mon, 27 Apr 2015 06:15:01 +0000
sh bash 4.3_p33-r2
ld GNU ld (Gentoo 2.25 p1.0) 2.25
distcc 3.2rc1 x86_64-pc-linux-gnu [enabled]
app-shells/bash:          4.3_p33-r2::gentoo
dev-lang/perl:            5.20.2::gentoo
dev-lang/python:          2.7.9-r2::gentoo, 3.3.5-r1::gentoo, 3.4.1::gentoo
dev-util/cmake:           3.0.2::gentoo
dev-util/pkgconfig:       0.28-r2::gentoo
sys-apps/baselayout:      2.2::gentoo
sys-apps/openrc:          0.14::gentoo
sys-apps/sandbox:         2.6-r1::gentoo
sys-devel/autoconf:       2.69-r1::gentoo
sys-devel/automake:       1.13.4::gentoo, 1.15::gentoo
sys-devel/binutils:       2.25::gentoo
sys-devel/gcc:            4.9.2::gentoo
sys-devel/gcc-config:     1.8-r1::gentoo
sys-devel/libtool:        2.4.6-r1::gentoo
sys-devel/make:           4.1-r1::gentoo
sys-kernel/linux-headers: 3.18::gentoo (virtual/os-headers)
sys-libs/glibc:           2.20-r2::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://mirror.hetzner.de/gentoo/portage
    priority: -1000

ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O3 -pipe -march=native -fstack-protector-strong --param=ssp-buffer-size=4"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O3 -pipe -march=native -fstack-protector-strong --param=ssp-buffer-size=4"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--quiet-build=y"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs buildpkg config-protect-if-modified distcc distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="ftp://mirror.hetzner.de/gentoo/"
LANG="en_US.UTF-8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j8 -l8"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
USE="X509 acl aes amd64 audit avx berkdb bzip2 cli cracklib crypt cxx dri fortran gdbm go iconv icu ipv6 mmx mmxext modern-top modules multilib ncurses nls nptl openmp pam pcre popcnt readline seccomp session sse sse2 sse3 sse4_1 sse4_2 ssl ssse3 systemd tcpd unicode xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx mmx mmxext popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NGINX_MODULES_HTTP="access auth_basic autoindex browser charset empty_gif geo gzip map proxy referer rewrite userid addition auth_pam auth_request cache_purge echo geoip gunzip gzip_static headers_more metrics push_stream realip secure_link slowfs_cache spdy sticky stub_status sub upload_progress upstream_check" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-5" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_3" RUBY_TARGETS="ruby19 ruby20" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga nouveau nv r128 radeon savage sis tdfx trident vesa via vmware dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON

The 3 hosts are identical in terms of hardware.

# emerge -pv openssh

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild   R    ] net-misc/openssh-6.8_p1-r4::gentoo  USE="hpn pam pie ssh1 ssl -X -X509 -bindist -debug -kerberos -ldap -ldns -libedit -sctp (-selinux) -skey -static" 0 KiB
Comment 4 Panagiotis Christopoulos (RETIRED) gentoo-dev 2015-04-27 11:54:47 UTC
This may have to do with the CFLAGS you're using. I would try reemerging everything with "-march=native -O2 -pipe". If you don't want to do that, make a cronjob and restart sshd every some minutes if it's killed. And check dmesg and other logs to see what is happening on the boxes, without rebooting them. I'm closing this as TEST-REQUEST. Reopen only if you still experience issues and you're sure that this is an openssh bug and not something related to your configuration.
Comment 5 Darko Luketic 2015-04-28 10:55:52 UTC
I did -O2 previously - same result
I disabled distcc and used -O2 - same result

But ok I have now recompiled with -O2 and the machines will be rebooting in a minute. We'll see how that turns out again. Today seems to be a rather stable day.

I'm not the only one experiencing this.
Gentoo Forum user Schnulli is also experiencing this, he works around this by using screen, I sent him a PM.

It's basically stock configuration with -march=native -O3 and few additional use flags and it happened from the very beginning. A very minimal system (or 3).

I know it's a complicated bug and there *is* an issue, but I don't know what it's related to.

It might be network instability, but how come that the pts/0 ssh process gets associated with pts/2? And why is there no pts/1? So there must be some software bug.

I don't know what it's related to, hence the bug report.
It might be openssh, it might be something else.

It's also not clear if the ssh process is killed or if it segfaults, logs show nothing, so if it would be segfaulting it would appear in the logs. I've been observing this behaviour for about 2 months.
Comment 6 Darko Luketic 2015-06-03 11:22:16 UTC
happened again with -O2 after opening a new terminal and connecting to another server

root     pts/1        2a02:8070:c680:c Wed Jun  3 13:15   still logged in
root     ssh          2a02:8070:c680:c Wed Jun  3 13:15 - 13:20  (00:05)
root     pts/0        2a02:8070:c680:c Wed Jun  3 12:32 - 13:20  (00:48)
root     ssh          2a02:8070:c680:c Wed Jun  3 12:32 - 13:15  (00:43)
reboot   system boot  3.18.11-gentoo   Wed Jun  3 12:31   still running

^ term1 closed
term2 connected
Comment 7 Darko Luketic 2015-06-22 14:23:33 UTC
and again

root     pts/1        2a02:8070:c6c2:6 Mon Jun 22 16:19   still logged in
root     ssh          2a02:8070:c6c2:6 Mon Jun 22 16:19   still logged in
root     pts/0        2a02:8070:c6c2:6 Mon Jun 22 15:36   still logged in
root     ssh          2a02:8070:c6c2:6 Mon Jun 22 15:36 - 16:19  (00:43)

and again it's stock configuration
flagging this as resolved test-request is the lazy way out
Comment 8 Darko Luketic 2016-01-25 02:55:24 UTC
and again
new system
new installation

root     pts/1        2a02:8070:c688:a Mon Jan 25 03:44   still logged in
root     ssh          2a02:8070:c688:a Mon Jan 25 03:44   still logged in
root     ssh          2a02:8070:c688:a Mon Jan 25 01:50 - 01:50  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:50 - 01:50  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:41 - 01:41  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:41 - 01:41  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:38 - 01:38  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:38 - 01:38  (00:00)
root     ssh          2a02:8070:c688:a Mon Jan 25 01:38 - 01:38  (00:00)
root     pts/0        2a02:8070:c688:a Mon Jan 25 01:35   still logged in
root     ssh          2a02:8070:c688:a Mon Jan 25 01:35 - 01:38  (00:03)

-O2 -march=native -pipe

gcc-5.3.0

come on. open the bug.

Same hardware, same system with debian 7, centos 7, opensuse 42.1 fedora 23 no such issue

Gentoo - issue