Bug 323869 - VServer-System hangs (soft hang), unable to spawn new processes?
Summary: VServer-System hangs (soft hang), unable to spawn new processes?
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: AMD64 Linux
: High normal (vote)
Assignee: Gentoo VPS Team (OBSOLETE)
Depends on:
Reported: 2010-06-13 23:23 UTC by Timo A. Hummel
Modified: 2010-09-26 08:28 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Comment Timo A. Hummel 2010-06-13 23:23:57 UTC

I have weird server hangs since a few days. I think the problems started when I updated to a vserver kernel.

Basically, the system starts hanging when woking on the server via SSH. For example, this hang happened when logging in via SSH (using -vv, the hang occurs directly after "starting interactive session") or when pressing tab after typing "ls /var/log/apa".

After the hang is resolved (which does so automatically after 2-10 mins), dmesg shows the following entry:

INFO: task sshd:28383 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sshd          D ffff880035e27e48     0 28383  16208 0x00000000
 ffff880035e27bd8 0000000000000086 ffff880024f9cf00 ffffea00006a2ca0
 ffffea0000d6eed8 ffff88007c244000 ffff88007d90c660 ffff88007c244270
 0000000135e27b98 ffffffff810bc8c8 0000000135e27b98 00007f3b1c316000
Call Trace:
 [<ffffffff810bc8c8>] ? 0xffffffff810bc8c8
 [<ffffffff8141cc15>] 0xffffffff8141cc15
 [<ffffffff810bcf3f>] ? 0xffffffff810bcf3f
 [<ffffffff8141cd2d>] 0xffffffff8141cd2d
 [<ffffffff810b458d>] 0xffffffff810b458d
 [<ffffffff810c212d>] ? 0xffffffff810c212d
 [<ffffffff810b3d20>] ? 0xffffffff810b3d20
 [<ffffffff810b65f5>] 0xffffffff810b65f5
 [<ffffffff810b6a55>] 0xffffffff810b6a55
 [<ffffffff810b6b88>] 0xffffffff810b6b88
 [<ffffffff810b7d6c>] 0xffffffff810b7d6c
 [<ffffffff810c212d>] ? 0xffffffff810c212d
 [<ffffffff810c0a2f>] ? 0xffffffff810c0a2f
 [<ffffffff810aa6ab>] 0xffffffff810aa6ab
 [<ffffffff810aa77b>] 0xffffffff810aa77b
 [<ffffffff810028eb>] 0xffffffff810028eb

The program (sshd in this example) varies often. Sometimes, this also happens when doing a simple "ls" command in a directory; I was able to attach to the process via strace, but strace did not display any system calls at all. To me it looks like a kernel scheduler bug, but I am not really sure how to further debug this one.

The system base data:

Kernel: kernel-genkernel-x86_64-2.6.32-vs2. or kernel-genkernel-x86_64-2.6.33-vs2.
Profile: default/linux/amd64/10.0/server
Filesystem: ext3
Raid: Software Raid 1

I have consulted google already and there doesn't seem to be any solution. Right now, I'll test the following things:

* Boot the kernel with highres=off
* Upgrading to hardened profile and recompile everything

If you have any other ideas on how to debug the issue, please attach them to this bug report. Also, if I have missed important information, please notify me and I'll attach them. I know that this is probably a meta-bug, probably caused by mis-configuration, but I think it's time to identify what's the cause of this and probably come up with a solution.

Reproducible: Sometimes

Steps to Reproduce:
See above
Actual Results:  
Random hangs

Expected Results:  
No hangs

Note: This emerge --info stuff is from after I updated the profile to hardened. Server is compiling gcc toolchain now.

Portage (hardened/linux/amd64/10.0, gcc-4.1.2, glibc-2.10.1-r1, 2.6.33-vs2. x86_64)
System uname: Linux-2.6.33-vs2.
Timestamp of tree: Sun, 13 Jun 2010 21:00:23 +0000
distcc 3.1 x86_64-pc-linux-gnu [disabled]
app-shells/bash:     4.0_p37
dev-lang/python:     2.4.6, 2.5.4-r4, 2.6.4-r1, 3.1.2-r3
sys-apps/baselayout: 1.12.13
sys-apps/sandbox:    1.6-r2
sys-devel/autoconf:  2.13, 2.65
sys-devel/automake:  1.5-r1, 1.7.9-r2, 1.9.6-r3, 1.10.3, 1.11.1
sys-devel/binutils:  2.18-r3
sys-devel/gcc:       4.1.2, 4.3.4, 4.4.3-r2
sys-devel/gcc-config: 1.4.1
sys-devel/libtool:   2.2.6b
virtual/os-headers:  2.6.30-r1
CFLAGS="-march=athlon64 -O2 -pipe"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=athlon64 -O2 -pipe"
FEATURES="assume-digests distlocks fixpackages news parallel-fetch protect-owned sandbox sfperms strict unmerge-logs unmerge-orphans userfetch"
LINGUAS="en de"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
USE="LINGUAS acl amd64 apache2 bash-completion bcmath berkdb bzip2 cgi clamd cli cluster cracklib crypt ctype cups curl cxx dri extraengine filter ftp gd gdbm geoip gpm hardened iconv imagemagick imap innodb jpeg json justify libwww maildir mbstring mmx modules mudflap multilib mysql mysqli ncurses nls nptl nptlonly openmp pam pcre pdo perl pic png posix pppd python rcypt readline reflection sasl session simplexml snmp soap sockets spl sqlite sse sse2 ssl subject-rewrite suexec svg sysfs tcpd tidy unicode urandom vchroot xml xmlrpc xorg xsl zip zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en de" RUBY_TARGETS="ruby18" USERLAND="GNU" VIDEO_CARDS="fbdev glint intel mach64 mga neomagic nv r128 radeon savage sis tdfx trident vesa via vmware voodoo" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" 
Comment 1 Timo A. Hummel 2010-06-13 23:29:03 UTC
I can now reproduce this problem on the server by typing cd /var<tab>log<tab>/apa<tab>

i.e. the system hangs when it tries to access the log directory, it's also not possible to login via SSH anymore. Will do a full fsck on next reboot.

Any further hints to debug the issue are greatly appreciated.
Comment 2 Benedikt Böhm (RETIRED) gentoo-dev 2010-09-26 08:28:54 UTC
please report this issue upstream