Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 80310 - Problems with procs in status 'R' which becomes unkillable
Summary: Problems with procs in status 'R' which becomes unkillable
Status: VERIFIED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC64 All
: High normal (vote)
Assignee: ppc64 architecture team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-02-01 02:34 UTC by Jacob Lindberg
Modified: 2005-03-17 09:06 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lindberg 2005-02-01 02:34:28 UTC
I have seen this situation before. Now I have tried to get as much information as possible. 

This is done by doing a 'emerge --sync' and after 4-5 seconds, I press ctrl-c, and the proc is 'hanging'.

---
11:27:40 up 2 days,  9:13,  2 users,  load average: 6.76, 6.84, 5.84
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
Cpu(s): 40.0% us, 54.2% sy,  0.0% ni,  0.0% id,  3.0% wa,  0.0% hi,  2.8% si
Mem:    996632k total,   741968k used,   254664k free,   180240k buffers
Swap:  1004052k total,        0k used,  1004052k free,   214432k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
9041 root      25   0  3936 2488  876 R 99.6  0.2  32:36.36 rsync


g-pdc root # cat /proc/9041/statm
984 622 219 74 0 427 0
g-pdc root #

g-pdc root # cat /proc/9041/status
Name:   rsync
State:  R (running)
SleepAVG:       0%
Tgid:   9041
Pid:    9041
PPid:   9037
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 64
Groups: 0 1 2 3 4 6 10 11 20 26 27
VmSize:     3936 kB
VmLck:         0 kB
VmRSS:      2488 kB
VmData:     1684 kB
VmStk:        24 kB
VmExe:       296 kB
VmLib:      1632 kB
VmPTE:        12 kB
Threads:        1
SigPnd: 0000000000000100
ShdPnd: 0000000000004102
SigBlk: 0000000080000000
SigIgn: 0000000001001000
SigCgt: 0000000000014a03
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
g-pdc root #

g-pdc root # cat /proc/9041/stat
9041 (rsync) R 9037 9037 22550 34816 9129 256 1210 0 0 0 18 162314 0 0 25 0 1 0 20403552 4030464 622 18446744073709551615 268435456 268736796 2199023251968 2199023248256 549757004124 256 0 16781312 84483 0 0 0 17 1 0 0
g-pdc root #


g-pdc root # cat /proc/9041/cmdline
/usr/bin/rsync--recursive--links--safe-links--perms--times--compress--force--whole-file--delete--delete-after--stats--timeout=180--exclude=/distfiles--exclude=/local--exclude=/packages--progressrsync://172.16.20.50/gentoo-portage//usr/portage
g-pdc root #

g-pdc root # cat /proc/9041/maps
10000000-1004a000 r-xp 00000000 70:03 210969                             /usr/bin/rsync
10059000-10062000 rw-p 00049000 70:03 210969                             /usr/bin/rsync
10062000-101fd000 rwxp 10062000 00:00 0
8000000000-800001d000 r-xp 00000000 70:01 31623                          /lib/ld-2.3.4.so
800001d000-800001e000 rw-p 800001d000 00:00 0
8000020000-8000021000 r--p 00020000 70:01 31623                          /lib/ld-2.3.4.so
8000021000-8000024000 rw-p 00021000 70:01 31623                          /lib/ld-2.3.4.so
8000024000-8000025000 rw-p 8000024000 00:00 0
8000029000-8000035000 r-xp 00000000 70:03 283586                         /usr/lib/libpopt.so.0.0.0
8000035000-8000039000 ---p 0000c000 70:03 283586                         /usr/lib/libpopt.so.0.0.0
8000039000-8000046000 rw-p 00000000 70:03 283586                         /usr/lib/libpopt.so.0.0.0
8000046000-800005a000 r-xp 00000000 70:01 31631                          /lib/libresolv-2.3.4.so
800005a000-8000066000 ---p 00014000 70:01 31631                          /lib/libresolv-2.3.4.so
8000066000-8000067000 r--p 00020000 70:01 31631                          /lib/libresolv-2.3.4.so
8000067000-8000069000 rw-p 00021000 70:01 31631                          /lib/libresolv-2.3.4.so
8000069000-800006c000 rw-p 8000069000 00:00 0
800006c000-80001c7000 r-xp 00000000 70:01 31622                          /lib/libc-2.3.4.so
80001c7000-80001cc000 ---p 0015b000 70:01 31622                          /lib/libc-2.3.4.so
80001cc000-80001cf000 r--p 00160000 70:01 31622                          /lib/libc-2.3.4.so
80001cf000-80001e5000 rw-p 00163000 70:01 31622                          /lib/libc-2.3.4.so
80001e5000-80001ea000 rw-p 80001e5000 00:00 0
1ffffffa000-20000000000 rwxp 1ffffffa000 00:00 0
g-pdc root #
---

'kill 9041' or 'kill -9 9041' is not doing anything good to me.

The last time I saw this was with slapd (openldap) and syslog-ng. Doing the exact same thing, but after some time the proc dies. It happened on the same server. These situations were not provoked by pressing ctrl-c. They were just doing it suddenly outta the blue.

---
g-pdc root # time emerge info
Portage 2.0.51-r15 (default-ppc64-2004.3, gcc-3.4.1, glibc-2.3.4.20041102-r0, 2.6.10-r6-pdc ppc64)
=================================================================
System uname: 2.6.10-r6-pdc ppc64 POWER4 (gp)
Gentoo Base System version 1.4.16
Python:              dev-lang/python-2.3.3-r1 [2.3.3 (#1, Nov 16 2004, 23:27:42)]
dev-lang/python:     2.3.3-r1
sys-devel/autoconf:  2.59-r5
sys-devel/automake:  1.8.5-r1
sys-devel/binutils:  2.15.90.0.3-r3
sys-devel/libtool:   1.5.2-r7
virtual/os-headers:  2.4.22
ACCEPT_KEYWORDS="ppc64"
AUTOCLEAN="yes"
CFLAGS="-O2 -pipe -mcpu=power4 -mtune=power4"
CHOST="powerpc64-unknown-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -pipe -mcpu=power4 -mtune=power4"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache distlocks"
GENTOO_MIRRORS="http://ftp.du.se/pub/os/gentoo/ http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.nordic/gentoo-portage"
USE="acl berkdb crypt f77 font-server foomaticdb fortran gdbm kerberos ldap libclamav nls oav pam perl ppc64 python quotas readline samba ssl tcpd winbind"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, PORTDIR_OVERLAY


real    0m30.810s
user    0m12.161s
sys     0m2.184s
g-pdc root #
---

As you can see the server is rather slow right now :-/

Actually right now, the proc 9041 just died by itself.

Hope someone knows a way to avoid this, and/or can tell me why.
Comment 1 Jacob Lindberg 2005-02-01 02:43:45 UTC
It just happened again. This time its slapd:

top - 11:39:44 up 2 days,  9:25,  2 users,  load average: 3.00, 3.55, 4.69
Tasks:  82 total,   4 running,  77 sleeping,   1 stopped,   0 zombie
Cpu(s):  1.0% us, 51.2% sy,  0.0% ni, 44.9% id,  0.0% wa,  0.0% hi,  2.9% si
Mem:    996632k total,   726416k used,   270216k free,   180900k buffers
Swap:  1004052k total,        0k used,  1004052k free,   241992k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5920 ldap      16   0 48884  13m 2652 R 99.9  1.4   6:27.77 slapd


g-pdc root # cat /proc/5920/status
Name:   slapd
State:  R (running)
SleepAVG:       93%
Tgid:   5920
Pid:    5920
PPid:   5919
TracerPid:      0
Uid:    439     439     439     439
Gid:    439     439     439     439
FDSize: 64
Groups: 439
VmSize:    48884 kB
VmLck:         0 kB
VmRSS:     14260 kB
VmData:    43492 kB
VmStk:        16 kB
VmExe:      1208 kB
VmLib:      3280 kB
VmPTE:        88 kB
Threads:        1
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000080000000
SigIgn: 0000000000001000
SigCgt: 00000003c001c013
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
g-pdc root #

g-pdc root # cat /proc/5920/statm
12221 3565 663 302 0 10877 0
g-pdc root # cat /proc/5920/stat
5920 (slapd) R 5919 5914 5914 0 -1 64 47 0 0 0 2477 49467 0 0 16 0 1 0 9465 50057216 3565 18446744073709551615 268435456 269671632 2199023254608 549766141824 549759940956 0 0 4096 1073856531 0 0 0 33 1 0 0
g-pdc root # cat /proc/5920/mem
cat: /proc/5920/mem: No such process
g-pdc root # cat /proc/5920/maps |more
10000000-1012e000 r-xp 00000000 70:03 298078                             /usr/lib/openldap/slapd
1013e000-10159000 rw-p 0012e000 70:03 298078                             /usr/lib/openldap/slapd
10159000-104e9000 rwxp 10159000 00:00 0
8000000000-800001d000 r-xp 00000000 70:01 31623                          /lib/ld-2.3.4.so
800001d000-800001e000 rw-p 800001d000 00:00 0
8000020000-8000021000 r--p 00020000 70:01 31623                          /lib/ld-2.3.4.so
8000021000-8000024000 rw-p 00021000 70:01 31623                          /lib/ld-2.3.4.so
8000029000-800006a000 r-xp 00000000 70:03 58869                          /usr/lib/libldap_r-2.2.so.7.0.12
800006a000-8000079000 ---p 00041000 70:03 58869                          /usr/lib/libldap_r-2.2.so.7.0.12
8000079000-800007f000 rw-p 00040000 70:03 58869                          /usr/lib/libldap_r-2.2.so.7.0.12
800007f000-8000085000 rw-p 800007f000 00:00 0
8000085000-8000097000 r-xp 00000000 70:03 58859                          /usr/lib/liblber-2.2.so.7.0.12
8000097000-80000a5000 ---p 00012000 70:03 58859                          /usr/lib/liblber-2.2.so.7.0.12
80000a5000-80000a9000 rw-p 00010000 70:03 58859                          /usr/lib/liblber-2.2.so.7.0.12
80000a9000-80000ab000 rw-p 80000a9000 00:00 0
80000ab000-80001cc000 r-xp 00000000 70:03 276396                         /usr/lib/libdb-4.2.so
80001cc000-80001db000 ---p 00121000 70:03 276396                         /usr/lib/libdb-4.2.so
80001db000-80001eb000 rw-p 00120000 70:03 276396                         /usr/lib/libdb-4.2.so
80001eb000-80001f0000 rw-p 80001eb000 00:00 0
80001f0000-80001f6000 r-xp 00000000 70:01 31497                          /lib/libcrypt-2.3.4.so
80001f6000-8000200000 ---p 00006000 70:01 31497                          /lib/libcrypt-2.3.4.so
8000200000-8000201000 r--p 00010000 70:01 31497                          /lib/libcrypt-2.3.4.so
8000201000-8000202000 rw-p 00011000 70:01 31497                          /lib/libcrypt-2.3.4.so
8000202000-8000230000 rw-p 8000202000 00:00 0
8000230000-8000244000 r-xp 00000000 70:01 31631                          /lib/libresolv-2.3.4.so
8000244000-8000250000 ---p 00014000 70:01 31631                          /lib/libresolv-2.3.4.so
8000250000-8000251000 r--p 00020000 70:01 31631                          /lib/libresolv-2.3.4.so
8000251000-8000253000 rw-p 00021000 70:01 31631                          /lib/libresolv-2.3.4.so
8000253000-8000256000 rw-p 8000253000 00:00 0
8000256000-8000262000 r-xp 00000000 70:03 276729                         /usr/lib/libltdl.so.3.1.0
8000262000-8000266000 ---p 0000c000 70:03 276729                         /usr/lib/libltdl.so.3.1.0
8000266000-8000273000 rw-p 00000000 70:03 276729                         /usr/lib/libltdl.so.3.1.0
8000273000-800027e000 r-xp 00000000 70:01 31436                          /lib/libwrap.so.0.7.6
800027e000-8000283000 ---p 0000b000 70:01 31436                          /lib/libwrap.so.0.7.6
8000283000-800028f000 rw-p 00000000 70:01 31436                          /lib/libwrap.so.0.7.6
800028f000-8000290000 rw-p 800028f000 00:00 0
8000290000-80002a4000 r-xp 00000000 70:01 31628                          /lib/libpthread-0.10.so
80002a4000-80002b0000 ---p 00014000 70:01 31628                          /lib/libpthread-0.10.so
80002b0000-80002b1000 r--p 00020000 70:01 31628                          /lib/libpthread-0.10.so
80002b1000-80002b3000 rw-p 00021000 70:01 31628                          /lib/libpthread-0.10.so
80002b3000-8000339000 rw-p 80002b3000 00:00 0
8000339000-8000494000 r-xp 00000000 70:01 31622                          /lib/libc-2.3.4.so
8000494000-8000499000 ---p 0015b000 70:01 31622                          /lib/libc-2.3.4.so
8000499000-800049c000 r--p 00160000 70:01 31622                          /lib/libc-2.3.4.so
800049c000-80004b2000 rw-p 00163000 70:01 31622                          /lib/libc-2.3.4.so
80004b2000-80004b6000 rw-p 80004b2000 00:00 0
80004b6000-80004b9000 r-xp 00000000 70:01 31626                          /lib/libdl-2.3.4.so
80004b9000-80004c6000 ---p 00003000 70:01 31626                          /lib/libdl-2.3.4.so
80004c6000-80004c7000 r--p 00010000 70:01 31626                          /lib/libdl-2.3.4.so
80004c7000-80004c8000 rw-p 00011000 70:01 31626                          /lib/libdl-2.3.4.so
80004c8000-80005dc000 rw-p 80004c8000 00:00 0
80005dc000-80005dd000 ---p 80005dc000 00:00 0
80005dd000-80009dd000 rwxp 80005dd000 00:00 0
80009dd000-80009de000 ---p 80009dd000 00:00 0
80009de000-8000dde000 rwxp 80009de000 00:00 0
8000dde000-8000edf000 rw-p 8000dde000 00:00 0
8000f00000-8001000000 rw-p 8000f00000 00:00 0
8001000000-8001001000 ---p 8001000000 00:00 0
8001001000-8001401000 rwxp 8001001000 00:00 0
8001401000-8001502000 rw-p 8001401000 00:00 0
8001600000-8001700000 rw-p 8001600000 00:00 0
8001700000-8001701000 ---p 8001700000 00:00 0
8001701000-8001b01000 rwxp 8001701000 00:00 0
8001b01000-8001c02000 rw-p 8001b01000 00:00 0
8001d00000-8001f21000 rw-p 8001d00000 00:00 0
8001f21000-8002000000 ---p 8001f21000 00:00 0
8002000000-800222d000 rw-p 8002000000 00:00 0
800222d000-8002300000 ---p 800222d000 00:00 0
8002300000-8002301000 ---p 8002300000 00:00 0
8002301000-8002701000 rwxp 8002301000 00:00 0
8002701000-8002802000 rw-p 8002701000 00:00 0
8002802000-8002803000 ---p 8002802000 00:00 0
8002803000-8002c03000 rwxp 8002803000 00:00 0
8002c03000-8002d04000 rw-p 8002c03000 00:00 0
1ffffffc000-20000000000 rwxp 1ffffffc000 00:00 0
g-pdc root #



Comment 2 Markus Rothe (RETIRED) gentoo-dev 2005-03-12 04:10:42 UTC
I cannot confirm this behaviour. Is this only one machine, or does this happen on
all your machines (if you have more than one)?
Comment 3 Jacob Lindberg 2005-03-14 00:57:18 UTC
This is a periodicly error :-(

I can reproduce it every 5-7 days approx, but now the server crashed without anything usefully in the logs.

After moving my master ldap DB to another server, the originally server keeps running (g-pdc). It doesn't crash anymore. The new ldap master DB gets heavy loaded (for approx 10 mins.) but doesn't crash. I have read something about TIME_WAIT or FIN_WAIT2 connections to the ldap, but it shouldn't make the server crash.

Comment 4 Jacob Lindberg 2005-03-14 00:58:55 UTC
When the ldap DB is running on the g-pdc server, the server crashes every 5-7 days. Just to make that clear :-) (It's monday)
Comment 5 Jacob Lindberg 2005-03-17 01:33:20 UTC
I found the error for this. A bad configuration of the slapd with simultaneously threads actually kills the server, including all process on the machine :-(

Im closing the bug.
Comment 6 Markus Rothe (RETIRED) gentoo-dev 2005-03-17 09:06:54 UTC
good news.. I was a kind of clueless what to do with this bug ;-)