I have seen this situation before. Now I have tried to get as much information as possible. This is done by doing a 'emerge --sync' and after 4-5 seconds, I press ctrl-c, and the proc is 'hanging'. --- 11:27:40 up 2 days, 9:13, 2 users, load average: 6.76, 6.84, 5.84 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie Cpu(s): 40.0% us, 54.2% sy, 0.0% ni, 0.0% id, 3.0% wa, 0.0% hi, 2.8% si Mem: 996632k total, 741968k used, 254664k free, 180240k buffers Swap: 1004052k total, 0k used, 1004052k free, 214432k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9041 root 25 0 3936 2488 876 R 99.6 0.2 32:36.36 rsync g-pdc root # cat /proc/9041/statm 984 622 219 74 0 427 0 g-pdc root # g-pdc root # cat /proc/9041/status Name: rsync State: R (running) SleepAVG: 0% Tgid: 9041 Pid: 9041 PPid: 9037 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 64 Groups: 0 1 2 3 4 6 10 11 20 26 27 VmSize: 3936 kB VmLck: 0 kB VmRSS: 2488 kB VmData: 1684 kB VmStk: 24 kB VmExe: 296 kB VmLib: 1632 kB VmPTE: 12 kB Threads: 1 SigPnd: 0000000000000100 ShdPnd: 0000000000004102 SigBlk: 0000000080000000 SigIgn: 0000000001001000 SigCgt: 0000000000014a03 CapInh: 0000000000000000 CapPrm: 00000000fffffeff CapEff: 00000000fffffeff g-pdc root # g-pdc root # cat /proc/9041/stat 9041 (rsync) R 9037 9037 22550 34816 9129 256 1210 0 0 0 18 162314 0 0 25 0 1 0 20403552 4030464 622 18446744073709551615 268435456 268736796 2199023251968 2199023248256 549757004124 256 0 16781312 84483 0 0 0 17 1 0 0 g-pdc root # g-pdc root # cat /proc/9041/cmdline /usr/bin/rsync--recursive--links--safe-links--perms--times--compress--force--whole-file--delete--delete-after--stats--timeout=180--exclude=/distfiles--exclude=/local--exclude=/packages--progressrsync://172.16.20.50/gentoo-portage//usr/portage g-pdc root # g-pdc root # cat /proc/9041/maps 10000000-1004a000 r-xp 00000000 70:03 210969 /usr/bin/rsync 10059000-10062000 rw-p 00049000 70:03 210969 /usr/bin/rsync 10062000-101fd000 rwxp 10062000 00:00 0 8000000000-800001d000 r-xp 00000000 70:01 31623 /lib/ld-2.3.4.so 800001d000-800001e000 rw-p 800001d000 00:00 0 8000020000-8000021000 r--p 00020000 70:01 31623 /lib/ld-2.3.4.so 8000021000-8000024000 rw-p 00021000 70:01 31623 /lib/ld-2.3.4.so 8000024000-8000025000 rw-p 8000024000 00:00 0 8000029000-8000035000 r-xp 00000000 70:03 283586 /usr/lib/libpopt.so.0.0.0 8000035000-8000039000 ---p 0000c000 70:03 283586 /usr/lib/libpopt.so.0.0.0 8000039000-8000046000 rw-p 00000000 70:03 283586 /usr/lib/libpopt.so.0.0.0 8000046000-800005a000 r-xp 00000000 70:01 31631 /lib/libresolv-2.3.4.so 800005a000-8000066000 ---p 00014000 70:01 31631 /lib/libresolv-2.3.4.so 8000066000-8000067000 r--p 00020000 70:01 31631 /lib/libresolv-2.3.4.so 8000067000-8000069000 rw-p 00021000 70:01 31631 /lib/libresolv-2.3.4.so 8000069000-800006c000 rw-p 8000069000 00:00 0 800006c000-80001c7000 r-xp 00000000 70:01 31622 /lib/libc-2.3.4.so 80001c7000-80001cc000 ---p 0015b000 70:01 31622 /lib/libc-2.3.4.so 80001cc000-80001cf000 r--p 00160000 70:01 31622 /lib/libc-2.3.4.so 80001cf000-80001e5000 rw-p 00163000 70:01 31622 /lib/libc-2.3.4.so 80001e5000-80001ea000 rw-p 80001e5000 00:00 0 1ffffffa000-20000000000 rwxp 1ffffffa000 00:00 0 g-pdc root # --- 'kill 9041' or 'kill -9 9041' is not doing anything good to me. The last time I saw this was with slapd (openldap) and syslog-ng. Doing the exact same thing, but after some time the proc dies. It happened on the same server. These situations were not provoked by pressing ctrl-c. They were just doing it suddenly outta the blue. --- g-pdc root # time emerge info Portage 2.0.51-r15 (default-ppc64-2004.3, gcc-3.4.1, glibc-2.3.4.20041102-r0, 2.6.10-r6-pdc ppc64) ================================================================= System uname: 2.6.10-r6-pdc ppc64 POWER4 (gp) Gentoo Base System version 1.4.16 Python: dev-lang/python-2.3.3-r1 [2.3.3 (#1, Nov 16 2004, 23:27:42)] dev-lang/python: 2.3.3-r1 sys-devel/autoconf: 2.59-r5 sys-devel/automake: 1.8.5-r1 sys-devel/binutils: 2.15.90.0.3-r3 sys-devel/libtool: 1.5.2-r7 virtual/os-headers: 2.4.22 ACCEPT_KEYWORDS="ppc64" AUTOCLEAN="yes" CFLAGS="-O2 -pipe -mcpu=power4 -mtune=power4" CHOST="powerpc64-unknown-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O2 -pipe -mcpu=power4 -mtune=power4" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache distlocks" GENTOO_MIRRORS="http://ftp.du.se/pub/os/gentoo/ http://gentoo.oregonstate.edu http://www.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.nordic/gentoo-portage" USE="acl berkdb crypt f77 font-server foomaticdb fortran gdbm kerberos ldap libclamav nls oav pam perl ppc64 python quotas readline samba ssl tcpd winbind" Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, PORTDIR_OVERLAY real 0m30.810s user 0m12.161s sys 0m2.184s g-pdc root # --- As you can see the server is rather slow right now :-/ Actually right now, the proc 9041 just died by itself. Hope someone knows a way to avoid this, and/or can tell me why.
It just happened again. This time its slapd: top - 11:39:44 up 2 days, 9:25, 2 users, load average: 3.00, 3.55, 4.69 Tasks: 82 total, 4 running, 77 sleeping, 1 stopped, 0 zombie Cpu(s): 1.0% us, 51.2% sy, 0.0% ni, 44.9% id, 0.0% wa, 0.0% hi, 2.9% si Mem: 996632k total, 726416k used, 270216k free, 180900k buffers Swap: 1004052k total, 0k used, 1004052k free, 241992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5920 ldap 16 0 48884 13m 2652 R 99.9 1.4 6:27.77 slapd g-pdc root # cat /proc/5920/status Name: slapd State: R (running) SleepAVG: 93% Tgid: 5920 Pid: 5920 PPid: 5919 TracerPid: 0 Uid: 439 439 439 439 Gid: 439 439 439 439 FDSize: 64 Groups: 439 VmSize: 48884 kB VmLck: 0 kB VmRSS: 14260 kB VmData: 43492 kB VmStk: 16 kB VmExe: 1208 kB VmLib: 3280 kB VmPTE: 88 kB Threads: 1 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000080000000 SigIgn: 0000000000001000 SigCgt: 00000003c001c013 CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 g-pdc root # g-pdc root # cat /proc/5920/statm 12221 3565 663 302 0 10877 0 g-pdc root # cat /proc/5920/stat 5920 (slapd) R 5919 5914 5914 0 -1 64 47 0 0 0 2477 49467 0 0 16 0 1 0 9465 50057216 3565 18446744073709551615 268435456 269671632 2199023254608 549766141824 549759940956 0 0 4096 1073856531 0 0 0 33 1 0 0 g-pdc root # cat /proc/5920/mem cat: /proc/5920/mem: No such process g-pdc root # cat /proc/5920/maps |more 10000000-1012e000 r-xp 00000000 70:03 298078 /usr/lib/openldap/slapd 1013e000-10159000 rw-p 0012e000 70:03 298078 /usr/lib/openldap/slapd 10159000-104e9000 rwxp 10159000 00:00 0 8000000000-800001d000 r-xp 00000000 70:01 31623 /lib/ld-2.3.4.so 800001d000-800001e000 rw-p 800001d000 00:00 0 8000020000-8000021000 r--p 00020000 70:01 31623 /lib/ld-2.3.4.so 8000021000-8000024000 rw-p 00021000 70:01 31623 /lib/ld-2.3.4.so 8000029000-800006a000 r-xp 00000000 70:03 58869 /usr/lib/libldap_r-2.2.so.7.0.12 800006a000-8000079000 ---p 00041000 70:03 58869 /usr/lib/libldap_r-2.2.so.7.0.12 8000079000-800007f000 rw-p 00040000 70:03 58869 /usr/lib/libldap_r-2.2.so.7.0.12 800007f000-8000085000 rw-p 800007f000 00:00 0 8000085000-8000097000 r-xp 00000000 70:03 58859 /usr/lib/liblber-2.2.so.7.0.12 8000097000-80000a5000 ---p 00012000 70:03 58859 /usr/lib/liblber-2.2.so.7.0.12 80000a5000-80000a9000 rw-p 00010000 70:03 58859 /usr/lib/liblber-2.2.so.7.0.12 80000a9000-80000ab000 rw-p 80000a9000 00:00 0 80000ab000-80001cc000 r-xp 00000000 70:03 276396 /usr/lib/libdb-4.2.so 80001cc000-80001db000 ---p 00121000 70:03 276396 /usr/lib/libdb-4.2.so 80001db000-80001eb000 rw-p 00120000 70:03 276396 /usr/lib/libdb-4.2.so 80001eb000-80001f0000 rw-p 80001eb000 00:00 0 80001f0000-80001f6000 r-xp 00000000 70:01 31497 /lib/libcrypt-2.3.4.so 80001f6000-8000200000 ---p 00006000 70:01 31497 /lib/libcrypt-2.3.4.so 8000200000-8000201000 r--p 00010000 70:01 31497 /lib/libcrypt-2.3.4.so 8000201000-8000202000 rw-p 00011000 70:01 31497 /lib/libcrypt-2.3.4.so 8000202000-8000230000 rw-p 8000202000 00:00 0 8000230000-8000244000 r-xp 00000000 70:01 31631 /lib/libresolv-2.3.4.so 8000244000-8000250000 ---p 00014000 70:01 31631 /lib/libresolv-2.3.4.so 8000250000-8000251000 r--p 00020000 70:01 31631 /lib/libresolv-2.3.4.so 8000251000-8000253000 rw-p 00021000 70:01 31631 /lib/libresolv-2.3.4.so 8000253000-8000256000 rw-p 8000253000 00:00 0 8000256000-8000262000 r-xp 00000000 70:03 276729 /usr/lib/libltdl.so.3.1.0 8000262000-8000266000 ---p 0000c000 70:03 276729 /usr/lib/libltdl.so.3.1.0 8000266000-8000273000 rw-p 00000000 70:03 276729 /usr/lib/libltdl.so.3.1.0 8000273000-800027e000 r-xp 00000000 70:01 31436 /lib/libwrap.so.0.7.6 800027e000-8000283000 ---p 0000b000 70:01 31436 /lib/libwrap.so.0.7.6 8000283000-800028f000 rw-p 00000000 70:01 31436 /lib/libwrap.so.0.7.6 800028f000-8000290000 rw-p 800028f000 00:00 0 8000290000-80002a4000 r-xp 00000000 70:01 31628 /lib/libpthread-0.10.so 80002a4000-80002b0000 ---p 00014000 70:01 31628 /lib/libpthread-0.10.so 80002b0000-80002b1000 r--p 00020000 70:01 31628 /lib/libpthread-0.10.so 80002b1000-80002b3000 rw-p 00021000 70:01 31628 /lib/libpthread-0.10.so 80002b3000-8000339000 rw-p 80002b3000 00:00 0 8000339000-8000494000 r-xp 00000000 70:01 31622 /lib/libc-2.3.4.so 8000494000-8000499000 ---p 0015b000 70:01 31622 /lib/libc-2.3.4.so 8000499000-800049c000 r--p 00160000 70:01 31622 /lib/libc-2.3.4.so 800049c000-80004b2000 rw-p 00163000 70:01 31622 /lib/libc-2.3.4.so 80004b2000-80004b6000 rw-p 80004b2000 00:00 0 80004b6000-80004b9000 r-xp 00000000 70:01 31626 /lib/libdl-2.3.4.so 80004b9000-80004c6000 ---p 00003000 70:01 31626 /lib/libdl-2.3.4.so 80004c6000-80004c7000 r--p 00010000 70:01 31626 /lib/libdl-2.3.4.so 80004c7000-80004c8000 rw-p 00011000 70:01 31626 /lib/libdl-2.3.4.so 80004c8000-80005dc000 rw-p 80004c8000 00:00 0 80005dc000-80005dd000 ---p 80005dc000 00:00 0 80005dd000-80009dd000 rwxp 80005dd000 00:00 0 80009dd000-80009de000 ---p 80009dd000 00:00 0 80009de000-8000dde000 rwxp 80009de000 00:00 0 8000dde000-8000edf000 rw-p 8000dde000 00:00 0 8000f00000-8001000000 rw-p 8000f00000 00:00 0 8001000000-8001001000 ---p 8001000000 00:00 0 8001001000-8001401000 rwxp 8001001000 00:00 0 8001401000-8001502000 rw-p 8001401000 00:00 0 8001600000-8001700000 rw-p 8001600000 00:00 0 8001700000-8001701000 ---p 8001700000 00:00 0 8001701000-8001b01000 rwxp 8001701000 00:00 0 8001b01000-8001c02000 rw-p 8001b01000 00:00 0 8001d00000-8001f21000 rw-p 8001d00000 00:00 0 8001f21000-8002000000 ---p 8001f21000 00:00 0 8002000000-800222d000 rw-p 8002000000 00:00 0 800222d000-8002300000 ---p 800222d000 00:00 0 8002300000-8002301000 ---p 8002300000 00:00 0 8002301000-8002701000 rwxp 8002301000 00:00 0 8002701000-8002802000 rw-p 8002701000 00:00 0 8002802000-8002803000 ---p 8002802000 00:00 0 8002803000-8002c03000 rwxp 8002803000 00:00 0 8002c03000-8002d04000 rw-p 8002c03000 00:00 0 1ffffffc000-20000000000 rwxp 1ffffffc000 00:00 0 g-pdc root #
I cannot confirm this behaviour. Is this only one machine, or does this happen on all your machines (if you have more than one)?
This is a periodicly error :-( I can reproduce it every 5-7 days approx, but now the server crashed without anything usefully in the logs. After moving my master ldap DB to another server, the originally server keeps running (g-pdc). It doesn't crash anymore. The new ldap master DB gets heavy loaded (for approx 10 mins.) but doesn't crash. I have read something about TIME_WAIT or FIN_WAIT2 connections to the ldap, but it shouldn't make the server crash.
When the ldap DB is running on the g-pdc server, the server crashes every 5-7 days. Just to make that clear :-) (It's monday)
I found the error for this. A bad configuration of the slapd with simultaneously threads actually kills the server, including all process on the machine :-( Im closing the bug.
good news.. I was a kind of clueless what to do with this bug ;-)