Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 95293 - dctc hangs with 100% system CPU usage
Summary: dctc hangs with 100% system CPU usage
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: New packages (show other bugs)
Hardware: AMD64 Linux
: High normal
Assignee: Gentoo net-p2p team
URL: http://forums.gentoo.org/viewtopic-t-...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-06 21:14 UTC by Nebojsa Trpkovic
Modified: 2005-06-07 17:21 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nebojsa Trpkovic 2005-06-06 21:14:04 UTC
AMD Athlon64 3000+, Newcastle core
NForce4 Ultra board
2 x 512MB KingstoneVR PC3200 DDR
4 x Seagate SATA drives

dctc (0.85.9) hangs with 100% "system" CPU usage on my server.

It has been running without a problem for 24.5 days (uptime of server). It has beening started by cron and killed by cron four times a day and everything worked fine. After 24.5 days of server uptime it started to use 100% of CPU time, failed to connect to the local HUB and even failed to start dctc_master process (wich usually starts shortly after dctc). Endless killing and starting, deleting of ~/.dctc directory, running as different user, running as a root, reemerging didn't show any progress.

I've tried to compile older versions of dctc (0.85.6 and 0.83.8) manualy but they gave the same result.

Here are the details:
http://forums.gentoo.org/viewtopic-t-342985.html

I've run "strace dctc..." and in few seconds got 200MB log file filed with:

semget(1142054670, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054671, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054672, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054673, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054674, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054675, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054676, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054677, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054678, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054679, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054680, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054681, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054682, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054683, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054684, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device)
semget(1142054685, 10, IPC_CREAT|IPC_EXCL|0600) = -1 ENOSPC (No space left on device) 

So, it seems dctc can't get some semaphore... Don't know anything more.

All questions and suggestions are welcome!


Portage 2.0.51.19 (default-linux/amd64/2005.0, gcc-3.4.3-20050110, glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r6 x86_64)
=================================================================
System uname: 2.6.11-gentoo-r6 x86_64 AMD Athlon(tm) 64 Processor 3000+
Gentoo Base System version 1.4.16
Python:              dev-lang/python-2.3.5 [2.3.5 (#1, May 12 2005, 02:04:49)]
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
dev-lang/python:     2.3.5
sys-apps/sandbox:    [Not Present]
sys-devel/autoconf:  2.59-r6, 2.13
sys-devel/automake:  1.7.9-r1, 1.8.5-r3, 1.5, 1.4_p6, 1.6.3, 1.9.5
sys-devel/binutils:  2.15.92.0.2-r10
sys-devel/libtool:   1.5.16
virtual/os-headers:  2.6.8.1-r4
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CFLAGS="-O2 -march=athlon64 -pipe -frename-registers -fweb -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/bind /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-O2 -pipe"
DISTDIR="/users/tnt/distfiles"
FEATURES="autoaddcvs autoconfig ccache distlocks sandbox sfperms strict"
GENTOO_MIRRORS="ftp://mirror.etf.bg.ac.yu/gentoo/ http://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ http://gd.tuwien.ac.at/opsys/linux/gentoo/"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="amd64 acpi apache2 berkdb bitmap-fonts crypt cups curl encode exif extensions font-server fortran gd gif gpm imagemagick imap jabber jp2 jpeg libwww logrotate lzw lzw-tiff maildir mp3 mpeg mysql ncurses nls nptl nptlonly oggvorbis pam pam-mysql perl php png python readline rrdtool samba sasl slang snmp ssl tcpd tiff truetype truetype-fonts type1-fonts unicode usb userlocales wmf xml2 xpm xrandr zlib userland_GNU kernel_linux elibc_glibc"
Unset:  ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2005-06-07 00:05:34 UTC
Uhm...

>(No space left on device)

obviously suggest that you are out of disk space...
Comment 2 Nebojsa Trpkovic 2005-06-07 03:50:30 UTC
(In reply to comment #1)
> Uhm...
> 
> >(No space left on device)
> 
> obviously suggest that you are out of disk space...


Partition with the least of space available has 409MB of free space:

titan root # df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             4.6G  3.2G  1.5G  68% /
/dev/sda2             4.2G  507M  3.7G  12% /var/cache/squid/cache00
/dev/sdb2             4.2G  506M  3.7G  12% /var/cache/squid/cache01
/dev/sdc2             4.2G  507M  3.7G  12% /var/cache/squid/cache02
/dev/sdd2             4.2G  507M  3.7G  12% /var/cache/squid/cache03
/dev/sda6              20G   20G  598M  98% /users/dooki
/dev/sda7              60G   60G  409M 100% /users/mica
/dev/sda8              99G   35G   64G  36% /users/mare
/dev/sda9              27G   33M   27G   1% /users/jeja
/dev/sda10            9.4G  6.9G  2.5G  74% /users/ilija
/dev/sda11            9.4G  735M  8.6G   8% /users/ana
/dev/sdb5              37G   19G   18G  53% /users/sija
/dev/sdb6              88G   37G   51G  42% /users/boki
/dev/sdb7             105G   52G   54G  50% /users/zola
/dev/sdc5              60G   57G  2.5G  96% /users/marko
/dev/sdc6             4.7G  622M  4.1G  14% /users/transfer
/dev/sdc7              20G   15G  5.0G  76% /users/vlada
/dev/sdc8             138G   49G   90G  36% /users/dome
/dev/sdc9             6.7G   33M  6.6G   1% /users/dusan
/dev/sdd5              90G   77G   14G  86% /users/tnt
/dev/sdd6              93G   82G   12G  88% /users/peleizoki
/dev/sdd7              20G  8.3G   12G  42% /users/nesha
/dev/sdd8             6.7G  1.4G  5.3G  22% /users/gergana
none                  502M     0  502M   0% /dev/shm
none                  1.5G  4.9M  1.5G   1% /tmp


And I have the same problem even if I share only two jpegs on /tmp partition,
and EVEN if I don't share anything (In both these cases /users/* partitions
shouldn't be touched).

I've tried to start just:

strace -o /tmp/strace.log dctc -g 10.0.1.33

without any additional arguments and still got the same bug. 


Comment 3 Jakub Moc (RETIRED) gentoo-dev 2005-06-07 05:54:20 UTC
(In reply to comment #2)

> Partition with the least of space available has 409MB of free space:

... which is probably reserved for root. Please, make some decent space on all
partitions where this P2P thing saves data and try again. And don
Comment 4 Jakub Moc (RETIRED) gentoo-dev 2005-06-07 05:54:20 UTC
(In reply to comment #2)

> Partition with the least of space available has 409MB of free space:

... which is probably reserved for root. Please, make some decent space on all
partitions where this P2P thing saves data and try again. And don´t run this as
root, duh!
Comment 5 Nebojsa Trpkovic 2005-06-07 06:16:45 UTC
1. There's reiserfs on all partitions except /boot, so there shouldn't be any
space reserved for root.

2. I get same problem as root, and IF there was space reserved for root, it
should be used without problems when dctc is run by root.

3. My server just serves files - it doesn't download anything through dctc, and
it doesn't need disk space for saving downloads (at least not more than 400MB).
Files for sharing are copied via samba from window$ boxes.

P.S. Thanks for advise. I normally run dctc as non-privileged user, but in this
situation I've tried everything to locate the problem, so I've run dctc as a
root, just to see if problem is permision-related. Unfortunately, got the same
hang-up. :(
Comment 6 Nebojsa Trpkovic 2005-06-07 09:57:38 UTC
UPDATE:

It seems that problem is semaphore-related (thanks to widan):

>ENOSPC for semget has nothing to do with full disks. From semget's man page :
>Code:
>ENOSPC     A semaphore set has to be created but the system  limit  for
>           the maximum number of semaphore sets (SEMMNI), or the system
>           wide  maximum  number  of  semaphores  (SEMMNS),  would   be
>           exceeded.
>
>Could you run those:
>Code:
>ipcs -ls
>ipcs

>Maybe some program (maybe dctc, maybe some other one) created semaphores but
>never deleted them, and the kernel table slowly filled up over time. If ipcs 
>returns a very long list under the title "semaphore arrays", there are probably
>too many semaphore sets.


And here we are! User 'titan' is used only for starting of dctc and he is owner
of most semaphores:

Code:
titan / # ipcs -ls

------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767

titan / # ipcs

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x10feed01 0          root      644        25656      4

------ Semaphore Arrays --------
key        semid      owner      perms      nsems
0x00feed00 32768      root      644        2
0x00000000 3637249    apache    600        1
0x00000000 3670018    apache    600        1
0x6eab5d4e 131075     titan     600        10
0x73692b8e 163844     titan     600        10
0x096d3adf 196613     titan     600        10
0x6c3a7223 229382     titan     600        10
0x0be0cd95 262151     titan     600        10
0x3735deb4 294920     titan     600        10
0x4a5dbae7 327689     titan     600        10
0x0dd1f3bb 360458     titan     600        10
0x22d456f7 393227     titan     600        10
0x38bcb84b 425996     titan     600        10
0x60010ebe 458765     titan     600        10
0x0b86becb 491534     titan     600        10
0x60e931d2 524303     titan     600        10
0x36a3a232 557072     titan     600        10
0x0b91d49e 589841     titan     600        10
0x6117c0a8 622610     titan     600        10
0x0c22aa5c 655379     titan     600        10
0x0fe20bf4 688148     titan     600        10
0x3785fe61 720917     titan     600        10
0x3b025a35 753686     titan     600        10
0x6277846a 786455     titan     600        10
0x0ceacca9 819224     titan     600        10
0x22844cd2 851993     titan     600        10
0x77c84297 884762     titan     600        10
0x3b906f7d 917531     titan     600        10
0x1100f450 950300     titan     600        10
0x4dc9a6c6 983069     titan     600        10
0x23b467bf 1015838    titan     600        10
0x6c333e51 1048607    titan     600        10
0x66c59b8c 1081376    titan     600        10
0x4e553ab6 1114145    titan     600        10
0x23810e17 1146914    titan     600        10
0x0f02f590 1179683    titan     600        10
0x64a2fddd 1212452    titan     600        10
0x39998e77 1245221    titan     600        10
0x4f2c2bfb 1277990    titan     600        10
0x52d51f8d 1310759    titan     600        10
0x4f92c7f7 1343528    titan     600        10
0x256611c1 1376297    titan     600        10
0x3a5b6ab4 1409066    titan     600        10
0x3e78ecf5 1441835    titan     600        10
0x65ac2fa8 1474604    titan     600        10
0x50a678d2 1507373    titan     600        10
0x25a76d59 1540142    titan     600        10
0x7b9fe085 1572911    titan     600        10
0x50c3913c 1605680    titan     600        10
0x51e54d9f 1638449    titan     600        10
0x54d9ea05 1671218    titan     600        10
0x3c7ef9fa 1703987    titan     600        10
0x528cb509 1736756    titan     600        10
0x27464464 1769525    titan     600        10
0x520d0f86 1802294    titan     600        10
0x67c62b84 1835063    titan     600        10
0x7d3d9eb9 1867832    titan     600        10
0x12f85944 1900601    titan     600        10
0x563ba1d0 1933370    titan     600        10
0x5361de35 1966139    titan     600        10
0x175f2692 1998908    titan     600        10
0x3eba763b 2031677    titan     600        10
0x2920dd9e 2064446    titan     600        10
0x5197c1bd 2097215    titan     600        10
0x2a181cf9 2129984    titan     600        10
0x3f41418d 2162753    titan     600        10
0x1548edbf 2195522    titan     600        10
0x2abb8b33 2228291    titan     600        10
0x14f38722 2261060    titan     600        10
0x2a97c547 2293829    titan     600        10
0x4028d5bc 2326598    titan     600        10
0x03ab7869 2359367    titan     600        10
0x6bef1878 2392136    titan     600        10
0x78dbad7a 2424905    titan     600        10
0x2bd4c0fe 2457674    titan     600        10
0x6ed33ac8 2490443    titan     600        10
0x164f267e 2523212    titan     600        10
0x6c167d48 2555981    titan     600        10
0x3d490107 2588750    titan     600        10
0x6c6b35df 2621519    titan     600        10
0x2feb048a 2654288    titan     600        10
0x56c0a560 2687057    titan     600        10
0x494f0af8 2719826    titan     600        10
0x25cacfb3 2752595    titan     600        10
0x22b3c15f 2785364    titan     600        10
0x37da8a3f 2818133    titan     600        10
0x0dfa2df1 2850902    titan     600        10
0x62cdfae5 2883671    titan     600        10
0x78d64081 2916440    titan     600        10
0x23f120c0 2949209    titan     600        10
0x78f822fd 2981978    titan     600        10
0x4e2f906e 3014747    titan     600        10
0x6382d3c6 3047516    titan     600        10
0x38d7c1be 3080285    titan     600        10
0x7dc60b23 3113054    titan     600        10
0x5293e65d 3145823    titan     600        10
0x7a429154 3178592    titan     600        10
0x0f451b1f 3211361    titan     600        10
0x64f9111d 3244130    titan     600        10
0x27b5e76f 3276899    titan     600        10
0x6557b205 3309668    titan     600        10
0x7a92b297 3342437    titan     600        10
0x10d89ce7 3375206    titan     600        10
0x13d50053 3407975    titan     600        10
0x3b4947eb 3539048    titan     600        10
0x65765f1d 3571817    titan     600        10
0x7ba28d61 3702890    titan     600        10
0x647cfccb 3735659    titan     600        10
0x549b7a16 3768428    titan     600        10
0x6a1c22e3 3801197    titan     600        10
0x552ed449 3833966    titan     600        10
0x3c6afe99 3866735    titan     600        10
0x4076b217 3899504    titan     600        10
0x15e70076 3932273    titan     600        10
0x7d6634c0 3965042    titan     600        10
0x682fc620 3997811    titan     600        10
0x6b64e8c7 4030580    titan     600        10
0x1383bc26 4063349    titan     600        10
0x56a5ea49 4096118    titan     600        10
0x7e73e2ff 4128887    titan     600        10
0x16d23b83 4161656    titan     600        10
0x2729074e 4194425    titan     600        10
0x41b68535 4227194    titan     600        10
0x293db85d 4259963    titan     600        10
0x6ce24989 4292732    titan     600        10
0x17bed7a0 4325501    titan     600        10
0x3fb77fab 4358270    titan     600        10
0x6a6294c3 4391039    titan     600        10

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages

titan / # 


Please, see details here:
http://forums.gentoo.org/viewtopic-t-342985.html






Comment 7 Nebojsa Trpkovic 2005-06-07 17:21:48 UTC
by widan (great linux mag from France):

First, you need to delete the ones that exist, to free up the kernel table entries :
Code:
for i in $(ipcs -s | grep titan | cut -d ' ' -f 1); do ipcrm -S $i; done

As to why it leaks semaphores, it's "normal" if you kill dctc that way: programs
killed with SIGKILL don't get a chance to clean up. This is usually not a
problem, but it is for IPC objects (semaphores, shared mem space and message
queues) : they are not associated to a process, so the kernel can't reclaim
them. You could try to kill with SIGTERM:
Code:
killall -TERM dctc
killall -TERM dctc_master

It's a bit less aggressive, and dctc might handle it better. If it fails, you
could use a script that does both kills and runs the code above to delete the IPCs.





So, I've deleted all semaphores that were owned by 'titan' user and dctc started
normaly.
I couldn't stop dctc and dctc_master by sending -TERM signal, so I had to make
stop script like this:
Code:
#!/bin/bash

killall -9 dctc 2>&1
killall -9 dctc_master 2>&1
sleep 2
for i in $(ipcs -s | grep titan | cut -d ' ' -f 1); do ipcrm -S $i; done

rm -f -r /home/titan/.dctc 2>&1


It works just fine for now and I have no semaphores owned by 'titan' when I turn
off dctc.
User 'titan' is not used to do anything else except starting dctc, so I guess it
will not be a problem to delete all his semaphores everytime I kill dctc.


http://forums.gentoo.org/viewtopic-t-342985.html