Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 60236 - major apps hang when calling wait4() or waitpid()
Summary: major apps hang when calling wait4() or waitpid()
Status: RESOLVED WORKSFORME
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Library (show other bugs)
Hardware: x86 Linux
: High critical (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-13 09:25 UTC by Marcus
Modified: 2004-08-25 05:29 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
kernel config that probably caused the bug, bzipped (dev17.bz2,6.11 KB, application/octet-stream)
2004-08-25 05:29 UTC, Marcus
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus 2004-08-13 09:25:22 UTC
Since an emerge -up world , all major applications hang:
eclipse/gtk, mozilla.

last update was:
[ebuild U ] sys-apps/baselayout-1.10.3 [1.10.2]
[ebuild N ] sys-kernel/linux26-headers-2.6.7-r4
[ebuild U ] sys-libs/glibc-2.3.4.20040808 [2.3.4.20040619-r1]
[ebuild U ] sys-devel/gcc-config-1.3.6-r1 [1.3.6]
[ebuild U ] sys-libs/cracklib-2.7-r10 [2.7-r9]
[ebuild U ] sys-apps/procps-3.2.3 [3.2.2-r1]
[ebuild U ] media-libs/libpng-1.2.5-r8 [1.2.5-r7]
[ebuild U ] net-nds/openldap-2.1.30-r2 [2.1.30-r1]
[ebuild U ] sys-libs/gdbm-1.8.3-r1 [1.8.3]
[ebuild U ] dev-php/php-5.0.0-r1 [4.3.8]
[ebuild U ] app-crypt/gnupg-1.2.5 [1.2.4]
[ebuild U ] media-libs/libmng-1.0.8 [1.0.5]

I dont know which package causes the error. 

When starting mozilla, it complains: "No running windows found"
   strace -ttt -T -o /tmp/1.txt mozilla   
In this strace example, I control-c-ed it after some seconds:

1092408986.266493 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000009>
1092408986.266566 close(3)              = 0 <0.000007>
1092408986.266612 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 <0.000006>
1092408986.266671 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000007>
1092408986.266715 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 <0.000006>
1092408986.266761 rt_sigaction(SIGINT, {0x8079410, [], 0}, {SIG_DFL}, 8) = 0 <0.000006>
1092408986.266812 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 4}], 0) = 12751 <0.031601>
1092408986.298551 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 12752 <0.000149>
1092408986.298778 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000009>
1092408986.298844 rt_sigaction(SIGINT, {SIG_DFL}, {0x8079410, [], 0}, 8) = 0 <0.000006>
1092408986.298897 close(3)              = -1 EBADF (Bad file descriptor) <0.000006>
1092408986.298945 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006>
1092408986.298985 --- SIGCHLD (Child exited) @ 0 (0) ---
1092408986.299005 waitpid(-1, 0xbfffe05c, WNOHANG) = -1 ECHILD (No child processes) <0.000005>
1092408986.299043 sigreturn()           = ? (mask now []) <0.000006>
1092408986.299236 fcntl64(1, F_GETFD)   = 0 <0.000008>
1092408986.299277 fcntl64(1, F_DUPFD, 10) = 10 <0.000008>
1092408986.299311 fcntl64(1, F_GETFD)   = 0 <0.000006>
1092408986.299345 fcntl64(10, F_SETFD, FD_CLOEXEC) = 0 <0.000006>
1092408986.299379 dup2(2, 1)            = 1 <0.000006>
1092408986.299462 fcntl64(2, F_GETFD)   = 0 <0.000006>
1092408986.299531 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 <0.000008>
1092408986.299597 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40185000 <0.000014>
1092408986.299665 write(1, "No running windows found\n", 25) = 25 <0.000640>
1092408986.300396 dup2(10, 1)           = 1 <0.000007>
1092408986.300456 fcntl64(10, F_GETFD)  = 0x1 (flags FD_CLOEXEC) <0.000007>
1092408986.300500 close(10)             = 0 <0.000006>
1092408986.300610 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
1092408986.300811 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
1092408986.300862 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 <0.000007>
1092408986.300919 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x40030b58) = 12756 <0.000121>
1092408986.324080 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000010>
1092408986.324221 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000006>
1092408986.324291 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 <0.000007>
1092408986.324341 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000006>
1092408986.324379 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 <0.000007>
1092408986.324424 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000006>
1092408986.324479 rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0 <0.000006>
1092408986.324524 rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0 <0.000006>
1092408986.324562 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006>
1092408986.324842 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
1092408986.324899 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000006>
1092408986.324975 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 <0.000005>
1092408986.325019 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000006>
1092408986.325065 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006>
1092408986.325104 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000006>
1092408986.325149 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000005>
1092408986.325187 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 <0.000006>
1092408986.325234 rt_sigaction(SIGINT, {0x8079410, [], 0}, {SIG_DFL}, 8) = 0 <0.000007>
1092408986.325286 waitpid(-1, 


0xbfffe658, 0) = ? ERESTARTSYS (To be restarted) <65.825213>
1092409052.150608 --- SIGINT (Interrupt) @ 0 (0) ---
1092409052.150648 rt_sigaction(SIGINT, {SIG_DFL}, {0x8079410, [], 0}, 8) = 0 <0.000008>
1092409052.150715 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
1092409052.150864 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000006>
1092409052.150961 munmap(0x40185000, 4096) = 0 <0.000020>
1092409052.151010 exit_group(129)       = ?




This error appears only since yesterday's update, i have never seen something like that before. So I guess it is due to one of the packages. I think, it's kernel-headers or

Reproducible: Always
Steps to Reproduce:


Actual Results:  
 

Expected Results:  
mozilla is starting very slowly, brings up a window after several minutes, and 
updates every view with >20s time difference. In between, it is completely 
dead. 
 
Eclipse hangs in the wait4() call, what I have seen in the strace log. 
 

 
 
emerge -info says: 
 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
Gentoo Base System version 1.5.2 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
sem_post: Invalid argument 
Portage 2.0.50-r9 (default-x86-2004.0, gcc-3.3.4, glibc-2.3.4.20040808-r0, 
2.6.8-rc2) 
================================================================= 
System uname: 2.6.8-rc2 i686 AMD Athlon(TM) XP1700+ 
ccache version 2.3 [enabled] 
Autoconf: sys-devel/autoconf-2.59-r4 
Automake: sys-devel/automake-1.8.5-r1 
ACCEPT_KEYWORDS="x86 ~x86" 
AUTOCLEAN="yes" 
CFLAGS="-O2 -funroll-loops -pipe -mcpu=athlon-xp -march=athlon-xp " 
CHOST="i686-pc-linux-gnu" 
COMPILER="gcc3" 
CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3.2/share/config /usr/kde/3.3/share/config /usr/kde/3/share/config /usr/lib/mozi                                     
lla/defaults/pref /usr/share/config /var/qmail/control" 
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" 
CXXFLAGS="-O2 -funroll-loops -pipe -mcpu=athlon-xp -march=athlon-xp " 
DISTDIR="/usr/portage/distfiles" 
FEATURES="autoaddcvs ccache sandbox" 
GENTOO_MIRRORS="http://ftp.uni-erlangen.de/pub/mirrors/gentoo" 
MAKEOPTS=" -j4 " 
PKGDIR="/usr/portage/packages" 
PORTAGE_TMPDIR="/var/tmp" 
PORTDIR="/usr/portage" 
PORTDIR_OVERLAY="/usr/local/portage" 
SYNC="rsync://rsync.gentoo.org/gentoo-portage" 
USE="3dfx 3dnow X aalib acl acpi alsa apache2 apm arts avi berkdb cdr crypt 
cups dba encode esd flash foomaticdb gd gdbm gif glx gnome gpm gtk gtk2 icq 
imap                                      imlib ipv6 java jikes jpeg junit kde 
ldap lib libg++ libwww mad matrox mbox mikmod mmx motif mozilla mpeg mysql nas 
ncurses nls nptl nvidia oggvorbis opengl                                      
oss pam pdflib perl png python qt quicktime readline sdl slang spell ssl svga 
tcltk tcpd tiff truetype usb vhosts video_cards_nvidia videos www x86 xml xml2                                      
xmms xv zlib"
Comment 1 Marcus 2004-08-15 14:45:18 UTC
What i did now: 
- removed 2.4 kernel headers
- re-emerged recent
  - kernel sources 
  - kernel-2.6 headers
  - glibc
  - and the whole previous posted list of packages;


The problem was not solved; same behaviour. 
Comment 2 Marcus 2004-08-25 05:29:16 UTC
Created attachment 38162 [details]
kernel config that probably caused the bug, bzipped

i using that configuration no more and the problem disappeared
Comment 3 Marcus 2004-08-25 05:29:42 UTC
This bug was most probably caused by a weird linux kernel.

I used a certain config on 2.6.7 sources , gentoo emerged 2.6.8*-rc2 and 2.6.8*-rc4 sources. I recompiled the kernel with minor changes to the prior config , and got the error. The same happened with the 2.8.1 (dot nothing) kernel on a re-installed gentoo installation.

After cleaning the configuration, re-selecting all items and recompiling, my applications ran again well.