randomly a process hogs the CPU for a very long time and then continues. the latest example is while compiling wireshark - amidst the process, 'sed' hangs for more than 17 minutes: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7760 root 39 19 7768 360 256 R 100 0.0 16:57.61 sed but after that, the compilation succeeds anyway. I tried to identify the state sed was in: # ps ax -T -o ppid,pid,args,stat,wchan | grep sed 7756 7760 /bin/sed -e 1s/^X// -e s/\( RN+ - but that doesn't help me much. I've got 4 CPU's and didn't do anything besides waiting for emerge to finish. I don't know how to reproduce this issue, but the last 2 days it was happening with a bash script I used for stress-testing inotify, like: #!/bin/sh # very strange. this script just hangs randomly on an up-to-date gentoo. # be it in a `seq` call (top confirms that) or in `touch`. # btw - touch seriously degrades performance!!! OUTPUT=. for j in `seq 0 9999`; do for i in `seq 0 49`; do # touch $OUTPUT/file$i.txt echo 1 > $OUTPUT/file$i.txt echo 1 > $OUTPUT/file$i.txt echo 1 > $OUTPUT/file$i.txt done done Randomly 'seq' or 'touch' were hogging up one of the CPUs - but would continue after a very long pause. I've compiled the system from scratch (emerge -e) at least 2 times during the last 3 weeks and was never using any fancy CFLAGS anyway. I was running Debian testing for 4 month beforehand on the very same hardware and did not have such a behaviour. Reproducible: Sometimes Steps to Reproduce: 1. run some cpu intensive program 2. wait indefinitely until a program hogs the cpu 3. Actual Results: randomly gets stuck and hogs CPU - but will continue after a very long time. Portage 2.1.4.4 (default/linux/amd64/2008.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7 x86_64) ================================================================= System uname: 2.6.25-gentoo-r7 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz Timestamp of tree: Thu, 28 Aug 2008 07:04:01 +0000 ccache version 2.4 [enabled] app-shells/bash: 3.2_p33 dev-lang/python: 2.5.2-r6 dev-python/pycrypto: 2.0.1-r6 dev-util/ccache: 2.4-r7 sys-apps/baselayout: 1.12.11.1 sys-apps/sandbox: 1.2.18.1-r2 sys-devel/autoconf: 2.13, 2.61-r2 sys-devel/automake: 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1 sys-devel/binutils: 2.18-r3 sys-devel/gcc-config: 1.4.0-r4 sys-devel/libtool: 1.5.26 virtual/os-headers: 2.6.23-r3 ACCEPT_KEYWORDS="amd64" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-mtune=nocona -O2 -pipe -ggdb" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/config" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d" CXXFLAGS="-mtune=nocona -O2 -pipe -ggdb" DISTDIR="/usr/portage/distfiles" FEATURES="buildpkg ccache distlocks fixpackages metadata-transfer parallel-fetch sandbox sfperms splitdebug strict unmerge-orphans userfetch" GENTOO_MIRRORS="ftp://mirror.switch.ch/mirror/gentoo/ ftp://mirror.ing.unibo.it/gentoo/ ftp://mirror.qubenet.net/mirror/gentoo/ ftp://ftp-stud.fht-esslingen.de/pub/Mirrors/gentoo/ ftp://pandemonium.tiscali.de/pub/gentoo/" LC_ALL="en_US" LDFLAGS="-Wl,-O1" LINGUAS="en de" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/portage/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/portage/local/layman/desktop-effects" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X acl acpi alsa amd64 bash-completion berkdb bluetooth branding bzip2 cairo cdr cli cracklib crypt cups dbus dri dvd dvdr dvdread eds emboss encode esd evo fam firefox fortran gdbm gif gpm gstreamer hal iconv ipv6 isdnlog jpeg kde kerberos libnotify lm_sensors locales mad midi mikmod mmx mp3 mpeg mudflap multilib ncurses nls nptl nptlonly ogg opengl openmp pam pcre pdf perl png ppds pppd python qt qt3 qt3support qt4 quicktime readline reflection sdl session spell spl sse sse2 ssl startup-notification svg sysfs tcpd tiff truetype unicode usb userlocales utf8 vim-syntax vorbis xcomposite xinerama xml xorg xv zlib" ALSA_CARDS="hda_intel" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="en de" USERLAND="GNU" VIDEO_CARDS="fglrx" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, FFLAGS, INSTALL_MASK, LANG, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
looks like a similar problem to: http://kerneltrap.org/mailarchive/linux-kernel/2007/10/19/348832 but there was no solution provided ...
It's happening again - this time with 'touch' of the simple shell script above: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16692 foobar 20 0 7816 408 312 R 100 0.0 6:03.72 touch The kernel link above suggested to cat the stat of the process: # while true; do cat /proc/16692/stat ; sleep 5; done 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 17803 2 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 18303 2 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 18803 2 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 19303 2 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315432 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 19802 3 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 20302 3 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 20802 3 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315441 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 21302 3 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0 16692 (touch) R 22679 22677 7314 34821 22677 4194304 180 0 0 0 21802 3 0 0 20 0 1 0 15147700 8003584 102 18446744073709551615 4194304 4242292 140736481217664 18446744073709551615 140401332315429 0 0 0 0 0 0 0 17 3 0 0 0 0 0
12 minutes later the shell script continues. Here's the output of strace: # strace -p 16692 Process 16692 attached - interrupt to quit brk(0) = 0x60d000 brk(0x62e000) = 0x62e000 close(0) = 0 open("./file66.txt", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 0 utimensat(0, NULL, NULL, 0) = 0 close(0) = 0 close(1) = 0 close(2) = 0 exit_group(0) = ? Process 16692 detached
Tried gentoo-sources-2.6.26-r1 - happens too.
(In reply to comment #4) > Tried gentoo-sources-2.6.26-r1 - happens too. > Just a couple stabs in the dark, but you could try following forks in strace. I tend to do something like: strace -f -s 4096 -T -p <PID> This will follow all forks, show you a little more data (optional), and give you timing information on each syscall (cool feature IMO). Also, how does your disk look while you are waiting? I agree that 10+ minutes is too long in any case, but an iostat against your device would be interesting.
(In reply to comment #4) > Tried gentoo-sources-2.6.26-r1 - happens too. > Couldn't reproduce your bug. I'd also like to see the strace to see if it waits for anything. I would try running your bash script against another file system, against another hard disk and maybe even against tmpfs mount -t tmpfs tmpfs -o size=10M /mnt/tmpfs Also, could you post your .config?
Please enable CONFIG_MAGIC_SYSRQ and increase CONFIG_LOG_BUF_SHIFT by 1 or 2 so that it's plenty big enough to store a lot of data. Then, when you see this bug reproduced (a process 'frozen' and hogging CPU) please note down its PID and then run: echo t > /proc/sysrq-trigger dmesg > taskstate.txt Then attach taskstate.txt to this bug, and tell us the noted PID. This will indicate exactly where the kernel is stuck. Thanks!
Please reopen with the info requested above.
Created attachment 181057 [details] taskstate requested earlier in bug report
I have this same problem. Currently on my system the following command is taking up 100% of the CPU: cat /proc/cpuinfo It has been running for 13+ minutes. Executing this command in another terminal works just fine. I've seen this happen with cat, sed, and grep. I ran the commands suggested and dumped taskstate.txt. It is attached.
Comment on attachment 181057 [details] taskstate requested earlier in bug report The relevant PID for "cat /proc/sysinfo" is 20935
Reopening since another user has the same problem and provided necessary/desired information.
Okay, since this is one of the oldest active bugs in here it's time for some attention. Can you, gentlemen, reproduce the issue even with the latest stable gentoo-sources (gentoo-sources-2.6.30-r5)?
This is some form of heisenbug. I haven't seen it in a long time now. I'll be updating my kernel next week so I can post something more substantial then.
I'm gonna close this bug as NEEDINFO. If you ever get the same issue, feel free to update this report.