I am getting some strange results using du, which based on testing on other boxen, I take to be related to running a 2.6 kernel. What I see: # du -shx /* 5.4M /bin 4.0K /boot 0 /dev 28M /etc 3.5G /home 9.6M /lib 4.0K /lost+found 352K /mnt 342M /opt 258M /root 3.2M /sbin 0 /sys 296K /tmp 1.2G /usr 98M /var # du -shc /root 92K /root 92K total The problem is that the figure for /root (and others) differs between du /* and du /root I get the same results with coreutils-5.0.91-r2 and 5.0-r5. # emerge --info Portage 2.0.49-r18 (default-x86-1.4, gcc-3.3.2, glibc-2.3.2-r9, 2.6.0-test11-gentoo-r2) ================================================================= System uname: 2.6.0-test11-gentoo-r2 i686 Pentium III (Coppermine) Gentoo Base System version 1.4.3.12 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CFLAGS="-march=i686 -Os -fno-strict-aliasing -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/usr/X11R6/lib/X11/xkb" CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d" CXXFLAGS="-march=i686 -Os -fno-strict-aliasing -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache sandbox" GENTOO_MIRRORS="http://gentoo.oregonstate.edu http://distro.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="-j1" PKGDIR="/usr/portage/packages/Hosts/stan" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="X alsa apm avi bonobo crypt cups dvd encode gif gnome gtk gtk2 java jpeg mad mmx mpeg nas oggvorbis oss pam pcmcia pdflib png quicktime readline samba sdl spell sse ssl tiff truetype wmf x86 xv zlib" As I build on a different box in a chroot, I suspected this first (especially as my build box runs 2.4) However I also re-emerged coreutils directly on my laptop and it made no difference. I mention this in case the problem lies in some other package used by coreutils. Also, and I haven't found any clue about if this matters anywhere, I have linux-headers-2.4.21-r1. It just occurs to me that the headers may need to match the kernel. I know the .21 doesn't have to match but I wonder about the 2.4.
what if you reboot into 2.4.x and try it again
I am unable to reproduce this here. I don't think it's a kernel issue, because I'm using 2.6-test11(ish). I also have 2.4.21-r1 headers, so I doubt that is the issue. What filesystems are you using? Mine are all reiser. Also tried this on my server with 2.4.23-aa1, and it worked as expected. # du -shx /* 5.8M /bin 7.6M /boot 0 /dev 31M /etc 4.9G /home 57K /include 156M /lib 478K /lost+found 250K /mnt 435K /no 396M /opt 899M /proc 2.2G /root 7.5M /sbin 512 /server 104G /stuff 0 /sys 87M /tmp 7.7G /usr 438M /var # for i in /* ; do > du -shc $i > done 5.8M /bin 5.8M total 7.6M /boot 7.6M total 0 /dev 0 total 31M /etc 31M total 4.9G /home 4.9G total 57K /include 57K total 156M /lib 156M total 478K /lost+found 478K total 250K /mnt 250K total 435K /no 435K total 396M /opt 396M total 899M /proc 899M total 2.2G /root 2.2G total 7.5M /sbin 7.5M total 512 /server 512 total 104G /stuff 104G total 0 /sys 0 total 87M /tmp 87M total 13G /usr 13G total 438M /var 438M total
Easy bits first: partition is ext3 and when I rebooted into 2.4 to try again it was OK. Now, of course, that I'm back in 2.6 it's still fine. aargh. The only thing I'm sure of is that it isn't finger trouble as I put the commands in a quick script. Having tried everything I could think of which I may have done before, I finally spotted something different. When the problem showed, /proc gave an error - and because I used a script and only saved stdout I can't remember what it was. With hindsight it was an obvious suspect but it's too late now. If anyone has any hints what to look at if it ever happens again they will be much appreciated, otherwise I guess this can be closed as I can't repeat it.
Closing as TEST-REQUEST as we can't seem to reproduce this. Please reopen this bug by all means if you experience this again.
Guess what - it's back. Here is the info that seems useful to me: # df -alh Filesystem Size Used Avail Use% Mounted on /dev/hda6 2.0G 1.7G 140M 93% / none 0 0 0 - /dev none 0 0 0 - /proc none 0 0 0 - /sys none 0 0 0 - /dev/pts /dev/hda7 6.6G 5.4G 825M 88% /home tmpfs 126M 0 126M 0% /dev/shm # du -shx /* <snip> 342M /opt du: `/proc': No such file or directory 258M /root <snip> # du -shc /root 88K /root 88K total # ls -ld /proc dr-xr-xr-x 71 root root 0 Dec 18 20:44 /proc # ls /proc 1 3522 3928 5485 5561 apm filesystems meminfo sysvipc 10 3554 3939 5487 5714 asound fs misc tty 11 3707 3941 5489 5717 buddyinfo ide modules uptime 123 3725 4 5491 5718 bus interrupts mounts version 2 3726 4548 5493 5723 cmdline iomem mtrr vmstat 2692 3727 4550 5497 5728 cpuinfo ioports net 2700 3728 4552 5513 5731 crypto irq partitions 3 3729 5 5518 5747 devices kallsyms self 3187 3730 5445 5519 6 diskstats kcore slabinfo 3248 3746 5454 5520 7 dma kmsg stat 3255 3747 5471 5527 8 driver loadavg swaps 3438 3748 5473 5558 9 execdomains locks sys # du -shc /proc du: `/proc': No such file or directory 258M total # cat /etc/mtab /dev/hda6 / ext3 rw,noatime 0 0 none /dev devfs rw 0 0 none /proc proc rw 0 0 none /sys sysfs rw 0 0 none /dev/pts devpts rw 0 0 /dev/hda7 /home ext3 rw,noatime 0 0 tmpfs /dev/shm tmpfs rw 0 0 #du /proc <snip> 68 /proc/5527 du: `/proc': No such file or directory ls /proc showed that process 5558 was still there as earlier ls so... # ps uwwp 5558 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND george 5558 0.0 0.0 0 0 ? Z 20:49 0:00 [netstat] <defunct> # ls -l /proc/5558 ls: cannot read symbolic link /proc/5558/cwd: No such file or directory ls: cannot read symbolic link /proc/5558/root: No such file or directory ls: cannot read symbolic link /proc/5558/exe: No such file or directory total 0 -r-------- 1 root root 0 Dec 18 21:48 auxv -r--r--r-- 1 root root 0 Dec 18 21:34 cmdline lrwxrwxrwx 1 root root 0 Dec 18 21:48 cwd -r-------- 1 root root 0 Dec 18 21:48 environ lrwxrwxrwx 1 root root 0 Dec 18 21:48 exe dr-x------ 2 root root 0 Dec 18 21:48 fd -r--r--r-- 1 root root 0 Dec 18 21:48 maps -rw------- 1 root root 0 Dec 18 21:48 mem -r--r--r-- 1 root root 0 Dec 18 21:48 mounts lrwxrwxrwx 1 root root 0 Dec 18 21:48 root -r--r--r-- 1 root root 0 Dec 18 21:34 stat -r--r--r-- 1 root root 0 Dec 18 21:48 statm -r--r--r-- 1 root root 0 Dec 18 21:34 status dr-xr-xr-x 3 root root 0 Dec 18 21:22 task -r--r--r-- 1 root root 0 Dec 18 21:48 wchan OK to summarize what I think I've shown: du barfs on /proc/5558 as does ls and this barf confuses the output from du to apparently show /root as the wrong size although it really seems to overwrite it with the best it can manage from /proc This looks like two bugs to me now. 1. du doesn't output correctly when it has a problem counting 2. /proc is wrong for this process. I don't know where to go next so as this doesn't seem likely to cause me any lasting damage I'm happy to leave my laptop in this state to try more things.
Hmmm. Can I please have: $> emerge strace $> strace -o $some_file $command $command_arguments [ Try "strace -o some_file du -shx /proc", for example... ] Can you ** attach ** [ not paste! ] those to this bug please?
Created attachment 22429 [details] Trace as requested # strace -o du.trace du -shx /proc du: `/proc': No such file or directory #
Okay, can you try 2.6.0 and see if you still get this - if you do, try an older version of coreutils - if the problem persists, we'll need to file a coreutils bug.
Works fine here (also tested with 5.0): -- nosferatu root # du -shx /* 5.7M /bin 44K /boot 96K /dev 39M /etc 11G /home 28M /lib 16K /lost+found 520K /mnt 106M /opt 899M /proc 129M /root 5.4M /sbin 43G /space 0 /sys 14M /tmp 4.7G /usr 142M /var nosferatu root # du -shc /root/ 129M /root 129M total nosferatu root # uname -a Linux nosferatu 2.6.0 #3 SMP Fri Dec 26 14:36:17 SAST 2003 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux nosferatu root #
I'm now running 2.6.0-gentoo-r1 and I put a check for zombies in cron.hourly. As soon as a zombie was spotted I got exactly the same symptoms as in comment #5. The zombie was even netstat again and apart from a different pid I see exactly the same. The problem is the same with coreutils 5.0.91-r2 and with 5.0-r5.
I have no idea why this happens on george's box.
Are there any updates to this one? I am also unable to reproduce it here.
Just tried again and I still see the same as comment #10. I haven't done any significant updates since then. If anyone has any ideas that they want me to check then let me know. Otherwise I'm due an update once I can make enough time to update and test a gash box - I can't afford to wreck this one and I'm too cowardly/sensible to try significant updates to gnome and OOo (at least) without testing first.
What kernel are you currently running and experiencing the problem on?
As I said in comment #10, 2.6.0-gentoo-r1 ;)
That kernel is _very_ old, please upgrade it and see if it still happens on 2.6.5
I've now updated to 2.6.5-gentoo-r1. No other updates so we stand a chance of finding this. Exactly the same within 4 hours of rebooting (check is cron.hourly so I can't be more precise). The zombie is still netstat and apart from the pid everything is the same as comment #5. I'm still nowhere near ready for the real upgrade to this laptop so I'm still happy to try updating specific packages or other tests.
george, this happens with coreutils-5.2.0 as well? Please try that, while I put 5.2.1 into portage
Updated to coreutile-5.2.0-r2. No other changes, no logout/login, or reboot to make sure I kept my zombie. Output of du -shx /* is now more sensible .... <snip> 342M /opt du: `/proc/7988/task': No such file or directory du: `/proc/7988/fd': No such file or directory 260M /proc 116K /root <snip> ls also gives errors for the missing directories above, and also gives errors for the broken links which are still there... ls -l /proc/7988 ls: cannot read symbolic link /proc/7988/cwd: No such file or directory ls: cannot read symbolic link /proc/7988/root: No such file or directory ls: cannot read symbolic link /proc/7988/exe: No such file or directory I'm not sure if du should report these as errors so I just offer the observation that they also don't show as errors even with du -a. I'm also left with the question whether a zombie should leave broken links in /proc. Is this reasonable or broken? fwiw, the parent of the zombie netstat process is galeon-bin and I'm running 1.3.12. I'll add another comment if anything changes next time I logout/login or reboot.
I'm closing this due to oldness of bug, and the fact that it isn't a kernel issue.