Summary: | coreutils: du problems with 2.6 kernel | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | george <gk> |
Component: | [OLD] Core system | Assignee: | x86-kernel (DEPRECATED) <x86-kernel> |
Status: | RESOLVED INVALID | ||
Severity: | major | CC: | base-system, steel300, vapier |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Trace as requested |
Description
george
2003-12-17 08:31:20 UTC
what if you reboot into 2.4.x and try it again I am unable to reproduce this here. I don't think it's a kernel issue, because
I'm using 2.6-test11(ish). I also have 2.4.21-r1 headers, so I doubt that is
the issue. What filesystems are you using? Mine are all reiser. Also tried
this on my server with 2.4.23-aa1, and it worked as expected.
# du -shx /*
5.8M /bin
7.6M /boot
0 /dev
31M /etc
4.9G /home
57K /include
156M /lib
478K /lost+found
250K /mnt
435K /no
396M /opt
899M /proc
2.2G /root
7.5M /sbin
512 /server
104G /stuff
0 /sys
87M /tmp
7.7G /usr
438M /var
# for i in /* ; do
> du -shc $i
> done
5.8M /bin
5.8M total
7.6M /boot
7.6M total
0 /dev
0 total
31M /etc
31M total
4.9G /home
4.9G total
57K /include
57K total
156M /lib
156M total
478K /lost+found
478K total
250K /mnt
250K total
435K /no
435K total
396M /opt
396M total
899M /proc
899M total
2.2G /root
2.2G total
7.5M /sbin
7.5M total
512 /server
512 total
104G /stuff
104G total
0 /sys
0 total
87M /tmp
87M total
13G /usr
13G total
438M /var
438M total
Easy bits first: partition is ext3 and when I rebooted into 2.4 to try again it was OK. Now, of course, that I'm back in 2.6 it's still fine. aargh. The only thing I'm sure of is that it isn't finger trouble as I put the commands in a quick script. Having tried everything I could think of which I may have done before, I finally spotted something different. When the problem showed, /proc gave an error - and because I used a script and only saved stdout I can't remember what it was. With hindsight it was an obvious suspect but it's too late now. If anyone has any hints what to look at if it ever happens again they will be much appreciated, otherwise I guess this can be closed as I can't repeat it. Closing as TEST-REQUEST as we can't seem to reproduce this. Please reopen this bug by all means if you experience this again. Guess what - it's back. Here is the info that seems useful to me: # df -alh Filesystem Size Used Avail Use% Mounted on /dev/hda6 2.0G 1.7G 140M 93% / none 0 0 0 - /dev none 0 0 0 - /proc none 0 0 0 - /sys none 0 0 0 - /dev/pts /dev/hda7 6.6G 5.4G 825M 88% /home tmpfs 126M 0 126M 0% /dev/shm # du -shx /* <snip> 342M /opt du: `/proc': No such file or directory 258M /root <snip> # du -shc /root 88K /root 88K total # ls -ld /proc dr-xr-xr-x 71 root root 0 Dec 18 20:44 /proc # ls /proc 1 3522 3928 5485 5561 apm filesystems meminfo sysvipc 10 3554 3939 5487 5714 asound fs misc tty 11 3707 3941 5489 5717 buddyinfo ide modules uptime 123 3725 4 5491 5718 bus interrupts mounts version 2 3726 4548 5493 5723 cmdline iomem mtrr vmstat 2692 3727 4550 5497 5728 cpuinfo ioports net 2700 3728 4552 5513 5731 crypto irq partitions 3 3729 5 5518 5747 devices kallsyms self 3187 3730 5445 5519 6 diskstats kcore slabinfo 3248 3746 5454 5520 7 dma kmsg stat 3255 3747 5471 5527 8 driver loadavg swaps 3438 3748 5473 5558 9 execdomains locks sys # du -shc /proc du: `/proc': No such file or directory 258M total # cat /etc/mtab /dev/hda6 / ext3 rw,noatime 0 0 none /dev devfs rw 0 0 none /proc proc rw 0 0 none /sys sysfs rw 0 0 none /dev/pts devpts rw 0 0 /dev/hda7 /home ext3 rw,noatime 0 0 tmpfs /dev/shm tmpfs rw 0 0 #du /proc <snip> 68 /proc/5527 du: `/proc': No such file or directory ls /proc showed that process 5558 was still there as earlier ls so... # ps uwwp 5558 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND george 5558 0.0 0.0 0 0 ? Z 20:49 0:00 [netstat] <defunct> # ls -l /proc/5558 ls: cannot read symbolic link /proc/5558/cwd: No such file or directory ls: cannot read symbolic link /proc/5558/root: No such file or directory ls: cannot read symbolic link /proc/5558/exe: No such file or directory total 0 -r-------- 1 root root 0 Dec 18 21:48 auxv -r--r--r-- 1 root root 0 Dec 18 21:34 cmdline lrwxrwxrwx 1 root root 0 Dec 18 21:48 cwd -r-------- 1 root root 0 Dec 18 21:48 environ lrwxrwxrwx 1 root root 0 Dec 18 21:48 exe dr-x------ 2 root root 0 Dec 18 21:48 fd -r--r--r-- 1 root root 0 Dec 18 21:48 maps -rw------- 1 root root 0 Dec 18 21:48 mem -r--r--r-- 1 root root 0 Dec 18 21:48 mounts lrwxrwxrwx 1 root root 0 Dec 18 21:48 root -r--r--r-- 1 root root 0 Dec 18 21:34 stat -r--r--r-- 1 root root 0 Dec 18 21:48 statm -r--r--r-- 1 root root 0 Dec 18 21:34 status dr-xr-xr-x 3 root root 0 Dec 18 21:22 task -r--r--r-- 1 root root 0 Dec 18 21:48 wchan OK to summarize what I think I've shown: du barfs on /proc/5558 as does ls and this barf confuses the output from du to apparently show /root as the wrong size although it really seems to overwrite it with the best it can manage from /proc This looks like two bugs to me now. 1. du doesn't output correctly when it has a problem counting 2. /proc is wrong for this process. I don't know where to go next so as this doesn't seem likely to cause me any lasting damage I'm happy to leave my laptop in this state to try more things. Hmmm. Can I please have: $> emerge strace $> strace -o $some_file $command $command_arguments [ Try "strace -o some_file du -shx /proc", for example... ] Can you ** attach ** [ not paste! ] those to this bug please? Created attachment 22429 [details]
Trace as requested
# strace -o du.trace du -shx /proc
du: `/proc': No such file or directory
#
Okay, can you try 2.6.0 and see if you still get this - if you do, try an older version of coreutils - if the problem persists, we'll need to file a coreutils bug. Works fine here (also tested with 5.0): -- nosferatu root # du -shx /* 5.7M /bin 44K /boot 96K /dev 39M /etc 11G /home 28M /lib 16K /lost+found 520K /mnt 106M /opt 899M /proc 129M /root 5.4M /sbin 43G /space 0 /sys 14M /tmp 4.7G /usr 142M /var nosferatu root # du -shc /root/ 129M /root 129M total nosferatu root # uname -a Linux nosferatu 2.6.0 #3 SMP Fri Dec 26 14:36:17 SAST 2003 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux nosferatu root # I'm now running 2.6.0-gentoo-r1 and I put a check for zombies in cron.hourly. As soon as a zombie was spotted I got exactly the same symptoms as in comment #5. The zombie was even netstat again and apart from a different pid I see exactly the same. The problem is the same with coreutils 5.0.91-r2 and with 5.0-r5. I have no idea why this happens on george's box. Are there any updates to this one? I am also unable to reproduce it here. Just tried again and I still see the same as comment #10. I haven't done any significant updates since then. If anyone has any ideas that they want me to check then let me know. Otherwise I'm due an update once I can make enough time to update and test a gash box - I can't afford to wreck this one and I'm too cowardly/sensible to try significant updates to gnome and OOo (at least) without testing first. What kernel are you currently running and experiencing the problem on? As I said in comment #10, 2.6.0-gentoo-r1 ;) That kernel is _very_ old, please upgrade it and see if it still happens on 2.6.5 I've now updated to 2.6.5-gentoo-r1. No other updates so we stand a chance of finding this. Exactly the same within 4 hours of rebooting (check is cron.hourly so I can't be more precise). The zombie is still netstat and apart from the pid everything is the same as comment #5. I'm still nowhere near ready for the real upgrade to this laptop so I'm still happy to try updating specific packages or other tests. george, this happens with coreutils-5.2.0 as well? Please try that, while I put 5.2.1 into portage Updated to coreutile-5.2.0-r2. No other changes, no logout/login, or reboot to make sure I kept my zombie. Output of du -shx /* is now more sensible .... <snip> 342M /opt du: `/proc/7988/task': No such file or directory du: `/proc/7988/fd': No such file or directory 260M /proc 116K /root <snip> ls also gives errors for the missing directories above, and also gives errors for the broken links which are still there... ls -l /proc/7988 ls: cannot read symbolic link /proc/7988/cwd: No such file or directory ls: cannot read symbolic link /proc/7988/root: No such file or directory ls: cannot read symbolic link /proc/7988/exe: No such file or directory I'm not sure if du should report these as errors so I just offer the observation that they also don't show as errors even with du -a. I'm also left with the question whether a zombie should leave broken links in /proc. Is this reasonable or broken? fwiw, the parent of the zombie netstat process is galeon-bin and I'm running 1.3.12. I'll add another comment if anything changes next time I logout/login or reboot. I'm closing this due to oldness of bug, and the fact that it isn't a kernel issue. |