Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 36001 - coreutils: du problems with 2.6 kernel
Summary: coreutils: du problems with 2.6 kernel
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All All
: High major (vote)
Assignee: x86-kernel@gentoo.org (DEPRECATED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-12-17 08:31 UTC by george
Modified: 2004-06-21 13:07 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Trace as requested (du.trace,330.40 KB, text/plain)
2003-12-19 00:47 UTC, george
Details

Note You need to log in before you can comment on or make changes to this bug.
Description george 2003-12-17 08:31:20 UTC
I am getting some strange results using du, which based on testing on other boxen, I take to be related to running a 2.6 kernel.

What I see:
# du -shx /*
5.4M    /bin
4.0K    /boot
0       /dev
28M     /etc
3.5G    /home
9.6M    /lib
4.0K    /lost+found
352K    /mnt
342M    /opt
258M    /root
3.2M    /sbin
0       /sys
296K    /tmp
1.2G    /usr
98M     /var
# du -shc /root
92K     /root
92K     total

The problem is that the figure for /root (and others) differs between du /* and du /root

I get the same results with coreutils-5.0.91-r2 and 5.0-r5.

# emerge --info
Portage 2.0.49-r18 (default-x86-1.4, gcc-3.3.2, glibc-2.3.2-r9, 2.6.0-test11-gentoo-r2)
=================================================================
System uname: 2.6.0-test11-gentoo-r2 i686 Pentium III (Coppermine)
Gentoo Base System version 1.4.3.12
ACCEPT_KEYWORDS="x86 ~x86"
AUTOCLEAN="yes"
CFLAGS="-march=i686 -Os -fno-strict-aliasing -pipe"
CHOST="i686-pc-linux-gnu"
COMPILER="gcc3"
CONFIG_PROTECT="/usr/X11R6/lib/X11/xkb"
CONFIG_PROTECT_MASK="/etc/gconf /etc/env.d"
CXXFLAGS="-march=i686 -Os -fno-strict-aliasing -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoaddcvs ccache sandbox"
GENTOO_MIRRORS="http://gentoo.oregonstate.edu http://distro.ibiblio.org/pub/Linux/distributions/gentoo"
MAKEOPTS="-j1"
PKGDIR="/usr/portage/packages/Hosts/stan"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="X alsa apm avi bonobo crypt cups dvd encode gif gnome gtk gtk2 java jpeg mad mmx mpeg nas oggvorbis oss pam pcmcia pdflib png quicktime readline samba sdl spell sse ssl tiff truetype wmf x86 xv zlib"

As I build on a different box in a chroot, I suspected this first (especially as my build box runs 2.4)  However I also re-emerged coreutils directly on my laptop and it made no difference.  I mention this in case the problem lies in some other package used by coreutils.

Also, and I haven't found any clue about if this matters anywhere, I have linux-headers-2.4.21-r1.  It just occurs to me that the headers may need to match the kernel.  I know the .21 doesn't have to match but I wonder about the 2.4.
Comment 1 SpanKY gentoo-dev 2003-12-18 08:39:50 UTC
what if you reboot into 2.4.x and try it again
Comment 2 Brian Jackson (RETIRED) gentoo-dev 2003-12-18 08:45:32 UTC
I am unable to reproduce this here. I don't think it's a kernel issue, because
I'm using 2.6-test11(ish). I also have 2.4.21-r1 headers, so I doubt that is
the issue. What filesystems are you using? Mine are all reiser. Also tried
this on my server with 2.4.23-aa1, and it worked as expected.

# du -shx /*
5.8M    /bin
7.6M    /boot
0       /dev
31M     /etc
4.9G    /home
57K     /include
156M    /lib
478K    /lost+found
250K    /mnt
435K    /no
396M    /opt
899M    /proc
2.2G    /root
7.5M    /sbin
512     /server
104G    /stuff
0       /sys
87M     /tmp
7.7G    /usr
438M    /var


# for i in /* ; do
> du -shc $i
> done
5.8M    /bin
5.8M    total
7.6M    /boot
7.6M    total
0       /dev
0       total
31M     /etc
31M     total
4.9G    /home
4.9G    total
57K     /include
57K     total
156M    /lib
156M    total
478K    /lost+found
478K    total
250K    /mnt
250K    total
435K    /no
435K    total
396M    /opt
396M    total
899M    /proc
899M    total
2.2G    /root
2.2G    total
7.5M    /sbin
7.5M    total
512     /server
512     total
104G    /stuff
104G    total
0       /sys
0       total
87M     /tmp
87M     total
13G     /usr
13G     total
438M    /var
438M    total
Comment 3 george 2003-12-18 12:30:16 UTC
Easy bits first:  partition is ext3 and when I rebooted into 2.4 to try again it was OK.

Now, of course, that I'm back in 2.6 it's still fine.  aargh.  The only thing I'm sure of is that it isn't finger trouble as I put the commands in a quick script.  Having tried everything I could think of which I may have done before, I finally spotted something different.

When the problem showed, /proc gave an error - and because I used a script and only saved stdout I can't remember what it was.  With hindsight it was an obvious suspect but it's too late now.

If anyone has any hints what to look at if it ever happens again they will be much appreciated, otherwise I guess this can be closed as I can't repeat it.
Comment 4 Tim Yamin (RETIRED) gentoo-dev 2003-12-18 12:52:24 UTC
Closing as TEST-REQUEST as we can't seem to reproduce this. Please reopen this bug by all means if you experience this again.
Comment 5 george 2003-12-18 13:56:36 UTC
Guess what - it's back.  Here is the info that seems useful to me:

# df -alh
Filesystem            Size  Used Avail Use% Mounted on
/dev/hda6             2.0G  1.7G  140M  93% /
none                     0     0     0   -  /dev
none                     0     0     0   -  /proc
none                     0     0     0   -  /sys
none                     0     0     0   -  /dev/pts
/dev/hda7             6.6G  5.4G  825M  88% /home
tmpfs                 126M     0  126M   0% /dev/shm
# du -shx /*
<snip>
342M    /opt
du: `/proc': No such file or directory
258M    /root
<snip>
# du -shc /root
88K     /root
88K     total
# ls -ld /proc
dr-xr-xr-x   71 root     root            0 Dec 18 20:44 /proc
# ls /proc
1     3522  3928  5485  5561  apm          filesystems  meminfo     sysvipc
10    3554  3939  5487  5714  asound       fs           misc        tty
11    3707  3941  5489  5717  buddyinfo    ide          modules     uptime
123   3725  4     5491  5718  bus          interrupts   mounts      version
2     3726  4548  5493  5723  cmdline      iomem        mtrr        vmstat
2692  3727  4550  5497  5728  cpuinfo      ioports      net
2700  3728  4552  5513  5731  crypto       irq          partitions
3     3729  5     5518  5747  devices      kallsyms     self
3187  3730  5445  5519  6     diskstats    kcore        slabinfo
3248  3746  5454  5520  7     dma          kmsg         stat
3255  3747  5471  5527  8     driver       loadavg      swaps
3438  3748  5473  5558  9     execdomains  locks        sys
# du -shc /proc
du: `/proc': No such file or directory
258M    total
# cat /etc/mtab
/dev/hda6 / ext3 rw,noatime 0 0
none /dev devfs rw 0 0
none /proc proc rw 0 0
none /sys sysfs rw 0 0
none /dev/pts devpts rw 0 0
/dev/hda7 /home ext3 rw,noatime 0 0
tmpfs /dev/shm tmpfs rw 0 0
#du /proc
<snip>
68      /proc/5527
du: `/proc': No such file or directory

ls /proc showed that process 5558 was still there as earlier ls so...

# ps uwwp 5558
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
george    5558  0.0  0.0     0    0 ?        Z    20:49   0:00 [netstat] <defunct>
# ls -l /proc/5558
ls: cannot read symbolic link /proc/5558/cwd: No such file or directory
ls: cannot read symbolic link /proc/5558/root: No such file or directory
ls: cannot read symbolic link /proc/5558/exe: No such file or directory
total 0
-r--------    1 root     root            0 Dec 18 21:48 auxv
-r--r--r--    1 root     root            0 Dec 18 21:34 cmdline
lrwxrwxrwx    1 root     root            0 Dec 18 21:48 cwd
-r--------    1 root     root            0 Dec 18 21:48 environ
lrwxrwxrwx    1 root     root            0 Dec 18 21:48 exe
dr-x------    2 root     root            0 Dec 18 21:48 fd
-r--r--r--    1 root     root            0 Dec 18 21:48 maps
-rw-------    1 root     root            0 Dec 18 21:48 mem
-r--r--r--    1 root     root            0 Dec 18 21:48 mounts
lrwxrwxrwx    1 root     root            0 Dec 18 21:48 root
-r--r--r--    1 root     root            0 Dec 18 21:34 stat
-r--r--r--    1 root     root            0 Dec 18 21:48 statm
-r--r--r--    1 root     root            0 Dec 18 21:34 status
dr-xr-xr-x    3 root     root            0 Dec 18 21:22 task
-r--r--r--    1 root     root            0 Dec 18 21:48 wchan

OK to summarize what I think I've shown: du barfs on /proc/5558 as does ls and this barf confuses the output from du to apparently show /root as the wrong size although it really seems to overwrite it with the best it can manage from /proc

This looks like two bugs to me now.
1. du doesn't output correctly when it has a problem counting
2. /proc is wrong for this process.

I don't know where to go next so as this doesn't seem likely to cause me any lasting damage I'm happy to leave my laptop in this state to try more things.
Comment 6 Tim Yamin (RETIRED) gentoo-dev 2003-12-18 14:18:24 UTC
Hmmm. Can I please have:

$> emerge strace
$> strace -o $some_file $command $command_arguments

[ Try "strace -o some_file du -shx /proc", for example... ]

Can you ** attach ** [ not paste! ] those to this bug please?
Comment 7 george 2003-12-19 00:47:32 UTC
Created attachment 22429 [details]
Trace as requested

# strace -o du.trace du -shx /proc
du: `/proc': No such file or directory
#
Comment 8 Tim Yamin (RETIRED) gentoo-dev 2003-12-28 09:13:15 UTC
Okay, can you try 2.6.0 and see if you still get this - if you do, try an older version of coreutils - if the problem persists, we'll need to file a coreutils bug.
Comment 9 Martin Schlemmer (RETIRED) gentoo-dev 2003-12-28 10:29:56 UTC
Works fine here (also tested with 5.0):

--
nosferatu root # du -shx /*
5.7M    /bin
44K     /boot
96K     /dev
39M     /etc
11G     /home
28M     /lib
16K     /lost+found
520K    /mnt
106M    /opt
899M    /proc
129M    /root
5.4M    /sbin
43G     /space
0       /sys
14M     /tmp
4.7G    /usr
142M    /var
nosferatu root # du -shc /root/
129M    /root
129M    total
nosferatu root # uname -a
Linux nosferatu 2.6.0 #3 SMP Fri Dec 26 14:36:17 SAST 2003 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
nosferatu root #
Comment 10 george 2004-01-01 07:40:30 UTC
I'm now running 2.6.0-gentoo-r1 and I put a check for zombies in cron.hourly.
As soon as a zombie was spotted I got exactly the same symptoms as in comment #5.
The zombie was even netstat again and apart from a different pid I see exactly the same.

The problem is the same with coreutils 5.0.91-r2 and with 5.0-r5.
Comment 11 Seemant Kulleen (RETIRED) gentoo-dev 2004-02-21 16:24:37 UTC
I have no idea why this happens on george's box.
Comment 12 Jason Cox (RETIRED) gentoo-dev 2004-04-09 09:37:53 UTC
Are there any updates to this one? I am also unable to reproduce it here.
Comment 13 george 2004-04-13 04:00:10 UTC
Just tried again and I still see the same as comment #10.
I haven't done any significant updates since then.

If anyone has any ideas that they want me to check then let 
me know.  Otherwise I'm due an update once I can make enough
time to update and test a gash box - I can't afford to wreck
this one and I'm too cowardly/sensible to try significant updates
to gnome and OOo (at least) without testing first.
Comment 14 Jason Cox (RETIRED) gentoo-dev 2004-04-13 23:11:46 UTC
What kernel are you currently running and experiencing the problem on?
Comment 15 george 2004-04-14 12:13:42 UTC
As I said in comment #10, 2.6.0-gentoo-r1 ;)
Comment 16 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-04-16 17:02:12 UTC
That kernel is _very_ old, please upgrade it and see if it still happens on 2.6.5
Comment 17 george 2004-04-19 01:43:48 UTC
I've now updated to 2.6.5-gentoo-r1.  No other updates so we stand a chance of finding this.

Exactly the same within 4 hours of rebooting (check is cron.hourly so I can't be more precise).  The zombie is still netstat and apart from the pid everything is the same as comment #5.

I'm still nowhere near ready for the real upgrade to this laptop so I'm still happy to try updating specific packages or other tests.
Comment 18 Seemant Kulleen (RETIRED) gentoo-dev 2004-05-16 09:33:56 UTC
george, this happens with coreutils-5.2.0 as well? Please try that, while I put 5.2.1 into portage
Comment 19 george 2004-05-18 01:26:50 UTC
Updated to coreutile-5.2.0-r2.  No other changes, no logout/login, or reboot to make sure I kept my zombie.

Output of du -shx /* is now more sensible ....
<snip>
342M    /opt
du: `/proc/7988/task': No such file or directory
du: `/proc/7988/fd': No such file or directory
260M    /proc
116K    /root
<snip>

ls also gives errors for the missing directories above, and also gives errors for the broken links which are still there...
ls -l /proc/7988
ls: cannot read symbolic link /proc/7988/cwd: No such file or directory
ls: cannot read symbolic link /proc/7988/root: No such file or directory
ls: cannot read symbolic link /proc/7988/exe: No such file or directory

I'm not sure if du should report these as errors so I just offer the observation that they also don't show as errors even with du -a.

I'm also left with the question whether a zombie should leave broken links in /proc.  Is this reasonable or broken?

fwiw, the parent of the zombie netstat process is galeon-bin and I'm running 1.3.12.

I'll add another comment if anything changes next time I logout/login or reboot.
Comment 20 Greg Kroah-Hartman (RETIRED) gentoo-dev 2004-06-21 13:07:13 UTC
I'm closing this due to oldness of bug, and the fact that it isn't a kernel
issue.