If a process accesses a lot of files over an NFS mounted filesystem (in my case rsync), the process can hang in a perpetual "disk sleep" state infinitely. The process can not be terminated with kill -9. The system reports a minimum load average of 1.00. The NFS mounted filesystem can not be cleanly unmounted, and the system must be reboot uncleanly. Reproducible: Always Steps to Reproduce: 1. Mount a filesystem with hundreds of thousands of files 2. rsync copy these files elsewhere. 3. rsync process will deadlock until the system is rebooted Actual Results: process is not killable with signal 9, load average is 1.00, filesystem can not be unmounted. kernel version 2.6.11-gentoo-r8, was not a problem in all versions of gentoo-sources 2.4 kernels that I ran. NFS mount from /proc/$pid/mounts: rover:/home/server/public /mnt/public nfs ro,v3,rsize=8192,wsize=32768,hard,udp,lock,addr=rover 0 0 Portage 2.0.51.19 (default-linux/x86/2005.0, gcc-3.3.5-20050130, glibc-2.3.4.20041102-r1, 2.6.11-gentoo-r8 i686) =============================================================== == System uname: 2.6.11-gentoo-r8 i686 Pentium III (Coppermine) Gentoo Base System version 1.4.16 Python: dev-lang/python-2.3.5,dev-lang/python-2.2.3-r5 [2.3.5 (#1, May 3 2005, 01:35:55)] distcc 2.16 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] dev-lang/python: 2.3.5, 2.2.3-r5 sys-apps/sandbox: [Not Present] sys-devel/autoconf: 2.13, 2.59-r6 sys-devel/automake: 1.7.9-r1, 1.5, 1.9.5, 1.4_p6, 1.6.3, 1.8.5-r3 sys-devel/binutils: 2.15.92.0.2-r7 sys-devel/libtool: 1.5.16 virtual/os-headers: 2.4.19-r1, 2.6.8.1-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-O3 -mcpu=pentium3 -funroll-loops -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/ kde/3.3/shutdown /usr/kde/3/share/config /usr/lib/X11/xkb /usr/share/config /var/bind /var/ qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-O3 -mcpu=pentium3 -funroll-loops -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs autoconfig ccache distcc distlocks sandbox sfperms strict" GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/Linux/distributions/ gentoo" MAKEOPTS="-j6" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 alsa apache2 apm arts avi berkdb bitmap-fonts crypt cups curl emboss encode foomaticdb fortran gd gif gpm imap imlib ipv6 jpeg libg++ libwww mad mbox mcal mikmod milter motif mp3 mpeg mysql ncurses nls ogg oggvorbis opengl oss pam pdflib perl png python quicktime readline sasl sdl session slang spell ssl svga tcltk tcpd tiff truetype truetype-fonts type1-fonts vorbis xml2 xmms xv zlib userland_GNU kernel_linux elibc_glibc" Unset: ASFLAGS, CBUILD, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTDIR_OVERLAY
Is this reproducable in vanilla-sources-2.6.12_rc4 ?
It isn't reproducible in vanilla 2.6.11.7 (which I assume is the non gentoo version of the gentoo kernel giving me this problem). Is this sufficient information or would it really help to run a release candidate of the vanilla sources? I'm hesitant to do that because this is a production server in a DMZ.
Don't worry about testing 2.6.12-rc then. I'm thinking this might be related to bug 87403. I'll ask around to see if anyone has ideas what might be causing this.
I just reproduced the problem in vanilla 2.6.11.7. It is significantly harder to reproduce in the vanilla kernel, but is probably not gentoo specific after all then.
Please try and reproduce in vanilla-sources-2.6.12_rc4
(In reply to comment #5) > Please try and reproduce in vanilla-sources-2.6.12_rc4 SCSI performance in this kernel is atrocious. For starters, it won't properly negotiate full speed with my U160 devices. Too painful to run on a production server.
2.6.12 is out now. Any chance you could try and reproduce there?
Just installed it. I will have to let it run for a few days before I can say for certain. I will update this bug with what happens by the end of this week.
So far it is behaving itself!
Great, please reopen if it happens again.