Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 114817 - NFS client returns input/output error when reading certain directories
Summary: NFS client returns input/output error when reading certain directories
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-07 16:46 UTC by Gabe Martin-Dempesy
Modified: 2005-12-08 09:19 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gabe Martin-Dempesy 2005-12-07 16:46:56 UTC
Gentoo is acting as an NFS client to an OSX server with the following fstab entry:

10.10.10.10:/Volumes/Sack /sack nfs  ro,rsize=1024,hard,intr,tcp,nosuid       00

For certain directories, when an 'ls' is issued, an input/output error is shown instead:
/sack/HCMG/_templates/Images/arthroscopy $ ls
ls: reading directory .: Input/output error

An strace (included below) shows the getdents() returning the EIO error.
Doing a tcpdump while doing the ls shows that all the directory information is indeed being transfered 
across the network

This error does not happen on other non-gentoo clients, including Debian and OSX.

The environment that this occurs with is a fresh 2005.1 installation, with the / directory chmod fixed, 
and the pie+ssp toolchain installed and system rebuilt, and then the entire system updated to current.  
It occurs on both hardened and nonhardened 2.6 kernels.  2.4 was unable to be tested as it lacks the 
driver for my AACRAID card.  It occurs with the NFS version set on the server as v2 or v3, and with the 
client side rsize set anywhere between 1k and 32k.

Sometimes this bug can be temporarily fixed by creating a directory on the server and then deleting it 
-- however, the EIO error returns shortly after that happens.  An event which causes the bug to return 
has not been identified.

No errors related to this are in any kernel or system log.

Reproducible: Sometimes
Steps to Reproduce:
1. /sack/HCMG/_templates $ find . 1>/dev/null

Actual Results:  
find: ./Images/arthroscopy: Input/output error
find: ./Images/Swarm/jpegs: Input/output error
find: ./Images/Swarm/tiffs: Input/output error
find: ./print/Final Copy: Input/output error
find: ./print/Hi-Res Art/print: Input/output error
find: ./print/Hi-Res Art/web: Input/output error
find: ./print/Logo Options: Input/output error
find: ./print/Option DA w:copy: Input/output error
find: ./print/Option DB w:copy: Input/output error
find: ./print/OrderForm/images: Input/output error
find: ./print/~Art/hip/hip illustr: Input/output error
find: ./print/~Art/knee/knee illustration: Input/output error
(etc)


emerge info:
Portage 2.0.51.22-r3 (hardened/x86/2.6, gcc-3.3.6, glibc-2.3.5-r2, 2.6.14-gentoo-r2 i686)
===============================================================
==
System uname: 2.6.14-gentoo-r2 i686 Intel(R) Xeon(TM) CPU 3.00GHz
Gentoo Base System version 1.6.13
dev-lang/python:     2.3.5, 2.4.2
sys-apps/sandbox:    1.2.12
sys-devel/autoconf:  2.13, 2.59-r6
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r1
sys-devel/binutils:  2.16.1
sys-devel/libtool:   1.5.20
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-march=pentium4 -pipe -O2"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/2/share/config /usr/kde/3/share/config /usr/share/config /var/
bind /var/qmail/alias /var/qmail/control"
CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d"
CXXFLAGS="-march=pentium4 -pipe -O2"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.osuosl.org/ http://distro.ibiblio.org/pub/linux/distributions/
gentoo/ http://ftp.ucsb.edu/pub/mirrors/linux/gentoo/ http://gentoo.chem.wisc.edu/gentoo/"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage"
SYNC="rsync://rsync.us.gentoo.org/gentoo-portage"
USE="apache2 bzip2 crypt curl dlloader expat gdbm gpm hardened hardenedphp imagemagick jpeg 
libwww mhash mpm-prefork mysql ncurses nls nptl pam pcre perl pic png python readline sendfile 
sftplogging ssl symlink tcpd udev userlocales utf8 vchroot x86 zlib userland_GNU kernel_linux 
elibc_glibc"
Unset:  ASFLAGS, CTARGET, LANG, LC_ALL, LDFLAGS, LINGUAS


strace:
open(".", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 3
fstat64(3, {st_mode=S_IFDIR|0775, st_size=2380, ...}) = 0
fcntl64(3, F_SETFD, FD_CLOEXEC)         = 0
getdents64(3, 0x80019e1c, 32768)        = -1 EIO (Input/output error)
close(3)                                = 0
Comment 1 Gabe Martin-Dempesy 2005-12-07 16:49:10 UTC
This occurs with both linux-2.6.14-gentoo-r2  linux-2.6.14-hardened-r1 kernels.
Comment 2 Daniel Drake (RETIRED) gentoo-dev 2005-12-08 04:22:13 UTC
Please try and reproduce this on the latest development kernel (currently
vanilla-sources-2.6.15_rc5)
Comment 3 Gabe Martin-Dempesy 2005-12-08 07:52:40 UTC
Bug still occurs on vanilla-sources-2.6.15_rc5
Comment 4 Gabe Martin-Dempesy 2005-12-08 08:01:01 UTC
Other interesting note --

I've just tried to duplicate this behavior on another Gentoo server we have, with the same hardened 
toolchain, profile, and kernel (version, not configuration) as the broken machine.

Both are Dell Power edges, but the one with the buggy behavior is more high end.  The hardware 
differences are:
Broken Machine:
* e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
* Dual Xeon
* Hardware AAC Raid

Working Machine:
* Broadcom Tigon3
* Single Celeron
* SATA hard drives


The problematic machine had this issue in its previous install.  Due to this issue, I reinstalled it from 
scratch (just as I had with the working machine), and still have the same issue.  This makes me lean 
towards a software issue that is related to the hardware.  Can the ethernet drivers play a role in this?
Comment 5 Daniel Drake (RETIRED) gentoo-dev 2005-12-08 08:20:12 UTC
Yeah, maybe.

Do you use the network heavily in other ways? Might be worth testing some FTP
transfers of large files (or similar) to see if it is a 'generic' networking
problem.

Is it possible to temporarily swap the network cards in these two machines for
testing purposes?

You should also check the "dmesg" output after these problems occur if you have
not done so already.
Comment 6 Gabe Martin-Dempesy 2005-12-08 09:03:48 UTC
This is *VERY* interesting.  Ignore anything said previously about this being hardware related -- that 
was incidental.

1> The 2005.1 Gentoo installation CD has this problem off the bat.  If I boot from the CD and issue 
*JUST* these commands, I get the error, on any machine I try:

ifconfig eth0 10.10.10.15
/etc/init.d/portmap start
mkdir /sack
mount -t nfs 10.10.10.10:/Volumes/Sack /sack
find /Volumes/Sack/HCMG 1>/dev/null

So whatever this problem is, its in the Gentoo installer too.

2> Because of this weird behavior, I took the config file from the known working machine (after 
rebooting out of the installer), copied it to the broken machine, flipped on RAIDAAC and my ethernet 
driver, and reinstalled.  It worked.  I did a few builds based on kernel diffs trying to target what to 
change.  Disabling NFS Version 3 client will prevent this behavior from occurring.  This means that the 
issue is most likely related to OSX's 64 bit NFS behavior, which could honestly be the culprit here.  Do 
note that this behavior occurs regardless of the -32bitclients flag for OSX's NFS.

I'm nominating this bug to be INVALID because of this.
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2005-12-08 09:19:58 UTC
Interesting. If you have time, can I suggest that you write a summary of this
and post it to the NFS mailing list? It's a good place for this problem and
workaround to be archived, and may prompt the NFS developers to fix it if they
regard it as a bug (generally kernel developers work towards maximum
compatibility). Just send a mail to nfs@lists.sourceforge.net (no subscription
necessary).