I have several 64-bit and several 32-bit machines on the local cable. Alll machines run Gentoo and are regularly updated. 64-bit fileserver runs hardened selinux (latest stable kernel and most of anything else), while client runs ordinary vanilla Gentoo. When I try to mount the share from client, it finishes momentarily as it should, but when I try to enter mounted directory, process stops. CIFS/Samba mount works, however. Also, "mount" without parameters lists nfs share as mounted. dmesg on the client says: nfs: server 192.168.1.1 not responding, still trying nfsd on the server makes no entry in the dmesg buffer about the bad access attempt...
nfs-utils version? emerge --info?
uname -a on client: Linux pixna4 2.6.17-gentoo-r4 #1 PREEMPT Mon Aug 7 03:04:19 Local time zone must be set--see z i686 Intel(R) Pentium(R) 4 CPU 1.60GHz GNU/Linux uname -a on server : Linux streznik 2.6.16-hardened-r11 #1 SMP Thu Jul 27 22:22:29 CEST 2006 x86_64 AMD Opteron(tm) Processor 240 GNU/Linux emerge --info on client: Portage 2.1-r1 (default-linux/x86/2006.0, gcc-4.1.1/vanilla, glibc-2.3.6-r4, 2.6.17-gentoo-r4 i686) ================================================================= System uname: 2.6.17-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 1.60GHz Gentoo Base System version 1.6.15 app-admin/eselect-compiler: 2.0.0_rc2-r1 dev-lang/python: 2.3.5-r2, 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: [Not Present] dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=pentium4 -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/" CONFIG_PROTECT_MASK="/etc/env.d /etc/eselect/compiler /etc/gconf /etc/terminfo" CXXFLAGS="-O2 -march=pentium4 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoclean autoconfig buildpkg distlocks fixpackages metadata-transfer nostrip sandbox sfperms strict" GENTOO_MIRRORS="http://gentoo.osuosl.org/" LANG="sl_SI.utf8" LC_ALL="sl_SI.utf8" MAKEOPTS="-j2" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="x86 7zip X Xaw3d a52 aac acl acpi alsa apache2 apm arts avi bash-completion berkdb binfilter bitmap-fonts bzip2 bzlib cairo caps cdr cli cpudetection crypt cups debug directfb divx4linux dlloader doc dri dts dv dvb dvd dvdr dvdread eds emboss encode esd exif fame fbcon firefox flac foomaticdb fortran ftp gdbm gif glitz gnome gphoto2 gpm gstreamer gtk gtk2 hal ieee1394 imagemagick imlib ipv6 isdnlog jack java jpeg jpeg2k kde ldap libg++ libwww lm-sensors logitech-mouse lzo mad mikmod mjpeg mmx motif mp3 mpeg mysql ncurses network nls nntp nptl nvidia ogg oggvorbis openal opengl oss pam pcre pdf pdflib perl php png postgres povray pppd python qt qt3 qt4 quicktime readline reflection rtc samba scanner sdl session sndfile spell spl sse sse2 ssl svg sysfs tcltk tcpd theora threads tiff truetype truetype-fonts type1-fonts udev unicode usb v4l v4l2 vcd vorbis wifi win32codecs wmf xine xinerama xml xmms xorg xosd xpm xprint xv xvid xvmc zlib elibc_glibc input_devices_keyboard input_devices_mouse input_devices_evdev kernel_linux userland_GNU" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY emerge --info on server: Portage 2.1-r2 (selinux/2005.1/amd64, gcc-3.4.6, glibc-2.3.6-r4, 2.6.16-hardened-r11 x86_64) ================================================================= System uname: 2.6.16-hardened-r11 x86_64 AMD Opteron(tm) Processor 240 Gentoo Base System version 1.6.15 app-admin/eselect-compiler: [Not Present] dev-lang/python: 2.3.5-r2, 2.4.3-r1 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: [Not Present] dev-util/confcache: [Not Present] sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.59-r7 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2 sys-devel/binutils: 2.15.92.0.2-r10, 2.16.1-r3 sys-devel/gcc-config: 1.3.13-r3 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.11-r2 ACCEPT_KEYWORDS="amd64" AUTOCLEAN="yes" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=opteron -O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/bind" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-march=opteron -O2 -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoconfig buildpkg candy distlocks fixpackages loadpolicy metadata-transfer nostrip sandbox selinux sfperms strict" GENTOO_MIRRORS="http://gentoo.ynet.sk/pub http://mirror.switch.ch/ftp/mirror/gentoo/ http://gentoo.inode.at/ http://ftp.romnet.org/gentoo/ http://pandemonium.tiscali.de/pub/gentoo/" LANG="sl_SI.utf8" LC_ALL="sl_SI.utf8" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages/opteron_glibc_235" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/portage/local" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="alsa amd64 apache apache2 berkdb bzip2 crypt cups curl doc emul-linux-x86 hardened ipv6 lm_sensors logrotate multislot mysql ncurses nis nls nptl pam postgres profile python quotas readline samba selinux slp ssl threads unicode vhosts xinetd xml xml2 zip zlib elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux userland_GNU" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS emerge -pv nfs-utils on client: [ebuild R ] net-fs/nfs-utils-1.0.9 USE="tcpd -kerberos -nonfsv4" 0 kB emerge -pv nfs-utils on server: [ebuild R ] net-fs/nfs-utils-1.0.9 USE="-kerberos -nonfsv4 -tcpd"
and how are you mounting ? udp or tcp ? nfs2 or nfs3 ? do you have iptables rules in place ?
(In reply to comment #3) > and how are you mounting ? udp or tcp ? nfs2 or nfs3 ? do you have iptables > rules in place ? > I think I have found the cause of failure, at least for this client. But to answer your questions first: I am mounting nfs3 through udp. No firewall rules prevent the mount, since: 1. There are no firewall rules, preventing any trafic on that net 2. Other clients (64-bit) can mount shares without a problem- on the same net. Share is also exported to the whole net 192.168.1.0/24, where both server (with one of the interfaces- it is also a router) and client reside. The cause of failure was default rsize and wsize: 4096. This client is only one on that net with 100 Mbit Etehrernet with MTU of 1500. Rest of the machines have MTU 9000. While machines can negotiate about maximal size of the frame, it seems that my server uses default w,rsize of more than 1400+ (probably 4096), which is more than MTU of the card (1500). When I lower the r,wsize to 1024, it works. I don't have book about UDP/IP protocol at hand and don't know if it is supposed to work like that. Anyway, I have a few 32-bit clients with 1Gbit Ethernet cards with bigger MTU (7200) and remember that I couldn't make nfs client on them to work, even with lower rsize/wsize. Just as current machine finishes its "emerge -uD" (sometime today), I'll plug in other client and try again... So in short, it works, but I'm not sure that it's suppossed to work this way.
O.K., now I have triede with another, similar client (Sempron XT 2700) but this time with 1 Gb Ethernet card (R8169 chip), working with MTU up to 7200. Results are similar, if not the same. No matter how I set up MTU, I can't use rsize or wsize, larger than 2047. Funny thing is, rsize and wsize can be larger than MTU of the card, as long as they are smaller than 2048, things work. Then again, it might be something with the card driver itself... I didn't have the oportunity to look at the packets on the wire. Will do that tonight.
I have run into a similar situation. I have a opteron server and an x86 client. The directory is exported rw and mounted rw. rsizse and wsize both 8192. This setup was working for years with mtu on both set to 1492. It recently stopped working. The symptom being that the client could read files from the server with no problem. However, it could only write small files. Small in this case being 1324 bytes or less. 1325 would hang the client. During the time the client was hung pairs of log messages [kernel] lockd: cannot monitor 192.168.0.2 [kernel] lockd: failed to monitor 192.168.0.2 would be written about every 20-30 seconds. 192.168.0.2 is the server address. Also, every few minutes there was a [kernel] nfs: server x.y.z not responding, still trying message. I believed that both client and server had an mtu of 1492 - that was what was set in their config files. After reading this bug report I checked with ifconfig and the server's mtu was 1492. The client's mtu was 1500. I set the server mtu to 1500 and was immediately able to write large files from the client. client kernel 2.6.14-gentoo-r5 server kernel 2.6.14-gentoo-r5 SMP (2 processors) server nfs-utils-1.0.6-r6
gentoo-sources-2.6.17-r6 includes a NFS stall fix. Does it help?
(In reply to comment #7) > gentoo-sources-2.6.17-r6 includes a NFS stall fix. Does it help? > No and yes. I built 2.6.17-r7 kernels for both the client and server and ran some tests. With client mtu <= server mtu there is no problem with any combination of client kernel and server kernel. With client mtu > server mtu: With client 2.6.14 kernel client process hangs when file written is larger than the server mtu - 168 bytes (approx). This happens with either server kernel. The client writing process cannot be killed. You cannot even do a clean client shutdown because of this. With client 2.6.17 kernel the client process hangs under the same circumstances. However, the client writing process can be killed with kill -9. So that is an improvement. For what it is worth, when the client process hangs the server has created the file being written, but it has a length of 0. (all kernel combinations)
Ok, a small improvement. Can you test with the latest development kernel (currently 2.6.18-rc5) running on client and server?
(In reply to comment #9) > Ok, a small improvement. Can you test with the latest development kernel > (currently 2.6.18-rc5) running on client and server? > I built vanilla-sources-2.6.18-rc5. Installed on both. Seems to work fine with default mtus (1500) on both client and server. Set server mtu to 1492 Same symptoms as 2.6.17. Client hangs when writing a file longer than the server mtu(minus a bit). Can be killed. Short files work OK, as before.
has proto=tcp been tried? this sounds alot like an issue I had with nfs with gigabit servers and 100bT clients upstreams knows about it but its a flaw with udp so it wont be fixed work around if it is is ether using proto=tcp or making sure all servers and clients are gigabit or 100bT or even all 10bT no mixture there of
Bump.. Does Bret's suggestion make any difference?
Sorry for delay. Been busy with other things including switching to gcc 4.1.1 and profile 2006.1 on the client machine. The two machines are both using 10/100 mbit ethernet cards via a 10/100 D-Link router. From observed data rates they appear to have negotiated 100 mbits/sec. I will try proto=tcp when I get a chance. It wouldn't surprise me that it is a bug related to "tcp over udp". My perfectly satisfactory work around is to make sure the MTU size is set the same on both client and server.
David, any news? Bret, I'm interested in what you wrote in comment #11. Was this discussed on a public mailing list or anything like that?
(In reply to comment #14) > David, any news? > > Bret, I'm interested in what you wrote in comment #11. Was this discussed on a > public mailing list or anything like that? > yes I'm pretty sure it was on lkml
I'm not sure this is what you want, but this is the latest test I ran: Client Info: uname -a Linux jo 2.6.17-gentoo-r7 #2 PREEMPT Sat Sep 23 22:27:56 ADT 2006 i686 Pentium III (Katmai) GNU/Linux Server Info: uname -a Linux kanga 2.6.17-gentoo-r7 #2 SMP PREEMPT Thu Aug 31 14:19:57 ADT 2006 x86_64 AMD Opteron(tm) Processor 246 GNU/Linux equery belongs /usr/sbin/rpc.nfsd [ Searching for file(s) /usr/sbin/rpc.nfsd in *... ] net-fs/nfs-utils-1.0.6-r6 (/usr/sbin/rpc.nfsd) The client is x86 and mounts /usr/portage from the server (amd64) On the client umount /usr/portage, edit fstab to: kanga.pooh.corner:/usr/portage /usr/portage nfs rw,rsize=8192,wsize=8192,auto,_netdev,intr,tcp 0 0 verify that mtu is 1500 On the server: set mtu=1492 in /etc/conf.d/net /etc/init.d/net.eth0 stop /etc/init.d/net.eth0 start /etc/init.d/nfs start On the client: mount /usr/portage ls -l /usr/portage ls hung after the entry app-pda (about 24 lines). More precisely bash that was running ls hung. Killing bash did not work. I was logged into the client using ssh and closing the xterm window did make averything go away. Ths is worse behaviour than with UDP which could read from the server so I did not try to do write experiments. I then repeated the sequence putting the server mtu back to 1500. ls -l portage then behaved normally ending with xfce-extra. I did not do a write experiment.
I'm not certain, but I'm unsure that your fstab change took any effect. I think you might have to use proto=tcp rather than just tcp.
Please reopen after retesting