Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 143332 - NFS- can't mount share on 100mbit network with diff MTUs
Summary: NFS- can't mount share on 100mbit network with diff MTUs
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-08-09 05:17 UTC by brankob
Modified: 2006-10-31 08:18 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description brankob 2006-08-09 05:17:28 UTC
I have several 64-bit and several 32-bit machines on the local cable.

Alll machines run Gentoo and are regularly updated.

64-bit fileserver runs hardened selinux (latest stable kernel and most of anything else), while client runs ordinary vanilla Gentoo.

When I try to mount the share from client, it finishes momentarily as it should, but when I try to enter mounted directory, process stops.

CIFS/Samba mount works, however. 

Also, "mount" without parameters lists nfs share as mounted.

dmesg on the client says:

nfs: server 192.168.1.1 not responding, still trying

nfsd on the server makes no entry in the dmesg buffer about the bad access attempt...
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2006-08-09 05:45:23 UTC
nfs-utils version? emerge --info? 
Comment 2 brankob 2006-08-09 06:14:03 UTC
uname -a on client:

Linux pixna4 2.6.17-gentoo-r4 #1 PREEMPT Mon Aug 7 03:04:19 Local time zone must be set--see z i686 Intel(R) Pentium(R) 4 CPU 1.60GHz GNU/Linux

uname -a on server :

Linux streznik 2.6.16-hardened-r11 #1 SMP Thu Jul 27 22:22:29 CEST 2006 x86_64 AMD Opteron(tm) Processor 240 GNU/Linux

emerge --info on client:

Portage 2.1-r1 (default-linux/x86/2006.0, gcc-4.1.1/vanilla, glibc-2.3.6-r4, 2.6.17-gentoo-r4 i686)
=================================================================
System uname: 2.6.17-gentoo-r4 i686 Intel(R) Pentium(R) 4 CPU 1.60GHz
Gentoo Base System version 1.6.15
app-admin/eselect-compiler: 2.0.0_rc2-r1
dev-lang/python:     2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=pentium4 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.2/share/config /usr/kde/3.3/env /usr/kde/3.3/share/config /usr/kde/3.3/shutdown /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/"
CONFIG_PROTECT_MASK="/etc/env.d /etc/eselect/compiler /etc/gconf /etc/terminfo"
CXXFLAGS="-O2 -march=pentium4 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoclean autoconfig buildpkg distlocks fixpackages metadata-transfer nostrip sandbox sfperms strict"
GENTOO_MIRRORS="http://gentoo.osuosl.org/"
LANG="sl_SI.utf8"
LC_ALL="sl_SI.utf8"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="x86 7zip X Xaw3d a52 aac acl acpi alsa apache2 apm arts avi bash-completion berkdb binfilter bitmap-fonts bzip2 bzlib cairo caps cdr cli cpudetection crypt cups debug directfb divx4linux dlloader doc dri dts dv dvb dvd dvdr dvdread eds emboss encode esd exif fame fbcon firefox flac foomaticdb fortran ftp gdbm gif glitz gnome gphoto2 gpm gstreamer gtk gtk2 hal ieee1394 imagemagick imlib ipv6 isdnlog jack java jpeg jpeg2k kde ldap libg++ libwww lm-sensors logitech-mouse lzo mad mikmod mjpeg mmx motif mp3 mpeg mysql ncurses network nls nntp nptl nvidia ogg oggvorbis openal opengl oss pam pcre pdf pdflib perl php png postgres povray pppd python qt qt3 qt4 quicktime readline reflection rtc samba scanner sdl session sndfile spell spl sse sse2 ssl svg sysfs tcltk tcpd theora threads tiff truetype truetype-fonts type1-fonts udev unicode usb v4l v4l2 vcd vorbis wifi win32codecs wmf xine xinerama xml xmms xorg xosd xpm xprint xv xvid xvmc zlib elibc_glibc input_devices_keyboard input_devices_mouse input_devices_evdev kernel_linux userland_GNU"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY


emerge --info on server:

Portage 2.1-r2 (selinux/2005.1/amd64, gcc-3.4.6, glibc-2.3.6-r4, 2.6.16-hardened-r11 x86_64)
=================================================================
System uname: 2.6.16-hardened-r11 x86_64 AMD Opteron(tm) Processor 240
Gentoo Base System version 1.6.15
app-admin/eselect-compiler: [Not Present]
dev-lang/python:     2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.15.92.0.2-r10, 2.16.1-r3
sys-devel/gcc-config: 1.3.13-r3
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r2
ACCEPT_KEYWORDS="amd64"
AUTOCLEAN="yes"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=opteron -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.4/env /usr/kde/3.4/share/config /usr/kde/3.4/shutdown /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/tex/platex/config/ /usr/share/texmf/xdvi/ /var/bind"
CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-march=opteron -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig buildpkg candy distlocks fixpackages loadpolicy metadata-transfer nostrip sandbox selinux sfperms strict"
GENTOO_MIRRORS="http://gentoo.ynet.sk/pub http://mirror.switch.ch/ftp/mirror/gentoo/ http://gentoo.inode.at/ http://ftp.romnet.org/gentoo/ http://pandemonium.tiscali.de/pub/gentoo/"
LANG="sl_SI.utf8"
LC_ALL="sl_SI.utf8"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages/opteron_glibc_235"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude='/distfiles' --exclude='/local' --exclude='/packages'"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local"
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="alsa amd64 apache apache2 berkdb bzip2 crypt cups curl doc emul-linux-x86 hardened ipv6 lm_sensors logrotate multislot mysql ncurses nis nls nptl pam postgres profile python quotas readline samba selinux slp ssl threads unicode vhosts xinetd xml xml2 zip zlib elibc_glibc input_devices_keyboard input_devices_mouse kernel_linux userland_GNU"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LDFLAGS, LINGUAS, PORTAGE_RSYNC_EXTRA_OPTS

emerge -pv nfs-utils on client:

[ebuild   R   ] net-fs/nfs-utils-1.0.9  USE="tcpd -kerberos -nonfsv4" 0 kB


emerge -pv nfs-utils on server:

[ebuild   R   ] net-fs/nfs-utils-1.0.9  USE="-kerberos -nonfsv4 -tcpd"











 















Comment 3 SpanKY gentoo-dev 2006-08-09 09:12:09 UTC
and how are you mounting ?  udp or tcp ?  nfs2 or nfs3 ?  do you have iptables rules in place ?
Comment 4 brankob 2006-08-09 11:11:15 UTC
(In reply to comment #3)
> and how are you mounting ?  udp or tcp ?  nfs2 or nfs3 ?  do you have iptables
> rules in place ?
> 

I think I have found the cause of failure, at least for this client.

But to answer your questions first:

I am mounting nfs3 through udp. No firewall rules prevent the mount, since:

1. There are no firewall rules, preventing any trafic on that net

2. Other clients (64-bit) can mount shares without a problem- on the same net.

Share is also exported to the whole net 192.168.1.0/24, where both server (with one of the interfaces- it is also a router) and client reside.

The cause of failure was default rsize and wsize: 4096.

This client is only one on that net with 100 Mbit Etehrernet with MTU of 1500. 

Rest of the machines have MTU 9000. 

While machines can negotiate about maximal size of the frame, it seems that my server uses default w,rsize of more than 1400+ (probably 4096), which is more than MTU of the card (1500). 

When I lower the r,wsize to 1024, it works. I don't have book about UDP/IP protocol at hand and don't know if it is supposed to work like that.

Anyway, I have a few 32-bit clients with 1Gbit Ethernet cards with bigger MTU (7200) and remember that I couldn't make nfs client on them to work, even with lower rsize/wsize. Just as current machine finishes its "emerge -uD" (sometime today), I'll plug in other client and try again...

So in short, it works, but I'm not sure that it's suppossed to work this way.















Comment 5 brankob 2006-08-10 01:09:14 UTC
O.K., now I have triede with another, similar client (Sempron XT 2700) but this time with 1 Gb Ethernet card (R8169 chip), working with MTU up to 7200.

Results are similar, if not the same.

No matter how I set up MTU, I can't use rsize or wsize, larger than 2047.

Funny thing is, rsize and wsize can be larger than MTU of the card, as long as they are smaller than 2048, things work. 

Then again, it might be something with the card driver itself...

I didn't have the oportunity to look at the packets on the wire.

Will do that tonight. 


Comment 6 David Fellows 2006-08-22 12:34:05 UTC
I have run into a similar situation. I have a opteron server and an x86 client.
The directory is exported rw and mounted rw. rsizse and wsize both 8192. This setup was working for years with mtu on both set to 1492.

It recently stopped working.  The symptom being that the client could read files
from the server with no problem. However, it could only write small files. Small in this case being 1324 bytes or less.  1325 would hang the client.  During the time the client was hung pairs of log messages 
[kernel] lockd: cannot monitor 192.168.0.2
[kernel] lockd: failed to monitor 192.168.0.2
would be written about every 20-30 seconds. 192.168.0.2 is the server address.
Also, every few minutes there was a 
[kernel] nfs: server x.y.z not responding, still trying
message.

I believed that both client and server had an mtu of 1492 - that was what was set in their config files.  After reading this bug report I checked with ifconfig and the server's mtu was 1492.  The client's mtu was 1500. 

I set the server mtu to 1500 and was immediately able to write large files from the client.

client kernel 2.6.14-gentoo-r5
server kernel 2.6.14-gentoo-r5 SMP (2 processors)
server nfs-utils-1.0.6-r6
Comment 7 Daniel Drake (RETIRED) gentoo-dev 2006-08-27 09:07:01 UTC
gentoo-sources-2.6.17-r6 includes a NFS stall fix. Does it help?
Comment 8 David Fellows 2006-08-31 18:30:30 UTC
(In reply to comment #7)
> gentoo-sources-2.6.17-r6 includes a NFS stall fix. Does it help?
> 

No and yes.

I built 2.6.17-r7 kernels for both the client and server and ran some tests.

With client mtu <= server mtu there is no problem with any combination of client kernel and server kernel.

With client mtu > server mtu:

With client 2.6.14 kernel  client process hangs when file written is larger than
the server mtu - 168 bytes (approx).  This happens with either server kernel.
The client writing process cannot be killed. You cannot even do a clean client shutdown because of this.

With client 2.6.17 kernel the client process hangs under the same circumstances.
However, the client writing process can be killed with kill -9.  So that is an improvement.

For what it is worth, when the client process hangs the server has created the file being written, but it has a length of 0. (all kernel combinations)
Comment 9 Daniel Drake (RETIRED) gentoo-dev 2006-09-02 07:23:59 UTC
Ok, a small improvement. Can you test with the latest development kernel (currently 2.6.18-rc5) running on client and server?
Comment 10 David Fellows 2006-09-03 06:36:19 UTC
(In reply to comment #9)
> Ok, a small improvement. Can you test with the latest development kernel
> (currently 2.6.18-rc5) running on client and server?
> 

I built vanilla-sources-2.6.18-rc5.
Installed on both.
Seems  to work fine with default mtus (1500) on both client and server.
Set server mtu to 1492 
Same symptoms as 2.6.17.
Client hangs when writing a file longer than the server mtu(minus a bit). 
Can be killed.
Short files work OK, as before.
Comment 11 Bret Towe 2006-09-03 19:30:44 UTC
has proto=tcp been tried?
this sounds alot like an issue I had with nfs with gigabit servers
and 100bT clients upstreams knows about it but its a flaw with
udp so it wont be fixed work around if it is is ether using proto=tcp
or making sure all servers and clients are gigabit or 100bT 
or even all 10bT no mixture there of
Comment 12 Daniel Drake (RETIRED) gentoo-dev 2006-09-16 20:07:00 UTC
Bump.. Does Bret's suggestion make any difference?
Comment 13 David Fellows 2006-09-23 05:23:43 UTC
Sorry for delay. Been busy with  other things including switching to gcc 4.1.1 and profile 2006.1 on the client machine.

The two machines are both using 10/100 mbit ethernet cards via a 10/100 D-Link router.  From observed data rates they appear to have negotiated 100 mbits/sec.

I will try proto=tcp when I get a chance.  It wouldn't surprise me that it is a bug related to "tcp over udp".

My perfectly satisfactory work around is to make sure the MTU size is set the same on both client and server.
Comment 14 Daniel Drake (RETIRED) gentoo-dev 2006-10-15 16:46:58 UTC
David, any news?

Bret, I'm interested in what you wrote in comment #11. Was this discussed on a public mailing list or anything like that?
Comment 15 Bret Towe 2006-10-15 18:16:39 UTC
(In reply to comment #14)
> David, any news?
> 
> Bret, I'm interested in what you wrote in comment #11. Was this discussed on a
> public mailing list or anything like that?
> 

yes I'm pretty sure it was on lkml
Comment 16 David Fellows 2006-10-16 17:26:31 UTC
I'm not sure this is what you want, but this is the latest test I ran:
Client Info:
uname -a
Linux jo 2.6.17-gentoo-r7 #2 PREEMPT Sat Sep 23 22:27:56 ADT 2006 i686 Pentium III (Katmai) GNU/Linux

Server Info:
uname -a
Linux kanga 2.6.17-gentoo-r7 #2 SMP PREEMPT Thu Aug 31 14:19:57 ADT 2006 x86_64 AMD Opteron(tm) Processor 246 GNU/Linux
equery belongs /usr/sbin/rpc.nfsd
[ Searching for file(s) /usr/sbin/rpc.nfsd in *... ]
net-fs/nfs-utils-1.0.6-r6 (/usr/sbin/rpc.nfsd)

The client is x86 and mounts /usr/portage from the server (amd64)
On the client
umount /usr/portage,
edit fstab to:
kanga.pooh.corner:/usr/portage  /usr/portage    nfs rw,rsize=8192,wsize=8192,auto,_netdev,intr,tcp   0       0

verify that mtu is 1500

On the server:
set mtu=1492 in /etc/conf.d/net
/etc/init.d/net.eth0 stop
/etc/init.d/net.eth0 start
/etc/init.d/nfs start

On the client:
mount /usr/portage
ls -l /usr/portage

ls hung after the entry app-pda (about 24 lines). More precisely bash that was running ls hung.
Killing bash did not work. I was logged into the client using ssh and closing the xterm window did make averything go away.

Ths is worse behaviour than with UDP which could read  from the server so I did
not try to do write experiments.

I then repeated the sequence putting the server mtu back to 1500. 
ls -l portage then behaved normally ending with xfce-extra.
I did not do a write experiment.
Comment 17 Daniel Drake (RETIRED) gentoo-dev 2006-10-16 17:54:55 UTC
I'm not certain, but I'm unsure that your fstab change took any effect. I think you might have to use proto=tcp rather than just tcp.
Comment 18 Daniel Drake (RETIRED) gentoo-dev 2006-10-31 08:18:58 UTC
Please reopen after retesting