I have a network booting system that is entirely NFS based. In order to run vmware I had to install nscd (otherwise i'm happy to live without nscd). Doing so caused problems. What I traced it down to was that the /var/run/.nscd_socket unix domain socket that nscd creates on startup was being created with mode 0600 (or 0644?) followed by a chmod call to set it to the appropriate 0666 mode. When /var/run was on an NFS v3,tcp file system using strace I found that the chmod call was failing so the mode was left 0644. This was the worst possible outcome because glibc would see the .nscd_socket and try to use it, but only root owned processed would succeed and be able to do any name lookups. (system config: kernel 2.4.20-gentoo-r7 on this client; my server is an alpha running 2.4.23-pre1 with 10 nfs patches from sourceforge to allow for 32k read/writes and some bug fixes) I fixed it by deciding that /var/run only ever contained transient temporary files relevant on a single boot of the system so I mounted /var/run as tmpfs and all has been well since. Is there any reason everyone's /var/run should not be tmpfs? I don't see an NFS setattr call on the network when i do the chown (though it does do a lookup on the file). Here's a simple way to test this yourself: % python >>> import socket >>> s = socket.socket(socket.AF_UNIX) >>> s.bind('/path/on/nfs/filesystem/socketfile') >>> ^Z [1]+ Stopped % ls -al /path/on/nfs/filesystem/socketfile srwxr-xr-x 1 greg users 0 Nov 14 12:26 socketfile= % chmod 0666 /path/on/nfs/filesystem/socketfile % ls -al /path/on/nfs/filesystem/socketfile srwxr-xr-x 1 greg users 0 Nov 14 12:26 socketfile= Do that on a non-NFS filesystem and it works as expected. This sounds like a linux kernel nfs client bug or possibly a glibc bug? (glibc 2.3.2 here)
We do not want to depend on tmpfs. If its a kernel bug, then let it be fixed.
Can we have some kernel info please. Can you also attach your .config to this bug? Thanks.
WRXsti src # cp -a /tmp/ksocket-root/kdeinit-\:0 /usr/portage/test.sock WRXsti src # ls -lah /usr/portage/test.sock srw------- 1 root root 0 Nov 10 15:29 /usr/portage/test.sock WRXsti src # chmod 777 /usr/portage/test.sock WRXsti src # ls -lah /usr/portage/test.sock srwxrwxrwx 1 root root 0 Nov 10 15:29 /usr/portage/test.sock WRXsti src # mount | grep portage 192.168.0.1:/usr/portage on /usr/portage type nfs (rw,noatime,addr=192.168.0.1) WRXsti src # uname -a Linux WRXsti 2.4.22-gentoo-test-r1 #2 Mon Nov 10 14:50:35 CST 2003 i686 AMD Athlon(tm) XP 2600+ AuthenticAMD GNU/Linux WRXsti src # Seems to work fine over here, maybe you can try a different kernel and let us know if it's a problem with gentoo-sources-2.4.20?
Kernel on the gentoo client is 2.4.20-gentoo-r7. Kernel parameters (via pxelinux): root=/dev/nfs nfsroot=192.168.2.200:/home/nfsroot,v3,rw,posix,rsize=8192,wsize=8192,actimeo=300 ip=bootp /etc/fstab: 192.168.2.200:/home/nfsroot / nfs rw,nfsvers=3,tcp,lock,intr,posix,actimeo=300,rsize=32768,wsize=32768 192.168.2.200:/home/nfsroot/usr /usr nfs rw,nfsvers=3,tcp,lock,intr,posix,actimeo=300,rsize=32768,wsize=32768 192.168.2.200:/home/nfsroot/home /home nfs rw,nfsvers=3,tcp,lock,intr,posix,actimeo=300,rsize=32768,wsize=32768 #/dev/SWAP none swap sw 0 0 /dev/cdroms/cdrom0 /mnt/cdrom iso9660 noauto,ro 0 0 # NOTE: The next line is critical for boot! none /proc proc defaults 0 0 none /proc/bus/usb usbdevfs defaults 0 0 # glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for # POSIX shared memory (shm_open, shm_unlink). none /dev/shm tmpfs defaults 0 0 none /tmp tmpfs defaults 0 0 # /var/run contains state only useful to a booted system # i was having trouble with permissions on /var/run/.nscd_socket not # being set to 0666 as nscd chmod'ed it to. hopefully this fixes that. none /var/run tmpfs defaults 0 0 i'll attach the kernel config file.
Created attachment 20790 [details] kernel config on the nfsroot client
I'm compiling a 2.4.22-gentoo-sources kernel now and will let you know how it goes.
2.4.22-gentoo-r5 does have this bug.
but 2.4.22-gentoo (which no longer seems to be in portage) does not have this bug.
How about anything newer? Does the error still exist?
yay. 2.4.25-gentoo does not have this bug. i haven't tried anything 2.6 on this machine; i'll reopen this bug or file a new one for any 2.6 nfs issues.