198601 – sys/apps-baselayout-1: root filesystem won't remount read/write under nfs

Bug 198601 - sys/apps-baselayout-1: root filesystem won't remount read/write under nfs

Summary: sys/apps-baselayout-1: root filesystem won't remount read/write under nfs

Status:	RESOLVED WONTFIX

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] baselayout (show other bugs)
Hardware:	All Linux

Importance:	High major
Assignee:	Gentoo's Team for Core System packages

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2007-11-09 21:48 UTC by Richard F. Ostrow Jr.
Modified:	2016-01-08 03:25 UTC (History)
CC List:	11 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Richard F. Ostrow Jr. 2007-11-09 21:48:15 UTC

Root filesystem won't remount in read/write mode under nfs (nfs root filesystem). The reasoning behind this appears to be a failure to run nfs.statd, which is normally started in the nfs or nfsmount init.d scripts (both of which I have at the boot runlevel).

Reproducible: Always

Steps to Reproduce:
1. Set up a diskless system
2. Boot

Actual Results:  
Root filesystem fails to mount in read/write mode

Expected Results:  
Root filesystem should boot

Digging around in the read-only filesystem, I manually ran:

mount -o remount,rw /

which produced this:

mount.nfs: rpc.statd is not running but is required for remote locking 
    Either use "-o nolocks" to keep locks local, or start statd

Which tells me that rpc.statd is no longer starting early enough to mount the root filesystem in read/write mode. This system has been running perfectly for several months until I updated it today.

I was able to get the system running normally by using the -nolock option in /etc/fstab, booting, removing the -nolock option (after init loaded rpc.statd), and running "mount -o remount,rw /"... which will keep me running correctly until I reboot, at which time it won't boot.

Comment 1 Richard F. Ostrow Jr. 2007-11-16 15:48:12 UTC

I feel I should add some more info.

Contents of /etc/runlevels/boot:

alsasound  checkfs    clock        hostname  localmount  net.lo    rmnologin
bootmisc   checkroot  consolefont  keymaps   modules     nfsmount  urandom

Contents of /etc/runlevels/default:

local  netmount  rngd  sshd  syslog-ng  vixie-cron

/etc/runlevels/nonetwork:

local

/etc/runlevels/single:

*empty*

All these scripts contain the default values as of coreutils-6.9-r1 and baselayout-1.12.9-r2

Contents of /etc/fstab:

gorgon:/usr/diskless/phoenixtmp /       nfs             rsize=8192,wsize=8192,rw,tcp,noatime     0 0
none                    /var/run        tmpfs           defaults        0 0
none                    /tmp            tmpfs           defaults        0 0

kernel parameters:

APPEND ip=dhcp root=/dev/nfs nfsroot=10.4.12.1:/usr/diskless/phoenixtmp,rsize=8192,wsize=8192,ro

Comment 2 SpanKY gentoo-dev

2007-12-24 13:02:09 UTC

where/when do you see an error ?  checkroot will not attempt to even touch your /  if it is net based (like nfs)

Comment 3 Dan Farrell 2008-05-28 17:02:55 UTC

I've had this problem as well.  I was unable to solve it with baselayout2, which worked even worse (no write support on /dev, cannot create nodes, and manages to boot, but into a mostly unusable system).  

Is there something I can do to help start resolving this issue?

(In reply to comment #2)
> where/when do you see an error ?  checkroot will not attempt to even touch your /  if it is net based (like nfs)

baselayout-1 says "Remounting Root Filesystem Read/Write", hangs (waiting to talk to rpc.statd), and then fails.  If statd is started properly, before that point, everything goes fine.  

In baselayout-2, problems start before that.  Udev never gets write access to /dev/, and the computer can't delete or make /dev nodes.  I cleared out dev and deleted the files supposedly unnecessary to boot, but, unsurprisingly, that didn't solve the read-only /dev problem.

Comment 4 Stefan Behte (RETIRED) gentoo-dev

2008-06-01 21:24:58 UTC

Same problem over here after update from sys-apps/baselayout-1.12.9-r2 to sys-apps/baselayout-1.12.11.1!

SpanKY, you said: "checkroot will not attempt to even touch your
/  if it is net based (like nfs)"
That's wrong, it DOES try to "mount -n -o remount,rw / &> /dev/null" after displaying "Remounting root filesystem read/write", which does not work.

The funny thing is: when the system cannot read /etc/fstab (e.g. if it does not exist), it boots just fine.
Adding "nolock" to the nfs mount options also worked!

Dan, you don't need nfs-utils for booting a diskless nfs system; I had them installed for one box (which stopped booting) and on another one (without), the systems still booted. After uninstalling nfs-utils from the 1st box, it booted again, but says:
 * Starting syslog-ng ... [ OK]
 * Starting portmap ... [ OK ]
rpc.statd
 * Error: Some services needed are missing. Run
 *        './netmount broken' for a list of those
 *        services. netmount was not started.
 * Starting sshd ... [ OK]

# /etc/init.d/netmount broken
rpc.statd

netmount has this in depend():

        local nfs_mounts=$(awk '!/^[[:space:]]*#/ && ($3=="nfs" || $3=="nfs4") && $4 !~ /\<(noauto|nolock)\>/ { print $0 }' /etc/fstab)
        if [[ -n ${nfs_mounts} ]] ; then
                myneed="${myneed} portmap rpc.statd"
        else
                myuse="${myuse} portmap rpc.statd"
        fi


This seems to be related to:
http://bugs.gentoo.org/show_bug.cgi?id=186542#c10

Comment 5 Stefan Behte (RETIRED) gentoo-dev

2008-06-01 21:29:25 UTC

BTW: I do not have *any* nfs scripts in the default or boot runlevel.

Comment 6 Richard F. Ostrow Jr. 2008-06-04 14:29:15 UTC

> Same problem over here after update from sys-apps/baselayout-1.12.9-r2 to
> sys-apps/baselayout-1.12.11.1!
> SpanKY, you said: "checkroot will not attempt to even touch your
> /  if it is net based (like nfs)"
> That's wrong, it DOES try to "mount -n -o remount,rw / &> /dev/null" after
> displaying "Remounting root filesystem read/write", which does not work.
> The funny thing is: when the system cannot read /etc/fstab (e.g. if it does not
> exist), it boots just fine.

Hmm... maybe your suggestion of removing the mount points entirely from the fstab would work for me. I've been wrestling with this for around 7 months now, avoiding a reboot because the machine will not come back up without manual intervention as I described in my first case. Obviously not a desired final solution, but it would likely help me to automate my boot process a bit.

> Adding "nolock" to the nfs mount options also worked!

But do you really want a filesystem that is not capable of lock-file support? You're asking for all sorts of problems there...

Comment 7 drcoolsanta 2008-12-29 14:42:51 UTC

Well I had the same problem while trying to create a network of diskless nodes. Well I realised that the root became read only and because of that rcp.statd didn't work. Also I realize that you mount nfs / as read only, that is what disables it from running an rcp.statd.

I fixed it by adding rw to kernel arguments. You should also check that the filesystem is not readonly in /etc/fstab and /etc/exports on approprite machines.

Comment 8 Vic Cross 2009-01-05 12:34:58 UTC

> Hmm... maybe your suggestion of removing the mount points entirely from the
> fstab would work for me.

I can confirm both the original bug and this workaround.

I have a system with a NFS-root that was working fine some months ago (sorry, not much more detail than that, can find out the last time it was booted if it's important), and stopped working with the "read only filesystem" problem after a recent update.  It started working again after commenting-out the line in /etc/fstab that represents the root filesystem.

Comment 9 Richard F. Ostrow Jr. 2009-01-13 23:25:44 UTC

Ok... I just updated that system again (after over a year running with the -nolock option and monthly updates), and now it won't remount the root filesystem read/write

fstab looks like:

gorgon:/usr/diskless/madusa     /       nfs             rsize=8192,wsize=8192,nolock,tcp,noatime       0 0

This causes the machine to have a read-only root filesystem, which wreaks all sorts of havoc with me. I managed to manually bring the thing up for now, but I need to manually start all my services because they all fail due to the root being read-only... really annoying.

Comment 10 Ivo Steinmann 2009-01-20 23:29:14 UTC

Same Problem here, my diskless systems worked for over 2 years and now it suddenly stopped. I cant boot any of them. One solution is to remove / from fstab and mtab. This way I can boot two of the diskless clients now. A 3rd one is still not booting.

Comment 11 Douglas Paul 2009-01-23 00:20:43 UTC

Is it possible that some of you are running into this bug now, which isn't related to the nolock problem?

http://bugs.gentoo.org/show_bug.cgi?id=252977

Comment 12 Dan Farrell 2009-01-23 00:39:03 UTC

Yes that appears to be the root of my issue
. Rpc.statd won't start because it needs to write to /var, but root can't be remounted RW until rpc.statd starts. But that situation must have been exposed by the change in the behavior of mount -f as the other bug says.  

Personally I have neve had an issue with locking on these hosts.  My solution was to symlink the file in var to a place in /dev which was RW already, but a better workaround is to change fstab to /dev/ROOT which worked for a new install this January as well as one I did in late September.

Comment 13 Ivo Steinmann 2009-01-24 01:44:12 UTC

This solved all my problems:

kernel:
root=/dev/nfs nfsroot=HOST_IP:NFS_ROOT,rsize=8192,wsize=8192,hard,intr,nfsvers=3

fstab:
/dev/nfs   /   none   rw,noatime   0 0


maybe the solution is still wrong, but at least I can boot all my machines now without any problems or errors at startup

Comment 14 thomas 2009-03-10 19:09:34 UTC

After updating from 2.6.25 to 2.6.27, remounting NFS RW does not work anymore.

The solution from comment 13 works basically, but I can't specify NFS options as boot params. If I add NFS options after the NFS root dir, it thinks the options are part of the root dir path and the server says access denied! I am using pxelinux. It should work, but it does not.

So, if I add no NFS options in the boot params and use the suggested entry in fstab, I can boot, but then I have default NFS options. If I add more options in fstab, remounting fails again.

Comment 15 nicolas fischer 2009-09-24 14:23:57 UTC

> I fixed it by adding rw to kernel arguments. 

wow, that fixed the prob after about 1 year for me, thanks. I never knew there was e kernel-option "rw", and almost missed your point thinking you would refer to using the rw-option in the nfsroot=... kernel-arg (which didn't help in my case)

Comment 16 thomas 2009-09-24 17:12:08 UTC

Adding rw as kernel boot parameter did not fix it for me.

Comment 17 Stefan Behte (RETIRED) gentoo-dev

2009-10-15 23:07:23 UTC

The bug is still there and prevents me e.g. from starting apache:

# /etc/init.d/apache2 start
 * Caching service dependencies ...
 *  Can't find service 'rpc.statd' needed by 'netmount';  continuing...                                                                                                                                       [ ok ]
rpc.statd
 * ERROR:  Some services needed are missing.  Run
 *         './netmount broken' for a list of those
 *         services.  netmount was not started.


/etc/init.d/netmount:
        local nfs_mounts=$(awk '!/^[[:space:]]*#/ && ($3=="nfs" || $3=="nfs4") && $4 !~ /\<(noauto|nolock)\>/ { print $0 }' /etc/fstab)

        if [[ -n ${nfs_mounts} ]] ; then
                myneed="${myneed} portmap rpc.statd"
        else
                myuse="${myuse} portmap rpc.statd"
        fi


If there are nfs mounts, it will try to start rpc.statd - which I don't have, as I didn't even install nfs-utils (no need - I'm booting via PXE/NFS).

When booting via PXE from NFS, it does not really matter if fstab has the fs mountpoint, as it's already mounted! So removing it from /etc/fstab does the trick, but I think it's an ugly hack (and it won't fix problems for non-nfsroot users).

If I emerge net-fs/nfs-utils-1.1.4-r1, rpc.statd gets used - and the system is fscked up, everything is mounted ro. It's funny that this happens very early:

Mounting proc at /proc ... [ ok ]
Mounting sysfs at /sys ... [ ok ]
Mounting /dev              [ ok ]
Starting udevd ... [ ok ]
Populating /dev/ with existing devices through uvents ... [ ok ]
Waiting for uevents to be processed ... [ ok ]
Mounting devpts at /dev/pts ... [ ok ]
Skipping /etc/mtab initialization (ro root?) 
You must be root to do this
Checking all filesystems ... [ ok ]
[...]
Configurating kernel parameters ... [ ok ]
Skipping /var and /tmp initialization (ro root?)
/sbin/rc: line511: /var/lib/init.d/softlevel: Read-only file system
Could not create needed directory '/var/lib/init.d/softscripts'
[...lots of mount/ro errors...]


"The rpc.statd server implements the NSM (Network Status Monitor) RPC protocol. This service is somewhat misnomed, since it doesn't actually provide active monitoring as one might suspect; instead, NSM implements a reboot notification service. It is used by the NFS file locking service, rpc.lockd, to implement lock recovery when the NFS server machine crashes and reboots."

http://linux.about.com/library/cmd/blcmdl8_rpc.statd.htm

The rpc.lockd program starts the NFS lock manager (NLM) on kernels that don't start it automatically. However, since most kernels do start it automatically, rpc.lockd. is usually not required. Even so, running it anyway is harmless. 

http://linux.about.com/library/cmd/blcmdl8_rpc.lockd.htm


net-fs/nfs-utils installs:

   usr/sbin/rpc.mountd
   usr/sbin/rpc.nfsd
   usr/sbin/showmount
   usr/sbin/sm-notify
   usr/sbin/rpc.idmapd
   usr/sbin/rpcdebug
   usr/sbin/exportfs
   usr/sbin/nfsstat
   sbin/mount.nfs
   sbin/rpc.statd

So we don't have rpc.lockd anyways! So what do we need rpc.statd for?!

BTW: Aren't those two obsolete? rpc.statd does not seem to be needed or used at all, a field test - I accidentally rebooted my development NFS Server while 3 NFS clients were doing emerge -uD world - shows, that the locks are still recovered - when the server came up, the clients continued updating and there were some hints in my /var/log/messages that the connection to the NFS Server was recovered. I'm not an NFS expert, but I think we don't need rpc.statd.

I guess the whole issue still needs some research...

Comment 18 Tiago Marques 2010-03-30 01:40:35 UTC

Hi,

I'm having this problem booting the same NFS install, some clients are booting with a 2.6.26-gentoo-r4 kernel and the ones that don't boot, with that same error, are running 2.6.31-gentoo-r6.

Also, I have noticed this in the 2.6.26-gentoo-r4 boxes:

 - installing the net-libs/libnfsidmap, dev-libs/libevent, net-nds/portmap and nfs-utils packages, they break mounts, they all become read-only.
 - it doesn't allow me to mount a directory on the root's /opt/mountpoint without installing those packages.

One of these kernels seems to have some kind of breakage, I'm not sure which one.

Comment 19 Tiago Marques 2010-04-13 15:50:17 UTC

Just confirmed that kernel 2.6.26-gentoo-r4 works fine, even 2.6.27 doesn't.

Comment 20 Dan Zoltak 2010-10-06 08:23:32 UTC

I'm using Dracut to boot an NFS root. Dracut mounts /sysroot with rw but after the pivot just after it runs udev the NFS root becomes read only.

Could this have something to do with udev?

Comment 21 Dan Zoltak 2010-10-06 08:32:29 UTC

enabling NFS4 on the server seems to have fixed the issue. Now the mounting root fs rw!

Comment 22 Dan Zoltak 2010-10-06 08:46:27 UTC

Just found out that this only works when using nfsroot=... and root=/dev/nfs

Which is deprecated :( on Kernel 2.6.34-gentoo-r1 

Anyone having any luck with this?

Comment 23 SpanKY gentoo-dev

2016-01-08 03:25:42 UTC

baselayout-1 is no longer in the tree