Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 198601
Alias:
Product:
Component:
Status: ASSIGNED
Resolution:
Assigned To: Gentoo's Team for Core System packages <base-system@gentoo.org>
Hardware:
OS:
Version:
Priority:
Severity:
Reporter: Richard F. Ostrow Jr. <kshots@warfaresdl.com>
Add CC:
CC:
Remove selected CCs
URL:
Summary:
Status Whiteboard:
Keywords:

Filename Description Type Creator Created Size Actions
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 198601 depends on: Show dependency tree
Bug 198601 blocks:
Votes: 0    Show votes for this bug    Vote for this bug

Additional Comments: (this is where you put emerge --info)


Not eligible to see or edit group visibility for this bug.







View Bug Activity   |   Format For Printing   |   XML   |   Clone This Bug


Description:   Opened: 2007-11-09 21:48 0000
Root filesystem won't remount in read/write mode under nfs (nfs root
filesystem). The reasoning behind this appears to be a failure to run
nfs.statd, which is normally started in the nfs or nfsmount init.d scripts
(both of which I have at the boot runlevel).

Reproducible: Always

Steps to Reproduce:
1. Set up a diskless system
2. Boot

Actual Results:  
Root filesystem fails to mount in read/write mode

Expected Results:  
Root filesystem should boot

Digging around in the read-only filesystem, I manually ran:

mount -o remount,rw /

which produced this:

mount.nfs: rpc.statd is not running but is required for remote locking 
    Either use "-o nolocks" to keep locks local, or start statd

Which tells me that rpc.statd is no longer starting early enough to mount the
root filesystem in read/write mode. This system has been running perfectly for
several months until I updated it today.

I was able to get the system running normally by using the -nolock option in
/etc/fstab, booting, removing the -nolock option (after init loaded rpc.statd),
and running "mount -o remount,rw /"... which will keep me running correctly
until I reboot, at which time it won't boot.

------- Comment #1 From Richard F. Ostrow Jr. 2007-11-16 15:48:12 0000 -------
I feel I should add some more info.

Contents of /etc/runlevels/boot:

alsasound  checkfs    clock        hostname  localmount  net.lo    rmnologin
bootmisc   checkroot  consolefont  keymaps   modules     nfsmount  urandom

Contents of /etc/runlevels/default:

local  netmount  rngd  sshd  syslog-ng  vixie-cron

/etc/runlevels/nonetwork:

local

/etc/runlevels/single:

*empty*

All these scripts contain the default values as of coreutils-6.9-r1 and
baselayout-1.12.9-r2

Contents of /etc/fstab:

gorgon:/usr/diskless/phoenixtmp /       nfs            
rsize=8192,wsize=8192,rw,tcp,noatime     0 0
none                    /var/run        tmpfs           defaults        0 0
none                    /tmp            tmpfs           defaults        0 0

kernel parameters:

APPEND ip=dhcp root=/dev/nfs
nfsroot=10.4.12.1:/usr/diskless/phoenixtmp,rsize=8192,wsize=8192,ro

------- Comment #2 From SpanKY 2007-12-24 13:02:09 0000 -------
where/when do you see an error ?  checkroot will not attempt to even touch your
/  if it is net based (like nfs)

------- Comment #3 From Dan Farrell 2008-05-28 17:02:55 0000 -------
I've had this problem as well.  I was unable to solve it with baselayout2,
which worked even worse (no write support on /dev, cannot create nodes, and
manages to boot, but into a mostly unusable system).  

Is there something I can do to help start resolving this issue?

(In reply to comment #2)
> where/when do you see an error ?  checkroot will not attempt to even touch your /  if it is net based (like nfs)

baselayout-1 says "Remounting Root Filesystem Read/Write", hangs (waiting to
talk to rpc.statd), and then fails.  If statd is started properly, before that
point, everything goes fine.  

In baselayout-2, problems start before that.  Udev never gets write access to
/dev/, and the computer can't delete or make /dev nodes.  I cleared out dev and
deleted the files supposedly unnecessary to boot, but, unsurprisingly, that
didn't solve the read-only /dev problem.  

------- Comment #4 From Stefan Behte 2008-06-01 21:24:58 0000 -------
Same problem over here after update from sys-apps/baselayout-1.12.9-r2 to
sys-apps/baselayout-1.12.11.1!

SpanKY, you said: "checkroot will not attempt to even touch your
/  if it is net based (like nfs)"
That's wrong, it DOES try to "mount -n -o remount,rw / &> /dev/null" after
displaying "Remounting root filesystem read/write", which does not work.

The funny thing is: when the system cannot read /etc/fstab (e.g. if it does not
exist), it boots just fine.
Adding "nolock" to the nfs mount options also worked!

Dan, you don't need nfs-utils for booting a diskless nfs system; I had them
installed for one box (which stopped booting) and on another one (without), the
systems still booted. After uninstalling nfs-utils from the 1st box, it booted
again, but says:
 * Starting syslog-ng ... [ OK]
 * Starting portmap ... [ OK ]
rpc.statd
 * Error: Some services needed are missing. Run
 *        './netmount broken' for a list of those
 *        services. netmount was not started.
 * Starting sshd ... [ OK]

# /etc/init.d/netmount broken
rpc.statd

netmount has this in depend():

        local nfs_mounts=$(awk '!/^[[:space:]]*#/ && ($3=="nfs" || $3=="nfs4")
&& $4 !~ /\<(noauto|nolock)\>/ { print $0 }' /etc/fstab)
        if [[ -n ${nfs_mounts} ]] ; then
                myneed="${myneed} portmap rpc.statd"
        else
                myuse="${myuse} portmap rpc.statd"
        fi


This seems to be related to:
http://bugs.gentoo.org/show_bug.cgi?id=186542#c10

------- Comment #5 From Stefan Behte 2008-06-01 21:29:25 0000 -------
BTW: I do not have *any* nfs scripts in the default or boot runlevel.

------- Comment #6 From Richard F. Ostrow Jr. 2008-06-04 14:29:15 0000 -------
> Same problem over here after update from sys-apps/baselayout-1.12.9-r2 to
> sys-apps/baselayout-1.12.11.1!
> SpanKY, you said: "checkroot will not attempt to even touch your
> /  if it is net based (like nfs)"
> That's wrong, it DOES try to "mount -n -o remount,rw / &> /dev/null" after
> displaying "Remounting root filesystem read/write", which does not work.
> The funny thing is: when the system cannot read /etc/fstab (e.g. if it does not
> exist), it boots just fine.

Hmm... maybe your suggestion of removing the mount points entirely from the
fstab would work for me. I've been wrestling with this for around 7 months now,
avoiding a reboot because the machine will not come back up without manual
intervention as I described in my first case. Obviously not a desired final
solution, but it would likely help me to automate my boot process a bit.

> Adding "nolock" to the nfs mount options also worked!

But do you really want a filesystem that is not capable of lock-file support?
You're asking for all sorts of problems there...

------- Comment #7 From drcoolsanta@gmail.com 2008-12-29 14:42:51 0000 -------
Well I had the same problem while trying to create a network of diskless nodes.
Well I realised that the root became read only and because of that rcp.statd
didn't work. Also I realize that you mount nfs / as read only, that is what
disables it from running an rcp.statd.

I fixed it by adding rw to kernel arguments. You should also check that the
filesystem is not readonly in /etc/fstab and /etc/exports on approprite
machines.

------- Comment #8 From Vic Cross 2009-01-05 12:34:58 0000 -------
> Hmm... maybe your suggestion of removing the mount points entirely from the
> fstab would work for me.

I can confirm both the original bug and this workaround.

I have a system with a NFS-root that was working fine some months ago (sorry,
not much more detail than that, can find out the last time it was booted if
it's important), and stopped working with the "read only filesystem" problem
after a recent update.  It started working again after commenting-out the line
in /etc/fstab that represents the root filesystem.

------- Comment #9 From Richard F. Ostrow Jr. 2009-01-13 23:25:44 0000 -------
Ok... I just updated that system again (after over a year running with the
-nolock option and monthly updates), and now it won't remount the root
filesystem read/write

fstab looks like:

gorgon:/usr/diskless/madusa     /       nfs            
rsize=8192,wsize=8192,nolock,tcp,noatime       0 0

This causes the machine to have a read-only root filesystem, which wreaks all
sorts of havoc with me. I managed to manually bring the thing up for now, but I
need to manually start all my services because they all fail due to the root
being read-only... really annoying.

------- Comment #10 From Ivo Steinmann 2009-01-20 23:29:14 0000 -------
Same Problem here, my diskless systems worked for over 2 years and now it
suddenly stopped. I cant boot any of them. One solution is to remove / from
fstab and mtab. This way I can boot two of the diskless clients now. A 3rd one
is still not booting.

------- Comment #11 From Douglas Paul 2009-01-23 00:20:43 0000 -------
Is it possible that some of you are running into this bug now, which isn't
related to the nolock problem?

http://bugs.gentoo.org/show_bug.cgi?id=252977

------- Comment #12 From Dan Farrell 2009-01-23 00:39:03 0000 -------
Yes that appears to be the root of my issue
. Rpc.statd won't start because it needs to write to /var, but root can't be
remounted RW until rpc.statd starts. But that situation must have been exposed
by the change in the behavior of mount -f as the other bug says.  

Personally I have neve had an issue with locking on these hosts.  My solution
was to symlink the file in var to a place in /dev which was RW already, but a
better workaround is to change fstab to /dev/ROOT which worked for a new
install this January as well as one I did in late September.   

------- Comment #13 From Ivo Steinmann 2009-01-24 01:44:12 0000 -------
This solved all my problems:

kernel:
root=/dev/nfs
nfsroot=HOST_IP:NFS_ROOT,rsize=8192,wsize=8192,hard,intr,nfsvers=3

fstab:
/dev/nfs   /   none   rw,noatime   0 0


maybe the solution is still wrong, but at least I can boot all my machines now
without any problems or errors at startup

------- Comment #14 From thomas@boerkel.de 2009-03-10 19:09:34 0000 -------
After updating from 2.6.25 to 2.6.27, remounting NFS RW does not work anymore.

The solution from comment 13 works basically, but I can't specify NFS options
as boot params. If I add NFS options after the NFS root dir, it thinks the
options are part of the root dir path and the server says access denied! I am
using pxelinux. It should work, but it does not.

So, if I add no NFS options in the boot params and use the suggested entry in
fstab, I can boot, but then I have default NFS options. If I add more options
in fstab, remounting fails again.

------- Comment #15 From nicolas fischer 2009-09-24 14:23:57 0000 -------
> I fixed it by adding rw to kernel arguments. 

wow, that fixed the prob after about 1 year for me, thanks. I never knew there
was e kernel-option "rw", and almost missed your point thinking you would refer
to using the rw-option in the nfsroot=... kernel-arg (which didn't help in my
case)

------- Comment #16 From thomas@boerkel.de 2009-09-24 17:12:08 0000 -------
Adding rw as kernel boot parameter did not fix it for me.

------- Comment #17 From Stefan Behte 2009-10-15 23:07:23 0000 -------
The bug is still there and prevents me e.g. from starting apache:

# /etc/init.d/apache2 start
 * Caching service dependencies ...
 *  Can't find service 'rpc.statd' needed by 'netmount';  continuing...        
                                                                               
                                              [ ok ]
rpc.statd
 * ERROR:  Some services needed are missing.  Run
 *         './netmount broken' for a list of those
 *         services.  netmount was not started.


/etc/init.d/netmount:
        local nfs_mounts=$(awk '!/^[[:space:]]*#/ && ($3=="nfs" || $3=="nfs4")
&& $4 !~ /\<(noauto|nolock)\>/ { print $0 }' /etc/fstab)

        if [[ -n ${nfs_mounts} ]] ; then
                myneed="${myneed} portmap rpc.statd"
        else
                myuse="${myuse} portmap rpc.statd"
        fi


If there are nfs mounts, it will try to start rpc.statd - which I don't have,
as I didn't even install nfs-utils (no need - I'm booting via PXE/NFS).

When booting via PXE from NFS, it does not really matter if fstab has the fs
mountpoint, as it's already mounted! So removing it from /etc/fstab does the
trick, but I think it's an ugly hack (and it won't fix problems for non-nfsroot
users).

If I emerge net-fs/nfs-utils-1.1.4-r1, rpc.statd gets used - and the system is
fscked up, everything is mounted ro. It's funny that this happens very early:

Mounting proc at /proc ... [ ok ]
Mounting sysfs at /sys ... [ ok ]
Mounting /dev              [ ok ]
Starting udevd ... [ ok ]
Populating /dev/ with existing devices through uvents ... [ ok ]
Waiting for uevents to be processed ... [ ok ]
Mounting devpts at /dev/pts ... [ ok ]
Skipping /etc/mtab initialization (ro root?) 
You must be root to do this
Checking all filesystems ... [ ok ]
[...]
Configurating kernel parameters ... [ ok ]
Skipping /var and /tmp initialization (ro root?)
/sbin/rc: line511: /var/lib/init.d/softlevel: Read-only file system
Could not create needed directory '/var/lib/init.d/softscripts'
[...lots of mount/ro errors...]


"The rpc.statd server implements the NSM (Network Status Monitor) RPC protocol.
This service is somewhat misnomed, since it doesn't actually provide active
monitoring as one might suspect; instead, NSM implements a reboot notification
service. It is used by the NFS file locking service, rpc.lockd, to implement
lock recovery when the NFS server machine crashes and reboots."

http://linux.about.com/library/cmd/blcmdl8_rpc.statd.htm

The rpc.lockd program starts the NFS lock manager (NLM) on kernels that don't
start it automatically. However, since most kernels do start it automatically,
rpc.lockd. is usually not required. Even so, running it anyway is harmless. 

http://linux.about.com/library/cmd/blcmdl8_rpc.lockd.htm


net-fs/nfs-utils installs:

   usr/sbin/rpc.mountd
   usr/sbin/rpc.nfsd
   usr/sbin/showmount
   usr/sbin/sm-notify
   usr/sbin/rpc.idmapd
   usr/sbin/rpcdebug
   usr/sbin/exportfs
   usr/sbin/nfsstat
   sbin/mount.nfs
   sbin/rpc.statd

So we don't have rpc.lockd anyways! So what do we need rpc.statd for?!

BTW: Aren't those two obsolete? rpc.statd does not seem to be needed or used at
all, a field test - I accidentally rebooted my development NFS Server while 3
NFS clients were doing emerge -uD world - shows, that the locks are still
recovered - when the server came up, the clients continued updating and there
were some hints in my /var/log/messages that the connection to the NFS Server
was recovered. I'm not an NFS expert, but I think we don't need rpc.statd.

I guess the whole issue still needs some research...

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug