Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 688644 - net-fs/nfs-utils-2.4.1 problems with statx
Summary: net-fs/nfs-utils-2.4.1 problems with statx
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal with 1 vote (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard: Waiting for upstream fix/approval
Keywords: InVCS
: 701192 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-06-24 20:49 UTC by Andreas Steinmetz
Modified: 2022-03-27 03:20 UTC (History)
9 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
workaround patch (nfs-utils.patch,299 bytes, patch)
2019-06-24 20:49 UTC, Andreas Steinmetz
Details | Diff
Fix-incorrect-order-between-config.h-and-errno.h (nfs-utils-2.4.1-Fix-incorrect-order-between-config.h-and-errno.h.patch,805 bytes, patch)
2019-11-19 15:36 UTC, Gordon Bos
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Steinmetz 2019-06-24 20:49:59 UTC
Created attachment 580728 [details, diff]
workaround patch

On a system with sys-libs/glibc-2.29-r2 and kernel 4.9.128 nfs v3 mount fails as statx() with mask=STATX_BASIC_STATS returns EINVAL, probably from glibc, as strace of rpc.mountd shows no system call.
The attached patch is at least a workaround that re-enables nfs v3 mounts.
Comment 1 Andreas Steinmetz 2019-06-24 23:19:17 UTC
Sorry s/4.9.128/4.9.182/
Comment 2 Lars Wendler (Polynomial-C) (RETIRED) gentoo-dev 2019-06-25 10:05:48 UTC
Does upstream know about this issue?
Comment 3 Andreas Steinmetz 2019-06-25 12:02:11 UTC
No.
Comment 4 Sven E. 2019-08-09 18:39:44 UTC
I am not sure if this is realted, when using nfs-utils 2.41 the server (rpc.mountd) complains it cannot stat the exported directoy, older nfs-utils however work liek a charm (server side).

Kernel on the server is 4.9.124 and server is running sys-libs/glibc-2.29-r3.
Comment 5 Matt Turner gentoo-dev 2019-09-02 06:11:09 UTC
We are not experts on the nfs-utils code. Please report this upstream. If they accept the patch, we will happily apply it to Gentoo.
Comment 6 Uros 2019-09-18 19:34:55 UTC
After upgrade to recently stabilized net-fs/nfs-utils-2.4.1-r1, my clients were unable to mount nfs v3 exports with the following error in server logs:

rpc.mountd: can't stat exported dir /shared/folder: Invalid argument

Downgrading to net-fs/nfs-utils-2.3.3 and everything works as expected.

I've created new bug upstream https://bugzilla.kernel.org/show_bug.cgi?id=204911 with link back to this.
Comment 7 Robert Gutermuth 2019-09-25 14:36:32 UTC
(In reply to Uros from comment #6)
> After upgrade to recently stabilized net-fs/nfs-utils-2.4.1-r1, my clients
> were unable to mount nfs v3 exports with the following error in server logs:
> 
> rpc.mountd: can't stat exported dir /shared/folder: Invalid argument
> 
> Downgrading to net-fs/nfs-utils-2.3.3 and everything works as expected.
> 
> I've created new bug upstream
> https://bugzilla.kernel.org/show_bug.cgi?id=204911 with link back to this.

Comment 6 report confirmed on my end.  Upgrade to nfs-utils 2.4.1-r1 on server resulted in banned client (both are amd64) on NFS v3 connection.  Backing off to 2.3.3 on server enabled client to reconnect.  Also verified that the client's nfs-utils version (2.3.3 vs 2.4.1-r1) does not seem to matter.
Comment 8 Mike Gilbert gentoo-dev 2019-09-25 15:29:36 UTC
I believe this is the bug tracker for nfs-utils.

https://bugzilla.linux-nfs.org/
Comment 9 Tom Dexter 2019-09-28 07:44:53 UTC
I don't know if this is related, but I just spent the better part of a day trying to figure out why my NFS 3 mounts were working on 32-bit x86. No matter what I did I got "mount.nfs: Stale file handle" failures. Downgrading to 2.3.3 fixed that as well.
Comment 10 Tom Dexter 2019-09-28 07:47:12 UTC
Obviously "were working" was intended to be "weren't working" above.
Comment 11 Tom Dexter 2019-09-28 13:55:48 UTC
OK. I can confirm that adding the new nfs-utils-2.4.1-Fix-include-order-between-config.h-and-stat.h patch (as is used in -r2) to my existing nfs-utils-2.4.1-r1 on the server side does in fact correct this.

In my case again, this is 32-bit x86 (both sides). I see that the fix is apparently related to large file support, and my export does in fact have files in excess of 3 GB.
Comment 12 Larry the Git Cow gentoo-dev 2019-09-28 14:53:12 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=baf42f13f34565100707f3fcba17fa2f0d0a0403

commit baf42f13f34565100707f3fcba17fa2f0d0a0403
Author:     Thomas Deutschmann <whissi@gentoo.org>
AuthorDate: 2019-09-28 14:43:35 +0000
Commit:     Thomas Deutschmann <whissi@gentoo.org>
CommitDate: 2019-09-28 14:45:41 +0000

    net-fs/nfs-utils: move stable keywords
    
    Closes: https://bugs.gentoo.org/688644
    Package-Manager: Portage-2.3.76, Repoman-2.3.17
    Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>

 net-fs/nfs-utils/nfs-utils-2.4.1-r2.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 13 Lars Langhans 2019-10-07 18:15:50 UTC
Hi all,

I would like to confirm the attached 'workaround patch' on my Gentoo-Linux with kernel 4.9.193, glibc 2.29-r2. It works as expected. Other Gentoo-box with kernel 5.3.4 can mount the filesystem.

With the current nfs-utils-2.4.1-r2 it do not work. Please create a patch 2.4.1-r3 with this patch included.

Thanks a lot and Kind regards
Lars
Comment 14 Thomas Deutschmann (RETIRED) gentoo-dev 2019-10-07 19:09:22 UTC
=net-fs/nfs-utils-2.4.1-r2 was stabilized based on comment #11.

@ Lars Langhans: Have you upgraded both, all server and clients, to latest version  and are still experiencing the issue?

What's about you Andreas, have you tried -r2 without your workaround?
Comment 15 Sven E. 2019-10-07 21:55:28 UTC
(In reply to Thomas Deutschmann from comment #14)
> =net-fs/nfs-utils-2.4.1-r2 was stabilized based on comment #11.
> 
> @ Lars Langhans: Have you upgraded both, all server and clients, to latest
> version  and are still experiencing the issue?
> 
> What's about you Andreas, have you tried -r2 without your workaround?

I am still seeing the problem with -r2. While the client version doesn't matter, 2.4.1-r2 on the server still produces:

rpc.mountd[26354]: can't stat exported dir <exportdir>: Invalid argument
Comment 16 Lars Langhans 2019-10-08 19:59:49 UTC
@ Thomas Deutschmann, both machines are up to date.

On my machine with kernel 4.9.193 with nfs-utils-2.4.1-r2 I can mount the filesystem of my other machine with kernel 5.3.4 also with nfs-utils-2.4.1-r2.

From the machine with kernel 5.3.4 and nfs-utils-2.4.1-r2 I can not right mount the filesystem from the machine with 4.9.193 and nfs-utils-2.4.1-r2. I see some directory entries but I can't see the whole filesystem. Also no change to open a single file.

But when I patch the nfs-utils-2.4.1-r2 with the attached 'workaround patch' from Andreas Steinmetz on the kernel 4.9.193 based machine then I can mount from kernel 5.3.4 the filesystem on the kernel 4.9.193 and have full access to all files.

The attached patch from Andreas Steinmetz is need only on the LTS kernel 4.9.x based machine.

I have a third machine with kernel 5.2.1 and also nfs-utils-2.4.1-r2 and there are no problems at all.

I also copied positive a big file (6.7gb from kernel 5.3.4 machine to kernel 4.9.193 machine) to test it.

HTH.
Comment 17 Thomas Deutschmann (RETIRED) gentoo-dev 2019-10-08 20:46:12 UTC
I contacted upstream: https://marc.info/?l=linux-nfs&m=157056624814371&w=2
Comment 18 Mark Davies 2019-10-13 09:28:00 UTC
Thomus

I hit this problem as well and I don't think its a kernel bug as the replier on the linux-nfs mailing list guesses. I think the problem is caused by the following.

The statx system call was introduced in kernel version 4.11 (man syscalls). Myself and the original reporter run kernels earlier that this. Unfortunately for us glibc 2.29 fakes statx if the real syscall is unavailable.

https://sourceware.org/git/?p=glibc.git;a=blob;f=io/statx_generic.c;h=10225ef5cb2b66c5dc16e2eea40ef27fe908be0c;hb=56c86f5dd516284558e106d04b92875d5b623b7a#l43

but its fake version does not support the AT_STATX_DONT_SYNC flag used by the nfs-utils

http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=support/misc/xstat.c;h=fa047880cfd01e8c5a1a021e3ed25e612cd6e838;hb=91873f84521a17648927f0db3470583c29adbd0f#l57

thus the call to statx fails with errno set to EINVAL which the nfs-utils code does not handle and the mount fails.
Comment 19 Sven E. 2019-10-17 22:54:44 UTC
(In reply to Mark Davies from comment #18)
> Thomus
> 
> I hit this problem as well and I don't think its a kernel bug as the replier
> on the linux-nfs mailing list guesses. I think the problem is caused by the
> following.
> 
> The statx system call was introduced in kernel version 4.11 (man syscalls).
> Myself and the original reporter run kernels earlier that this.
> Unfortunately for us glibc 2.29 fakes statx if the real syscall is
> unavailable.
> 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=io/statx_generic.c;
> h=10225ef5cb2b66c5dc16e2eea40ef27fe908be0c;
> hb=56c86f5dd516284558e106d04b92875d5b623b7a#l43
> 
> but its fake version does not support the AT_STATX_DONT_SYNC flag used by
> the nfs-utils
> 
> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=support/misc/xstat.
> c;h=fa047880cfd01e8c5a1a021e3ed25e612cd6e838;
> hb=91873f84521a17648927f0db3470583c29adbd0f#l57
> 
> thus the call to statx fails with errno set to EINVAL which the nfs-utils
> code does not handle and the mount fails.

I think you nailed it there:

statx(AT_FDCWD, "/", AT_STATX_DONT_SYNC|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, 0x7ffce4242860) = -1 ENOSYS
statx(AT_FDCWD, "/distfiles", AT_STATX_DONT_SYNC|AT_NO_AUTOMOUNT, STATX_BASIC_STATS, 0x7ffce4242860) = -1 ENOSYS

However the error is ENOSYS not EINVAL.
Comment 20 Brian Evans (RETIRED) gentoo-dev 2019-10-30 15:39:29 UTC
+1 on patch;  helped me out
Comment 21 Larry the Git Cow gentoo-dev 2019-10-30 16:18:47 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=151c446a8906e7489de10ae2e66642a142e5509c

commit 151c446a8906e7489de10ae2e66642a142e5509c
Author:     Lars Wendler <polynomial-c@gentoo.org>
AuthorDate: 2019-10-30 16:17:50 +0000
Commit:     Lars Wendler <polynomial-c@gentoo.org>
CommitDate: 2019-10-30 16:18:41 +0000

    net-fs/nfs-utils: Revbump to fix issue with old kernels and statx
    
    Bumped straight to stable as this seems to affect many users.
    
    Thanks-to: Andreas Steinmetz <ast@domdv.de>
    Tested-by: Lars Langhans <lars.langhans@gmx.de>
    Tested-by: Brian Evans <grknight@gentoo.org>
    Bug: https://bugs.gentoo.org/688644
    Package-Manager: Portage-2.3.78, Repoman-2.3.17
    Signed-off-by: Lars Wendler <polynomial-c@gentoo.org>

 net-fs/nfs-utils/files/nfs-utils-2.4.1-statx.patch | 31 ++++++++++++++++++++++
 ...s-2.4.1-r2.ebuild => nfs-utils-2.4.1-r3.ebuild} |  1 +
 2 files changed, 32 insertions(+)
Comment 22 Gordon Bos 2019-11-19 12:13:41 UTC
Sadly whatever 2.4.1-r2 attempted to fix does in fact break nfs.mountd on kernel 4.19.52 (arm). It dies silently, at which time the nfsclient gets stuck indefinitely which is pretty bad when that happens during boot.

Please reinvestigate and if applicable make the patch depend on the kernel version as I understand the original problem was with pre 4.11 kernels only.
Comment 23 Thomas Deutschmann (RETIRED) gentoo-dev 2019-11-19 13:33:21 UTC
(In reply to Gordon Bos from comment #22)
> Please reinvestigate and if applicable make the patch depend on the kernel
> version as I understand the original problem was with pre 4.11 kernels only.

Two things: You are running an EOL kernel version. Please upgrade.

But, conditionally patching based on kernel version will never happen in Gentoo: It is impossible for us to know which kernel you are running. Imagine you could use a build host where you create all your packages. This build host is running latest 4.19.x kernel. However, your client which is using that build host and therefore would receive a conditionally patched net-fs/nfs-utils package for 4.19.x kernel could still run a different kernel.
Comment 24 Gordon Bos 2019-11-19 15:36:41 UTC
Created attachment 596768 [details, diff]
Fix-incorrect-order-between-config.h-and-errno.h

Right. Granted, I do not have the latest patch level of the kernel, but 4.19 is still the current stable and I believe there was some mentioning earlier that the issue was not related to the kernel.

In any case I found that when I reverted the patch associated with revision 2 (nfs-utils-2.4.1-Fix-include-order-between-config.h-and-stat.h.patch) then nfs.mountd would not crash. So I investigated that patch and I found the error. Yes! There is in fact an error in that patch and it is quite subtle. By moving the include all the way to the top rather that directly in front of stat.h a new conflict was created, this time on errno.h
Comment 25 Doug Nazar 2019-11-19 21:41:21 UTC
(In reply to Gordon Bos from comment #22)
> Sadly whatever 2.4.1-r2 attempted to fix does in fact break nfs.mountd on
> kernel 4.19.52 (arm). It dies silently, at which time the nfsclient gets
> stuck indefinitely which is pretty bad when that happens during boot.
> 
> Please reinvestigate and if applicable make the patch depend on the kernel
> version as I understand the original problem was with pre 4.11 kernels only.

The only problem with that patch was it didn't go far enough, only fixing files that already included config.h, not all files using struct stat.

Your crash is due to files using different sized struct stat definitions.

See:
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commit;h=1378280398ef9f5cd45f5542ae2945b9a360b132
https://marc.info/?l=linux-nfs&m=157401284903578&w=2

Doug
Comment 26 Gordon Bos 2019-11-19 22:09:58 UTC
Yeah... I kind of went ahead and pushed the binary I got working with that last patch to an identical machine and although it did not crash the server side it gave me a stale handle message on the client. Had some other business to take care of and intended to investigate tomorrow. Reverted to the older 2.3.3 binary I had tucked away on my binhost to keep things running until then.

Can't help but conclude that the 2.4.1 version contains some serious bugs and I know it's been asked before, but as a service towards users it might be a good idea to place the 2.3.3 ebuild back into the repository until this issue is properly resolved.
Comment 27 Larry the Git Cow gentoo-dev 2019-11-20 12:55:42 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=678435330365772389ff25a1307b9a96a9797afc

commit 678435330365772389ff25a1307b9a96a9797afc
Author:     Thomas Deutschmann <whissi@gentoo.org>
AuthorDate: 2019-11-20 12:55:20 +0000
Commit:     Thomas Deutschmann <whissi@gentoo.org>
CommitDate: 2019-11-20 12:55:20 +0000

    net-fs/nfs-utils: rev bump to add patches
    
    Bug: https://bugs.gentoo.org/688644
    Package-Manager: Portage-2.3.79, Repoman-2.3.18
    Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>

 .../nfs-utils/{nfs-utils-2.4.1-r3.ebuild => nfs-utils-2.4.1-r4.ebuild} | 3 +++
 1 file changed, 3 insertions(+)

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=a1a54c963cdfcf3af6787a109fc5518c1fde14ef

commit a1a54c963cdfcf3af6787a109fc5518c1fde14ef
Author:     Thomas Deutschmann <whissi@gentoo.org>
AuthorDate: 2019-11-20 12:52:31 +0000
Commit:     Thomas Deutschmann <whissi@gentoo.org>
CommitDate: 2019-11-20 12:52:31 +0000

    net-fs/nfs-utils: rev bump to add some patches
    
    Bug: https://bugs.gentoo.org/688644
    Package-Manager: Portage-2.3.79, Repoman-2.3.18
    Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>

 ...utils-2.4.2-Ensure-consistent-struct-stat.patch | 115 +++++++++++++++++++++
 ...2-mountd-Add-check-for-struct-file_handle.patch |  54 ++++++++++
 ...-mountd-Fix-compilation-for--disable-uuid.patch |  35 +++++++
 ...tils-2.4.2.ebuild => nfs-utils-2.4.2-r1.ebuild} |   3 +
 4 files changed, 207 insertions(+)

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=313a6a58992063f9c17946e8f93147d5e63e3ef7

commit 313a6a58992063f9c17946e8f93147d5e63e3ef7
Author:     Thomas Deutschmann <whissi@gentoo.org>
AuthorDate: 2019-11-20 12:41:59 +0000
Commit:     Thomas Deutschmann <whissi@gentoo.org>
CommitDate: 2019-11-20 12:41:59 +0000

    net-fs/nfs-utils: restore v2.3.x
    
    Bug: https://bugs.gentoo.org/688644
    Package-Manager: Portage-2.3.79, Repoman-2.3.18
    Signed-off-by: Thomas Deutschmann <whissi@gentoo.org>

 net-fs/nfs-utils/Manifest               |   1 +
 net-fs/nfs-utils/nfs-utils-2.3.4.ebuild | 192 ++++++++++++++++++++++++++++++++
 2 files changed, 193 insertions(+)
Comment 28 Thomas Deutschmann (RETIRED) gentoo-dev 2019-11-20 12:59:47 UTC
We restored nfs-utils-2.3.4 for those who can't test and just need a working setup.

nfs-utils-2.4.1 was rev bumped to include Doug's patch for the problem reported by Gordon which was accepted upstream (thanks!).

Please report if this is now working for everyone. Please also mention host and guest architecture (32/64bit) and kernel.

We still need to look into Andreas' workaround we currently apply. Code could be improved and of course the problem should be addressed upstream. However, I don't have a setup where I can test by myself. So any help is appreciated.
Comment 29 Gordon Bos 2019-11-20 15:11:20 UTC
(In reply to Doug Nazar from comment #25)
> The only problem with that patch was it didn't go far enough, only fixing
> files that already included config.h, not all files using struct stat.
> 
> Your crash is due to files using different sized struct stat definitions.

Okay, so I ran:
> find -name '*.c' | while read file; do
> 	grep -q "#include *[\"<]sys\/stat.h[\">]" $file && \
> 		( grep -q "#include *[\"<]config.h" $file || \
> 			sed -e "s/#include *[\"<]sys\/stat.h[\">]/#include <config.h>\n#include <sys\/stat.h>/" -i $file )
> done
to add an include for config.h directly in front of every include for sys/stat.h

This made no change whatsoever and that actually makes sense because the whole reason for placing config.h in front of sys/stat.h appears to be a dependency on a keyword that is only defined on 64 bit systems. And I am on 32 bit arm.

The thing that got me most here is that after applying the proposed patch attachment 596768 [details, diff] I had nfs working on a developer machine but it gave me the `stale file handle` message on the live machine. So I verified the only thing that was really different, which is the bindist flag, but it kept working on the developer machine.

In the end I figured it could have something to do with path length and I created test exports for every intermediate folder leading to the original export. To my amazement I could not only open every single one of them, but also the one that gave me the `stale file handle` message earlier. As it turns out I had wrongfully concluded that nfs-4.1.x had been working on the developer machine because I had simply exported /home. Had I exported /home/testuser it would have given me the same error as the live machine.

Concluding:
1) patch attachment 596768 [details, diff] *is* required to stop rpc.mountd from crashing on client connect on 32 bit arm
2) nfs-4.1.x can only export folders directly part of the root file system and thus is of limited use for live environments
Comment 30 Doug Nazar 2019-11-20 17:38:03 UTC
(In reply to Gordon Bos from comment #29)
> This made no change whatsoever and that actually makes sense because the
> whole reason for placing config.h in front of sys/stat.h appears to be a
> dependency on a keyword that is only defined on 64 bit systems. And I am on
> 32 bit arm.

Actually, it's only defined on 32 bit machines, to use 64 bit fields in the structs & various functions.

> Concluding:
> 1) patch attachment 596768 [details, diff] [details, diff] *is* required to stop rpc.mountd
> from crashing on client connect on 32 bit arm
> 2) nfs-4.1.x can only export folders directly part of the root file system
> and thus is of limited use for live environments

I'm currently running nfs-utils-2.4.1-r3 with my patch, on kernel 5.4ish x86 & amd64, and 4.9.200 arm. Exporting various filesystem root & sub-dirs over nfs3/nfs4 with both sys & krb5 security. One box is on 2.4.2+ upstream for testing. Everything is working correctly.

I'm currently building nfs-utils-2.4.2-r1 and will be testing that this afternoon but don't expect any issues.

Doug
Comment 31 Gordon Bos 2019-11-20 18:28:20 UTC
(In reply to Doug Nazar from comment #30)
> (In reply to Gordon Bos from comment #29)
> > This made no change whatsoever and that actually makes sense because the
> > whole reason for placing config.h in front of sys/stat.h appears to be a
> > dependency on a keyword that is only defined on 64 bit systems. And I am on
> > 32 bit arm.
> 
> Actually, it's only defined on 32 bit machines, to use 64 bit fields in the
> structs & various functions.

I guess I can verify on the x86 machine I use for crossdev. On the arm machine *and* the crossdev file system the mentioned define is not present. Either way the issue is real and moving the include of config.h to after the include off errno.h does not hurt your objective and the include should in fact not be conditional, as it wasn't before the patch that was included in revision 2.

> > Concluding:
> > 1) patch attachment 596768 [details, diff] [details, diff] [details, diff] *is* required to stop rpc.mountd
> > from crashing on client connect on 32 bit arm
> > 2) nfs-4.1.x can only export folders directly part of the root file system
> > and thus is of limited use for live environments
> 
> I'm currently running nfs-utils-2.4.1-r3 with my patch, on kernel 5.4ish x86
> & amd64, and 4.9.200 arm. Exporting various filesystem root & sub-dirs over
> nfs3/nfs4 with both sys & krb5 security. One box is on 2.4.2+ upstream for
> testing. Everything is working correctly.

That earlier comment appears to have been open for interpretation. In fact /home is on a different filesystem in my setup and so I verified with /var/tmp. The return is the same: a stale file handle. When I export /var I can do an nfs mount to /var/tmp without issues.
Comment 32 Doug Nazar 2019-11-20 22:03:49 UTC
(In reply to Gordon Bos from comment #31)
> I guess I can verify on the x86 machine I use for crossdev. On the arm
> machine *and* the crossdev file system the mentioned define is not present.
> Either way the issue is real and moving the include of config.h to after the
> include off errno.h does not hurt your objective and the include should in
> fact not be conditional, as it wasn't before the patch that was included in
> revision 2.

Well, the user level define is _FILE_OFFSET_BITS=64, which is converted to __USE_FILE_OFFSET64 by /usr/include/features.h and used in the struct stat definition in /usr/include/bits/stat.h. At least on glibc.

On my arm box after ebuild configure:
grep _FILE_OFFSET_BITS support/include/config.h
#define _FILE_OFFSET_BITS 64

> That earlier comment appears to have been open for interpretation. In fact
> /home is on a different filesystem in my setup and so I verified with
> /var/tmp. The return is the same: a stale file handle. When I export /var I
> can do an nfs mount to /var/tmp without issues.

For testing on my arm box (which normally doesn't export anything), I exported only /boot & /mnt/system/usr/include where / is actually nfs mounted, /boot & /mnt/system are the two local partitions. Was able to mount & access from another box without issue.


Have had several systems running with nfs-utils-2.4.2-r1 (including one testing nfsdcld) this afternoon without issues.
Comment 33 Gordon Bos 2019-11-21 09:34:27 UTC
(In reply to Doug Nazar from comment #32)
> Well, the user level define is _FILE_OFFSET_BITS=64, which is converted to
> __USE_FILE_OFFSET64 by /usr/include/features.h and used in the struct stat
> definition in /usr/include/bits/stat.h. At least on glibc.
> 
> On my arm box after ebuild configure:
> grep _FILE_OFFSET_BITS support/include/config.h
> #define _FILE_OFFSET_BITS 64

Ah... the power of obscurity... the Force is great in this one
 
> For testing on my arm box (which normally doesn't export anything), I
> exported only /boot & /mnt/system/usr/include where / is actually nfs
> mounted, /boot & /mnt/system are the two local partitions. Was able to mount
> & access from another box without issue.
> 
> 
> Have had several systems running with nfs-utils-2.4.2-r1 (including one
> testing nfsdcld) this afternoon without issues.

Verified this morning. I had in fact already checked out nfs-utils-2.4.2 before but I rechecked on the dev machine just to be sure.
1) exporting and mounting /var/tmp no longer produces the `stale file handle` message
2) exporting and mounting /home/testuser which crosses a file system boundary still crashes the server side

Synced the tree and installed nfs-utils-2.4.2-r1. This appears to solve the crashes but I'm still a bit wary about it because I'm not real sure which of the three patches is actually responsible for that.
Comment 34 Doug Nazar 2019-11-21 17:22:40 UTC
(In reply to Gordon Bos from comment #33)
> Synced the tree and installed nfs-utils-2.4.2-r1. This appears to solve the
> crashes but I'm still a bit wary about it because I'm not real sure which of
> the three patches is actually responsible for that.

It's my patch, "Ensure consistent struct stat". See the urls I linked earlier for the details but basically when trying to detect if the export path is a root of a filesystem, it creates two 32bit struct stat's on the stack, and passes the address of each to the stat helper functions that expect a 64bit struct stat. This issue only shows up on 32bit.

Unfortunately it missed getting applied for 2.4.2, but is in Gentoo's ebuilds now for 2.4.1 & 2.4.2.

Doug
Comment 35 Gordon Bos 2019-11-21 21:06:40 UTC
(In reply to Doug Nazar from comment #34)
> (In reply to Gordon Bos from comment #33)
> > Synced the tree and installed nfs-utils-2.4.2-r1. This appears to solve the
> > crashes but I'm still a bit wary about it because I'm not real sure which of
> > the three patches is actually responsible for that.
> 
> It's my patch, "Ensure consistent struct stat". See the urls I linked
> earlier for the details but basically when trying to detect if the export
> path is a root of a filesystem, it creates two 32bit struct stat's on the
> stack, and passes the address of each to the stat helper functions that
> expect a 64bit struct stat. This issue only shows up on 32bit.

Yet it didn't work with the oneliner I applied based on your earlier comment. Main difference being that the oneliner placed the include of config.h directly in front of sys/stat.h where you put it all the way on top. Also I think my oneliner touched quite a lot more files than were included in your patch, so what is special about these particular files. Also, part of the crashes were already gone in 2.4.2 - prior to that I could not mount anything. In some crazy manner all the little pieces appear to fall together to create something that functions the way it should, but I can't really shake the feeling that it is all makeshift and could easily collapse again if someone even points at it.

Either way, glad to see a working version again in the main tree. One more notch on the way to my next release.
Comment 36 Thomas Deutschmann (RETIRED) gentoo-dev 2019-11-22 08:00:59 UTC
Gordon, could you please clarify if both

=net-fs/nfs-utils-2.4.1-r4
=net-fs/nfs-utils-2.4.2-r1

are working for you?

Thanks.
Comment 37 Alix 2019-11-23 07:16:47 UTC
nfs-utils-2.4.1-r4 works on 32 bit x86 server, shares are served fine
Comment 38 Gordon Bos 2019-11-25 09:46:17 UTC
(In reply to Thomas Deutschmann from comment #36)
> Gordon, could you please clarify if both
> 
> =net-fs/nfs-utils-2.4.1-r4
> =net-fs/nfs-utils-2.4.2-r1
> 
> are working for you?
> 
> Thanks.

Sorry, had something else to attend to. Both versions appear to do the job for various paths that either do or do not cross file system boundaries.

PS: I also rechecked the earlier versions and it appears the problem was identified incorrectly. Version 2.4.1 returned stale file handles on every export, 2.4.1-r3 did in fact fix that but caused rpc.mountd to crash when the export crossed a file system boundary (except with pseudo file system). When I moved the include of config.h to after errno.h I did stop the crashes, but I also reinstated the stale file handle messages. The problem thus appears to have been related more to errno.h than it was to stat.h.
Comment 39 Sven E. 2019-11-25 20:25:53 UTC
(In reply to Thomas Deutschmann from comment #36)
> Gordon, could you please clarify if both
> 
> =net-fs/nfs-utils-2.4.1-r4
> =net-fs/nfs-utils-2.4.2-r1
> 
> are working for you?
> 
> Thanks.

Both versions finally resolve the stat issue for me (~amd64).
Comment 40 Thomas Deutschmann (RETIRED) gentoo-dev 2019-11-26 10:58:16 UTC
*** Bug 701192 has been marked as a duplicate of this bug. ***
Comment 41 Uros 2019-11-30 01:57:44 UTC
Running sys-kernel/gentoo-sources-4.9.193 on amd64 server.

Upgrading net-fs/nfs-utils-2.3.3 to net-fs/nfs-utils-2.4.1-r4, "mount.nfs: Stale file handle" errors were reported on client side when mounting certain exports.

After upgrading net-fs/nfs-utils-2.3.3 to net-fs/nfs-utils-2.4.2-r1, all nfs v3 exports now work as expected on all clients.


Thanks.
Comment 42 Richard H. 2020-03-21 22:02:54 UTC
4.9.192-gentoo here, running on x86. Upgrading from 2.4.1-r2 to 2.4.1-r4 fixed all my problems as well with EINVAL which suddenly started occuring a few months before.