Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 924309 - net-fs/nfs-utils nfs fails to stop when running 6.6 kernels
Summary: net-fs/nfs-utils nfs fails to stop when running 6.6 kernels
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords:
: 920816 924178 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-02-11 20:31 UTC by Vjaceslavs Klimovs
Modified: 2024-02-17 22:27 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Emerge info (tom-dexter-emerge-info.txt,6.30 KB, text/plain)
2024-02-17 15:45 UTC, Tom Dexter
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vjaceslavs Klimovs 2024-02-11 20:31:58 UTC
After upgrading 6.1.45 to 6.6.16 nfs fails to stop:

gearbox ~ # /etc/init.d/nfs stop
 * Stopping NFS mountd ...                                                                                                                                                                               [ ok ]
 * Stopping NFS daemon ...
 * start-stop-daemon: 8 process(es) refused to stop                                                                                                                                                      [ !! ]
 * Unexporting NFS directories ...                                                                                                                                                                       [ ok ]
 * ERROR: nfs failed to stop
gearbox ~ # 

same result on net-fs/nfs-utils 2.6.3-r2, 2.6.4-r1, 2.6.4-r3. Reverting to 6.1 fixes the issue.



Reproducible: Always

Steps to Reproduce:
1.Upgrade to 6.6 kernel
2. Attempt to stop nfsd

Actual Results:  
nfs does not stop

Expected Results:  
nfs stops without issue

This could be related to https://bugs.gentoo.org/924178 or https://bugs.gentoo.org/920816.

emerge info:
Portage 3.0.61 (python 3.11.7-final-0, default/linux/amd64/17.1/no-multilib/hardened, gcc-13, glibc-2.38-r10, 6.6.16-gentoo x86_64)
=================================================================
System uname: Linux-6.6.16-gentoo-x86_64-AMD_EPYC_7302P_16-Core_Processor-with-glibc2.38
KiB Mem:   131771996 total,  76969368 free
KiB Swap:  134217720 total, 134217720 free
Timestamp of repository gentoo: Thu, 08 Feb 2024 22:30:01 +0000
Head commit of repository gentoo: b5c382cf0f0e8287631615bb00463bb99008f04d
sh bash 5.1_p16-r6
ld GNU ld (Gentoo 2.41 p4) 2.41.0
app-misc/pax-utils:        1.3.7::gentoo
app-shells/bash:           5.1_p16-r6::gentoo
dev-build/autoconf:        2.71-r6::gentoo
dev-build/automake:        1.16.5-r2::gentoo
dev-build/cmake:           3.27.9::gentoo
dev-build/libtool:         2.4.7-r1::gentoo
dev-build/make:            4.4.1-r1::gentoo
dev-build/meson:           1.3.0-r2::gentoo
dev-lang/perl:             5.38.2-r1::gentoo
dev-lang/python:           3.11.7::gentoo, 3.12.1_p1::gentoo
sys-apps/baselayout:       2.14-r2::gentoo
sys-apps/openrc:           0.53::gentoo
sys-apps/sandbox:          2.38::gentoo
sys-devel/binutils:        2.41-r3::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/gcc:             13.2.1_p20240113-r1::gentoo
sys-devel/gcc-config:      2.11::gentoo
sys-kernel/linux-headers:  6.6::gentoo (virtual/os-headers)
sys-libs/glibc:            2.38-r10::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://pulley.lan/gentoo-portage
    priority: -1000
    volatile: False
    sync-rsync-extra-opts: 
    sync-rsync-verify-max-age: 3
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-metamanifest: no

vklimovs
    location: /usr/local/portage/vklimovs
    masters: gentoo
    volatile: True

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /etc/stunnel/stunnel.conf /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="https://pulley.lan:8080"
INSTALL_MASK="*.la /usr/share/qemu/firmware/*"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
LINGUAS="en"
MAKEOPTS="-j33"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="acl amd64 bzip2 caps cet cli crypt dri fortran gdbm hardened iconv ipv6 kerberos ldap libtirpc modules-sign ncurses nls openmp pam pcre pic pie readline sasl seccomp split-usr ssl ssp test-rust unicode verify-sig vhosts xattr xtpax zlib" ABI_X86="64" ADA_TARGET="gnat_2021" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="cpu interface ipmi memory network sensors smart syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 ntrip navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" NGINX_MODULES_HTTP="access auth_request autoindex brotli fastcgi gzip proxy rewrite uwsgi" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-1" POSTGRES_TARGETS="postgres15" PYTHON_SINGLE_TARGET="python3_11" PYTHON_TARGETS="python3_11" RUBY_TARGETS="ruby31" UWSGI_PLUGINS="syslog" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, LC_ALL, LD, LFLAGS, LIBTOOL, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-14 22:40:35 UTC
Could you try strace the stuck nfsd process and also get a gdb backtrace from it? (Attach gdb with gdb -p, then ^C, then bt)?
Comment 2 Tom Dexter 2024-02-16 14:08:49 UTC
Just to add to this one: What he's reporting is exactly the same as what I reported in bug 924178, which for reasons I don't really understand, was resolved as a duplicate of bug 916947. The only notable difference is that, as I stated in my original bug, is that I'm running an old version of openrc (0.17).
Comment 3 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-16 18:11:42 UTC
I already explained at https://forums.gentoo.org/viewtopic-p-8816463.html#8816463. There's no interest in debugging old OpenRC versions where they've changed relevant code.
Comment 4 Eli Schwartz 2024-02-16 18:24:29 UTC
Tom,

Can you explain why you are using an ancient and broken version of openrc that is guaranteed to not work?

Because if you can't reproduce the issue with current versions of openrc then your system is broken and nfs-utils is not.

As such, you're distracting and confusing the very real attempts to debug a real problem by the issue reporter for this bug report, which is totally unrelated to your issue.
Comment 5 Mike Gilbert gentoo-dev 2024-02-16 18:51:07 UTC
So the nfsd init script does this in stop():

        # nfsd sets its process name to [nfsd] so don't look for $nfsd
        ebegin "Stopping NFS daemon"
        start-stop-daemon --stop --name nfsd --user root --signal 2
        eend $?
        ret=$((ret + $?))
        # in case things don't work out ... #228127
        rpc.nfsd 0

I think that start-stop-daemon call is sending SIGINT to kernel nfsd threads.

Since this Linux commit, we cannot signal nfsd kernel threads directly:

https://github.com/torvalds/linux/commit/3903902401451b1cd9d797a8c79769eb26ac7fe5

I think we should just update the init script to stop sending SIGINT via start-stop-daemon and just jump directly to calling rpc.nfsd 0.
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-16 19:32:50 UTC
(In reply to Mike Gilbert from comment #5)

Sounds good.
Comment 7 Larry the Git Cow gentoo-dev 2024-02-16 19:33:06 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7aa183ae8073593cab6d3f012a981a6e6712ffc2

commit 7aa183ae8073593cab6d3f012a981a6e6712ffc2
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2024-02-16 19:23:48 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2024-02-16 19:32:49 +0000

    net-fs/nfs-utils: stop sending signals to kernel nfsd threads
    
    Closes: https://bugs.gentoo.org/924309
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>

 net-fs/nfs-utils/files/nfs.initd                               | 10 ++++------
 .../{nfs-utils-2.6.3-r2.ebuild => nfs-utils-2.6.3-r3.ebuild}   |  2 +-
 .../{nfs-utils-2.6.4-r3.ebuild => nfs-utils-2.6.4-r10.ebuild}  |  0
 .../{nfs-utils-2.6.4-r1.ebuild => nfs-utils-2.6.4-r4.ebuild}   |  0
 4 files changed, 5 insertions(+), 7 deletions(-)
Comment 8 Mike Gilbert gentoo-dev 2024-02-16 19:46:08 UTC
*** Bug 924178 has been marked as a duplicate of this bug. ***
Comment 9 Mike Gilbert gentoo-dev 2024-02-16 19:48:27 UTC
*** Bug 920816 has been marked as a duplicate of this bug. ***
Comment 10 Tom Dexter 2024-02-17 14:51:37 UTC
(In reply to Eli Schwartz from comment #4)
> Tom,
> 
> Can you explain why you are using an ancient and broken version of openrc
> that is guaranteed to not work?
> 
> Because if you can't reproduce the issue with current versions of openrc
> then your system is broken and nfs-utils is not.
> 
> As such, you're distracting and confusing the very real attempts to debug a
> real problem by the issue reporter for this bug report, which is totally
> unrelated to your issue.

Well...That's sort of moot, as I actually just updatedopenrc and rebooted:

equery list openrc
 * Searching for openrc ...
[IP-] [  ] sys-apps/openrc-0.53:0

...and still have the issue:

/etc/init.d/nfs stop
 * Stopping NFS mountd ...
 * start-stop-daemon: no matching processes found                                                                                                                   [ ok ]
 * Stopping NFS daemon ...
 * start-stop-daemon: no matching processes found                                                                                                                   [ ok ]
 * Unexporting NFS directories ...               

Also note that with both the service start and the failed stop I notice that the init script continues to run afterwards for as much as a minute or more:

ps auxw| grep nfs
root       117  0.0  0.0      0     0 ?        I<   09:43   0:00 [kworker/R-nfsio]
root      3239  0.0  0.0   7912  2360 pts/4    S    09:51   0:00 /bin/sh /lib/rc/sh/openrc-run.sh /etc/init.d/nfs stop
root      3243  0.0  0.0   6332  2176 pts/4    S+   09:51   0:00 grep --colour=auto nfs

Tom
Comment 11 Tom Dexter 2024-02-17 15:45:52 UTC
Created attachment 885225 [details]
Emerge info
Comment 12 Tom Dexter 2024-02-17 15:47:26 UTC
So I just updated to that new net-fs/nfs-utils-2.6.4-r10 from my overlay:

equery list nfs-utils
 * Searching for nfs-utils ...
[I-O] [  ] net-fs/nfs-utils-2.6.4-r10:0

However the service stiff fails to start:

/etc/init.d/nfs stop
 * Stopping NFS mountd ...                                                                                                                                          [ ok ]
 * Stopping NFS daemon ...
 * start-stop-daemon: 8 process(es) refused to stop                                                                                                                 [ !! ]
 * Unexporting NFS directories ...                                                                                                                                  [ ok ]
 * ERROR: nfs failed to stop

As you see I've posted an attachment with my emerge --info.

Tom
Comment 13 Tom Dexter 2024-02-17 16:01:33 UTC
Never mind. I clearly did that upgrade all wrong as all I got was the ebuild. I will try that correctly.

Question...and I've run into this before:

If I go to that commit:

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7aa183ae8073593cab6d3f012a981a6e6712ffc2

How to I get a raw unified diff? I want to get those patches and for the life of me I can't figure out how.
Comment 14 Tom Dexter 2024-02-17 16:30:22 UTC
Wow...OK. I was able to apply that commit patch to my existing nfs-utils-2.6.4-r3.ebuild and the service stop is still failing:

/etc/init.d/nfs stop
 * Stopping NFS mountd ...                                                                                                                                          [ ok ]
 * Stopping NFS daemon ...
 * start-stop-daemon: 8 process(es) refused to stop                                                                                                                 [ !! ]
 * Unexporting NFS directories ...                                                                                                                                  [ ok ]
 * ERROR: nfs failed to stop

Tom
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-17 16:31:40 UTC
Is there a reason to not just emerge --sync and update normally?

Anyway, please check what the contents of the init script are.
Comment 16 Tom Dexter 2024-02-17 17:04:28 UTC
OK...Sorry for all the news. I finally got that init script to patch correctly, and the stop DOES work now.

Thanks!!...and sorry for the confusion.

Tom
Comment 17 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-17 17:06:11 UTC
Excellent! Thanks!
Comment 18 Vjaceslavs Klimovs 2024-02-17 22:27:03 UTC
Thank you!