I'm currently observing a reproducible warning with Dovecot-2.3 (release and -git) that seems to be caused by something changing in glibc-2.28. The error is seen on startup, and also on demand when the command 'dovecot user '*'' is executed. In both these cases Dovecot is attempting to get a list of valid users from the system passwd file (there are less than 10 of these). I am not using nss or any external authentication mechanism, only local system accounts. Here's the Dovecot error as seen at startup: Sep 24 23:07:55 thunderstorm.reub.net dovecot[26218]: master: Dovecot v2.3.devel (110d5f086) starting up for imap, lmtp, sieve, submission, sieve Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth: Debug: Loading modules from directory: /usr/lib64/dovecot/auth Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth: Debug: Module loaded: /usr/lib64/dovecot/auth/lib20_auth_var_expand_crypt.so Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth: Debug: Read auth token secret from /run/dovecot/auth-token-secret.dat Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth: Debug: passwd-file /etc/dovecot/passwd.extra: Read 5 users in 0 secs Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth: Debug: master in: LIST 1 user=* service=replicator Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth-worker(26232): Debug: Loading modules from directory: /usr/lib64/dovecot/auth Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth-worker(26232): Debug: Module loaded: /usr/lib64/dovecot/auth/lib20_auth_var_expand_crypt.so Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth-worker(26232): Debug: passwd-file /etc/dovecot/passwd.extra: Read 5 users in 0 secs ====>> Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: auth-worker(26232): Error: getpwent() failed: Invalid argument <<======== Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: replicator: Error: User listing returned failure Sep 24 23:07:55 thunderstorm.reub.net dovecot[26223]: replicator: Error: listing users failed, can't replicate existing data Sep 24 23:08:01 thunderstorm.reub.net dovecot[26223]: auth: Debug: auth client connected (pid=26260) [Note that the error occurs regardless of the passwd.extra file being defined, which only exists to add a specific setting to one user's account] This error is _not_ seen with glibc-2.27. I have been able to go backwards and forwards between version 2.27 where this error is not seen, and upgrade to 2.28 (changing nothing else on the system) where I am always seeing the problem logged. This leads me to believe it is a glibc-2.28 problem. I can look up individual users with dovecot and their information is returned, it seems it is only the listing of valid users that is the issue. thunderstorm ~ # doveadm user 'reuben' field value uid 1000 gid 1000 home /home/reuben mail maildir:~/Maildir system_groups_user reuben thunderstorm ~ # doveadm user '*' reuben <names removed> Error: User listing returned failure Fatal: user listing failed thunderstorm ~ # The code in dovecot triggering this is: static void passwd_iterate_next(struct userdb_iterate_context *_ctx) { struct passwd_userdb_iterate_context *ctx = (struct passwd_userdb_iterate_context *)_ctx; const struct auth_settings *set = _ctx->auth_request->set; struct passwd *pw; if (cur_userdb_iter != NULL && cur_userdb_iter != ctx) { /* we can't support concurrent userdb iteration. wait until the previous one is done */ ctx->next_waiting = cur_userdb_iter->next_waiting; cur_userdb_iter->next_waiting = ctx; return; } errno = 0; while ((pw = getpwent()) != NULL) { if (passwd_iterate_want_pw(pw, set)) { _ctx->callback(pw->pw_name, _ctx->context); return; } } if (errno != 0) { i_error("getpwent() failed: %m"); _ctx->failed = TRUE; } _ctx->callback(NULL, _ctx->context); } This is on a Gentoo x86_64 system with glibc-2.28 straight from a very up to date portage. I have two other systems which independently are seeing this same issue too. I have taken this to the Dovecot users mailing list and suggested by one of the developers there that this might be a glibc problem. See https://www.dovecot.org/pipermail/dovecot/2018-September/112960.html for additional information, including an strace output. glibc build options: [ebuild R ] sys-libs/glibc-2.27-r6:2.2::gentoo USE="gd multiarch -audit -caps (-compile-locales) -doc (-hardened) -headers-only (-multilib) -nscd -profile (-selinux) -suid -systemtap (-vanilla)" 0 KiB [ebuild U *] sys-libs/glibc-2.28:2.2::gentoo [2.27-r6:2.2::gentoo] USE="gd multiarch -audit -caps (-cet) (-compile-locales) -doc (-hardened) -headers-only (-multilib) -nscd -profile (-selinux) -suid -systemtap {-test%} (-vanilla)" 16,113 KiB thunderstorm ~ # emerge --info Portage 2.3.50 (python 3.6.6-final-0, default/linux/amd64/17.1/no-multilib, gcc-8.2.0, glibc-2.28, 4.18.9-gentoo x86_64) ================================================================= System uname: Linux-4.18.9-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2680_v2_@_2.80GHz-with-gentoo-2.6 KiB Mem: 8181436 total, 383416 free KiB Swap: 2096104 total, 2096104 free Head commit of repository gentoo: 8e8cc8d0d4f85f5bab6cff28335c1e416bc6c48a sh bash 4.4_p23 ld GNU ld (Gentoo 2.31.1 p3) 2.31.1 app-shells/bash: 4.4_p23::gentoo dev-lang/perl: 5.26.2::gentoo dev-lang/python: 2.7.15::gentoo, 3.6.6::gentoo, 3.7.0::gentoo dev-util/cmake: 3.12.2::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.6-r1::gentoo sys-apps/openrc: 0.38.2::gentoo sys-apps/sandbox: 2.13::gentoo sys-devel/autoconf: 2.69-r4::gentoo sys-devel/automake: 1.13.4-r2::gentoo, 1.16.1-r1::gentoo sys-devel/binutils: 2.31.1-r1::gentoo sys-devel/gcc: 8.2.0-r3::gentoo sys-devel/gcc-config: 2.0::gentoo sys-devel/libtool: 2.4.6-r5::gentoo sys-devel/make: 4.2.1-r4::gentoo sys-kernel/linux-headers: 4.17::gentoo (virtual/os-headers) sys-libs/glibc: 2.28::gentoo Repositories: gentoo location: /usr/portage sync-type: git sync-uri: git://anongit.gentoo.org/repo/gentoo.git priority: -1000 reub-Local-Overlay location: /usr/local/portage masters: gentoo ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe -march=native -mtune=native" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/bind /var/rancid/.cloginrc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.2/ext-active/ /etc/php/cgi-php7.2/ext-active/ /etc/php/cli-php7.2/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo" CXXFLAGS="-O2 -pipe -march=native -mtune=native" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--autounmask=n --quiet-build=n --with-bdeps=y" ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync metadata-transfer multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://mirror.ipv6.internode.on.net/pub/gentoo http://distfiles.gentoo.org" LANG="en_AU.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" MAKEOPTS="-j4" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/home/portage/" USE="acl amd64 apache2 bash-completion berkdb bzip2 cairo cgi cli crypt curl cxx dri fortran gd gdbm geoip gif gmp hardened iconv ipv6 jpeg libressl libtirpc logrotate modules mysql mysqli ncurses nls nptl openmp pam pcre php png readline samba savedconfig snmp spell sqlite3 ssl tcpd threads tiff truetype vhosts xattr xinetd zip zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias http2 remoteip" CALLIGRA_FEATURES="karbon plan sheets stage words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx f16c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="pc efi-64" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres9_5 postgres10" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6 python3_7" RUBY_TARGETS="ruby23" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS thunderstorm ~ #
Out of curiosity if you switch > passwd: compat files > shadow: compat files > group: compat files to > passwd: files > shadow: files > group: files would it make warning go away? I tried to install dovecot and run the command on a fresh system: # dovecot user '*' Error: userdb lookup: connect(/run/dovecot/auth-userdb) failed: No such file or directory Fatal: user listing failed # systemctl start dovecot # dovecot user '*' Error: userdb lookup: Disconnected unexpectedly Do you have simpler reproducer or hints on how to get minimal reproducer?
I've now changed /etc/nsswitch.conf to: thunderstorm ~ # cat /etc/nsswitch.conf # /etc/nsswitch.conf: # $Header: /var/cvsroot/gentoo/src/patchsets/glibc/extra/etc/nsswitch.conf,v 1.2 2017/08/12 16:21:44 slyfox Exp $ passwd: files shadow: files group: files hosts: files dns networks: files dns services: db files protocols: db files rpc: db files ethers: db files netmasks: files netgroup: files bootparams: files automount: files aliases: files thunderstorm ~ # No change to the behaviour. doveadm user '*' still returns usernames but terminates with a fatal error and the same getpwent error logged. As for replicating with Dovecot, the most important config you need to have is this in your auth-system.conf.ext file: userdb { driver = passwd result_success = continue-ok } The include for this file should be uncommented out in 10-auth.conf When Dovecot is up and running and has the userdb configured you should see this: thunderstorm ~ # ls -la /run/dovecot/auth-userdb srwxrwxrwx 1 dovecot root 0 Sep 26 23:36 /run/dovecot/auth-userdb thunderstorm ~ # At that point it should be possible for you to repro the problem with the doveadm user command and also see the same error on startup.
I'm not sure I see the same problem but I observe crash when running /usr/libexec/dovecot/auth as-is: $ /usr/libexec/dovecot/auth Segmentation fault $ gdb --quiet /usr/libexec/dovecot/auth Reading symbols from /usr/libexec/dovecot/auth...Reading symbols from /usr/lib64/debug//usr/libexec/dovecot/auth.debug...done. done. (gdb) run Starting program: /usr/libexec/dovecot/auth Program received signal SIGSEGV, Segmentation fault. __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31 31 movdqu (%rdi), %xmm1 (gdb) bt #0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:31 #1 0x000055555558b368 in password_scheme_register_crypt () at password-scheme-crypt.c:190 #2 0x000055555558aefc in password_schemes_init () at password-scheme.c:874 #3 0x00005555555679d6 in main_preinit () at main.c:185 #4 main (argc=<optimized out>, argv=<optimized out>) at main.c:392 The bug is in dovecot's use of 'crypt()' function via implicit function declaration: mycrypt.c: In function «mycrypt»: mycrypt.c:22:9: warning: implicit declaration of function «crypt»; did you mean «mycrypt»? [-Wimplicit-function-declaration] return crypt(key, salt); ^~~~~ Something as simple as: --- a/src/auth/mycrypt.c +++ b/src/auth/mycrypt.c @@ -4,6 +4,7 @@ # include "config.h" #endif +#define _DEFAULT_SOURCE #define _XOPEN_SOURCE 4 #define _XOPEN_SOURCE_EXTENDED 1 /* 1 needed for AIX */ #ifndef _AIX makes the binary start. Can you try the patch and see if helps you? Drop it somewhere in /etc/portage/patches/net-mail/dovecot
Created attachment 548096 [details, diff] dovecot-2.3.2.1-crypt-decl.patch
*** This bug has been marked as a duplicate of bug 666202 ***
Firstly this bug is -not- a duplicate of 666202. The issue Sergei is seeing is indeed the same as 666202, but that's not the same issue I am reporting. The testing I am doing and behaviour I am observing is with that patch from 666202 already applied (I reported that bug in Dovecot). (Sergei - apply that patch, and then see if you can repro the behaviour that I have reported. Hopefully you will see it..) I've also applied the additional patch from Comment #4 but it made no difference. I'm still seeing the same behaviour, ie the same getpwent() error.
Yes, I can see initial error now. I'll keep digging.
Simpler reproducer: // $ cat a.c #include <sys/types.h> #include <pwd.h> #include <unistd.h> #include <stdio.h> #include <errno.h> int main() { struct passwd * pw; for (;;) { errno = 0; pw = getpwent (); if (pw == NULL) break; if (errno != 0) break; } if (errno != 0) { printf("fail: %m (errno=%u)\n", errno); } } $ gcc a.c -o a && ./a fail: Invalid argument (errno=22)
With the following command: gcc a.c -o a && ./elf/ld.so --inhibit-cache --library-path .:nss ./a bisected glibc down to this change: 916124ed841745b7a1e0fbc43f9909340b47d373 is the first bad commit commit 916124ed841745b7a1e0fbc43f9909340b47d373 Author: Florian Weimer <fweimer@redhat.com> Date: Fri Jul 6 14:23:15 2018 +0200 nss_files: Fix re-reading of long lines [BZ #18991] Use the new __libc_readline_unlocked function to pick up reading at the same line in case the buffer needs to be enlarged. Sounds relevant.
Filed upstream bug: https://sourceware.org/PR16004
(In reply to Sergei Trofimovich from comment #10) > Filed upstream bug: > https://sourceware.org/PR16004 Upstream helped debugging the behaviour. Andreas found the flaw in client code: "You are checking errno when getpwent didn't fail. That doesn't produce a defined value." Which makes sense. While at it I would also suggest take into account the fact that '_ctx->callback' could also clobber errno arbitrarily. errno = 0; while ((pw = getpwent()) != NULL) { if (passwd_iterate_want_pw(pw, set)) { _ctx->callback(pw->pw_name, _ctx->context); return; } } if (errno != 0) { i_error("getpwent() failed: %m"); _ctx->failed = TRUE; } Reassigning to dovecot maintainers.
Created attachment 549406 [details, diff] dovecot-2.3.2.1-fix-errno.patch
Attached the patch to demonstrate the idea.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=38b1b5ecb1cc03000e3e9db61cb6537a7b252106 commit 38b1b5ecb1cc03000e3e9db61cb6537a7b252106 Author: Eray Aslan <eras@gentoo.org> AuthorDate: 2018-10-05 11:03:52 +0000 Commit: Eray Aslan <eras@gentoo.org> CommitDate: 2018-10-05 11:03:52 +0000 net-mail/dovecot: bump to 2.3.3 Closes: https://bugs.gentoo.org/666202 Closes: https://bugs.gentoo.org/667118 Closes: https://bugs.gentoo.org/664988 Signed-off-by: Eray Aslan <eras@gentoo.org> Package-Manager: Portage-2.3.50, Repoman-2.3.11 net-mail/dovecot/Manifest | 2 + net-mail/dovecot/dovecot-2.3.3.ebuild | 291 ++++++++++++++++++++++++++ net-mail/dovecot/files/dovecot-glibc228.patch | 44 ++++ net-mail/dovecot/files/dovecot.init-r5 | 57 +++++ 4 files changed, 394 insertions(+)
I have in the last few hours tested this proposed patch, and it does fix the issue. The error is no longer logged. Tested on 3 systems. However the linking of this bug to 666202 has (once again) caused this report to be closed prematurely, when it is an unrelated issue. I guess we should keep this bug report open until the patch is either integrated upstream or into the dovecot ebuild (neither of which is the case at this point in time). Thanks for all your work in isolating this one Sergei.
Proposed change upstream as: https://github.com/dovecot/core/pull/92
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=5261930a4d34c3cd4fdf31fc319e423958a875cc commit 5261930a4d34c3cd4fdf31fc319e423958a875cc Author: Eray Aslan <eras@gentoo.org> AuthorDate: 2018-10-11 10:56:52 +0000 Commit: Eray Aslan <eras@gentoo.org> CommitDate: 2018-10-11 10:56:52 +0000 net-mail/dovecot: fix userdb-passwd errno In https://bugs.gentoo.org/667118 Reuben Farrelly noticed that running # doveadm user '*' causes auth daemon to generate errors like: auth-worker(3585): Error: getpwent() failed: Invalid argument This happens because on successful call getpwent() now sets errno=EINVAL starting from glibc-2.28. See https://sourceware.org/PR16004 for details. The fix is to check 'errno' only when 'getpwent()' fails. Reported-by: Reuben Farrelly Bug: https://bugs.gentoo.org/667118 Bug: https://sourceware.org/PR16004 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> Signed-off-by: Eray Aslan <eras@gentoo.org> Package-Manager: Portage-2.3.51, Repoman-2.3.11 net-mail/dovecot/dovecot-2.3.3-r1.ebuild | 294 +++++++++++++++++++++ .../dovecot/files/dovecot-userdb-passwd-fix.patch | 18 ++ 2 files changed, 312 insertions(+)
Closing. Thank you both