Created attachment 880155 [details] emerge --info openssh glibc About 1 year ago, after upgrading net-misc/openssh from 9.0_p1-r2 to 9.0_p1-r6, sshd no longer accepts connections. At the time I reverted the upgrade due to lack of time. I am trying it again now with net-misc/openssh-9.5_p1-r2 (current stable), and the issue still persists. Unfortunately I suspect this is due to the old kernel I am using. Running sshd in debug mode, I see this: > banana ~ # /usr/sbin/sshd -f /etc/ssh/sshd_config -dd > debug2: load_server_config: filename /etc/ssh/sshd_config > debug2: load_server_config: done config len = 6621 > debug2: parse_server_config_depth: config /etc/ssh/sshd_config len 6621 > debug1: sshd version OpenSSH_9.5, OpenSSL 1.1.1u 30 May 2023 > debug1: private host key #0: ssh-rsa SHA256:REDACTED > debug1: private host key #1: ssh-ed25519 SHA256:REDACTED > debug1: rexec_argv[0]='/usr/sbin/sshd' > debug1: rexec_argv[1]='-f' > debug1: rexec_argv[2]='/etc/ssh/sshd_config' > debug1: rexec_argv[3]='-dd' > debug1: Set /proc/self/oom_score_adj from 0 to -1000 > debug2: fd 4 setting O_NONBLOCK > debug1: Bind to port 666 on 0.0.0.0. > Server listening on 0.0.0.0 port 666. > debug2: fd 5 setting O_NONBLOCK > debug1: Bind to port 22 on 0.0.0.0. > Server listening on 0.0.0.0 port 22. > debug1: Server will not fork when running in debugging mode. > debug1: rexec start in 6 out 6 newsock 6 pipe -1 sock 9 > debug2: parse_server_config_depth: config rexec len 6621 > debug1: sshd version OpenSSH_9.5, OpenSSL 1.1.1u 30 May 2023 > debug1: private host key #0: ssh-rsa SHA256:+7lsCr9vVn5vW1t4ohBHy8vKRDXr4C6RC7MfKsYqNcE > debug1: private host key #1: ssh-ed25519 SHA256:rCwyTe/Bvo/cXA+q+6qonQRMsmwqRndEZF/xJNv90pk > debug1: inetd sockets after dupping: 5, 5 > sys_get_rdomain: cannot determine VRF for fd=5 : Protocol not available > Connection from 192.168.0.44 port 45032 on 192.168.1.253 port 666 > debug1: Local version string SSH-2.0-OpenSSH_9.5 > debug1: Remote protocol version 2.0, remote software version OpenSSH_9.4 > debug1: compat_banner: match: OpenSSH_9.4 pat OpenSSH* compat 0x04000000 > debug2: fd 5 setting O_NONBLOCK > debug2: Network child is on pid 14269 > debug1: permanently_set_uid: 22/22 [preauth] > debug1: ssh_sandbox_child: prctl(PR_SET_SECCOMP): Invalid argument [preauth] > debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519 [preauth] > Fatal glibc error: cannot get entropy for arc4random > debug1: do_cleanup > debug1: Killing privsep child 14269 It's worth noting that I am using a very old kernel (3.4.104-sunxi-g1df3de8e) because the system was unstable using a newer kernel on previous upgrade atempts. Also worth noting this is running on a Banana Pi M1 (armv7l).
I do not recommend using an EOL kernel, but if a newer kernel is causing stability issues for your system, you might want to review the kernel's configuration and add or remove options that are not needed (trial & error route). But if you want to try to fix the entropy issue, have a look at this thread "https://stackoverflow.com/questions/36990257/build-error-caused-by-missing-library-arc4random".
(In reply to sanomiad from comment #1) This isn't going to help as it's a runtime problem.
Thanks for your comments. I know this kernel is far from ideal but I can't test a new one again right now - it implies downtime and days of testing with physical access. This is indeed a runtime issue; the package compiles fine. I've tracked the error message down to stdlib/arc4random.c in glibc [1]: > static void > arc4random_getrandom_failure (void) > { > __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); > } which can be called for 7 different reasons (all 7 give the exact same message string, so I'm clueless). I've been trying to debug this by re-compiling glibc with different error messages and running sshd with it, using the testrun.sh script as described in [2] but with no success -- my custom error messages do not appear, just the old "Fatal glibc error: cannot get entropy for arc4random". I'll report back if I manage some progress. -- [1] https://github.com/bminor/glibc/blob/6d7e8ed/stdlib/arc4random.c#L30 [2] https://sourceware.org/glibc/wiki/Testing/Builds#Compile_normally.2C_run_under_new_glibc
@Nuno: here's a workaround I use on an old system, afflicted by the same problem. /etc/portage/profile/profile.bashrc: # ancient kernel = broken getrandom export ac_cv_func_getrandom=no export ac_cv_have_decl_getrandom=no # getentropy uses getrandom under the hood export ac_cv_func_getentropy=no export ac_cv_have_decl_getentropy=no This fixes openssh and >=dev-lang/ruby-3.1. for me. At compile time, configure checks for that the functions exist, and can be called, but don't check the return codes: Both functions return ENOSYS. Optionally it could handle it with a graceful degradation at runtime.
Hi Robin, thanks for the workaround. What kernel are you using? I tried applying it to my system but it seems to have done more harm than good :/ I've re-emerged openssh after adding in the /etc/portage/profile/profile.bashrc. Mon Apr 1 21:59:12 2024 <<< net-misc/openssh-9.0_p1-r2 Mon Apr 1 21:59:30 2024 >>> net-misc/openssh-9.6_p1-r3 Confirmed that I saw this in the build log (which I unfortunately didn't save): > checking for getentropy... (cached) no > checking for getrandom... (cached) no Then I ran sshd and it still failed: > # /usr/sbin/sshd -f /etc/ssh/sshd_config -dd > (...) > debug1: sshd version OpenSSH_9.6, OpenSSL 1.1.1u 30 May 2023 > (...) > Connection from xxxx port 38178 on xxxx port xxx > debug1: Local version string SSH-2.0-OpenSSH_9.6 > debug1: Remote protocol version 2.0, remote software version TrileadSSH2Java_213 > debug1: compat_banner: no match: TrileadSSH2Java_213 > debug2: fd 5 setting O_NONBLOCK > debug2: Network child is on pid 9411 > debug1: permanently_set_uid: 22/22 [preauth] > debug1: ssh_sandbox_child: prctl(PR_SET_SECCOMP): Invalid argument [preauth] > debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519 [preauth] > Fatal glibc error: cannot get entropy for arc4random > debug1: do_cleanup > debug1: Killing privsep child 9411 I then tried recompiling glibc as well: Mon Apr 1 23:28:22 2024 <<< sys-libs/glibc-2.37-r7 Mon Apr 1 23:28:29 2024 >>> sys-libs/glibc-2.38-r10 the post_inst step (I think) failed and I started seeing errors like this > Apr 1 23:28:19 banana kernel: [80270.884177] sandbox/6631: potentially unexpected fatal signal 6. > (...) > Apr 1 23:28:20 banana kernel: [80272.614517] bash/6639: potentially unexpected fatal signal 6. and portage failed. Unfortunately I didn't remember to save the output but here's an excerpt from elog: > >>> Messages generated by process 1842 on 2024-04-01 23:26:27 WEST for package sys-libs/glibc-2.38-r10: > > WARN: pretend > After upgrading glibc, please restart all running processes. > Be sure to include init (telinit u) or systemd (systemctl daemon-reexec). > Alternatively, reboot your system. > (See bug #660556, bug #741116, bug #823756, etc) > > >>> Messages generated by process 1842 on 2024-04-01 23:28:29 WEST for package sys-libs/glibc-2.37-r7: > > ERROR: prerm > ERROR: sys-libs/glibc-2.37-r7::gentoo failed (prerm phase): > error processing environment > > Call stack: > ebuild.sh, line 561: Called die > The specific snippet of code: > __preprocess_ebuild_env || die "error processing environment" > > If you need support, post the output of `emerge --info '=sys-libs/glibc-2.37-r7::gentoo'`, > the complete build log and the output of `emerge -pqv '=sys-libs/glibc-2.37-r7::gentoo'`. > The complete build log is located at '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/temp/build.log'. > The ebuild environment file is located at '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/temp/environment'. > Working directory: '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/empty' > S: '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/work/glibc-2.37' > ERROR: postrm > The ebuild phase 'postrm' has been killed by signal 6. > The 'postrm' phase of the 'sys-libs/glibc-2.37-r7' package has failed > with exit value 1. > > The problem occurred while executing the ebuild file named > 'glibc-2.37-r7.ebuild' located in the '/var/db/pkg/sys- > libs/glibc-2.37-r7' directory. If necessary, manually remove the > environment.bz2 file and/or the ebuild file located in that directory. > > Removal of the environment.bz2 file is preferred since it may allow the > removal phases to execute successfully. The ebuild will be sourced and > the eclasses from the current ebuild repository will be used when > necessary. Removal of the ebuild file will cause the pkg_prerm() and > pkg_postrm() removal phases to be skipped entirely. > > >>> Messages generated by process 1842 on 2024-04-01 23:28:29 WEST for package sys-libs/glibc-2.38-r10: > > ERROR: postinst > The ebuild phase 'postinst' has been killed by signal 6. > FAILED postinst: 1 > ERROR: other > The ebuild phase 'other' has been killed by signal 6. I noticed a few "Fatal glibc error: cannot get entropy for arc4random" in portage's output as well and now it seems I have a broken libc :/ I'm also unable to re-compile glibc as emerge fails, e.g. > * Package: sys-libs/glibc-2.38-r10:2.2 > * Repository: gentoo > * Maintainer: toolchain@gentoo.org > * USE: arm elibc_glibc kernel_linux multiarch nscd ssp static-libs > * FEATURES: compressdebug network-sandbox preserve-libs sandbox splitdebug userpriv usersandbox > * Checking whether python3_12 is suitable ... > * dev-lang/python:3.12 ... > [ !! ] > * Checking whether python3_11 is suitable ... > * dev-lang/python:3.11 ... > [ ok ] > * Using python3.11 to build (via PYTHON_COMPAT iteration) > Fatal glibc error: cannot get entropy for arc4random > Sandboxed process killed by signal: Aborted > * The ebuild phase 'unpack' has been killed by signal 6. > Fatal glibc error: cannot get entropy for arc4random > Sandboxed process killed by signal: Aborted > * The ebuild phase 'die_hooks' has been killed by signal 6. Is there any way I can restore glibc with a broken portage? Would manually extracting the contents of a binary package from another arm host of mine work? I should mention that my @world is a few months outdated.
Leaving an update: I still wasn't able to fix the original problem and I want to note that Robin Johnson's work around from comment #4 breaks app-shells/bash-5.1_p16-r11: > random.c:188:1: error: static declaration of ‘getrandom’ follows non-static declaration > In file included from random.c:26: > /usr/include/sys/random.h:34:9: note: previous declaration of ‘getrandom’ with type ‘ssize_t(void *, size_t, unsigned > int)’ {aka ‘int(void *, unsigned int, unsigned int)’} > 34 | ssize_t getrandom (void *__buffer, size_t __length, > | ^~~~~~~~~ > make[1]: *** [Makefile:78: random.o] Error 1 > make[1]: Leaving directory '/var/tmp/portage/app-shells/bash-5.1_p16-r11/work/bash-5.1/lib/sh' > make: *** [Makefile:692: lib/sh/libsh.a] Error 1 As a more definitive work around, I have configured and compiled a new mainline kernel using gentoo-sources (6.6.21) instead of 3.4 and while it obviously solves the arc4random issue, it's still unstable (hangs after 7-14 days, some times just 2 days). It's better than being unable to upgrade packages though :/ Thanks everyone for the input so far.