Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 920476 - sys-libs/glibc >=net-misc/openssh-9.0_p1-r6: Fatal glibc error: cannot get entropy for arc4random
Summary: sys-libs/glibc >=net-misc/openssh-9.0_p1-r6: Fatal glibc error: cannot get en...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-21 14:05 UTC by Nuno
Modified: 2024-06-20 21:11 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info openssh glibc (emerge.info.txt,8.33 KB, text/plain)
2023-12-21 14:05 UTC, Nuno
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nuno 2023-12-21 14:05:58 UTC
Created attachment 880155 [details]
emerge --info openssh glibc

About 1 year ago, after upgrading net-misc/openssh from 9.0_p1-r2 to 9.0_p1-r6, sshd no longer accepts connections. At the time I reverted the upgrade due to lack of time. I am trying it again now with net-misc/openssh-9.5_p1-r2 (current stable), and the issue still persists. 

Unfortunately I suspect this is due to the old kernel I am using.

Running sshd in debug mode, I see this:

> banana ~ # /usr/sbin/sshd -f /etc/ssh/sshd_config -dd
> debug2: load_server_config: filename /etc/ssh/sshd_config
> debug2: load_server_config: done config len = 6621
> debug2: parse_server_config_depth: config /etc/ssh/sshd_config len 6621
> debug1: sshd version OpenSSH_9.5, OpenSSL 1.1.1u  30 May 2023
> debug1: private host key #0: ssh-rsa SHA256:REDACTED
> debug1: private host key #1: ssh-ed25519 SHA256:REDACTED
> debug1: rexec_argv[0]='/usr/sbin/sshd'
> debug1: rexec_argv[1]='-f'
> debug1: rexec_argv[2]='/etc/ssh/sshd_config'
> debug1: rexec_argv[3]='-dd'
> debug1: Set /proc/self/oom_score_adj from 0 to -1000
> debug2: fd 4 setting O_NONBLOCK
> debug1: Bind to port 666 on 0.0.0.0.
> Server listening on 0.0.0.0 port 666.
> debug2: fd 5 setting O_NONBLOCK
> debug1: Bind to port 22 on 0.0.0.0.
> Server listening on 0.0.0.0 port 22.
> debug1: Server will not fork when running in debugging mode.
> debug1: rexec start in 6 out 6 newsock 6 pipe -1 sock 9
> debug2: parse_server_config_depth: config rexec len 6621
> debug1: sshd version OpenSSH_9.5, OpenSSL 1.1.1u  30 May 2023
> debug1: private host key #0: ssh-rsa SHA256:+7lsCr9vVn5vW1t4ohBHy8vKRDXr4C6RC7MfKsYqNcE
> debug1: private host key #1: ssh-ed25519 SHA256:rCwyTe/Bvo/cXA+q+6qonQRMsmwqRndEZF/xJNv90pk
> debug1: inetd sockets after dupping: 5, 5
> sys_get_rdomain: cannot determine VRF for fd=5 : Protocol not available
> Connection from 192.168.0.44 port 45032 on 192.168.1.253 port 666
> debug1: Local version string SSH-2.0-OpenSSH_9.5
> debug1: Remote protocol version 2.0, remote software version OpenSSH_9.4
> debug1: compat_banner: match: OpenSSH_9.4 pat OpenSSH* compat 0x04000000
> debug2: fd 5 setting O_NONBLOCK
> debug2: Network child is on pid 14269
> debug1: permanently_set_uid: 22/22 [preauth]
> debug1: ssh_sandbox_child: prctl(PR_SET_SECCOMP): Invalid argument [preauth]
> debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519 [preauth]
> Fatal glibc error: cannot get entropy for arc4random
> debug1: do_cleanup
> debug1: Killing privsep child 14269

It's worth noting that I am using a very old kernel (3.4.104-sunxi-g1df3de8e) because the system was unstable using a newer kernel on previous upgrade atempts.

Also worth noting this is running on a Banana Pi M1 (armv7l).
Comment 1 sanomiad 2023-12-22 03:56:41 UTC
I do not recommend using an EOL kernel, but if a newer kernel is causing stability issues for your system, you might want to review the kernel's configuration and add or remove options that are not needed (trial & error route).

But if you want to try to fix the entropy issue, have a look at this thread "https://stackoverflow.com/questions/36990257/build-error-caused-by-missing-library-arc4random".
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-12-22 05:39:56 UTC
(In reply to sanomiad from comment #1)

This isn't going to help as it's a runtime problem.
Comment 3 Nuno 2023-12-22 16:46:51 UTC
Thanks for your comments.

I know this kernel is far from ideal but I can't test a new one again right now - it implies downtime and days of testing with physical access.

This is indeed a runtime issue; the package compiles fine.

I've tracked the error message down to stdlib/arc4random.c in glibc [1]:

> static void
> arc4random_getrandom_failure (void)
> {
>   __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n");
> }

which can be called for 7 different reasons (all 7 give the exact same message string, so I'm clueless).

I've been trying to debug this by re-compiling glibc with different error messages and running sshd with it, using the testrun.sh script as described in [2] but with no success -- my custom error messages do not appear, just the old "Fatal glibc error: cannot get entropy for arc4random".

I'll report back if I manage some progress.

--

[1] https://github.com/bminor/glibc/blob/6d7e8ed/stdlib/arc4random.c#L30
[2] https://sourceware.org/glibc/wiki/Testing/Builds#Compile_normally.2C_run_under_new_glibc
Comment 4 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2023-12-29 06:07:21 UTC
@Nuno: here's a workaround I use on an old system, afflicted by the same problem.

/etc/portage/profile/profile.bashrc:
# ancient kernel = broken getrandom
export ac_cv_func_getrandom=no
export ac_cv_have_decl_getrandom=no
# getentropy uses getrandom under the hood
export ac_cv_func_getentropy=no
export ac_cv_have_decl_getentropy=no


This fixes openssh and >=dev-lang/ruby-3.1. for me.

At compile time, configure checks for that the functions exist, and can be called, but don't check the return codes: Both functions return ENOSYS.
Optionally it could handle it with a graceful degradation at runtime.
Comment 5 Nuno 2024-04-03 15:51:14 UTC
Hi Robin, thanks for the workaround. What kernel are you using?

I tried applying it to my system but it seems to have done more harm than good :/

I've re-emerged openssh after adding in the /etc/portage/profile/profile.bashrc.

     Mon Apr  1 21:59:12 2024 <<< net-misc/openssh-9.0_p1-r2
     Mon Apr  1 21:59:30 2024 >>> net-misc/openssh-9.6_p1-r3

Confirmed that I saw this in the build log (which I unfortunately didn't save):

> checking for getentropy... (cached) no
> checking for getrandom... (cached) no  

Then I ran sshd and it still failed:

> # /usr/sbin/sshd -f /etc/ssh/sshd_config -dd
> (...)
> debug1: sshd version OpenSSH_9.6, OpenSSL 1.1.1u  30 May 2023
> (...)
> Connection from xxxx port 38178 on xxxx port xxx
> debug1: Local version string SSH-2.0-OpenSSH_9.6
> debug1: Remote protocol version 2.0, remote software version TrileadSSH2Java_213
> debug1: compat_banner: no match: TrileadSSH2Java_213
> debug2: fd 5 setting O_NONBLOCK
> debug2: Network child is on pid 9411
> debug1: permanently_set_uid: 22/22 [preauth]
> debug1: ssh_sandbox_child: prctl(PR_SET_SECCOMP): Invalid argument [preauth]
> debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ssh-ed25519 [preauth]
> Fatal glibc error: cannot get entropy for arc4random
> debug1: do_cleanup
> debug1: Killing privsep child 9411


I then tried recompiling glibc as well:

     Mon Apr  1 23:28:22 2024 <<< sys-libs/glibc-2.37-r7
     Mon Apr  1 23:28:29 2024 >>> sys-libs/glibc-2.38-r10


the post_inst step (I think) failed and I started seeing errors like this

> Apr  1 23:28:19 banana kernel: [80270.884177] sandbox/6631: potentially unexpected fatal signal 6.
> (...)
> Apr  1 23:28:20 banana kernel: [80272.614517] bash/6639: potentially unexpected fatal signal 6.

and portage failed. Unfortunately I didn't remember to save the output but here's an excerpt from elog:

> >>> Messages generated by process 1842 on 2024-04-01 23:26:27 WEST for package sys-libs/glibc-2.38-r10:
> 
> WARN: pretend
> After upgrading glibc, please restart all running processes.
> Be sure to include init (telinit u) or systemd (systemctl daemon-reexec).
> Alternatively, reboot your system.
> (See bug #660556, bug #741116, bug #823756, etc)
> 
> >>> Messages generated by process 1842 on 2024-04-01 23:28:29 WEST for package sys-libs/glibc-2.37-r7:
> 
> ERROR: prerm
> ERROR: sys-libs/glibc-2.37-r7::gentoo failed (prerm phase):
>   error processing environment
> 
> Call stack:
>   ebuild.sh, line 561:  Called die
> The specific snippet of code:
>         __preprocess_ebuild_env || die "error processing environment"
> 
> If you need support, post the output of `emerge --info '=sys-libs/glibc-2.37-r7::gentoo'`,
> the complete build log and the output of `emerge -pqv '=sys-libs/glibc-2.37-r7::gentoo'`.
> The complete build log is located at '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/temp/build.log'.
> The ebuild environment file is located at '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/temp/environment'.
> Working directory: '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/empty'
> S: '/var/tmp/portage/._unmerge_/sys-libs/glibc-2.37-r7/work/glibc-2.37'
> ERROR: postrm
> The ebuild phase 'postrm' has been killed by signal 6.
> The 'postrm' phase of the 'sys-libs/glibc-2.37-r7' package has failed
> with exit value 1.
> 
> The problem occurred while executing the ebuild file named
> 'glibc-2.37-r7.ebuild' located in the '/var/db/pkg/sys-
> libs/glibc-2.37-r7' directory. If necessary, manually remove the
> environment.bz2 file and/or the ebuild file located in that directory.
> 
> Removal of the environment.bz2 file is preferred since it may allow the
> removal phases to execute successfully. The ebuild will be sourced and
> the eclasses from the current ebuild repository will be used when
> necessary. Removal of the ebuild file will cause the pkg_prerm() and
> pkg_postrm() removal phases to be skipped entirely.
> 
> >>> Messages generated by process 1842 on 2024-04-01 23:28:29 WEST for package sys-libs/glibc-2.38-r10:
> 
> ERROR: postinst
> The ebuild phase 'postinst' has been killed by signal 6.
> FAILED postinst: 1
> ERROR: other
> The ebuild phase 'other' has been killed by signal 6.


I noticed a few "Fatal glibc error: cannot get entropy for arc4random" in portage's output as well and now it seems I have a broken libc :/

I'm also unable to re-compile glibc as emerge fails, e.g.

>  * Package:    sys-libs/glibc-2.38-r10:2.2
>  * Repository: gentoo
>  * Maintainer: toolchain@gentoo.org
>  * USE:        arm elibc_glibc kernel_linux multiarch nscd ssp static-libs
>  * FEATURES:   compressdebug network-sandbox preserve-libs sandbox splitdebug userpriv usersandbox
>  * Checking whether python3_12 is suitable ...
>  *   dev-lang/python:3.12 ...
>  [ !! ]
>  * Checking whether python3_11 is suitable ...
>  *   dev-lang/python:3.11 ...
>  [ ok ]
>  * Using python3.11 to build (via PYTHON_COMPAT iteration)
> Fatal glibc error: cannot get entropy for arc4random
> Sandboxed process killed by signal: Aborted
>  * The ebuild phase 'unpack' has been killed by signal 6.
> Fatal glibc error: cannot get entropy for arc4random
> Sandboxed process killed by signal: Aborted
>  * The ebuild phase 'die_hooks' has been killed by signal 6.


Is there any way I can restore glibc with a broken portage?
Would manually extracting the contents of a binary package from another arm host of mine work?

I should mention that my @world is a few months outdated.
Comment 6 Nuno 2024-06-20 21:11:11 UTC
Leaving an update:

I still wasn't able to fix the original problem and I want to note that Robin Johnson's work around from comment #4 breaks app-shells/bash-5.1_p16-r11:

> random.c:188:1: error: static declaration of ‘getrandom’ follows non-static declaration
> In file included from random.c:26:
> /usr/include/sys/random.h:34:9: note: previous declaration of ‘getrandom’ with type ‘ssize_t(void *, size_t,  unsigned 
> int)’ {aka ‘int(void *, unsigned int,  unsigned int)’}
>    34 | ssize_t getrandom (void *__buffer, size_t __length,
>       |         ^~~~~~~~~
> make[1]: *** [Makefile:78: random.o] Error 1
> make[1]: Leaving directory '/var/tmp/portage/app-shells/bash-5.1_p16-r11/work/bash-5.1/lib/sh'
> make: *** [Makefile:692: lib/sh/libsh.a] Error 1


As a more definitive work around, I have configured and compiled a new mainline kernel using gentoo-sources (6.6.21) instead of 3.4 and while it obviously solves the arc4random issue, it's still unstable (hangs after 7-14 days, some times just 2 days).
It's better than being unable to upgrade packages though :/

Thanks everyone for the input so far.