Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 807832 - dev-java/openjdk-jre-bin-8.292_p10 : java -version hangs sometimes within an bubblewrapped image
Summary: dev-java/openjdk-jre-bin-8.292_p10 : java -version hangs sometimes within an ...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Georgy Yakovlev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: glibc-2.34
  Show dependency tree
 
Reported: 2021-08-12 08:29 UTC by Toralf Förster
Modified: 2021-10-16 01:45 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
strace.out (strace.out,209.89 KB, text/plain)
2021-08-17 18:59 UTC, Toralf Förster
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Toralf Förster gentoo-dev 2021-08-12 08:29:46 UTC
At the tinderbox I'm faced sometimes with a hang of "java -version" - which is not reliable but happened from time to time at an image.

The picture is, that "java -vession" runs fine for 4-5 times, but hangs in about 10-20% of all cases.

An strace shows that the hang occurres here :

...
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\321\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=878680, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 880896, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8767952000
mprotect(0x7f876795f000, 823296, PROT_NONE) = 0
mmap(0x7f876795f000, 454656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7f876795f000
mmap(0x7f87679ce000, 364544, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7c000) = 0x7f87679ce000
mmap(0x7f8767a28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd5000) = 0x7f8767a28000
close(3)                                = 0
mprotect(0x7f8767a28000, 4096, PROT_READ) = 0
mprotect(0x7f876897e000, 618496, PROT_READ) = 0
getpid()                                = 14236
munmap(0x7f8768e8f000, 194193)          = 0
getpid()                                = 14236
rt_sigaction(SIGRT_1, {sa_handler=0x7f8768afa100, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f8768ab1060}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
mmap(NULL, 1052672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8767851000
mprotect(0x7f8767852000, 1048576, PROT_READ|PROT_WRITE) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f8767951910, parent_tid=0x7f8767951910, exit_signal=0, stack=0x7f8767851000, stack_size=0xfff00, tls=0x7f8767951640} => {parent_tid=[14239]}, 88) = 14239
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8767951910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14239, NULL, FUTEX_BITSET_MATCH_ANY    


The current affected image is: ~/img/17.1_developer-j3-20210809-220514

The script to "chroot" into the image is https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh

Details of the image can be accessed (for the next 8 weeks) at http://tinderbox.zwiebeltoralf.de:31557
Comment 1 Joonas Niilola gentoo-dev 2021-08-16 06:52:27 UTC
Happened to me as well inside an lxd container. Restarting the emerge fixed it.
dev-java/openjdk-jre-bin-11.0.11_p9
Comment 2 Toralf Förster gentoo-dev 2021-08-16 14:01:18 UTC
The same happened now with :11 - and again at a developer profile image
for package app-metrics/collectd-5.12.0-r1
Comment 3 Toralf Förster gentoo-dev 2021-08-17 18:59:13 UTC
Created attachment 733613 [details]
strace.out
Comment 4 Toralf Förster gentoo-dev 2021-08-30 20:31:46 UTC
It is an issue here at various tinderbox images running with glibc-2.34 (indepentend of java 8 or 11)
Comment 5 Toralf Förster gentoo-dev 2021-10-10 16:25:42 UTC
It hangs immediately here:
...
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fbcb5409910, parent_tid=0x7fbcb5409910, exit_signal=0, stack=0x7fbcb5309000, stack_size=0xfff00, tls=0x7fbcb5409640} => {parent_tid=[14073]}, 88) = 14073
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7fbcb5409910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14073, NULL, FUTEX_BITSET_MATCH_ANYjavac 1.8.0_302


It seems to print out the version but do not exited then afterwards.


A successful run (at the same image just a second before gave:

) = ?
+++ exited with 0 +++

I masked java entirely here at the tinderbox till I can solve the root cause.
Comment 6 Sam James archtester gentoo-dev Security 2021-10-10 16:34:28 UTC
I wonder if bug 816396 is related.
Comment 7 Toralf Förster gentoo-dev 2021-10-10 16:35:19 UTC
Looks like https://stackoverflow.com/questions/58991966/what-java-security-egd-option-is-for

I do wonder if I can a system wide env var to point javac to /dev/urandom ?
Comment 8 Sam James archtester gentoo-dev Security 2021-10-11 05:25:47 UTC
(In reply to Toralf Förster from comment #7)
> Looks like
> https://stackoverflow.com/questions/58991966/what-java-security-egd-option-
> is-for
> 
> I do wonder if I can a system wide env var to point javac to /dev/urandom ?

You're sure it's entropy related? You can check how much is available when it happens
Comment 9 Toralf Förster gentoo-dev 2021-10-11 07:22:49 UTC
(In reply to Sam James from comment #8)
> You're sure it's entropy related? You can check how much is available when
> it happens

no, stumbled yesterday over a futex+random issue for java in the past, but seems not to happen here.

FWIW if I run

i=0; while :; do ((i++)); echo $i; javac -version; sleep 1; done

in an image then i is usually < 10.
Comment 10 SpanKY gentoo-dev 2021-10-14 13:36:32 UTC
this smells a bit like https://issuetracker.google.com/issues/187793042

when it hangs for you, please attach with gdb and get a backtrace
Comment 11 Josh Triplett 2021-10-15 19:15:11 UTC
As far as I can tell, this happens because the sandbox tool has a global lock, and hooks fork to acquire that lock before forking and drop it afterwards (so that fork doesn't happen while another thread holds the lock), but does not have a similar hook for clone or clone3. It's possible to create another process using clone or clone3 (not just a thread), if the flags do not include CLONE_VM. I think the right fix is to hook clone and clone3, and if the flags do *not* include CLONE_VM, use the same lock/unlock logic.
Comment 12 Josh Triplett 2021-10-15 19:17:42 UTC
Interesting that this call to clone3 *does* have CLONE_VM though. This deadlock may have a different cause, though it may be related.