Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 807832

Summary: dev-java/openjdk-jre-bin-8.292_p10 : java -version hangs sometimes within an bubblewrapped image
Product: Gentoo Linux Reporter: Toralf Förster <toralf>
Component: Current packagesAssignee: Georgy Yakovlev <gyakovlev>
Status: RESOLVED FIXED    
Severity: normal CC: java, juippis, sam, toolchain
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
URL: https://github.com/containers/bubblewrap/issues/464
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=28624
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 803482    
Attachments: strace.out
gdb log
pstree
pstree -a
gdb log of javac
gdb log

Description Toralf Förster gentoo-dev 2021-08-12 08:29:46 UTC
At the tinderbox I'm faced sometimes with a hang of "java -version" - which is not reliable but happened from time to time at an image.

The picture is, that "java -vession" runs fine for 4-5 times, but hangs in about 10-20% of all cases.

An strace shows that the hang occurres here :

...
openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\321\0\0\0\0\0\0"..., 832) = 832
newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=878680, ...}, AT_EMPTY_PATH) = 0
mmap(NULL, 880896, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8767952000
mprotect(0x7f876795f000, 823296, PROT_NONE) = 0
mmap(0x7f876795f000, 454656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7f876795f000
mmap(0x7f87679ce000, 364544, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7c000) = 0x7f87679ce000
mmap(0x7f8767a28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd5000) = 0x7f8767a28000
close(3)                                = 0
mprotect(0x7f8767a28000, 4096, PROT_READ) = 0
mprotect(0x7f876897e000, 618496, PROT_READ) = 0
getpid()                                = 14236
munmap(0x7f8768e8f000, 194193)          = 0
getpid()                                = 14236
rt_sigaction(SIGRT_1, {sa_handler=0x7f8768afa100, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f8768ab1060}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
mmap(NULL, 1052672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8767851000
mprotect(0x7f8767852000, 1048576, PROT_READ|PROT_WRITE) = 0
rt_sigprocmask(SIG_BLOCK, ~[], [], 8)   = 0
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f8767951910, parent_tid=0x7f8767951910, exit_signal=0, stack=0x7f8767851000, stack_size=0xfff00, tls=0x7f8767951640} => {parent_tid=[14239]}, 88) = 14239
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7f8767951910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14239, NULL, FUTEX_BITSET_MATCH_ANY    


The current affected image is: ~/img/17.1_developer-j3-20210809-220514

The script to "chroot" into the image is https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh

Details of the image can be accessed (for the next 8 weeks) at http://tinderbox.zwiebeltoralf.de:31557
Comment 1 Joonas Niilola gentoo-dev 2021-08-16 06:52:27 UTC
Happened to me as well inside an lxd container. Restarting the emerge fixed it.
dev-java/openjdk-jre-bin-11.0.11_p9
Comment 2 Toralf Förster gentoo-dev 2021-08-16 14:01:18 UTC
The same happened now with :11 - and again at a developer profile image
for package app-metrics/collectd-5.12.0-r1
Comment 3 Toralf Förster gentoo-dev 2021-08-17 18:59:13 UTC
Created attachment 733613 [details]
strace.out
Comment 4 Toralf Förster gentoo-dev 2021-08-30 20:31:46 UTC
It is an issue here at various tinderbox images running with glibc-2.34 (indepentend of java 8 or 11)
Comment 5 Toralf Förster gentoo-dev 2021-10-10 16:25:42 UTC
It hangs immediately here:
...
clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fbcb5409910, parent_tid=0x7fbcb5409910, exit_signal=0, stack=0x7fbcb5309000, stack_size=0xfff00, tls=0x7fbcb5409640} => {parent_tid=[14073]}, 88) = 14073
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x7fbcb5409910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14073, NULL, FUTEX_BITSET_MATCH_ANYjavac 1.8.0_302


It seems to print out the version but do not exited then afterwards.


A successful run (at the same image just a second before gave:

) = ?
+++ exited with 0 +++

I masked java entirely here at the tinderbox till I can solve the root cause.
Comment 6 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-10-10 16:34:28 UTC
I wonder if bug 816396 is related.
Comment 7 Toralf Förster gentoo-dev 2021-10-10 16:35:19 UTC
Looks like https://stackoverflow.com/questions/58991966/what-java-security-egd-option-is-for

I do wonder if I can a system wide env var to point javac to /dev/urandom ?
Comment 8 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-10-11 05:25:47 UTC
(In reply to Toralf Förster from comment #7)
> Looks like
> https://stackoverflow.com/questions/58991966/what-java-security-egd-option-
> is-for
> 
> I do wonder if I can a system wide env var to point javac to /dev/urandom ?

You're sure it's entropy related? You can check how much is available when it happens
Comment 9 Toralf Förster gentoo-dev 2021-10-11 07:22:49 UTC
(In reply to Sam James from comment #8)
> You're sure it's entropy related? You can check how much is available when
> it happens

no, stumbled yesterday over a futex+random issue for java in the past, but seems not to happen here.

FWIW if I run

i=0; while :; do ((i++)); echo $i; javac -version; sleep 1; done

in an image then i is usually < 10.
Comment 10 SpanKY gentoo-dev 2021-10-14 13:36:32 UTC
this smells a bit like https://issuetracker.google.com/issues/187793042

when it hangs for you, please attach with gdb and get a backtrace
Comment 11 Josh Triplett 2021-10-15 19:15:11 UTC
As far as I can tell, this happens because the sandbox tool has a global lock, and hooks fork to acquire that lock before forking and drop it afterwards (so that fork doesn't happen while another thread holds the lock), but does not have a similar hook for clone or clone3. It's possible to create another process using clone or clone3 (not just a thread), if the flags do not include CLONE_VM. I think the right fix is to hook clone and clone3, and if the flags do *not* include CLONE_VM, use the same lock/unlock logic.
Comment 12 Josh Triplett 2021-10-15 19:17:42 UTC
Interesting that this call to clone3 *does* have CLONE_VM though. This deadlock may have a different cause, though it may be related.
Comment 13 SpanKY gentoo-dev 2021-10-17 00:16:03 UTC
please don't copy & paste the same comments to multiple bugs
Comment 14 SpanKY gentoo-dev 2021-11-03 16:58:37 UTC
*** Bug 806302 has been marked as a duplicate of this bug. ***
Comment 15 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-11-07 20:42:25 UTC
(In reply to SpanKY from comment #10)
> this smells a bit like https://issuetracker.google.com/issues/187793042
> 
> when it hangs for you, please attach with gdb and get a backtrace

toralf, are you able to try do this?

also, can you let us know if the hang happens with FEATURES="-sandbox -usersandbox"?
Comment 16 Toralf Förster gentoo-dev 2021-11-07 21:57:16 UTC
does this help? :

mr-fox ~ # gdb /home/tinderbox/run/17.1_desktop_gnome-j4-20211103-130002/usr/bin/javac 25980 
GNU gdb (Gentoo 10.2 vanilla) 10.2
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
/home/tinderbox/run/17.1_desktop_gnome-j4-20211103-130002/usr/bin/javac: No such file or directory.
Attaching to process 25980
[New LWP 26025]
[New LWP 26095]
[New LWP 26125]
[New LWP 26236]
[New LWP 26239]
[New LWP 26240]
[New LWP 26252]
[New LWP 26253]

warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib64/libpthread.so.0.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable.  Connect to gdbserver inside the container.

warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib64/libpthread.so.0.

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
0x00007faa097f2cf6 in ?? () from target:/lib64/libc.so.6
(gdb) bt full
#0  0x00007faa097f2cf6 in ?? () from target:/lib64/libc.so.6
No symbol table info available.
#1  0x00007faa097f7b03 in ?? () from target:/lib64/libc.so.6
No symbol table info available.
#2  0x00007faa09983925 in ContinueInNewThread0 (continuation=continuation@entry=0x7faa0997de20 <JavaMain>, stack_size=1048576, args=args@entry=0x7ffd50651ce0)
    at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/solaris/bin/java_md_solinux.c:1045
        tmp = 0x0
        rslt = <optimized out>
        tid = 140368261477952
        attr = {__size = '\000' <repeats 17 times>, "\020", '\000' <repeats 16 times>, "\020", '\000' <repeats 20 times>, __align = 0}
#3  0x00007faa0997fc72 in ContinueInNewThread (ifn=ifn@entry=0x7ffd50651e00, threadStackSize=<optimized out>, argc=<optimized out>, argv=0x5654480cdc80, mode=mode@entry=0, 
    what=what@entry=0x0, ret=0) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/share/bin/java.c:2033
        args = {argc = 9, argv = 0x5654480cdc80, mode = 1, what = 0x5654480cdd00 "com.sun.tools.javac.Main", ifn = {CreateJavaVM = 0x7faa08dfbaf0 <JNI_CreateJavaVM>, 
            GetDefaultJavaVMInitArgs = 0x7faa08dfbaa0 <JNI_GetDefaultJavaVMInitArgs>, GetCreatedJavaVMs = 0x7faa08dfbc10 <JNI_GetCreatedJavaVMs>}}
        rslt = <optimized out>
#4  0x00007faa099839db in JVMInit (ifn=ifn@entry=0x7ffd50651e00, threadStackSize=<optimized out>, argc=<optimized out>, argv=<optimized out>, mode=0, mode@entry=1, what=0x0, 
    what@entry=0x5654480cdd00 "com.sun.tools.javac.Main", ret=<optimized out>)
    at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/solaris/bin/java_md_solinux.c:1092
No locals.
#5  0x00007faa09980380 in JLI_Launch (argc=<optimized out>, argv=<optimized out>, jargc=<optimized out>, jargv=<optimized out>, appclassc=2, appclassv=0x565447201040 <const_appclasspath>, 
    fullversion=0x565447000939 "1.8.0_302-b08", dotversion=0x565447000935 "1.8", pname=0x565447000930 "java", lname=0x565447000928 "openjdk", javaargs=1 '\001', cpwildcard=1 '\001', 
    javaw=0 '\000', ergo=1) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/share/bin/java.c:304
        mode = <optimized out>
        what = <optimized out>
        cpath = <optimized out>
        main_class = <optimized out>
        ret = <optimized out>
        ifn = {CreateJavaVM = 0x7faa08dfbaf0 <JNI_CreateJavaVM>, GetDefaultJavaVMInitArgs = 0x7faa08dfbaa0 <JNI_GetDefaultJavaVMInitArgs>, 
          GetCreatedJavaVMs = 0x7faa08dfbc10 <JNI_GetCreatedJavaVMs>}
        start = <optimized out>
        end = <optimized out>
        jvmpath = "/opt/openjdk-bin-8.302_p08/jre/lib/amd64/server/libjvm.so\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000o\177\271\t\252\177\000\000\200\300v\t\252\177\000\000\205\320s\t\252\177\000\000\020\200r\t\252\177\000\000\000\000\000\000\000\000\000\000\026\000\000\000\000\000\000\000K\201\271\t\252\177", '\000' <repeats 3969 times>
        jrepath = "/opt/openjdk-bin-8.302_p08/jre\000javac", '\000' <repeats 4059 times>
        jvmcfg = "/opt/openjdk-bin-8.302_p08/jre/lib/amd64/jvm.cfg", '\000' <repeats 1080 times>...
#6  0x00005654470006ba in main ()
No symbol table info available.
(gdb) quit
A debugging session is active.

        Inferior 1 [process 25980] will be detached.

Quit anyway? (y or n) y
Detaching from program: target:/opt/openjdk-bin-8.302_p08/bin/javac, process 25980
[Inferior 1 (process 25980) detached]
Comment 17 Toralf Förster gentoo-dev 2021-11-08 14:21:18 UTC
Created attachment 749523 [details]
gdb log

this is from an image with debug enabled, this is the process tree, when  it hangs:

Every 2.0s: pstree -Ulnspu -a 12511                                                                                                                          mr-fox: Mon Nov  8 15:20:09 2021

init,1
  └─sudo,24565 /opt/tb/bin/bwrap.sh -m /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 -s /opt/tb/bin/job.sh
      └─bwrap.sh,24596 /opt/tb/bin/bwrap.sh -m /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 -s /opt/tb/bin/job.sh
          └─bwrap,24663 --unshare-cgroup --unshare-ipc --unshare-pid --unshare-uts --hostname 17-1-no-multilib-j4-debug-20211106-110004- --die-with-parent --setenv MAILTO tinderbox --bind /
home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 / --dev /dev --mqueue /dev/mqueue --perms 1777 --tmpfs /dev/shm --proc /proc --tmpfs /run --ro-bind /home/tinderbox/tb/sdata/ssm
tp.conf /etc/ssmtp/ssmtp.conf --bind /home/tinderbox/tb/data /mnt/tb/data --bind /home/tinderbox/distfiles /var/cache/distfiles --tmpfs /var/tmp/portage --chdir /var/tmp/tb /bin/bash -l -c
/entrypoint
              └─bwrap,24668 --unshare-cgroup --unshare-ipc --unshare-pid --unshare-uts --hostname 17-1-no-multilib-j4-debug-20211106-110004- --die-with-parent --setenv MAILTO tinderbox --bi
nd /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 / --dev /dev --mqueue /dev/mqueue --perms 1777 --tmpfs /dev/shm --proc /proc --tmpfs /run --ro-bind /home/tinderbox/tb/sdata
/ssmtp.conf /etc/ssmtp/ssmtp.conf --bind /home/tinderbox/tb/data /mnt/tb/data --bind /home/tinderbox/distfiles /var/cache/distfiles --tmpfs /var/tmp/portage --chdir /var/tmp/tb /bin/bash -l
 -c /entrypoint
                  └─entrypoint,24674 /entrypoint
                      └─entrypoint,27486 /entrypoint
                          └─emerge,27488 -b /usr/lib/python-exec/python3.9/emerge --update dev-java/yanfs
                              └─python3.9,12486 /usr/lib/portage/python3.9/pid-ns-init 26925
                                  └─python3.9,12511 /usr/lib/portage/python3.9/pid-ns-init 250 250 250 18 0,1,2 /usr/bin/sandbox [dev-java/ant-core-1.10.9] sandbox /usr/lib/portage/python3.
9/ebuild.sh compile
                                      └─sandbox,12763,portage /usr/lib/portage/python3.9/ebuild.sh compile
                                          └─ebuild.sh,12773 /usr/lib/portage/python3.9/ebuild.sh compile
                                              └─ebuild.sh,12932 /usr/lib/portage/python3.9/ebuild.sh compile
                                                  └─build.sh,13317 ./build.sh -Dbuild.sysclasspath=ignore jars dist-internal
                                                      └─sh,13337 ./bootstrap.sh
                                                          └─javac,13382 --release 8 -d build/classes build/classes/JavacVersionCheck.java
                                                              ├─{javac},13397
                                                              ├─{javac},13442
                                                              ├─{javac},13458
                                                              ├─{javac},13465
                                                              ├─{javac},13548
                                                              ├─{javac},13550
                                                              ├─{javac},13551
                                                              ├─{javac},13560
                                                              └─{javac},13564
Comment 18 Toralf Förster gentoo-dev 2021-11-08 14:31:52 UTC
(In reply to Sam James from comment #15)

> also, can you let us know if the hang happens with FEATURES="-sandbox
> -usersandbox"?

That seems not to help
Comment 19 SpanKY gentoo-dev 2021-11-08 17:31:01 UTC
(In reply to Toralf Förster from comment #18)

can you verify `sandbox` is not in the process tree in this case ?
Comment 20 Toralf Förster gentoo-dev 2021-11-08 18:44:01 UTC
(In reply to SpanKY from comment #19)
> (In reply to Toralf Förster from comment #18)
> 
> can you verify `sandbox` is not in the process tree in this case ?

yes, and I'll attach here again

pstree -Ulnspu  24565 > pstree
pstree -Ulnspua 24565 > pstree-a
gdb /home/tinderbox/img/17.1-j4_debug-20211105-183959/usr/bin/java 2>&1 6582 | tee gdb.log
Comment 21 Toralf Förster gentoo-dev 2021-11-08 18:44:45 UTC
Created attachment 749592 [details]
pstree
Comment 22 Toralf Förster gentoo-dev 2021-11-08 18:45:03 UTC
Created attachment 749595 [details]
pstree -a
Comment 23 Toralf Förster gentoo-dev 2021-11-08 18:45:24 UTC
Created attachment 749598 [details]
gdb log of javac
Comment 24 SpanKY gentoo-dev 2021-11-09 00:53:26 UTC
thanks, sounds like sandbox isn't relevant.  it makes it sound like the glibc issue w/pthreads & futexes (comment #10) is relevant.
Comment 25 Toralf Förster gentoo-dev 2021-11-10 18:12:21 UTC
Created attachment 750216 [details]
gdb log

It is javac itself which hangs. So I run this in a tinderbox image:

while :; do /usr/bin/javac -version; done

and put that into the background when it hang.
Then I run gdb and attached to the appropriate pid (all within the image b/c the name spaces are different):

gdb /usr/bin/javac 368 2>&1 | tee gdb.log
Comment 26 Andreas K. Hüttel archtester gentoo-dev 2021-11-13 17:45:48 UTC
(In reply to Toralf Förster from comment #25)
> Created attachment 750216 [details]
> gdb log
> 
> It is javac itself which hangs. So I run this in a tinderbox image:
> 
> while :; do /usr/bin/javac -version; done
> 
> and put that into the background when it hang.
> Then I run gdb and attached to the appropriate pid (all within the image b/c
> the name spaces are different):
> 
> gdb /usr/bin/javac 368 2>&1 | tee gdb.log

Tried to reproduce this in a systemd-nspawn but everything worked fine there.

The hangs seem to be specific to bubblewrap or another detail of the setup... :(
Comment 27 Toralf Förster gentoo-dev 2021-11-13 20:21:41 UTC
but now I do have a non-java use case at image
~/img/17.1_no_multilib-j4_test-20211112-204802 where it freezes too (test phase of sys-apps/gawk):


$ pstree -Ulnpua 8346
make,8346,portage check
  └─sh,11130 -c make pass-fail || { make diffout; exit 1; }
      └─make,11209 diffout
          └─sh,11242 -c for i in _* ; \\\012do  \\\012\011if [ "$i" != "_*" ]; then \\\012\011echo ============== $i ============= ; \\\012\011base=`echo $i | sed 's/^_//'` ; \\\012\011if [ -r ${base}.ok ]; then \\\012\011diff -u ${base}.ok $i ; \\\012\011else \\\012\011diff -u "."/${base}.ok  $i ; \\\012\011fi ; \\\012\011fi ; \\\012done | more
              └─more,11251
Comment 28 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-11-13 20:33:57 UTC
Was that with glibc-2.34? Wasn’t clear from irc
Comment 29 Larry the Git Cow gentoo-dev 2021-11-29 10:13:36 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4241f37f9437f5e5e9c61ad8411e4515d96728a5

commit 4241f37f9437f5e5e9c61ad8411e4515d96728a5
Author:     Andreas K. Hüttel <dilfridge@gentoo.org>
AuthorDate: 2021-11-29 10:11:48 +0000
Commit:     Andreas K. Hüttel <dilfridge@gentoo.org>
CommitDate: 2021-11-29 10:13:18 +0000

    sys-libs/glibc: 2.34 revision/patchlevel bump
    
    Closes: https://bugs.gentoo.org/807832
    Package-Manager: Portage-3.0.28, Repoman-3.0.3
    Signed-off-by: Andreas K. Huettel <dilfridge@gentoo.org>

 sys-libs/glibc/Manifest             |    1 +
 sys-libs/glibc/glibc-2.34-r3.ebuild | 1580 +++++++++++++++++++++++++++++++++++
 2 files changed, 1581 insertions(+)