At the tinderbox I'm faced sometimes with a hang of "java -version" - which is not reliable but happened from time to time at an image. The picture is, that "java -vession" runs fine for 4-5 times, but hangs in about 10-20% of all cases. An strace shows that the hang occurres here : ... openat(AT_FDCWD, "/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\321\0\0\0\0\0\0"..., 832) = 832 newfstatat(3, "", {st_mode=S_IFREG|0755, st_size=878680, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 880896, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8767952000 mprotect(0x7f876795f000, 823296, PROT_NONE) = 0 mmap(0x7f876795f000, 454656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd000) = 0x7f876795f000 mmap(0x7f87679ce000, 364544, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7c000) = 0x7f87679ce000 mmap(0x7f8767a28000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd5000) = 0x7f8767a28000 close(3) = 0 mprotect(0x7f8767a28000, 4096, PROT_READ) = 0 mprotect(0x7f876897e000, 618496, PROT_READ) = 0 getpid() = 14236 munmap(0x7f8768e8f000, 194193) = 0 getpid() = 14236 rt_sigaction(SIGRT_1, {sa_handler=0x7f8768afa100, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f8768ab1060}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 mmap(NULL, 1052672, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8767851000 mprotect(0x7f8767852000, 1048576, PROT_READ|PROT_WRITE) = 0 rt_sigprocmask(SIG_BLOCK, ~[], [], 8) = 0 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7f8767951910, parent_tid=0x7f8767951910, exit_signal=0, stack=0x7f8767851000, stack_size=0xfff00, tls=0x7f8767951640} => {parent_tid=[14239]}, 88) = 14239 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 futex(0x7f8767951910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14239, NULL, FUTEX_BITSET_MATCH_ANY The current affected image is: ~/img/17.1_developer-j3-20210809-220514 The script to "chroot" into the image is https://github.com/toralf/tinderbox/blob/master/bin/bwrap.sh Details of the image can be accessed (for the next 8 weeks) at http://tinderbox.zwiebeltoralf.de:31557
Happened to me as well inside an lxd container. Restarting the emerge fixed it. dev-java/openjdk-jre-bin-11.0.11_p9
The same happened now with :11 - and again at a developer profile image for package app-metrics/collectd-5.12.0-r1
Created attachment 733613 [details] strace.out
It is an issue here at various tinderbox images running with glibc-2.34 (indepentend of java 8 or 11)
It hangs immediately here: ... clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fbcb5409910, parent_tid=0x7fbcb5409910, exit_signal=0, stack=0x7fbcb5309000, stack_size=0xfff00, tls=0x7fbcb5409640} => {parent_tid=[14073]}, 88) = 14073 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 futex(0x7fbcb5409910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 14073, NULL, FUTEX_BITSET_MATCH_ANYjavac 1.8.0_302 It seems to print out the version but do not exited then afterwards. A successful run (at the same image just a second before gave: ) = ? +++ exited with 0 +++ I masked java entirely here at the tinderbox till I can solve the root cause.
I wonder if bug 816396 is related.
Looks like https://stackoverflow.com/questions/58991966/what-java-security-egd-option-is-for I do wonder if I can a system wide env var to point javac to /dev/urandom ?
(In reply to Toralf Förster from comment #7) > Looks like > https://stackoverflow.com/questions/58991966/what-java-security-egd-option- > is-for > > I do wonder if I can a system wide env var to point javac to /dev/urandom ? You're sure it's entropy related? You can check how much is available when it happens
(In reply to Sam James from comment #8) > You're sure it's entropy related? You can check how much is available when > it happens no, stumbled yesterday over a futex+random issue for java in the past, but seems not to happen here. FWIW if I run i=0; while :; do ((i++)); echo $i; javac -version; sleep 1; done in an image then i is usually < 10.
this smells a bit like https://issuetracker.google.com/issues/187793042 when it hangs for you, please attach with gdb and get a backtrace
As far as I can tell, this happens because the sandbox tool has a global lock, and hooks fork to acquire that lock before forking and drop it afterwards (so that fork doesn't happen while another thread holds the lock), but does not have a similar hook for clone or clone3. It's possible to create another process using clone or clone3 (not just a thread), if the flags do not include CLONE_VM. I think the right fix is to hook clone and clone3, and if the flags do *not* include CLONE_VM, use the same lock/unlock logic.
Interesting that this call to clone3 *does* have CLONE_VM though. This deadlock may have a different cause, though it may be related.
please don't copy & paste the same comments to multiple bugs
*** Bug 806302 has been marked as a duplicate of this bug. ***
(In reply to SpanKY from comment #10) > this smells a bit like https://issuetracker.google.com/issues/187793042 > > when it hangs for you, please attach with gdb and get a backtrace toralf, are you able to try do this? also, can you let us know if the hang happens with FEATURES="-sandbox -usersandbox"?
does this help? : mr-fox ~ # gdb /home/tinderbox/run/17.1_desktop_gnome-j4-20211103-130002/usr/bin/javac 25980 GNU gdb (Gentoo 10.2 vanilla) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://bugs.gentoo.org/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... /home/tinderbox/run/17.1_desktop_gnome-j4-20211103-130002/usr/bin/javac: No such file or directory. Attaching to process 25980 [New LWP 26025] [New LWP 26095] [New LWP 26125] [New LWP 26236] [New LWP 26239] [New LWP 26240] [New LWP 26252] [New LWP 26253] warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib64/libpthread.so.0. warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. warning: Target and debugger are in different PID namespaces; thread lists and other data are likely unreliable. Connect to gdbserver inside the container. warning: Expected absolute pathname for libpthread in the inferior, but got target:/lib64/libpthread.so.0. warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. 0x00007faa097f2cf6 in ?? () from target:/lib64/libc.so.6 (gdb) bt full #0 0x00007faa097f2cf6 in ?? () from target:/lib64/libc.so.6 No symbol table info available. #1 0x00007faa097f7b03 in ?? () from target:/lib64/libc.so.6 No symbol table info available. #2 0x00007faa09983925 in ContinueInNewThread0 (continuation=continuation@entry=0x7faa0997de20 <JavaMain>, stack_size=1048576, args=args@entry=0x7ffd50651ce0) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/solaris/bin/java_md_solinux.c:1045 tmp = 0x0 rslt = <optimized out> tid = 140368261477952 attr = {__size = '\000' <repeats 17 times>, "\020", '\000' <repeats 16 times>, "\020", '\000' <repeats 20 times>, __align = 0} #3 0x00007faa0997fc72 in ContinueInNewThread (ifn=ifn@entry=0x7ffd50651e00, threadStackSize=<optimized out>, argc=<optimized out>, argv=0x5654480cdc80, mode=mode@entry=0, what=what@entry=0x0, ret=0) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/share/bin/java.c:2033 args = {argc = 9, argv = 0x5654480cdc80, mode = 1, what = 0x5654480cdd00 "com.sun.tools.javac.Main", ifn = {CreateJavaVM = 0x7faa08dfbaf0 <JNI_CreateJavaVM>, GetDefaultJavaVMInitArgs = 0x7faa08dfbaa0 <JNI_GetDefaultJavaVMInitArgs>, GetCreatedJavaVMs = 0x7faa08dfbc10 <JNI_GetCreatedJavaVMs>}} rslt = <optimized out> #4 0x00007faa099839db in JVMInit (ifn=ifn@entry=0x7ffd50651e00, threadStackSize=<optimized out>, argc=<optimized out>, argv=<optimized out>, mode=0, mode@entry=1, what=0x0, what@entry=0x5654480cdd00 "com.sun.tools.javac.Main", ret=<optimized out>) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/solaris/bin/java_md_solinux.c:1092 No locals. #5 0x00007faa09980380 in JLI_Launch (argc=<optimized out>, argv=<optimized out>, jargc=<optimized out>, jargv=<optimized out>, appclassc=2, appclassv=0x565447201040 <const_appclasspath>, fullversion=0x565447000939 "1.8.0_302-b08", dotversion=0x565447000935 "1.8", pname=0x565447000930 "java", lname=0x565447000928 "openjdk", javaargs=1 '\001', cpwildcard=1 '\001', javaw=0 '\000', ergo=1) at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-hotspot/workspace/build/src/jdk/src/share/bin/java.c:304 mode = <optimized out> what = <optimized out> cpath = <optimized out> main_class = <optimized out> ret = <optimized out> ifn = {CreateJavaVM = 0x7faa08dfbaf0 <JNI_CreateJavaVM>, GetDefaultJavaVMInitArgs = 0x7faa08dfbaa0 <JNI_GetDefaultJavaVMInitArgs>, GetCreatedJavaVMs = 0x7faa08dfbc10 <JNI_GetCreatedJavaVMs>} start = <optimized out> end = <optimized out> jvmpath = "/opt/openjdk-bin-8.302_p08/jre/lib/amd64/server/libjvm.so\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000o\177\271\t\252\177\000\000\200\300v\t\252\177\000\000\205\320s\t\252\177\000\000\020\200r\t\252\177\000\000\000\000\000\000\000\000\000\000\026\000\000\000\000\000\000\000K\201\271\t\252\177", '\000' <repeats 3969 times> jrepath = "/opt/openjdk-bin-8.302_p08/jre\000javac", '\000' <repeats 4059 times> jvmcfg = "/opt/openjdk-bin-8.302_p08/jre/lib/amd64/jvm.cfg", '\000' <repeats 1080 times>... #6 0x00005654470006ba in main () No symbol table info available. (gdb) quit A debugging session is active. Inferior 1 [process 25980] will be detached. Quit anyway? (y or n) y Detaching from program: target:/opt/openjdk-bin-8.302_p08/bin/javac, process 25980 [Inferior 1 (process 25980) detached]
Created attachment 749523 [details] gdb log this is from an image with debug enabled, this is the process tree, when it hangs: Every 2.0s: pstree -Ulnspu -a 12511 mr-fox: Mon Nov 8 15:20:09 2021 init,1 └─sudo,24565 /opt/tb/bin/bwrap.sh -m /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 -s /opt/tb/bin/job.sh └─bwrap.sh,24596 /opt/tb/bin/bwrap.sh -m /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 -s /opt/tb/bin/job.sh └─bwrap,24663 --unshare-cgroup --unshare-ipc --unshare-pid --unshare-uts --hostname 17-1-no-multilib-j4-debug-20211106-110004- --die-with-parent --setenv MAILTO tinderbox --bind / home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 / --dev /dev --mqueue /dev/mqueue --perms 1777 --tmpfs /dev/shm --proc /proc --tmpfs /run --ro-bind /home/tinderbox/tb/sdata/ssm tp.conf /etc/ssmtp/ssmtp.conf --bind /home/tinderbox/tb/data /mnt/tb/data --bind /home/tinderbox/distfiles /var/cache/distfiles --tmpfs /var/tmp/portage --chdir /var/tmp/tb /bin/bash -l -c /entrypoint └─bwrap,24668 --unshare-cgroup --unshare-ipc --unshare-pid --unshare-uts --hostname 17-1-no-multilib-j4-debug-20211106-110004- --die-with-parent --setenv MAILTO tinderbox --bi nd /home/tinderbox/img/17.1_no_multilib-j4_debug-20211106-110004 / --dev /dev --mqueue /dev/mqueue --perms 1777 --tmpfs /dev/shm --proc /proc --tmpfs /run --ro-bind /home/tinderbox/tb/sdata /ssmtp.conf /etc/ssmtp/ssmtp.conf --bind /home/tinderbox/tb/data /mnt/tb/data --bind /home/tinderbox/distfiles /var/cache/distfiles --tmpfs /var/tmp/portage --chdir /var/tmp/tb /bin/bash -l -c /entrypoint └─entrypoint,24674 /entrypoint └─entrypoint,27486 /entrypoint └─emerge,27488 -b /usr/lib/python-exec/python3.9/emerge --update dev-java/yanfs └─python3.9,12486 /usr/lib/portage/python3.9/pid-ns-init 26925 └─python3.9,12511 /usr/lib/portage/python3.9/pid-ns-init 250 250 250 18 0,1,2 /usr/bin/sandbox [dev-java/ant-core-1.10.9] sandbox /usr/lib/portage/python3. 9/ebuild.sh compile └─sandbox,12763,portage /usr/lib/portage/python3.9/ebuild.sh compile └─ebuild.sh,12773 /usr/lib/portage/python3.9/ebuild.sh compile └─ebuild.sh,12932 /usr/lib/portage/python3.9/ebuild.sh compile └─build.sh,13317 ./build.sh -Dbuild.sysclasspath=ignore jars dist-internal └─sh,13337 ./bootstrap.sh └─javac,13382 --release 8 -d build/classes build/classes/JavacVersionCheck.java ├─{javac},13397 ├─{javac},13442 ├─{javac},13458 ├─{javac},13465 ├─{javac},13548 ├─{javac},13550 ├─{javac},13551 ├─{javac},13560 └─{javac},13564
(In reply to Sam James from comment #15) > also, can you let us know if the hang happens with FEATURES="-sandbox > -usersandbox"? That seems not to help
(In reply to Toralf Förster from comment #18) can you verify `sandbox` is not in the process tree in this case ?
(In reply to SpanKY from comment #19) > (In reply to Toralf Förster from comment #18) > > can you verify `sandbox` is not in the process tree in this case ? yes, and I'll attach here again pstree -Ulnspu 24565 > pstree pstree -Ulnspua 24565 > pstree-a gdb /home/tinderbox/img/17.1-j4_debug-20211105-183959/usr/bin/java 2>&1 6582 | tee gdb.log
Created attachment 749592 [details] pstree
Created attachment 749595 [details] pstree -a
Created attachment 749598 [details] gdb log of javac
thanks, sounds like sandbox isn't relevant. it makes it sound like the glibc issue w/pthreads & futexes (comment #10) is relevant.
Created attachment 750216 [details] gdb log It is javac itself which hangs. So I run this in a tinderbox image: while :; do /usr/bin/javac -version; done and put that into the background when it hang. Then I run gdb and attached to the appropriate pid (all within the image b/c the name spaces are different): gdb /usr/bin/javac 368 2>&1 | tee gdb.log
(In reply to Toralf Förster from comment #25) > Created attachment 750216 [details] > gdb log > > It is javac itself which hangs. So I run this in a tinderbox image: > > while :; do /usr/bin/javac -version; done > > and put that into the background when it hang. > Then I run gdb and attached to the appropriate pid (all within the image b/c > the name spaces are different): > > gdb /usr/bin/javac 368 2>&1 | tee gdb.log Tried to reproduce this in a systemd-nspawn but everything worked fine there. The hangs seem to be specific to bubblewrap or another detail of the setup... :(
but now I do have a non-java use case at image ~/img/17.1_no_multilib-j4_test-20211112-204802 where it freezes too (test phase of sys-apps/gawk): $ pstree -Ulnpua 8346 make,8346,portage check └─sh,11130 -c make pass-fail || { make diffout; exit 1; } └─make,11209 diffout └─sh,11242 -c for i in _* ; \\\012do \\\012\011if [ "$i" != "_*" ]; then \\\012\011echo ============== $i ============= ; \\\012\011base=`echo $i | sed 's/^_//'` ; \\\012\011if [ -r ${base}.ok ]; then \\\012\011diff -u ${base}.ok $i ; \\\012\011else \\\012\011diff -u "."/${base}.ok $i ; \\\012\011fi ; \\\012\011fi ; \\\012done | more └─more,11251
Was that with glibc-2.34? Wasn’t clear from irc
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4241f37f9437f5e5e9c61ad8411e4515d96728a5 commit 4241f37f9437f5e5e9c61ad8411e4515d96728a5 Author: Andreas K. Hüttel <dilfridge@gentoo.org> AuthorDate: 2021-11-29 10:11:48 +0000 Commit: Andreas K. Hüttel <dilfridge@gentoo.org> CommitDate: 2021-11-29 10:13:18 +0000 sys-libs/glibc: 2.34 revision/patchlevel bump Closes: https://bugs.gentoo.org/807832 Package-Manager: Portage-3.0.28, Repoman-3.0.3 Signed-off-by: Andreas K. Huettel <dilfridge@gentoo.org> sys-libs/glibc/Manifest | 1 + sys-libs/glibc/glibc-2.34-r3.ebuild | 1580 +++++++++++++++++++++++++++++++++++ 2 files changed, 1581 insertions(+)