for i in {0..100}; do echo ${i} while read x ; do echo ${x} done < <(find /usr/lib -type f ! -type l -name '*.a') done will fail once to several times. You will be reproduced in the case of high load considerable of CPU probability. Error messages: test.sh: line 2: /tmp//sh-np-9439761768676459121: Interrupted system call Reproducible: Sometimes Steps to Reproduce: 1. ACCEPT_KEYWORDS=amd64 emerge app-benchmarks/stress 2. stress --cpu 8 --verbose 3. bash test.sh contains the code that is similar to sys-apps/portage. it is the cause of the problem that prepstrip and ecompressdir can not be successful in rare cases.
Created attachment 332708 [details] sample script 1. bash test.sh &>test.log 2. cat test.log | grep 'Interrupted' | wc
Created attachment 332710 [details, diff] sample patch for sys-apps/portage-2.2.0_alpha149 It is dirty. However, it works fine on G/FBSD.
Created attachment 332910 [details, diff] sample patch for sys-apps/portage-2.2.0_alpha149 enabled parallel installation
I can't reproduce the issue. Actually it seems a problem of operating system or kernel. Anyway we might need more information about your environment: version of bash, kernel, libc or such.
(In reply to comment #4) > I can't reproduce the issue. Actually it seems a problem of operating system > or kernel. Anyway we might need more information about your environment: > version of bash, kernel, libc or such. results of emerge --info Portage 2.2.0_alpha149 (default/bsd/fbsd/amd64/9.1, gcc-4.6.3, freebsd-lib-9.1_rc3-r1, 9.1-Gentoo amd64) ================================================================= System uname: FreeBSD-9.1-Gentoo-amd64-64bit-ELF Timestamp of tree: Wed, 19 Dec 2012 11:30:01 +0000 ld GNU ld (GNU Binutils) 2.22 app-shells/bash: 4.2_p39-r1 dev-lang/python: 2.7.3-r3, 3.2.3-r2 dev-util/pkgconfig: 0.27.1 sys-apps/baselayout: 2.2 sys-apps/openrc: 0.11.8 sys-devel/autoconf: 2.69 sys-devel/automake: 1.9.6-r3, 1.11.6 sys-devel/binutils: 2.22-r1 sys-devel/gcc: 4.6.3 sys-devel/gcc-config: 1.8 sys-devel/libtool: 2.4.2 sys-devel/make: 3.82-r3 sys-freebsd/freebsd-lib: 9.1_rc3-r1 (virtual/os-headers) Repositories: gentoo ACCEPT_KEYWORDS="amd64-fbsd ~amd64-fbsd" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-gentoo-freebsd9.1" CFLAGS="-O2 -pipe -mtune=generic" CHOST="x86_64-gentoo-freebsd9.1" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -pipe -mtune=generic" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs chflags config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" INSTALL_MASK="/usr/lib/systemd" LDFLAGS="" MAKEOPTS="-j5" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="acl amd64-fbsd berkdb cracklib crypt cups cxx dri gdbm iconv ipv6 java5 java6 mmx modules multilib ncurses nls oss pam pcre readline sse sse2 ssl tcpd unicode zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="FreeBSD" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse" KERNEL="FreeBSD" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="BSD" VIDEO_CARDS="apm ark chips cirrus cyrix dummy i128 intel mach64 mga nv r128 radeon rendition s3 s3virge savage siliconmotion sis tga trident tseng fbdev" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, LANG, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Maybe you can patch bash's read builtin to retry when it encounters EINTR.
Created attachment 333210 [details, diff] sample patch for app-shells/bash-4.2_p39-r1 (In reply to comment #6) > Maybe you can patch bash's read builtin to retry when it encounters EINTR. Thanks for your comment. I've created a patch for bash. I was able to work around this problem.
(In reply to comment #7) > Created attachment 333210 [details, diff] [details, diff] > sample patch for app-shells/bash-4.2_p39-r1 > > (In reply to comment #6) > > Maybe you can patch bash's read builtin to retry when it encounters EINTR. > > Thanks for your comment. > > I've created a patch for bash. > I was able to work around this problem. Please send this issue and your patch to the upstream. Since I cannot reploduce the issue, It is difficult for me to discuss about it.
I see exactly the same thing on Gentoo Prefix, x86-based FreeBSD 9.1-b1 under VirtualBox.
http://lists.freebsd.org/pipermail/freebsd-bugs/2009-March/034733.html http://lists.gnu.org/archive/html/bug-bash/2008-10/msg00091.html
Since this problem makes emerge hang without being able to be killed by ^C (need another terminal to pkill -f prepstrip) I applied the patch to the Prefix version of bash to get a working system again.
I was able to reproduce this while running portage's unit tests on the GhostBSD-2.5 livedvd under VirtualBox. So, I added a minimal workaround that allows the tests to pass: http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2d76e5e77d2be40f5635140dbbbf0b4d60f20dc1
@base-system how do you think of adding the patch on comment 7?
(In reply to comment #13) Chet's response seems to indicate that this is a bug in FreeBSD. i don't think we should be patching random read() calls to workaround that.
(In reply to comment #14) It's not exactly a random read() call, it is the specific open(2) call when reading from a named pipe. I get the impression that this may be due to the writer not being fully active yet or something.
Posix says open(2) can return -1 with errno set to EINTR, so if bash triggers that case, maybe it's a good idea to handle it?
(In reply to comment #16) EINTR happens when a process gets a signal that the process handles and returns from. in the named pipe scenario, the remote side shouldn't matter. you can open a named pipe even if there are no active writers. reads will then simply block waiting for data. $ mkfifo asdf $ cat asdf <hang> someone should figure out what signal bash is receiving & handling, and who is sending it for what reason.
Probably a SIGCHLD, and I can't imagine anything else. A quick grep of the bash source turns up about 148 references to SIGCHLD. Consider the attached test case from comment #1: for i in {0..100}; do echo ${i} while read x ; do echo ${x} done < <(find /usr/lib -type f ! -type l -name '*.a') done If the subshell exits when the parent process is trying to open the named pipe, the resulting SIGCHLD triggers EINTR. Seems like the most obvious explanation.
Tracing through bash internals, we find some evidence in support of the explanation given in comment #18: Apparently the /tmp//sh-np-9439761768676459121 fifo, visible in the error message in comment #0, is created by make_named_pipe() (from subst.c). The only place make_named_pipe() is called is from process_substitute(). process_substitute() forks, returns the fifo's pathname in the parent process, and calls exit() after executing the child shell. The parent only tries to open the fifo after process_substitute() returns, which is after the fork has occurred. So, the child shell may exit and trigger a SICHLD during the parent's open call, causing it to fail with EINTR.
(In reply to comment #19) on Linux, the initial write to the named fifo hangs until there is someone on the other side consuming it. so it wouldn't be possible for a fifo to be created, a child process created, write all of its output, and then exit before the parent got a chance to open it. is that not the case on freebsd ?
(In reply to comment #20) This is true, both the read and write ends appear to block until both ends are opened, so it does not seem possible for the child to exit before the parent opens the pipe. So, we have to search further for the source of the theorized SIGCHLD event. We don't have to look far though: In both ecompressdir and prepstrip (where we just happen to observe EINTR problems), we have other subshells running in parallel that can trigger additional SIGCHLD events in the main shell. These additional subshells are controlled by the functions from bin/helper-functions.sh (equivalent to multiprocessing.eclass).
Created attachment 335664 [details] test case - more SIGCHLD events correlates with more EINTR With this test case, I trigger EINTR for more than 90% of redir_open calls.
(In reply to comment #22) i don't have access to a FreeBSD system anymore to test. does this simpler version fail too ? while :; do (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& while read x ; do : ; done < <(echo foo) done
(In reply to comment #23) > (In reply to comment #22) > > i don't have access to a FreeBSD system anymore to test. does this simpler > version fail too ? > > while :; do > (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& > while read x ; do : ; done < <(echo foo) > done Yes, I'm seeing a 100% failure rate with this test. The system is FreeBSD 9.0 (GhostBSD-2.5-gnome-i386.iso livedvd running in VirtualBox), with BASH_VERSION 4.1.11(1)-release.
Chet says that all SIGCHLD signal handlers should be registered with SA_RESTART. so either some #ifdef route in bash is not doing that, or the FreeBSD kernel isn't respecting that flag correctly. can you run the script through like `strace -f` and see if signal() is being called ? or is it all sigaction (and if so, is SA_RESTART set in sa_flags) ?
Created attachment 336266 [details, diff] debug patch for bash set_signal_handler function (In reply to comment #25) > Chet says that all SIGCHLD signal handlers should be registered with > SA_RESTART. so either some #ifdef route in bash is not doing that, or the > FreeBSD kernel isn't respecting that flag correctly. Apparently bash is in fact using sigaction with SA_RESTART, but the FreeBSD kernel is not respecting it. > can you run the script through like `strace -f` and see if signal() is being > called ? or is it all sigaction (and if so, is SA_RESTART set in sa_flags) ? I compiled bash with the attached debug patch, and it shows bash calling sigaction with SA_RESTART: set_signal_handler SIGCHLD sa_flags = 0x2 SA_RESTART = 0x2 sa_flags & SA_RESTART = 0x2
Created attachment 336270 [details] ktrace log of bash-4.2_p42 failing the test
(In reply to comment #27) ktrace really should get support for decoding structs passed/received :) at any rate, this shows the problems is in the freebsd kernel. a bit ironic as i think the SA_RESTART semantics came from BSD to Linux :).
they've done some more digging and it is a bug in the FreeBSD (among others). they will most likely develop a workaround in bash for it, but the FreeBSD kernel still needs fixing. someone (i.e. not me) will have to follow up with the FreeBSD project to get things fixed there. most likely affects other *BSD kernels too.
*** Bug 460774 has been marked as a duplicate of this bug. ***
bash's workaround seems to be here: http://git.savannah.gnu.org/cgit/bash.git/diff/redir.c?h=devel&id=208fdb509e072977ae7a621e916dfcd32c76047d
i can confirm, after adding the bash workaround patch posted by Naohiro Aota it resolve the problem
Commit message: Add workaround from upstream for read() under buggy BSD kernels http://sources.gentoo.org/app-shells/bash/bash-4.2_p45-r1.ebuild?rev=1.1 http://sources.gentoo.org/app-shells/bash/files/bash-4.2-read-retry.patch?rev=1.1
(In reply to SpanKY from comment #33) > Commit message: Add workaround from upstream for read() under buggy BSD > kernels > http://sources.gentoo.org/app-shells/bash/bash-4.2_p45-r1.ebuild?rev=1.1 > http://sources.gentoo.org/app-shells/bash/files/bash-4.2-read-retry. > patch?rev=1.1 But you are aware that some people were actually relying on signals interrupting read, right? Right now those scripts just end up locked completely and no traps are run during read. In fact, you can't even terminate the script gracefully when read is locked on a pipe...
(In reply to Michał Górny from comment #34) > But you are aware that some people were actually relying on signals > interrupting read, right? Right now those scripts just end up locked > completely and no traps are run during read. A simple test case script would be nice, so that we can try it with and without the patch, to demonstrate the regression. > In fact, you can't even > terminate the script gracefully when read is locked on a pipe... Maybe this could be handled by storing the type of the last signal in a variable, and changing loop behavior based on the type of the signal that triggered the EINTR. If the last signal was SIGCHLD, we should continue looping. Otherwise, we should exit the loop. Shouldn't that produce the desired behavior?
The simplest test case: trap 'exit 0' INT TERM mkfifo pipe read <pipe now you can't terminate the script without either opening the other end of fifo or SIGKILL-ing it (which would result in very non-sane exit).
(In reply to Michał Górny from comment #34) take it up with FreeBSD / upstream bash. we aren't carrying custom patches here.
*-fbsd is gone.