Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 447810 - FreeBSD: open() on a fifo can fail with -EINTR even when using SA_RESTART (app-shells/bash sometimes fails `read`)
Summary: FreeBSD: open() on a fifo can fail with -EINTR even when using SA_RESTART (ap...
Status: CONFIRMED
Alias: None
Product: Gentoo/Alt
Classification: Unclassified
Component: FreeBSD (show other bugs)
Hardware: All FreeBSD
: Normal normal (vote)
Assignee: Gentoo/BSD Team
URL: http://lists.gnu.org/archive/html/bug...
Whiteboard:
Keywords:
: 460774 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-12-19 12:28 UTC by Yuta SATOH
Modified: 2015-01-10 12:44 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
sample script (447810.sh,136 bytes, text/plain)
2012-12-19 12:33 UTC, Yuta SATOH
Details
sample patch for sys-apps/portage-2.2.0_alpha149 (portage-2.2-BSD.patch,1.96 KB, patch)
2012-12-19 12:36 UTC, Yuta SATOH
Details | Diff
sample patch for sys-apps/portage-2.2.0_alpha149 (portage-2.2-BSD.patch,2.21 KB, patch)
2012-12-21 11:42 UTC, Yuta SATOH
Details | Diff
sample patch for app-shells/bash-4.2_p39-r1 (bash-4.2-fbsd-EINTR.patch,413 bytes, patch)
2012-12-24 14:14 UTC, Yuta SATOH
Details | Diff
test case - more SIGCHLD events correlates with more EINTR (sigchld_eintr_correlation.sh,373 bytes, text/plain)
2013-01-15 03:16 UTC, Zac Medico
Details
debug patch for bash set_signal_handler function (bash_debug.patch,593 bytes, patch)
2013-01-20 20:48 UTC, Zac Medico
Details | Diff
ktrace log of bash-4.2_p42 failing the test (bash-4.2_p42_ktrace.log,76.68 KB, text/plain)
2013-01-20 20:55 UTC, Zac Medico
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yuta SATOH 2012-12-19 12:28:26 UTC
for i in {0..100}; do
  echo ${i}
  while read x ; do
    echo ${x}
  done < <(find /usr/lib -type f ! -type l -name '*.a')
done
 will fail once to several times.
You will be reproduced in the case of high load considerable of CPU probability.

Error messages:
test.sh: line 2: /tmp//sh-np-9439761768676459121: Interrupted system call


Reproducible: Sometimes

Steps to Reproduce:
1. ACCEPT_KEYWORDS=amd64 emerge app-benchmarks/stress
2. stress --cpu 8 --verbose
3. bash test.sh




contains the code that is similar to sys-apps/portage.
it is the cause of the problem that prepstrip and ecompressdir can not be successful in rare cases.
Comment 1 Yuta SATOH 2012-12-19 12:33:13 UTC
Created attachment 332708 [details]
sample script

1. bash test.sh &>test.log
2. cat test.log | grep 'Interrupted' | wc
Comment 2 Yuta SATOH 2012-12-19 12:36:24 UTC
Created attachment 332710 [details, diff]
sample patch for sys-apps/portage-2.2.0_alpha149

It is dirty.
However, it works fine on G/FBSD.
Comment 3 Yuta SATOH 2012-12-21 11:42:24 UTC
Created attachment 332910 [details, diff]
sample patch for sys-apps/portage-2.2.0_alpha149

enabled parallel installation
Comment 4 Naohiro Aota gentoo-dev 2012-12-21 14:41:33 UTC
I can't reproduce the issue. Actually it seems a problem of operating system or kernel. Anyway we might need more information about your environment: version of bash, kernel, libc or such.
Comment 5 Yuta SATOH 2012-12-23 09:36:26 UTC
(In reply to comment #4)
> I can't reproduce the issue. Actually it seems a problem of operating system
> or kernel. Anyway we might need more information about your environment:
> version of bash, kernel, libc or such.

results of emerge --info

Portage 2.2.0_alpha149 (default/bsd/fbsd/amd64/9.1, gcc-4.6.3, freebsd-lib-9.1_rc3-r1, 9.1-Gentoo amd64)
=================================================================
System uname: FreeBSD-9.1-Gentoo-amd64-64bit-ELF
Timestamp of tree: Wed, 19 Dec 2012 11:30:01 +0000
ld GNU ld (GNU Binutils) 2.22
app-shells/bash:         4.2_p39-r1
dev-lang/python:         2.7.3-r3, 3.2.3-r2
dev-util/pkgconfig:      0.27.1
sys-apps/baselayout:     2.2
sys-apps/openrc:         0.11.8
sys-devel/autoconf:      2.69
sys-devel/automake:      1.9.6-r3, 1.11.6
sys-devel/binutils:      2.22-r1
sys-devel/gcc:           4.6.3
sys-devel/gcc-config:    1.8
sys-devel/libtool:       2.4.2
sys-devel/make:          3.82-r3
sys-freebsd/freebsd-lib: 9.1_rc3-r1 (virtual/os-headers)
Repositories: gentoo
ACCEPT_KEYWORDS="amd64-fbsd ~amd64-fbsd"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-gentoo-freebsd9.1"
CFLAGS="-O2 -pipe -mtune=generic"
CHOST="x86_64-gentoo-freebsd9.1"
CONFIG_PROTECT="/etc"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/terminfo"
CXXFLAGS="-O2 -pipe -mtune=generic"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-logs chflags config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
INSTALL_MASK="/usr/lib/systemd"
LDFLAGS=""
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY=""
SYNC="rsync://rsync.gentoo.org/gentoo-portage"
USE="acl amd64-fbsd berkdb cracklib crypt cups cxx dri gdbm iconv ipv6 java5 java6 mmx modules multilib ncurses nls oss pam pcre readline sse sse2 ssl tcpd unicode zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump" CAMERAS="ptp2" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" ELIBC="FreeBSD" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf superstar2 timing tsip tripmate tnt ubx" INPUT_DEVICES="keyboard mouse" KERNEL="FreeBSD" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" PHP_TARGETS="php5-3" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_2" RUBY_TARGETS="ruby18 ruby19" USERLAND="BSD" VIDEO_CARDS="apm ark chips cirrus cyrix dummy i128 intel mach64 mga nv r128 radeon rendition s3 s3virge savage siliconmotion sis tga trident tseng fbdev" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CPPFLAGS, CTARGET, EMERGE_DEFAULT_OPTS, LANG, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Comment 6 Zac Medico gentoo-dev 2012-12-23 20:57:33 UTC
Maybe you can patch bash's read builtin to retry when it encounters EINTR.
Comment 7 Yuta SATOH 2012-12-24 14:14:28 UTC
Created attachment 333210 [details, diff]
sample patch for app-shells/bash-4.2_p39-r1

(In reply to comment #6)
> Maybe you can patch bash's read builtin to retry when it encounters EINTR.

Thanks for your comment.

I've created a patch for bash.
I was able to work around this problem.
Comment 8 Naohiro Aota gentoo-dev 2012-12-30 23:05:40 UTC
(In reply to comment #7)
> Created attachment 333210 [details, diff] [details, diff]
> sample patch for app-shells/bash-4.2_p39-r1
> 
> (In reply to comment #6)
> > Maybe you can patch bash's read builtin to retry when it encounters EINTR.
> 
> Thanks for your comment.
> 
> I've created a patch for bash.
> I was able to work around this problem.

Please send this issue and your patch to the upstream. Since I cannot reploduce the issue, It is difficult for me to discuss about it.
Comment 9 Fabian Groffen gentoo-dev 2013-01-01 09:58:09 UTC
I see exactly the same thing on Gentoo Prefix, x86-based FreeBSD 9.1-b1 under VirtualBox.
Comment 11 Fabian Groffen gentoo-dev 2013-01-01 11:56:07 UTC
Since this problem makes emerge hang without being able to be killed by ^C (need another terminal to pkill -f prepstrip) I applied the patch to the Prefix version of bash to get a working system again.
Comment 12 Zac Medico gentoo-dev 2013-01-10 12:15:06 UTC
I was able to reproduce this while running portage's unit tests on the GhostBSD-2.5 livedvd under VirtualBox.  So, I added a minimal workaround that allows the tests to pass:

http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=2d76e5e77d2be40f5635140dbbbf0b4d60f20dc1
Comment 13 Naohiro Aota gentoo-dev 2013-01-12 03:28:51 UTC
@base-system how do you think of adding the patch on comment 7?
Comment 14 SpanKY gentoo-dev 2013-01-12 23:39:02 UTC
(In reply to comment #13)

Chet's response seems to indicate that this is a bug in FreeBSD.  i don't think we should be patching random read() calls to workaround that.
Comment 15 Fabian Groffen gentoo-dev 2013-01-13 10:08:22 UTC
(In reply to comment #14)

It's not exactly a random read() call, it is the specific open(2) call when reading from a named pipe.  I get the impression that this may be due to the writer not being fully active yet or something.
Comment 16 Zac Medico gentoo-dev 2013-01-13 11:01:34 UTC
Posix says open(2) can return -1 with errno set to EINTR, so if bash triggers that case, maybe it's a good idea to handle it?
Comment 17 SpanKY gentoo-dev 2013-01-14 19:41:58 UTC
(In reply to comment #16)

EINTR happens when a process gets a signal that the process handles and returns from.  in the named pipe scenario, the remote side shouldn't matter.  you can open a named pipe even if there are no active writers.  reads will then simply block waiting for data.

$ mkfifo asdf
$ cat asdf
<hang>

someone should figure out what signal bash is receiving & handling, and who is sending it for what reason.
Comment 18 Zac Medico gentoo-dev 2013-01-14 20:19:50 UTC
Probably a SIGCHLD, and I can't imagine anything else. A quick grep of the bash source turns up about 148 references to SIGCHLD. Consider the attached test case from comment #1:

	for i in {0..100}; do
		echo ${i}
		while read x ; do
			echo ${x}
		done < <(find /usr/lib -type f ! -type l -name '*.a')
	done

If the subshell exits when the parent process is trying to open the named pipe, the resulting SIGCHLD triggers EINTR. Seems like the most obvious explanation.
Comment 19 Zac Medico gentoo-dev 2013-01-15 00:05:54 UTC
Tracing through bash internals, we find some evidence in support of the explanation given in comment #18:

Apparently the /tmp//sh-np-9439761768676459121 fifo, visible in the error message in comment #0, is created by make_named_pipe() (from subst.c). The only place make_named_pipe() is called is from process_substitute().

process_substitute() forks, returns the fifo's pathname in the parent process, and calls exit() after executing the child shell. The parent only tries to open the fifo after process_substitute() returns, which is after the fork has occurred. So, the child shell may exit and trigger a SICHLD during the parent's open call, causing it to fail with EINTR.
Comment 20 SpanKY gentoo-dev 2013-01-15 00:27:12 UTC
(In reply to comment #19)

on Linux, the initial write to the named fifo hangs until there is someone on the other side consuming it.  so it wouldn't be possible for a fifo to be created, a child process created, write all of its output, and then exit before the parent got a chance to open it.

is that not the case on freebsd ?
Comment 21 Zac Medico gentoo-dev 2013-01-15 02:26:15 UTC
(In reply to comment #20)
This is true, both the read and write ends appear to block until both ends are opened, so it does not seem possible for the child to exit before the parent opens the pipe.

So, we have to search further for the source of the theorized SIGCHLD event. We don't have to look far though: In both ecompressdir and prepstrip (where we just happen to observe EINTR problems), we have other subshells running in parallel that can trigger additional SIGCHLD events in the main shell. These additional subshells are controlled by the functions from bin/helper-functions.sh (equivalent to multiprocessing.eclass).
Comment 22 Zac Medico gentoo-dev 2013-01-15 03:16:02 UTC
Created attachment 335664 [details]
test case - more SIGCHLD events correlates with more EINTR

With this test case, I trigger EINTR for more than 90% of redir_open calls.
Comment 23 SpanKY gentoo-dev 2013-01-15 05:42:01 UTC
(In reply to comment #22)

i don't have access to a FreeBSD system anymore to test.  does this simpler version fail too ?

while :; do
    (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)&
    while read x ; do : ; done < <(echo foo)
done
Comment 24 Zac Medico gentoo-dev 2013-01-15 10:43:39 UTC
(In reply to comment #23)
> (In reply to comment #22)
> 
> i don't have access to a FreeBSD system anymore to test.  does this simpler
> version fail too ?
> 
> while :; do
>     (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)& (:)&
>     while read x ; do : ; done < <(echo foo)
> done

Yes, I'm seeing a 100% failure rate with this test. The system is FreeBSD 9.0 (GhostBSD-2.5-gnome-i386.iso livedvd running in VirtualBox), with BASH_VERSION 4.1.11(1)-release.
Comment 25 SpanKY gentoo-dev 2013-01-18 18:45:10 UTC
Chet says that all SIGCHLD signal handlers should be registered with SA_RESTART.  so either some #ifdef route in bash is not doing that, or the FreeBSD kernel isn't respecting that flag correctly.

can you run the script through like `strace -f` and see if signal() is being called ?  or is it all sigaction (and if so, is SA_RESTART set in sa_flags) ?
Comment 26 Zac Medico gentoo-dev 2013-01-20 20:48:59 UTC
Created attachment 336266 [details, diff]
debug patch for bash set_signal_handler function

(In reply to comment #25)
> Chet says that all SIGCHLD signal handlers should be registered with
> SA_RESTART.  so either some #ifdef route in bash is not doing that, or the
> FreeBSD kernel isn't respecting that flag correctly.

Apparently bash is in fact using sigaction with SA_RESTART, but the FreeBSD kernel is not respecting it.

> can you run the script through like `strace -f` and see if signal() is being
> called ?  or is it all sigaction (and if so, is SA_RESTART set in sa_flags) ?

I compiled bash with the attached debug patch, and it shows bash calling sigaction with SA_RESTART:

set_signal_handler SIGCHLD
        sa_flags = 0x2
        SA_RESTART = 0x2
        sa_flags & SA_RESTART = 0x2
Comment 27 Zac Medico gentoo-dev 2013-01-20 20:55:48 UTC
Created attachment 336270 [details]
ktrace log of bash-4.2_p42 failing the test
Comment 28 SpanKY gentoo-dev 2013-01-20 22:48:31 UTC
(In reply to comment #27)

ktrace really should get support for decoding structs passed/received :)

at any rate, this shows the problems is in the freebsd kernel.  a bit ironic as i think the SA_RESTART semantics came from BSD to Linux :).
Comment 29 SpanKY gentoo-dev 2013-01-31 05:07:20 UTC
they've done some more digging and it is a bug in the FreeBSD (among others).  they will most likely develop a workaround in bash for it, but the FreeBSD kernel still needs fixing.

someone (i.e. not me) will have to follow up with the FreeBSD project to get things fixed there.  most likely affects other *BSD kernels too.
Comment 30 Dmitri Bogomolov 2013-03-08 16:22:51 UTC
*** Bug 460774 has been marked as a duplicate of this bug. ***
Comment 32 Alice Ferrazzi Gentoo Infrastructure gentoo-dev 2013-08-16 08:36:20 UTC
i can confirm, after adding the bash workaround patch posted by Naohiro Aota it resolve the problem
Comment 33 SpanKY gentoo-dev 2014-01-07 14:23:41 UTC
Commit message: Add workaround from upstream for read() under buggy BSD kernels
http://sources.gentoo.org/app-shells/bash/bash-4.2_p45-r1.ebuild?rev=1.1
http://sources.gentoo.org/app-shells/bash/files/bash-4.2-read-retry.patch?rev=1.1
Comment 34 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-09-27 07:30:40 UTC
(In reply to SpanKY from comment #33)
> Commit message: Add workaround from upstream for read() under buggy BSD
> kernels
> http://sources.gentoo.org/app-shells/bash/bash-4.2_p45-r1.ebuild?rev=1.1
> http://sources.gentoo.org/app-shells/bash/files/bash-4.2-read-retry.
> patch?rev=1.1

But you are aware that some people were actually relying on signals interrupting read, right? Right now those scripts just end up locked completely and no traps are run during read. In fact, you can't even terminate the script gracefully when read is locked on a pipe...
Comment 35 Zac Medico gentoo-dev 2014-09-27 16:40:28 UTC
(In reply to Michał Górny from comment #34)
> But you are aware that some people were actually relying on signals
> interrupting read, right? Right now those scripts just end up locked
> completely and no traps are run during read.

A simple test case script would be nice, so that we can try it with and without the patch, to demonstrate the regression.

> In fact, you can't even
> terminate the script gracefully when read is locked on a pipe...

Maybe this could be handled by storing the type of the last signal in a variable, and changing loop behavior based on the type of the signal that triggered the EINTR. If the last signal was SIGCHLD, we should continue looping. Otherwise, we should exit the loop. Shouldn't that produce the desired behavior?
Comment 36 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-09-27 17:23:02 UTC
The simplest test case:

  trap 'exit 0' INT TERM
  mkfifo pipe
  read <pipe

now you can't terminate the script without either opening the other end of fifo or SIGKILL-ing it (which would result in very non-sane exit).
Comment 37 SpanKY gentoo-dev 2014-10-19 05:02:25 UTC
(In reply to Michał Górny from comment #34)

take it up with FreeBSD / upstream bash.  we aren't carrying custom patches here.