Summary: | app-shells/bash-4.2_p39-r1: "exec 3>&2" causes illegal instruction (seen with app-admin/eselect-1.3.5) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | MATSUI Tetsushi <VED03370> |
Component: | [OLD] Core system | Assignee: | Gentoo's Team for Core System packages <base-system> |
Status: | RESOLVED OBSOLETE | ||
Severity: | normal | CC: | eselect, nojspam, prefix |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | OS X | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 451150 | ||
Attachments: |
emerge --info
env -i bash eselect env -i bash -v eselect Patch (reverts commit 12e3ecb) Alternative patch for stderr redirection Updated patch that tests for bash version A workaround for emerging zsh-5.0.2-r3 on prefix OS X diagnostic report |
Description
MATSUI Tetsushi
2013-06-30 05:50:14 UTC
I cannot reproduce this. Could you please post: - emerge --info - output (stdout and stderr) of "env -i bash eselect" Created attachment 352294 [details]
emerge --info
Created attachment 352300 [details]
env -i bash eselect
(In reply to MATSUI Tetsushi from comment #3) > Created attachment 352300 [details] > env -i bash eselect Sorry, I meant "env -i bash -v eselect". Created attachment 352306 [details]
env -i bash -v eselect
So the problem disappear when you turn on debug output in bash. Are core dumps enabled ("ulimit -c unlimited")? If yes, does the "illegal instruction" cause a core dump, and from what program? (In reply to Ulrich Müller from comment #6) > So the problem disappear when you turn on debug output in bash. > > Are core dumps enabled ("ulimit -c unlimited")? If yes, does the "illegal > instruction" cause a core dump, and from what program? When core dump is enabled, there are two core files created both from bash. The first one(32321): Core was generated by `/Users/tetsushi/Gentoo/bin/bash'. Reading symbols for shared libraries . done Reading symbols for shared libraries .............. done #0 0x901ff862 in mbrlen () (gdb) bt #0 0x901ff862 in mbrlen () #1 0x000058f4 in set_line_mbstate () #2 0x00007e12 in shell_getc () #3 0x0000a603 in read_token () #4 0x0000dcd8 in yyparse () #5 0x00004de7 in parse_command () #6 0x0005c0b0 in parse_and_execute () #7 0x0005b9c9 in _evalfile () #8 0x0005bbcd in source_file () #9 0x00064deb in source_builtin () #10 0x000178ba in execute_builtin () #11 0x0001b31f in execute_simple_command () #12 0x00019063 in execute_command_internal () #13 0x0001be44 in execute_command () #14 0x0001cc88 in execute_connection () #15 0x00018f50 in execute_command_internal () #16 0x0001be44 in execute_command () #17 0x0000509e in reader_loop () #18 0x00004b5c in main () The second one(32325): Core was generated by `/Users/tetsushi/Gentoo/bin/bash'. Reading symbols for shared libraries . done Reading symbols for shared libraries ............. done #0 0x00007dda in shell_getc () (gdb) bt #0 0x00007dda in shell_getc () #1 0x0000a603 in read_token () #2 0x0000dcd8 in yyparse () #3 0x00004de7 in parse_command () #4 0x0005c0b0 in parse_and_execute () #5 0x0005b9c9 in _evalfile () #6 0x0005bbcd in source_file () #7 0x00064deb in source_builtin () #8 0x000178ba in execute_builtin () #9 0x0001b31f in execute_simple_command () #10 0x00019063 in execute_command_internal () #11 0x0001be44 in execute_command () #12 0x0001cc88 in execute_connection () #13 0x00018f50 in execute_command_internal () #14 0x0001be44 in execute_command () #15 0x0000509e in reader_loop () #16 0x00004b5c in main () Do these help? Or do you need other kinds of info? A bash script shouldn't be able to cause an illegal instruction in the shell, so my guess is that it's a bug in bash. CCing Prefix team: Any ideas? I recall seeing those before, which was due to an upgraded system. I'm wondering if this might be the case here also. (In reply to Fabian Groffen from comment #9) > I recall seeing those before, which was due to an upgraded system. I'm > wondering if this might be the case here also. I haven't upgraded my system recently. It's still 10.6. Created attachment 352690 [details, diff] Patch (reverts commit 12e3ecb) There are only few changes between eselect 1.3.4 and 1.3.5. The only one that looks suspicious to me is this: <http://git.overlays.gentoo.org/gitweb/?p=proj/eselect.git;a=commit;h=12e3ecb19d311b888abc118d806fee635602e3ee> Can you try if attached patch makes the problem disappear? Yes, The patch makes the problem disappear. Thank you! PS I get sure that this bug is OS X specific. I ran valgrind to see what happens when bash dies, and it always stops at the same address of libSystem.B.dylib. Created attachment 352704 [details, diff]
Alternative patch for stderr redirection
Could you do one more test for me please, and check if attached alternative patch would also fix the problem?
That patch looks sensical to me, we can't just assume fd-3 is not in use, letting bash assign a free one is much more portable/safe. Created attachment 352706 [details, diff]
Updated patch that tests for bash version
This will work only for >=bash-4.1 though. Updated patch with bash version test is attached.
The crucial question is of course if it fixes (or rather, works around) the bug on OS X.
Doesn't this show on *BSD? I recall something about fd-3, or was it 7? I'd like to know what FD is assigned in case bash does it though. I agree bash shouldn't crash here. (In reply to Fabian Groffen from comment #16) > I'd like to know what FD is assigned in case bash does it though. With bash 4.2 I get this, both on Linux and FreeBSD: $ exec {fd}>&2 $ echo ${fd} 10 (In reply to Ulrich Müller from comment #15) > Created attachment 352706 [details, diff] [details, diff] > Updated patch that tests for bash version > > This will work only for >=bash-4.1 though. Updated patch with bash version > test is attached. > > The crucial question is of course if it fixes (or rather, works around) the > bug on OS X. yeah, it also works fine. I've added the workaround to eselect-1.3.6: <http://git.overlays.gentoo.org/gitweb/?p=proj/eselect.git;a=commit;h=3a412426d924310abb59311dd3cc1133eb1c6849> Reassigning to bash maintainers. In a nutshell: "env -i /usr/bin/eselect" makes bash-4.2_p39-r1 crash with an illegal instruction on Mac OS X. The cause for the failure seems to be the line "exec 3>&2" (which is new in eselect 1.3.5). I've seen similar behavior with bash-4.2_p39-r1 on prefixed portage with OS X when trying to emerge zsh-5.0.2-r3. In that case it was dying when executing "exec 3>&1 >$the_subdir/${the_makefile}.in". Would you like me to open a separate bug for that issue? (In reply to John Gibson from comment #20) > I've seen similar behavior with bash-4.2_p39-r1 on prefixed portage with OS > X when trying to emerge zsh-5.0.2-r3. In that case it was dying when > executing "exec 3>&1 >$the_subdir/${the_makefile}.in". Would you like me to > open a separate bug for that issue? Looks like the same issue, so no separate bug. Setting status to CONFIRMED since it has been seen twice now. Created attachment 358742 [details, diff]
A workaround for emerging zsh-5.0.2-r3 on prefix OS X
I was able to get zsh to emerge under prefix portage on OS X by replacing the explicitly numbered file descriptor of 3 with a variable name. Here's the patch that I used. It doesn't check the version of bash and given earlier comments on the thread it probably won't work with bash 3. However it may be useful as a quick workaround for other prefixers.
Also, I have no idea if it has any bearing on the underlying bash bug, but /dev/fd/3 already exists on my computer, and ls reports it as a directory:
my-pc:files jgibson$ ls -l /dev/fd
total 0
crw--w---- 1 jgibson tty 16, 0 Sep 15 14:58 0
crw--w---- 1 jgibson tty 16, 0 Sep 15 14:58 1
crw--w---- 1 jgibson tty 16, 0 Sep 15 14:58 2
drw-r--r-- 2 portage portage 272 Sep 14 17:31 3
dr--r--r-- 1 root wheel 0 Sep 14 16:46 4
Inspecting either 3 or 4 more closely results in the following:
my-pc:files jgibson$ ls -l /dev/fd/3 /dev/fd/4
ls: /dev/fd/3: Bad file descriptor
ls: /dev/fd/4: Bad file descriptor
I've seen similar behavior on OS X 10.6, 10.7, and 10.8.
(In reply to Ulrich Müller from comment #19) please post a reduced test case. like a single shell script you can run `env -i` on and see the crash. i can't take `eselect` upstream, and i don't have an OS X system to reduce on. (In reply to SpanKY from comment #23) > please post a reduced test case. like a single shell script you can run > `env -i` on and see the crash. i can't take `eselect` upstream, and i don't > have an OS X system to reduce on. Same problem here. Matsui-san, John, or Prefix team, could you provide us with a minimal test case, please? I don't know if I can provide you with a minimal one. I only really saw the issue when building zsh (and some other packages like tiff) with emerge. What I can do is provide you with my prefix as an OS X disk image and then you can have the entire environment to experiment with. The only downsides to this approach is that you'll need a Mac and the disk image is ~4 GB, which I'm guessing is a little large to upload to Bugzilla. I have images available for 10.7 and 10.8 (and I could probably dig one up for 10.6 if necessary). I'd assume that this trivial script: #!/bin/bash exec 3>&2 should already be enough for reproducing the error, but I cannot verify this here. (John?) I cannot reproduce. I tried emerging zsh as John reports, no problem whatsoever. % uname -a Darwin Phoebe.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 i386 MacBook3,1 Darwin % /usr/bin/sw_vers ProductName: Mac OS X ProductVersion: 10.6.8 BuildVersion: 10K549 % which bash /Volumes/Scratch/Gentoo/bin/bash % bash --version GNU bash, version 4.2.39(1)-release (i386-apple-darwin10) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. % cat fd3.sh #!/usr/bin/env bash cat ${BASH_SOURCE[0]} exec 3>&2 % ./fd3.sh #!/usr/bin/env bash cat ${BASH_SOURCE[0]} exec 3>&2 fwiw, the following bash version runs the exec fine too: % /Volumes/Scratch/Gentoo64/bin/bash --version GNU bash, version 4.2.36(1)-release (x86_64-apple-darwin9) Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. (that is, 64-bits, Leopard -- Darwin 9, binary from before I upgraded my laptop) (In reply to Fabian Groffen from comment #27) > % ./fd3.sh What happens if you start it with an "env -i" wrapper, as in the original report? (In reply to Ulrich Müller from comment #29) > (In reply to Fabian Groffen from comment #27) > > % ./fd3.sh > > What happens if you start it with an "env -i" wrapper, as in the original > report? the same, it just executes for me: % env -i /Volumes/Scratch/Gentoo/bin/bash ./fd3.sh #!/usr/bin/env bash cat ${BASH_SOURCE[0]} exec 3>&2 My smallest case to reproduce the bug is the following: #!/Users/tetsushi/Gentoo/bin/bash exec 3>&2 die() { echo "foo" } Then it fails once per 100 executions or so with "Illegal instruction" like: $ for i in `seq 100`; do env -i ./foo.sh; done Illegal instruction $ for i in `seq 100`; do env -i ./foo.sh; done Illegal instruction Illegal instruction $ for i in `seq 100`; do env -i ./foo.sh; done Created attachment 366358 [details]
diagnostic report
Update:
The crash occurs only with
$ for i in `seq 1000`; do env -i /Users/tetsushi/Gentoo/bin/sh -c 'exec 3>&2'; done
There are several (about 6 or 7) crashes in the 1000 executions.
OS X recodes these into logs in ~/Library/Logs/DiagnosticReports, and the attachment is one of them.
Tried it, still cannot reproduce (32&&64). Can you tell me what exact machine you're running this on? In particular the cpu is important here. Maybe the cflags don't match or something. I guess CFLAGS is irrelevant but if you want to know, I have used -march=nocona for Core 2 duo CPU. I'm trying to bootstrap another prefix, and bash in it doesn't show the symptom. hum.. I can confirm Matsui Tetsushi's experiments. I have two machines, one a 2.53 Core 2 Duo running 10.7.5 and the second a 2.4 Core i7 running 10.8.5. On both machines running the foo.sh script directly or via /Library/Gentoo/bin/sh foo.sh almost always results in an illegal instruction. Running via env -i does not produce an illegal instruction. However, the later test that he tried: for i in `seq 1000`; do env -i /Library/Gentoo/bin/sh -c 'exec 3>&2'; done Does produce a handful of illegal instructions, but only on the C2D, not on the i7. I'll see if I can try a fresh bootstrap this week. I did notice on the C2D that some other packages like tiff would fail to emerge with illegal instruction as well. I played around with the -j make option when emerging and it when I would see the illegal instruction, but it would still fail. I did not see those errors on the i7. I did a fresh bootstrap on both machines (the 10.7 C2D and the 10.8 i7) and got almost the same results. The only difference was that on the C2D I had to increase the iterations in this loop to 10000 to see any illegal instructions. for i in `seq 10000`; do env -i /Library/Gentoo/bin/sh -c 'exec 3>&2'; done I can reproduce this on my i7 too, will have to check on the c2d if upping the iteration count works I actually find that comment in the crashlog sort of interesting: Application Specific Information: BUG IN CLIENT OF LIBDISPATCH: Do not close random Unix descriptors Thread 0: Dispatch queue: com.apple.main-thread 0 libSystem.B.dylib 0x9a5a1dc6 dup2 + 10 1 sh 0x0006a94b do_redirection_internal + 3458 2 sh 0x00069268 do_redirections + 113 3 sh 0x00021c9b execute_builtin_or_function + 53 4 sh 0x00020b5e execute_simple_command + 2509 5 sh 0x0001ab70 execute_command_internal + 1907 6 sh 0x00074fef parse_and_execute + 1076 7 sh 0x000034e9 run_one_command + 271 8 sh 0x00002564 main + 2384 9 sh 0x00001c09 start + 53 this version is obsolete |