Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 871324 - sys-devel/flex: fatal internal error, exec of gm4 failed on macOS Prefix (ERROR: app-text/xmlto-0.0.28-r9::gentoo_prefix failed (compile phase))
Summary: sys-devel/flex: fatal internal error, exec of gm4 failed on macOS Prefix (ERR...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo/Alt
Classification: Unclassified
Component: Prefix Support (show other bugs)
Hardware: ARM64 OS X
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL:
Whiteboard:
Keywords: PullRequest
: 895494 (view as bug list)
Depends on:
Blocks: 886491
  Show dependency tree
 
Reported: 2022-09-18 12:21 UTC by Askar Bektassov
Modified: 2023-04-25 03:10 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Askar Bektassov 2022-09-18 12:21:45 UTC
emerge fails during compilation, claiming that gm4 failed to execute

Reproducible: Always

Steps to Reproduce:
1.emerge xmlto
Actual Results:  
>>> Compiling source in /Users/askarbektassov/Gentoo/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28 ...
make SHELL=/Users/askarbektassov/Gentoo/usr/bin/bash -j6
make  all-am
make[1]: Entering directory '/Users/askarbektassov/Gentoo/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28'
/Users/askarbektassov/Gentoo/usr/bin/bash ./ylwrap xmlif/xmlif.l unknown.c xmlif/xmlif.c -- /Users/askarbektassov/Gentoo/usr/bin/bash '/Users/askarbektassov/Gentoo/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28/missing' flex
flex: fatal internal error, exec of /Users/askarbektassov/Gentoo/usr/bin/gm4 failed
make[1]: *** [Makefile:777: xmlif/xmlif.c] Error 141
make[1]: Leaving directory '/Users/askarbektassov/Gentoo/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28'

Expected Results:  
successful emerge

/Users/askarbektassov/Gentoo/usr/bin/gm4 is present and is works properly
Comment 1 Fabian Groffen gentoo-dev 2022-09-18 12:32:29 UTC
Can confirm, this seems new, and I have a suspicion it came through the latest updates.  What is your current OS/dev-tools?

12.6 + PROJECT:ld64-819.6

(sw_vers + ld -v)
Comment 2 Askar Bektassov 2022-09-18 18:11:25 UTC
(In reply to Fabian Groffen from comment #1)
> Can confirm, this seems new, and I have a suspicion it came through the
> latest updates.  What is your current OS/dev-tools?
> 
> 12.6 + PROJECT:ld64-819.6
> 
> (sw_vers + ld -v)

Right now, 12.6 + PROJECT:ld64-764, but bear in mind that I downgraded earlier today command line tools from Xcode 14 to Xcode 13.4 (see https://bugs.gentoo.org/show_bug.cgi?id=871336)

askarbektassov@Askars-MBP ~ $ sw_vers
ProductName:	macOS
ProductVersion:	12.6
BuildVersion:	21G115

askarbektassov@Askars-MBP ~ $ ld -v
ld: BINUTILS_CONFIG_LD not found in environment
ld: linker not found in PATH
ld: /Users/askarbektassov/Gentoo/etc/env.d/binutils/config-arm64-apple-darwin21 defines CURRENT=native-5
ld: trying from /Users/askarbektassov/Gentoo/etc/env.d/binutils/config-arm64-apple-darwin21: /Users/askarbektassov/Gentoo/usr/arm64-apple-darwin21/binutils-bin/native-5/ld
ld: invoking /Users/askarbektassov/Gentoo/usr/arm64-apple-darwin21/binutils-bin/native-5/ld with arguments:
  /Users/askarbektassov/Gentoo/usr/arm64-apple-darwin21/binutils-bin/native-5/ld
  -sdk_version
  12.0
  -syslibroot
  /Users/askarbektassov/Gentoo/MacOSX.sdk
  -search_paths_first
  -v
  -L/Users/askarbektassov/Gentoo/usr/lib
  -L/Users/askarbektassov/Gentoo/lib
  -rpath
  /Users/askarbektassov/Gentoo/usr/lib
  -rpath
  /Users/askarbektassov/Gentoo/lib
@(#)PROGRAM:ld  PROJECT:ld64-764
BUILD 11:22:55 Apr 28 2022
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
Library search paths:
	/Users/askarbektassov/Gentoo/usr/lib
	/Users/askarbektassov/Gentoo/lib
	/Users/askarbektassov/Gentoo/MacOSX.sdk/usr/lib
Framework search paths:
	/Users/askarbektassov/Gentoo/MacOSX.sdk/System/Library/Frameworks/
ld: warning: platform not specified
ld: warning: -arch not specified
ld: warning: No platform min-version specified on command line
ld: no object files specified
Comment 3 Askar Bektassov 2022-09-25 17:50:54 UTC
(In reply to Fabian Groffen from comment #1)
> Can confirm, this seems new, and I have a suspicion it came through the
> latest updates.  What is your current OS/dev-tools?
> 
> 12.6 + PROJECT:ld64-819.6
> 
> (sw_vers + ld -v)

FYI, still no progress. When I run the following code 

    ./ylwrap xmlif/xmlif.l unknown.c xmlif/xmlif.c -- /missing flex

In the working folder ($EPREFIX/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28), all I get is a strange flex complain

    flex: fatal internal error, exec of $EPREFIX/usr/bin/gm4 failed

Using system gm4 does not help

    flex: fatal internal error, exec of /usr/bin/gm4 failed

At that point I thought to recompile flex with static libraries

    USE="static" emerge flex

But it was not happy, and emerge failed during configure phase.

checking for arm64-apple-darwin21-gcc... arm64-apple-darwin21-gcc
checking whether the C compiler works... no
configure: error: in `/Users/askarbektassov/Gentoo/var/tmp/portage/sys-devel/flex-2.6.4-r2/work/flex-2.6.4-.arm64':
configure: error: C compiler cannot create executables
See `config.log' for more details

!!! Please attach the following file when seeking support:
!!! /Users/askarbektassov/Gentoo/var/tmp/portage/sys-devel/flex-2.6.4-r2/work/flex-2.6.4-.arm64/config.log
 * ERROR: sys-devel/flex-2.6.4-r2::gentoo_prefix failed (configure phase):

Then I peeked into the config.log and it would seem that ld is unable to find lcrt0.o... should I create a separate bug?

configure:3031: checking whether the C compiler works
configure:3053: arm64-apple-darwin21-gcc  -O2 -pipe  -Wl,-dead_strip_dylibs -static conftest.c  >&5
ld: library not found for -lcrt0.o
collect2: error: ld returned 1 exit status
configure:3057: $? = 1
configure:3095: result: no
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "the fast lexical analyser generator"
| #define PACKAGE_TARNAME "flex"
| #define PACKAGE_VERSION "2.6.4"
| #define PACKAGE_STRING "the fast lexical analyser generator 2.6.4"
| #define PACKAGE_BUGREPORT "flex-help@lists.sourceforge.net"
| #define PACKAGE_URL ""
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }
configure:3100: error: in `/Users/askarbektassov/Gentoo/var/tmp/portage/sys-devel/flex-2.6.4-r2/work/flex-2.6.4-.arm64':
configure:3102: error: C compiler cannot create executables
See `config.log' for more details
Comment 4 Askar Bektassov 2022-09-25 17:56:23 UTC
(In reply to Askar Bektassov from comment #3)
> (In reply to Fabian Groffen from comment #1)
> > Can confirm, this seems new, and I have a suspicion it came through the
> > latest updates.  What is your current OS/dev-tools?
> > 
> > 12.6 + PROJECT:ld64-819.6
> > 
> > (sw_vers + ld -v)
> 
> FYI, still no progress. When I run the following code 
> 
>     ./ylwrap xmlif/xmlif.l unknown.c xmlif/xmlif.c -- /missing flex
> 
> In the working folder
> ($EPREFIX/var/tmp/portage/app-text/xmlto-0.0.28-r9/work/xmlto-0.0.28), all I
> get is a strange flex complain
> 
>     flex: fatal internal error, exec of $EPREFIX/usr/bin/gm4 failed
> 
> Using system gm4 does not help
> 
>     flex: fatal internal error, exec of /usr/bin/gm4 failed
> 
> At that point I thought to recompile flex with static libraries
> 
>     USE="static" emerge flex
> 
> But it was not happy, and emerge failed during configure phase.

Sorry, forget my last (unhelpful) comment. I understand that static linking is not possible in MacOS with llvm/clang gcc (https://stackoverflow.com/questions/3801011/ld-library-not-found-for-lcrt0-o-on-osx-10-6-with-gcc-clang-static-flag)
Comment 5 Askar Bektassov 2022-09-27 22:10:43 UTC
I managed installing two separate instances of CLT on my system. Do not know why, but for some reason if you do not use system '/bin/cp' and copy the parent directory '/Library/Developer', emerge will not be able to compile.

$ mkdir $EPREFIX/Library
$ /bin/cp -R /Library/Developer $EPREFIX/Library/
$ sudo xcode-select -s $EPREFIX/Library/Developer/CommandLineTools

Now you can update CLT 14, which will update the folder /Library/Developer, maintaining unchanged your own instance of CLT in $EPREFIX/Library/Developer/CommandLineTools
Comment 6 Askar Bektassov 2022-10-23 14:09:43 UTC
Hi, any news on this? My `emerge world -uDN` keeps failing :(
Comment 7 Fabian Groffen gentoo-dev 2022-10-23 15:56:50 UTC
sorry, no news :(  I think the toolchain needs to use a linker from our world, but still we need to figure out what special thing is going on here.
Comment 8 Askar Bektassov 2022-10-26 21:52:24 UTC
So far, the workaround for me was using LEX=/usr/bin/flex. Actually, even flex does not compile, unless I use macOS shipped flex.

LEX=/usr/bin/flex emerge flex
LEX=/usr/bin/flex emerge xmlto
Comment 9 Fabian Groffen gentoo-dev 2022-10-27 06:43:45 UTC
does it help if you set this in your etc/portage/make.conf?
Comment 10 Askar Bektassov 2022-10-27 06:56:13 UTC
Yep, I created a file etc/portage/make.conf/flex with LEX=/usr/bin/flex and both smalto and flex now compile smoothly.
Comment 11 Askar Bektassov 2022-11-01 21:21:15 UTC
(In reply to Askar Bektassov from comment #10)
> Yep, I created a file etc/portage/make.conf/flex with LEX=/usr/bin/flex and
> both smalto and flex now compile smoothly.

Should we consider this resolved? Besides, I have upgraded to CLT for Xcode 14.1, which in turn required me to umask gcc 12.2, and now everything seems to work fine. Last week I run `emerge system -e` and apparently everything went like a charm.
Comment 12 Fabian Groffen gentoo-dev 2022-11-02 07:09:02 UTC
Hmmm, ok, let me try that, would be great if it indeed works now, thanks!
Comment 13 Askar Bektassov 2022-12-11 21:37:32 UTC
I noticed that a new app-alternative/lex was introduced, which creates lex symlink to flex. Now the package compiles even without environment variable. Can we considered it solved?
Comment 14 Fabian Groffen gentoo-dev 2022-12-12 19:28:16 UTC
hmmm, what, if flex is called lex, it actually works, or does app-alternative/lex make symlink to host-system /usr/bin/lex?
Comment 15 Alexey 2023-02-02 19:43:10 UTC
Doesn't work for me on arm64-apple-darwin22

The lex symlink is pointing at local flex, not /usr/bin/lex

$ cd ~/Gentoo/usr/bin
$ ls -l lex
lrwxrwxrwx  1 sokolov  primarygroup  4 Feb  1 15:51 lex -> flex

Setting the environment variable in make.conf does help.
Comment 16 Alexey 2023-02-02 19:54:38 UTC
It uses flex instead of lex, so manually making the lex symlink to point at /usr/bin/lex doesn't help either.
Comment 17 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-02 19:58:04 UTC
(In reply to Alexey from comment #16)
> It uses flex instead of lex, so manually making the lex symlink to point at
> /usr/bin/lex doesn't help either.

Try exporting LEX=/usr/bin/lex, LEX=reflex, etc.
Comment 18 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-02 19:58:20 UTC
(In reply to Alexey from comment #15)
> Doesn't work for me on arm64-apple-darwin22
> 
> The lex symlink is pointing at local flex, not /usr/bin/lex
> 
> $ cd ~/Gentoo/usr/bin
> $ ls -l lex
> lrwxrwxrwx  1 sokolov  primarygroup  4 Feb  1 15:51 lex -> flex
> 
> Setting the environment variable in make.conf does help.

sorry, do you mean does, or does not help? Setting it to what?
Comment 19 Alexey 2023-02-02 21:00:43 UTC
Adding LEX=/usr/bin/flex to make.conf like suggested above, DOES help.

The default setup doesn't work (therefore I don't think this bug is fixed)

Then I replied to Fabian about new app-alternatives/lex that it wouldn't help either even if we make in the prefix some kind of "native" branch of that package which would make the symlink to /usr/bin/lex.
Comment 20 Fabian Groffen gentoo-dev 2023-02-03 08:06:44 UTC
At this point, a bashrc for a few packages that need this that exports LEX=/usr/bin/lex is the only thing we can do, until we figure out why it fails on the exec here on arm64.
Comment 21 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-02-20 04:21:28 UTC
*** Bug 895494 has been marked as a duplicate of this bug. ***
Comment 22 Tom Li 2023-02-20 07:35:04 UTC
This bug is starting to become interesting, because now I'm starting to suspect it's actually a GCC bug on Darwin, more investigation is needed.

I added some debug traces into the code, and I've identified anomalous behavior in variadic functions... To create an external filter to run m4, flex calls filter_create_ext() in filter.c. This is a variadic function:

    *filter_create_ext (struct filter *chain, const char *cmd, ...)

Inside the function body, it allocates memory in an array according to the variable number of arguments given to filter_create_ext():

    va_start (ap, cmd);
    while ((s = va_arg (ap, const char *)) != NULL) {
        if (f->argc >= max_args) {
            max_args += 8;
            f->argv = realloc(f->argv, sizeof(char*) * (size_t) (max_args + 1));
        }
        f->argv[f->argc++] = s;
    }   
    f->argv[f->argc] = NULL;

    va_end (ap);

To invoke m4, only two arguments in argv[] are necessary, the first argument is the name/path of the executable itself, the second argument is the command-line option "-P", then the array is terminated with NULL. This is the debug trace of a normal run:

    f->argv base addr: 0x600001b6f3e0
    s: 0x1026e7188
    argc: 1
    set 0x600001b6f3e0[2] to 0x0

Later, this struct will be passed into filter_apply_chain() to call m4 via execvp().

    /* run as a filter, either internally or by exec */
    if (chain->filter_func) {
        int     r;

        if ((r = chain->filter_func (chain)) == -1)
            flexfatal (_("filter_func failed"));
        FLEX_EXIT (0);
    }
    else {
        execvp (chain->argv[0],
             (char **const) (chain->argv));
        lerr_fatal ( _("exec of %s failed"),
             chain->argv[0]);
    }

This list chain->argv must be NULL terminated. This is the debug trace of a normal run:

    call execvp to run /Users/ec2-user/gentoo/usr/bin/gm4
    chain->argv[0]: /Users/ec2-user/gentoo/usr/bin/gm4
    chain->argv[1]: -P
    chain->argv[2]: 0x0

However, when flex crashes, the debug trace becomes very different.

During filter_create_ext():

    f->argv base addr: 0x60000285c000
    s: 0x100f42968
    argc: 1
    s: 0x100000000
    argc: 2
    s: 0x16eee18e0
    argc: 3
    s: 0x100f30e34
    argc: 4
    s: 0xc8
    argc: 5
    s: 0x100f5cca0
    argc: 6
    s: 0x100f7a000
    argc: 7
    s: 0x16eee19ab
    argc: 8
    s: 0x100f5cf6c
    argc: 9
    s: 0x1
    argc: 10
    set 0x60000285c000[11] to 0x0

And during 

    call execvp to run /Users/ec2-user/gentoo/usr/bin/gm4
    chain->argv[0]: /Users/ec2-user/gentoo/usr/bin/gm4
    chain->argv[1]: -P
    chain->argv[2]: 0x100000000
    execvp() returned (-1, errno 14): Bad address!
    flex: fatal internal error, exec of /Users/ec2-user/gentoo/usr/bin/gm4 failed

As you can see, the problem is basically that even when we only have 2 argument to process, the variadic loop between va_start() and va_end() was executed 10 times and creates 10 garbage items in memory. Thus, the list is no longer NULL terminated, by instead it's terminated by garbage 0x100000000.

Thus, system call execvp() fails.

The behavior is even more intriguing when you add a printf() statement just on top of va_start(), like this:

    printf("Hello, world!\n");
    va_start (ap, cmd);
    while ((s = va_arg (ap, const char *)) != NULL) {
            if (f->argc >= max_args) {
                    max_args += 8;
                    f->argv = realloc(f->argv, sizeof(char*) * (size_t) (max_args + 1));
            }
            f->argv[f->argc++] = s;
    }
    f->argv[f->argc] = NULL;
    va_end (ap);

Compiling flex immediately explodes with the following error messages. It makes me suspect that the source of the problem could be a conflict caused by mixing gnuc_va_list and darwin_va_list.

stage1scan.c:1:1: warning: data definition has no type or storage class
    1 | Hello, world!
      | ^~~~~
stage1scan.c:1:1: warning: type defaults to 'int' in declaration of 'Hello' [-Wimplicit-int]
stage1scan.c:1:13: error: expected '=', ',', ';', 'asm' or '__attribute__' before '!' token
    1 | Hello, world!
      |             ^
In file included from stage1scan.c:19:
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:216:63: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  216 | int      vfprintf(FILE * __restrict, const char * __restrict, __gnuc_va_list) __printflike(2, 0);
      |                                                               ^~~~~~~~~~~~~~
      |                                                               __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:217:43: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  217 | int      vprintf(const char * __restrict, __gnuc_va_list) __printflike(1, 0);
      |                                           ^~~~~~~~~~~~~~
      |                                           __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:223:63: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  223 | int      vsprintf(char * __restrict, const char * __restrict, __gnuc_va_list) __printflike(2, 0);
      |                                                               ^~~~~~~~~~~~~~
      |                                                               __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:274:42: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  274 | int     __svfscanf(FILE *, const char *, __gnuc_va_list) __scanflike(2, 0);
      |                                          ^~~~~~~~~~~~~~
      |                                          __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:359:80: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  359 | int      vfscanf(FILE * __restrict __stream, const char * __restrict __format, __gnuc_va_list) __scanflike(2, 0);
      |                                                                                ^~~~~~~~~~~~~~
      |                                                                                __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:360:51: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  360 | int      vscanf(const char * __restrict __format, __gnuc_va_list) __scanflike(1, 0);
      |                                                   ^~~~~~~~~~~~~~
      |                                                   __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:361:94: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  361 | int      vsnprintf(char * __restrict __str, size_t __size, const char * __restrict __format, __gnuc_va_list) __printflike(3, 0);
      |                                                                                              ^~~~~~~~~~~~~~
      |                                                                                              __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:362:83: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  362 | int      vsscanf(const char * __restrict __str, const char * __restrict __format, __gnuc_va_list) __scanflike(2, 0);
      |                                                                                   ^~~~~~~~~~~~~~
      |                                                                                   __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:377:48: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  377 | int     vdprintf(int, const char * __restrict, __gnuc_va_list) __printflike(2, 0) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
      |                                                ^~~~~~~~~~~~~~
      |                                                __darwin_va_list
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:401:65: error: unknown type name '__gnuc_va_list'; did you mean '__darwin_va_list'?
  401 | int      vasprintf(char ** __restrict, const char * __restrict, __gnuc_va_list) __printflike(2, 0);
      |                                                                 ^~~~~~~~~~~~~~
      |                                                                 __darwin_va_list
In file included from flexdef.h:44,
                 from ./scan.l:35:
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include/stdarg.h:99:9: error: unknown type name '__gnuc_va_list'
   99 | typedef __gnuc_va_list va_list;
      |         ^~~~~~~~~~~~~~
/Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include/stdarg.h:99:24: error: conflicting types for 'va_list'; have 'int'
   99 | typedef __gnuc_va_list va_list;
      |                        ^~~~~~~
In file included from /Users/ec2-user/gentoo/MacOSX.sdk/usr/include/_stdio.h:75,
                 from /Users/ec2-user/gentoo/usr/lib/gcc/arm64-apple-darwin22/12.2.0/include-fixed/stdio.h:78:
/Users/ec2-user/gentoo/MacOSX.sdk/usr/include/sys/_types/_va_list.h:32:26: note: previous declaration of 'va_list' with type 'va_list' {aka 'char *'}
   32 | typedef __darwin_va_list va_list;
      |                          ^~~~~~~
make[1]: *** [Makefile:1130: flex-stage1scan.o] Error 1
make[1]: Leaving directory '/private/tmp/flex/flex-2.6.4.patch/src'
make: *** [Makefile:578: all] Error 2
Comment 23 Fabian Groffen gentoo-dev 2023-02-20 07:41:04 UTC
wow!

This makes a lot of sense.
Comment 24 Tom Li 2023-02-20 08:59:16 UTC
Bingo, I think I've found the answer! It's not a GCC bug, GCC simply exposed an existing bug lurking inside our programs, that is, the dependence of undefined behaviors - as always! As Gentoo developers, I guess we're all extremely familiar with this phenomenon...

In C programming, it's common to write a variadic function with a NULL-terminated argument list as its input format. For example, the following buggy program processes the argument "arg1", "arg2", "arg3", "arg4", then encounters the value 0 and stops.

    #include <stdio.h>
    #include <stdarg.h>

    void f(char *x, ...)
    {
        va_list ap;
        char *str;
        int args = 0;

        va_start(ap, x);
        while ((str = va_arg(ap, char *)) != NULL) {
            args++;
            printf("called %d times!\n", args);
        }

        va_end(ap);
    }

    int main(void)
    {
        f("not included", "str1", "str2", "str3", "str4", 0);
    }

Running it on most systems will produce the following output:

    called 1 times!
    called 2 times!
    called 3 times!
    called 4 times!

However, the code is wrong and invokes undefined behavior. When you run this program on macOS with GCC, you get the following output instead:

    called 1 times!
    called 2 times!
    called 3 times!
    called 4 times!
    called 5 times!
    called 6 times!
    called 7 times!
    called 8 times!

What's going on?

The function f(x, ...) only accepts arguments with pointer type "char *", but we're passing the number 0, which is "char" (or "int"). Thus, we're passing an integer in place of a pointer. This is undefined behavior in C, thus, on some platforms it does not work correctly.

The fix is simply to pass NULL instead of 0.

    f("not included", "str1", "str2", "str3", "str4", NULL);

A single NULL worked for me but many say it's still not 100% safe, it's better to cast explicitly.

    f("not included", "str1", "str2", "str3", "str4", (char *) NULL);
    // or
    f("not included", "str1", "str2", "str3", "str4", (char *) 0);

This kind of bugs is not a new problem. It's an old and classic problem in C programming. For example, in 1987, there was a popular article called The Ten Commandments for C Programmers by Internet pioneer Henry Spencer, it mentioned:

> "3. Thou shalt cast all function arguments to the expected type
> if they are not of that type already, even when thou art convinced
> that this is unnecessary, lest they take cruel vengeance upon thee
> when thou least expect it."
https://www.lysator.liu.se/c/ten-commandments.html

This was 36 years ago. Unfortunately this kind of bugs is still lurking inside many projects.

To fix the problem in flex, simply change:

    filter_create_ext(output_chain, m4, "-P", 0);

to 

    filter_create_ext(output_chain, m4, "-P", (const char *) 0);

Problem solved. I'll send a patch to both upstream and Gentoo to fix this problem.
Comment 25 Fabian Groffen gentoo-dev 2023-02-20 09:23:42 UTC
Thank you!

I think NULL should be OK, as it is defined like "((void *)0)" and ANSI says that a void * pointer can be cast to any other pointer implicitly.
Comment 26 Tom Li 2023-02-20 09:39:26 UTC
Rich Felker, the primary author of musl libc, argues in this blog post that explicit casting from 0 or NULL should be preferred. It's an interesting read.

NULL considered harmful
https://ewontfix.com/11/

Though at this point it's bikeshedding... Time for me to submit patches.
Comment 27 Fabian Groffen gentoo-dev 2023-02-20 09:49:22 UTC
Hmmm, hard to argue with that reality :)

Last bit because I cannot resist then:

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stddef.h.html

POSIX says about NULL

The macro shall expand to an integer constant expression with the value 0 cast to type void *.

(no OR there, so at least POSIX is with us here)
Comment 28 Larry the Git Cow gentoo-dev 2023-02-20 13:48:27 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3ab5b78c47ee2dcfdb2de0ae84f43c96d2e9c210

commit 3ab5b78c47ee2dcfdb2de0ae84f43c96d2e9c210
Author:     Yifeng Li <tomli@tomli.me>
AuthorDate: 2023-02-20 10:35:16 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-02-20 13:48:16 +0000

    sys-devel/flex: fix crash on Apple M1 due to undefined behavior.
    
    Currently, when the NULL-terminated variadic function
    filter_create_ext() is invoked, the value "0" is passed as
    the last argument to act as a terminator. However, this is
    an integer value, which is incompatible with the pointer
    data type expected by filter_create_ext().
    
    This is undefined behavior in C, correct operation is not
    guaranteed. In fact, it causes flex to crash on Apple M1
    when GCC is used - the loop is not terminated when it should,
    instead, it keeps running, corrupting the argument list for
    invoking m4. As a result, it creates the following error:
    
    > flex: fatal internal error, exec of gm4 failed
    
    This commit fixes the problem by explicitly casting the value
    0 to the correct pointer type (char *).
    
    Since the existence of the bug doesn't always prevent a Gentoo
    Prefix bootstrapping, it can lurk inside the system and remain
    undetected, furthermore, it's technically a C programming bug,
    other platforms could've been affected as well in theory. Thus,
    we also bump the package version.
    
    Closes: https://bugs.gentoo.org/871324
    Signed-off-by: Yifeng Li <tomli@tomli.me>
    Signed-off-by: Sam James <sam@gentoo.org>

 ...x-apple-m1-crash-by-explicit-pointer-cast.patch |  48 ++++++++++
 sys-devel/flex/flex-2.6.4-r6.ebuild                | 101 +++++++++++++++++++++
 2 files changed, 149 insertions(+)
Comment 29 Larry the Git Cow gentoo-dev 2023-04-22 12:13:42 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=a6a43da374cef37508c0b7872f64bd7922b569bf

commit a6a43da374cef37508c0b7872f64bd7922b569bf
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-04-22 12:11:53 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-04-22 12:11:53 +0000

    sys-devel/flex: keyword 2.6.4-r5 for -arm64-macos (older version only)
    
    This version is broken on arm64-macos. Newer versions are keyworded.
    
    Bug: https://bugs.gentoo.org/871324
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-devel/flex/flex-2.6.4-r5.ebuild | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 30 Tom Li 2023-04-24 06:19:52 UTC
Please add the upstream PR https://github.com/westes/flex/pull/554 to "See Also" for cross-reference.