Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 755647 - dev-libs/libffi-3.3-r2 test failure
Summary: dev-libs/libffi-3.3-r2 test failure
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL: https://github.com/libffi/libffi/issu...
Whiteboard:
Keywords: TESTFAILURE
Depends on:
Blocks:
 
Reported: 2020-11-20 01:59 UTC by Alexey
Modified: 2021-02-24 09:37 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
libffi.log (libffi.log.bz2,36.67 KB, application/x-bzip2)
2020-11-20 01:59 UTC, Alexey
Details
build.log (build.log,85.10 KB, text/x-log)
2020-11-20 02:00 UTC, Alexey
Details
emerge --info (file_755647.txt,20.36 KB, text/plain)
2020-11-20 02:00 UTC, Alexey
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey 2020-11-20 01:59:48 UTC
Created attachment 673345 [details]
libffi.log

FAIL: libffi.bhaible/test-callback.c -W -Wall -Wno-psabi -DDGTEST=73 -Wno-unused-variable -Wno-unused-parameter -Wno-unused-but-set-variable -Wno-uninitialized -O2 -DABI_NUM=FFI_FASTCALL -DABI_ATTR=__FASTCALL__ execution test
FAIL: libffi.closures/closure_simple.c -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__ execution test
Comment 1 Alexey 2020-11-20 02:00:18 UTC
Created attachment 673348 [details]
build.log
Comment 2 Alexey 2020-11-20 02:00:47 UTC
Created attachment 673351 [details]
emerge --info
Comment 3 Sergei Trofimovich (RETIRED) gentoo-dev 2020-11-20 08:52:36 UTC
Happens for me on
    CFLAGS="-O2 -pipe -fdiagnostics-show-option -frecord-gcc-switches"
as well. Faster reproducer on test subset:
    $ FEATURES=test ABI_X86="64 32" MAKEOPTS='-j9 RUNTESTFLAGS=closure.exp' emerge -v1 libffi

Poking a bit more around we have 166 tets failures. All are 32-bit test failures around THISCALL/FASTCALL (microsoft ABI). That sounds similar to bug #702286 (https://gcc.gnu.org/PR85667) (supposed to be fixed in gcc-9.3.0). But looks like it's not.

Let's look at a simpler test:
"""
.../libffi-3.3/testsuite/libffi.closures/closure.exp ...
FAIL: libffi.closures/closure_simple.c -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__ execution test
"""

It's ran as:

"""
Executing on host: x86_64-pc-linux-gnu-gcc -m32 /var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/libffi.closures/closure_simple.c   -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__  -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/../include  -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include/.. -L/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs  -lffi -lm  -o ./closure_simple.exe    (timeout = 300)
spawn -ignore SIGHUP x86_64-pc-linux-gnu-gcc -m32 /var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/libffi.closures/closure_simple.c -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__ -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/../include -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include/.. -L/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs -lffi -lm -o ./closure_simple.exe
PASS: libffi.closures/closure_simple.c -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__ (test for excess errors)

Setting LD_LIBRARY_PATH to .::/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs:.::/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs
Execution timeout is: 300
spawn [open ...]
FAIL: libffi.closures/closure_simple.c -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__ execution test
"""

Here test build works, but test run fails. Re-running the same manually:

"""
$ x86_64-pc-linux-gnu-gcc -m32 /var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/libffi.closures/closure_simple.c   -W -Wall -Wno-psabi -O0 -DABI_NUM=FFI_THISCALL -DABI_ATTR=__THISCALL__  -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3/testsuite/../include  -I/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../include/.. -L/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs  -lffi -lm  -o ./closure_simple.exe -ggdb3

$ LD_LIBRARY_PATH=.::/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs:.::/var/tmp/portage/dev-libs/libffi-3.3-r2/work/libffi-3.3-abi_x86_32.x86/testsuite/../.libs ./closure_simple.exe
0 1 2 3: 9
res: 9
Segmentation fault (core dumped
"""

The test is straightforward: https://github.com/libffi/libffi/blob/v3.3/testsuite/libffi.closures/closure_simple.c

It creates a function pointer of type
    typedef int (ABI_ATTR *closure_test_type0)(int, int, int, int);
in __thiscall__ ABI and calls by it. When backing code is in libffi's internal generic format:
    closure_test(ffi_cif* cif __UNUSED__, void* resp, void** args, void* userdata)

https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html says we expect ABI in form of:

"""
On x86-32 targets, the thiscall attribute causes the compiler to pass the first argument (if of integral type) in the register ECX. Subsequent and other typed arguments are passed on the stack. The called function pops the arguments off the stack. ...
"""

The source test doues not look outright broken. Perhaps generated code violates one of the assumptions.

Shrinking test to even simpler one (against system install of libffi):

"""
#include <ffi.h>
#include <stdlib.h>

static void
closure_test(ffi_cif* cif, void* resp, void** args, void* userdata)
{
  *(ffi_arg*)resp = 0;
}

typedef int (__attribute__((thiscall)) *closure_test_type0)(int, int);

int main (void)
{
  static ffi_cif cif;
  void *code;
  ffi_closure *pcl = ffi_closure_alloc(sizeof(ffi_closure), &code);
  static ffi_type * cl_arg_types[17];

  cl_arg_types[0] = &ffi_type_sint;
  cl_arg_types[1] = &ffi_type_sint;
  cl_arg_types[2] = NULL;

  if (ffi_prep_cif(&cif, FFI_THISCALL, 2, &ffi_type_sint, cl_arg_types) != FFI_OK)
    __builtin_trap();

  if (ffi_prep_closure_loc(pcl, &cif, closure_test, NULL /* userdata */, code) != FFI_OK)
    __builtin_trap();

  (*(closure_test_type0)code)(0, 1);

  exit(0);
}
"""

"""
$ x86_64-pc-linux-gnu-gcc -m32 -I/usr/lib/libffi/include bug.c -o bug -ggdb3 -lffi && ./bug
Segmentation fault (core dumped)
"""

The crash hints at stack alignment problem:

"""
$ gdb --quiet ./bug
Reading symbols from ./bug...
(gdb) run
Starting program: /root/bug

Program received signal SIGSEGV, Segmentation fault.
0xf7e0af54 in ?? () from /lib/libc.so.6
(gdb) disassemble $eip, $eip+20
Dump of assembler code from 0xf7e0af54 to 0xf7e0af68:
=> 0xf7e0af54:  pxor   (%esp),%xmm0
   0xf7e0af59:  movd   %xmm0,%edx
   0xf7e0af5d:  psrlq  $0x20,%xmm0
   0xf7e0af62:  movd   %xmm0,%eax
   0xf7e0af66:  or     %edx,%eax
End of assembler dump.
(gdb) print (void*)$esp
$1 = (void *) 0xffffd27c
(gdb) x/1a $esp
0xffffd27c:     0x1
"""

Here stack is aligned to only 4-byte boundary while pxor requirs 16-byte alignment.

The question is: what broke alignment requirements: gcc or libffi?
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2020-11-20 10:08:47 UTC
I think it's a libffi bug. Spot-checked by tracing libffi indirect call on the following snippet of 2-argument call (only one argument goes on stack):

   0x0804924c <+28>:    sub    $0xc,%esp
   0x0804924f <+31>:    push   $0x1      ; arg2
   0x08049251 <+33>:    mov    $0x0,%ecx ; arg1
   0x08049256 <+38>:    call   *%eax     ; int fastcall_fun(int arg1, int arg2)
   0x08049258 <+40>:    add    $0xc,%esp

at
   0x0804924f <+31>:    push   $0x1      ; arg2
stack value is:

  Breakpoint 1, 0x0804924f in main () at bug.c:57
  57        (*(closure_test_type0)code)(0, 1);
  (gdb) print $esp
  $1 = (void *) 0xffffd2e4

at
   0x08049258 <+40>:    add    $0xc,%esp
stack value is:
  Breakpoint 2, 0x08049258 in main () at bug.c:57
  57        (*(closure_test_type0)code)(0, 1);
  (gdb) print $esp
  $2 = (void *) 0xffffd2e0

They are supposed to be the same, but we see off-by-4 value.

I think stack adjustment used by 'code' did not account for 'arg1' being passed via registers instead of stack.
Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev 2020-11-20 10:19:58 UTC
Filed upstream bug as https://github.com/libffi/libffi/issues/597. Will try to fix it this evening if I get some time.