Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 797886 - dev-lang/erlang-24.0: fails to build second bootstrap
Summary: dev-lang/erlang-24.0: fails to build second bootstrap
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Sergei Trofimovich (RETIRED)
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-06-23 13:47 UTC by Jonathan Davies
Modified: 2021-06-25 19:06 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
erlang-24.0 build log (erlang-24.0.log,21.89 KB, text/plain)
2021-06-23 13:47 UTC, Jonathan Davies
Details
emerge --info (emerge-info.log,7.70 KB, text/plain)
2021-06-23 13:48 UTC, Jonathan Davies
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Davies 2021-06-23 13:47:54 UTC
Created attachment 717969 [details]
erlang-24.0 build log

An upgrade to erlang-24.0 fails with the attached log.
Comment 1 Jonathan Davies 2021-06-23 13:48:30 UTC
Created attachment 717972 [details]
emerge --info
Comment 2 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-23 14:15:05 UTC
> === Entering application parsetools                                                                                                                                                                                 
> make[3]: Entering directory '/var/tmp/portage/dev-lang/erlang-24.0/work/otp-OTP-24.0/lib/parsetools/src'                                                                                                            erlc -W  -Werror +debug_info -DUSE_ESOCK=true -I/var/tmp/portage/dev-lang/erlang-24.0/work/otp-OTP-24.0/lib/stdlib/include -Werror -o../ebin leex.erl                                                               
> Function: compile/3
> leex.erl: internal error in pass beam_kernel_to_ssa:                                                                                                                                                                
> exception error: bad key: 0                                                                                                                                                                                         
>   in function  map_get/2
> ...

This does not looks like a complete build.log. Can you attach full build.log?

The failure looks like a broken compiler.

> CFLAGS="-O2 -pipe -fdevirtualize-at-ltrans -floop-nest-optimize -fgraphite-identity -fipa-pta -fno-semantic-interposition -flto -ftree-loop-vectorize -ftree-vectorize"                                             
> LDFLAGS="-Wl,-O1 -Wl,--as-needed -flto"                                                                   

These are very aggressive optimization options. Can you check if -O2 still renders broken erlang? And if -O2 fixes failure can you narrow down to minimal set of flags that cause breakage?
Comment 3 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-23 17:41:47 UTC
For me
    CFLAGS="-O2 -pipe -flto"  LDFLAGS="-Wl,-O1 -Wl,--as-needed -flto"
is enough to observe the same crash.
Comment 4 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-23 22:38:01 UTC
Adding -fno-strict-aliasing seems to be able to repair the build.

One of the suspicious warnings is -Wlto-type-mismatch (I don't see why it happens yet):

beam/erl_driver.h:331:12: warning: type of 'driver_select' does not match original declaration [-Wlto-type-mismatch]
  331 | EXTERN int driver_select(ErlDrvPort port, ErlDrvEvent event, int mode, int on);
      |            ^
sys/common/erl_check_io.c:796:1: note: type mismatch in parameter 2
  796 | driver_select(ErlDrvPort ix, ErlDrvEvent e, int mode, int on)
      | ^
sys/common/erl_check_io.c:796:1: note: type 'ErlDrvEvent' should match type 'struct _erl_drv_event *'
sys/common/erl_check_io.c:796:1: note: 'driver_select' was previously declared here
sys/common/erl_check_io.c:796:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
Comment 5 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-24 22:31:21 UTC
(In reply to Sergei Trofimovich from comment #4)
> Adding -fno-strict-aliasing seems to be able to repair the build.
> 
> One of the suspicious warnings is -Wlto-type-mismatch (I don't see why it
> happens yet):
> 
> beam/erl_driver.h:331:12: warning: type of 'driver_select' does not match
> original declaration [-Wlto-type-mismatch]
>   331 | EXTERN int driver_select(ErlDrvPort port, ErlDrvEvent event, int
> mode, int on);
>       |            ^
> sys/common/erl_check_io.c:796:1: note: type mismatch in parameter 2
>   796 | driver_select(ErlDrvPort ix, ErlDrvEvent e, int mode, int on)
>       | ^
> sys/common/erl_check_io.c:796:1: note: type 'ErlDrvEvent' should match type
> 'struct _erl_drv_event *'
> sys/common/erl_check_io.c:796:1: note: 'driver_select' was previously
> declared here
> sys/common/erl_check_io.c:796:1: note: code may be misoptimized unless
> '-fno-strict-aliasing' is used

After a bit of back and forth found the type collision:

$ git grep ErlDrvEvent | fgrep typedef
erts/emulator/beam/erl_driver.h:typedef struct _erl_drv_event* ErlDrvEvent; /* An event to be selected on. */
erts/emulator/beam/erl_sys_driver.h:typedef SWord ErlDrvEvent; /* An event to be selected on. */

Still don't know if it's the primary type aliasing confusion place.
Comment 6 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-24 22:50:42 UTC
Building with -fsanitize=undefined yields NULL derefs around type punning:

beam/erl_io_queue.h:186:69: runtime error: member access within null pointer of type 'union ErtsIOQBinary'
Function: yecc_error_type/2
beam/io.c:1880:3: runtime error: member access within null pointer of type 'union ErtsIOQBinary'

That comes from

erts/emulator/beam/sys.h:
  typedef unsigned long long Eterm erts_align_attribute(sizeof(long long));

erts/emulator/beam/erl_io_queue.h:

  ERTS_GLB_INLINE
  int erts_ioq_iodata_to_vec(Eterm obj,
                           SysIOVec *iov,
                           ErtsIOQBinary **binv,
                           ErtsIOQBinary  *cbin,
                           Uint bin_limit,
                           int driver)
  {
    ...
            Eterm *bptr = binary_val(real_bin);
    ...
            ErlHeapBin* hb = (ErlHeapBin *)bptr;
    ...
  }

I think such long->ptr type casts are an aliasing breakage. Let's throw -fno-strict-aliasing as a workaround.
Comment 7 Larry the Git Cow gentoo-dev 2021-06-24 22:56:19 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=8bfac183230c31f95a3ae809d8647f87eacfae38

commit 8bfac183230c31f95a3ae809d8647f87eacfae38
Author:     Sergei Trofimovich <slyfox@gentoo.org>
AuthorDate: 2021-06-24 22:56:06 +0000
Commit:     Sergei Trofimovich <slyfox@gentoo.org>
CommitDate: 2021-06-24 22:56:16 +0000

    dev-lang/erlang: add -fno-strict-aliasing workaround
    
    Reported-by: Jonathan Davies
    Closes: https://bugs.gentoo.org/797886
    Package-Manager: Portage-3.0.20, Repoman-3.0.3
    Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>

 dev-lang/erlang/erlang-24.0.2.ebuild | 5 +++++
 1 file changed, 5 insertions(+)
Comment 8 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-24 23:34:20 UTC
(In reply to Sergei Trofimovich from comment #5)

> $ git grep ErlDrvEvent | fgrep typedef
> erts/emulator/beam/erl_driver.h:typedef struct _erl_drv_event* ErlDrvEvent;
> /* An event to be selected on. */
> erts/emulator/beam/erl_sys_driver.h:typedef SWord ErlDrvEvent; /* An event
> to be selected on. */
> 
> Still don't know if it's the primary type aliasing confusion place.

Proposed fix upstream as https://github.com/erlang/otp/pull/5002
Comment 9 Sergei Trofimovich (RETIRED) gentoo-dev 2021-06-25 19:06:48 UTC
I think it was properly fixed upstream in https://github.com/erlang/otp/issues/4846