Created attachment 717969 [details] erlang-24.0 build log An upgrade to erlang-24.0 fails with the attached log.
Created attachment 717972 [details] emerge --info
> === Entering application parsetools > make[3]: Entering directory '/var/tmp/portage/dev-lang/erlang-24.0/work/otp-OTP-24.0/lib/parsetools/src' erlc -W -Werror +debug_info -DUSE_ESOCK=true -I/var/tmp/portage/dev-lang/erlang-24.0/work/otp-OTP-24.0/lib/stdlib/include -Werror -o../ebin leex.erl > Function: compile/3 > leex.erl: internal error in pass beam_kernel_to_ssa: > exception error: bad key: 0 > in function map_get/2 > ... This does not looks like a complete build.log. Can you attach full build.log? The failure looks like a broken compiler. > CFLAGS="-O2 -pipe -fdevirtualize-at-ltrans -floop-nest-optimize -fgraphite-identity -fipa-pta -fno-semantic-interposition -flto -ftree-loop-vectorize -ftree-vectorize" > LDFLAGS="-Wl,-O1 -Wl,--as-needed -flto" These are very aggressive optimization options. Can you check if -O2 still renders broken erlang? And if -O2 fixes failure can you narrow down to minimal set of flags that cause breakage?
For me CFLAGS="-O2 -pipe -flto" LDFLAGS="-Wl,-O1 -Wl,--as-needed -flto" is enough to observe the same crash.
Adding -fno-strict-aliasing seems to be able to repair the build. One of the suspicious warnings is -Wlto-type-mismatch (I don't see why it happens yet): beam/erl_driver.h:331:12: warning: type of 'driver_select' does not match original declaration [-Wlto-type-mismatch] 331 | EXTERN int driver_select(ErlDrvPort port, ErlDrvEvent event, int mode, int on); | ^ sys/common/erl_check_io.c:796:1: note: type mismatch in parameter 2 796 | driver_select(ErlDrvPort ix, ErlDrvEvent e, int mode, int on) | ^ sys/common/erl_check_io.c:796:1: note: type 'ErlDrvEvent' should match type 'struct _erl_drv_event *' sys/common/erl_check_io.c:796:1: note: 'driver_select' was previously declared here sys/common/erl_check_io.c:796:1: note: code may be misoptimized unless '-fno-strict-aliasing' is used
(In reply to Sergei Trofimovich from comment #4) > Adding -fno-strict-aliasing seems to be able to repair the build. > > One of the suspicious warnings is -Wlto-type-mismatch (I don't see why it > happens yet): > > beam/erl_driver.h:331:12: warning: type of 'driver_select' does not match > original declaration [-Wlto-type-mismatch] > 331 | EXTERN int driver_select(ErlDrvPort port, ErlDrvEvent event, int > mode, int on); > | ^ > sys/common/erl_check_io.c:796:1: note: type mismatch in parameter 2 > 796 | driver_select(ErlDrvPort ix, ErlDrvEvent e, int mode, int on) > | ^ > sys/common/erl_check_io.c:796:1: note: type 'ErlDrvEvent' should match type > 'struct _erl_drv_event *' > sys/common/erl_check_io.c:796:1: note: 'driver_select' was previously > declared here > sys/common/erl_check_io.c:796:1: note: code may be misoptimized unless > '-fno-strict-aliasing' is used After a bit of back and forth found the type collision: $ git grep ErlDrvEvent | fgrep typedef erts/emulator/beam/erl_driver.h:typedef struct _erl_drv_event* ErlDrvEvent; /* An event to be selected on. */ erts/emulator/beam/erl_sys_driver.h:typedef SWord ErlDrvEvent; /* An event to be selected on. */ Still don't know if it's the primary type aliasing confusion place.
Building with -fsanitize=undefined yields NULL derefs around type punning: beam/erl_io_queue.h:186:69: runtime error: member access within null pointer of type 'union ErtsIOQBinary' Function: yecc_error_type/2 beam/io.c:1880:3: runtime error: member access within null pointer of type 'union ErtsIOQBinary' That comes from erts/emulator/beam/sys.h: typedef unsigned long long Eterm erts_align_attribute(sizeof(long long)); erts/emulator/beam/erl_io_queue.h: ERTS_GLB_INLINE int erts_ioq_iodata_to_vec(Eterm obj, SysIOVec *iov, ErtsIOQBinary **binv, ErtsIOQBinary *cbin, Uint bin_limit, int driver) { ... Eterm *bptr = binary_val(real_bin); ... ErlHeapBin* hb = (ErlHeapBin *)bptr; ... } I think such long->ptr type casts are an aliasing breakage. Let's throw -fno-strict-aliasing as a workaround.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=8bfac183230c31f95a3ae809d8647f87eacfae38 commit 8bfac183230c31f95a3ae809d8647f87eacfae38 Author: Sergei Trofimovich <slyfox@gentoo.org> AuthorDate: 2021-06-24 22:56:06 +0000 Commit: Sergei Trofimovich <slyfox@gentoo.org> CommitDate: 2021-06-24 22:56:16 +0000 dev-lang/erlang: add -fno-strict-aliasing workaround Reported-by: Jonathan Davies Closes: https://bugs.gentoo.org/797886 Package-Manager: Portage-3.0.20, Repoman-3.0.3 Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org> dev-lang/erlang/erlang-24.0.2.ebuild | 5 +++++ 1 file changed, 5 insertions(+)
(In reply to Sergei Trofimovich from comment #5) > $ git grep ErlDrvEvent | fgrep typedef > erts/emulator/beam/erl_driver.h:typedef struct _erl_drv_event* ErlDrvEvent; > /* An event to be selected on. */ > erts/emulator/beam/erl_sys_driver.h:typedef SWord ErlDrvEvent; /* An event > to be selected on. */ > > Still don't know if it's the primary type aliasing confusion place. Proposed fix upstream as https://github.com/erlang/otp/pull/5002
I think it was properly fixed upstream in https://github.com/erlang/otp/issues/4846