Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 756814 - >dev-db/mariadb-10.2.22-r2 crashes during startup on ppc64-linux
Summary: >dev-db/mariadb-10.2.22-r2 crashes during startup on ppc64-linux
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: PPC64 Linux
: Normal normal (vote)
Assignee: Gentoo Linux MySQL bugs team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-26 10:37 UTC by Fabian Groffen
Modified: 2021-05-29 14:49 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Fabian Groffen gentoo-dev 2020-11-26 10:37:40 UTC
I'm a bit at loss (and panic) here, after upgrading my system from 10.2.22-r2 to 10.4.13-r3 I'm no longer to start mariadb.  I tried 10.5.8 and 10.3.23-r3 to no avail, same problem.  Installing 10.2.22-r2 seems impossible due to some blockers, which I'm probably going to ignore trying to bring back up my DB.

mysqld.err:

2020-11-26 11:28:35 0 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=khnum' or '--log-bin=mysqld-bin' to avoid this problem.
2020-11-26 11:28:35 0 [Note] InnoDB: Using Linux native AIO
2020-11-26 11:28:35 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-11-26 11:28:35 0 [Note] InnoDB: Uses event mutexes
2020-11-26 11:28:35 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-11-26 11:28:35 0 [Note] InnoDB: Number of pools: 1
2020-11-26 11:28:35 0 [Note] InnoDB: Using POWER8 crc32 instructions
2020-11-26 11:28:35 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-11-26 11:28:35 0 [Note] InnoDB: Completed initialization of buffer pool
2020-11-26 11:28:35 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
201126 11:28:35 [ERROR] mysqld got signal 4 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.

Server version: 10.3.23-MariaDB-log
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 137256 K  
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace-0x7a343c)[0x1076b6db4]
/usr/sbin/mysqld(handle_fatal_signal-0xdb1380)[0x107053598]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fff836c0498]
/usr/sbin/mysqld(crc32c_vpmsum-0x789160)[0x1076d34d8]
/usr/sbin/mysqld(+0xd01c78)[0x107455c78]
/usr/sbin/mysqld(+0xce6e04)[0x10743ae04]
/usr/sbin/mysqld(+0xcecfb8)[0x107440fb8]
/usr/sbin/mysqld(+0xd92670)[0x1074e6670]
/usr/sbin/mysqld(+0xd96084)[0x1074ea084]
/usr/sbin/mysqld(+0xd9763c)[0x1074eb63c]
/usr/sbin/mysqld(+0xc6494c)[0x1073b894c]
/usr/sbin/mysqld(+0xb2a3b8)[0x10727e3b8]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int-0xdae54c)[0x10705723c]
/usr/sbin/mysqld(+0x6cadf8)[0x106e1edf8]
/usr/sbin/mysqld(_Z11plugin_initPiPPci-0xfc35dc)[0x106e201ec]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc-0x10a4f74)[0x106d2f74c]
/usr/sbin/mysqld(main-0x10c8a10)[0x106d0aea0]
/lib64/libc.so.6(+0x494e4)[0x3fff828824e4]
/lib64/libc.so.6(__libc_start_main-0x1b3ec8)[0x3fff828826f0]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/my...
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimi
ted            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             47101                47101                processes 
Max open files            6586                 6586                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       47101                47101                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: co...

The only thing that stands out as odd is the POWER8 CRC thing, this is a Apple G5, so POWER4 CPU.
Comment 1 Daniel Black 2020-12-04 10:31:06 UTC
The 10.3 and 10.4 don't actually check the hardware capabilities. I did assume a lack of < POWER7 users I'm sorry.

10.5 does check this correctly if that is acceptable to you.
https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474
Comment 2 Fabian Groffen gentoo-dev 2020-12-04 11:00:46 UTC
Hi Daniel, thanks for checking in.  Since what version of 10.5 should this be fixed?  I tried 10.5.8 and got the same crash.
Comment 3 Daniel Black 2020-12-05 02:09:37 UTC
https://github.com/MariaDB/server/commit/ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7.

What does `LD_SHOW_AUXV=1 /bin/true` show?


This assumption is definitely wrong:
https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122

I think I was aiming for a would be available if hardware upgraded approach. Hence still like to see the POWER4 HWCAP2 flags.

I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the ISA (like include/my_cpu.h). But with info, will look closer next week.
Comment 4 Fabian Groffen gentoo-dev 2020-12-05 10:47:02 UTC
(In reply to Daniel Black from comment #3)
> https://github.com/MariaDB/server/commit/
> ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7.
> 
> What does `LD_SHOW_AUXV=1 /bin/true` show?

% env LD_SHOW_AUXV=1 /bin/true
AT_DCACHEBSIZE:       0x80
AT_ICACHEBSIZE:       0x80
AT_UCACHEBSIZE:       0x0
AT_SYSINFO_EHDR:      0x3fffa51c0000
AT_L1I_CACHESIZE:     65536
AT_L1I_CACHEGEOMETRY: 128B line size, Directly mapped
AT_L1D_CACHESIZE:     32768
AT_L1D_CACHEGEOMETRY: 128B line size, 2-way set associative
AT_L2_CACHESIZE:      1048576
AT_L2_CACHEGEOMETRY:  128B line size, 8-way set associative
AT_L3_CACHESIZE:      0
AT_L3_CACHEGEOMETRY:  Unknown line size, Unknown associativity
AT_HWCAP:             power4 mmu fpu altivec ppc64 ppc32
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0x12cbd7040
AT_PHENT:             56
AT_PHNUM:             8
AT_BASE:              0x3fffa5178000
AT_FLAGS:             0x0
AT_ENTRY:             0x12cbf67c8
AT_UID:               501
AT_EUID:              501
AT_GID:               500
AT_EGID:              500
AT_SECURE:            0
AT_RANDOM:            0x3fffd5cac7b2
AT_HWCAP2:           
AT_EXECFN:            /bin/true
AT_PLATFORM:          ppc970
AT_BASE_PLATFORM:     ppc970

> This assumption is definitely wrong:
> https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122
> 
> I think I was aiming for a would be available if hardware upgraded approach.
> Hence still like to see the POWER4 HWCAP2 flags.

See above.

> I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the
> ISA (like include/my_cpu.h). But with info, will look closer next week.

I can live with a less optimised version if that is easier.  This is an old box, but still running a (master) instance fine, it doesn't have to perform at its top.
Comment 5 Daniel Black 2020-12-07 03:40:11 UTC
Guess I shouldn't be surprised HWCAP2 is blank given it all seems to be later CPUs
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/uapi/asm/cputable.h

Even if getauxval returns 0 for not supported it should fail
https://man7.org/linux/man-pages/man3/getauxval.3.html

Feels like something isn't behaving to spec.

Do you mind trying to trace the C implementation of getauxval(AT_HWCAP2)?
Comment 6 Fabian Groffen gentoo-dev 2020-12-08 11:51:00 UTC
not sure if this was what you had in mind:

% cat x.c 
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/auxv.h>

int main() {
        unsigned long x = getauxval(AT_HWCAP2);
        printf("HWCAP2: %lu, (%s)\n", x, strerror(errno));
        x = getauxval(123456);
        printf("garbage: %lu, (%s)\n", x, strerror(errno));
}
% gcc -o x x.c
% ./x
HWCAP2: 0, (Success)
garbage: 0, (No such file or directory)
% 

If I try to interpret the man-page it just means that the value is 0 for real for HWCAP2, not because of an error.
Comment 7 Daniel Black 2020-12-14 01:20:26 UTC
yep, Thanks. That looks like what I expected/intended.

I've no idea why https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474 is returning true when HWCAP2 is 0.

(hopefully) last guess. Add `-maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector` cflags to your test program and see if somehow the getauxvec is being optimized away/different result.
Comment 8 Fabian Groffen gentoo-dev 2020-12-20 09:17:19 UTC
(sorry for the late reply)

% gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c
% ./x 
HWCAP2: 0, (Success)
garbage: 0, (No such file or directory)

However, after digging somewhat more:

% cat x.c 
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <sys/auxv.h>

int main() {
        unsigned long x = getauxval(AT_HWCAP2);
        printf("HWCAP2: %lu, (%s)\n", x, strerror(errno));
        x = getauxval(123456);
        printf("garbage: %lu, (%s)\n", x, strerror(errno));

#if __linux__
        printf("we're on Linux\n");
#if defined(__powerpc64__)
        printf("arch is ppc64\n");
        x = getauxval(AT_HWCAP2);
        printf ("PPC_FEATURE2_VEC_CRYPTO: %s (%lu & %lu)\n",
                        x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no",
                        x, PPC_FEATURE2_VEC_CRYPTO);
        if (getauxval(AT_HWCAP2) & PPC_FEATURE2_VEC_CRYPTO)
                printf("all-in-one-statement detects PPC_FEATURE2_VEC_CRYPTO\n");
#endif
#endif
}
% gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c
x.c: In function ‘main’:
x.c:18:8: error: ‘PPC_FEATURE2_VEC_CRYPTO’ undeclared (first use in this function); did you mean ‘PPC_FEATURE2_HAS_VEC_CRYPTO’?
   18 |    x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no",
      |        ^~~~~~~~~~~~~~~~~~~~~~~
      |        PPC_FEATURE2_HAS_VEC_CRYPTO
x.c:18:8: note: each undeclared identifier is reported only once for each function it appears in
% grep PPC_FEATURE2_VEC_CRYPTO /usr/include/bits/hwcap.h
%

so is there something defining PPC_FEATURE2_VEC_CRYPTO to 0 via some compatability code or something?  That'd obviously make the condition true.
Comment 9 Fabian Groffen gentoo-dev 2020-12-20 09:23:25 UTC
ok, I copied in

#ifndef PPC_FEATURE2_VEC_CRYPTO
#define PPC_FEATURE2_VEC_CRYPTO 0x02000000
#endif

from the top of the file.

now we get:

% ./x 
HWCAP2: 0, (Success)
garbage: 0, (No such file or directory)
we're on Linux
arch is ppc64
PPC_FEATURE2_VEC_CRYPTO: no (0 & 33554432)

so back to square one.
Comment 10 Fabian Groffen gentoo-dev 2020-12-20 09:28:39 UTC
I just noticed the code is guarded by HAVE_POWER8 and HAS_ALTIVEC guards.  The latter I think would be ok, but POWER8 shouldn't, for this is a POWER4 at most and is suspicious.  I cannot seem to find what sets HAVE_POWER8, could it be that this is where the unexpected results come from (e.g getauxval never be called)?
Comment 11 Fabian Groffen gentoo-dev 2021-01-05 18:08:47 UTC
it seems that the cmake files are a bit blunt:

include(CheckCCompilerFlag)
# ppc64 or ppc64le
if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64")
  CHECK_C_COMPILER_FLAG("-maltivec" HAS_ALTIVEC)
  if(HAS_ALTIVEC)
    message(STATUS " HAS_ALTIVEC yes")
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -maltivec")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -maltivec")
  endif(HAS_ALTIVEC)
  if(NOT CMAKE_C_FLAGS MATCHES "m(cpu|tune)")
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=power8")
  endif()
  if(NOT CMAKE_CXX_FLAGS MATCHES "m(cpu|tune)")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcpu=power8")
  endif()
  ADD_DEFINITIONS(-DHAVE_POWER8 -DHAS_ALTIVEC)
endif(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64")

likewise:

IF(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64")
    SET(HAVE_CRC32_VPMSUM 1)
    SET(CRC32_LIBRARY crc32-vpmsum)
    ADD_SUBDIRECTORY(extra/crc32-vpmsum)
ENDIF()

In other words, if it is ppc64, it is assumed to be HAVE_POWER8.  I think PowerPC64 has Altivec alright, but the HAVE_POWER8 should probably be pulled into a conditional of some sort?

I've dropped -DHAVE_POWER8 and -DHAVE_CRC32_VPMSUM from the cmake file now in src_prepare to test on my host:


    # bug 756814
    sed -i -e 's/-DHAVE_POWER8 //' \
        "${S}"/storage/rocksdb/build_rocksdb.cmake || die
    sed -i -e 's/ppc64/got-no-power8/' \
        "${S}"/cmake/crc32.cmake || die

This give me

2021-01-05 19:06:29 0 [Note] InnoDB: Using generic crc32 instructions

and a non-crashing mysqld at some first quick tests.
Comment 12 Thomas Deutschmann (RETIRED) gentoo-dev 2021-05-25 01:08:32 UTC
Is this still happening with mariadb-10.5.10? I think some PPC patches landed upstream...
Comment 13 Fabian Groffen gentoo-dev 2021-05-29 14:49:38 UTC
It crashes on startup with Illegal Instruction.  But, it writes to the log:

2021-05-29 15:39:28 0 [Note] InnoDB: Using generic crc32 instructions

so this particular problem is solved in that version, mysql still isn't usable.