I'm a bit at loss (and panic) here, after upgrading my system from 10.2.22-r2 to 10.4.13-r3 I'm no longer to start mariadb. I tried 10.5.8 and 10.3.23-r3 to no avail, same problem. Installing 10.2.22-r2 seems impossible due to some blockers, which I'm probably going to ignore trying to bring back up my DB. mysqld.err: 2020-11-26 11:28:35 0 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used; This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=khnum' or '--log-bin=mysqld-bin' to avoid this problem. 2020-11-26 11:28:35 0 [Note] InnoDB: Using Linux native AIO 2020-11-26 11:28:35 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2020-11-26 11:28:35 0 [Note] InnoDB: Uses event mutexes 2020-11-26 11:28:35 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2020-11-26 11:28:35 0 [Note] InnoDB: Number of pools: 1 2020-11-26 11:28:35 0 [Note] InnoDB: Using POWER8 crc32 instructions 2020-11-26 11:28:35 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2020-11-26 11:28:35 0 [Note] InnoDB: Completed initialization of buffer pool 2020-11-26 11:28:35 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 201126 11:28:35 [ERROR] mysqld got signal 4 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.3.23-MariaDB-log key_buffer_size=16777216 read_buffer_size=262144 max_used_connections=0 max_threads=153 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 137256 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x49000 /usr/sbin/mysqld(my_print_stacktrace-0x7a343c)[0x1076b6db4] /usr/sbin/mysqld(handle_fatal_signal-0xdb1380)[0x107053598] linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x3fff836c0498] /usr/sbin/mysqld(crc32c_vpmsum-0x789160)[0x1076d34d8] /usr/sbin/mysqld(+0xd01c78)[0x107455c78] /usr/sbin/mysqld(+0xce6e04)[0x10743ae04] /usr/sbin/mysqld(+0xcecfb8)[0x107440fb8] /usr/sbin/mysqld(+0xd92670)[0x1074e6670] /usr/sbin/mysqld(+0xd96084)[0x1074ea084] /usr/sbin/mysqld(+0xd9763c)[0x1074eb63c] /usr/sbin/mysqld(+0xc6494c)[0x1073b894c] /usr/sbin/mysqld(+0xb2a3b8)[0x10727e3b8] /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int-0xdae54c)[0x10705723c] /usr/sbin/mysqld(+0x6cadf8)[0x106e1edf8] /usr/sbin/mysqld(_Z11plugin_initPiPPci-0xfc35dc)[0x106e201ec] /usr/sbin/mysqld(_Z11mysqld_mainiPPc-0x10a4f74)[0x106d2f74c] /usr/sbin/mysqld(main-0x10c8a10)[0x106d0aea0] /lib64/libc.so.6(+0x494e4)[0x3fff828824e4] /lib64/libc.so.6(__libc_start_main-0x1b3ec8)[0x3fff828826f0] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /var/lib/my... Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimi ted unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 47101 47101 processes Max open files 6586 6586 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 47101 47101 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: co... The only thing that stands out as odd is the POWER8 CRC thing, this is a Apple G5, so POWER4 CPU.
The 10.3 and 10.4 don't actually check the hardware capabilities. I did assume a lack of < POWER7 users I'm sorry. 10.5 does check this correctly if that is acceptable to you. https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474
Hi Daniel, thanks for checking in. Since what version of 10.5 should this be fixed? I tried 10.5.8 and got the same crash.
https://github.com/MariaDB/server/commit/ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7. What does `LD_SHOW_AUXV=1 /bin/true` show? This assumption is definitely wrong: https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122 I think I was aiming for a would be available if hardware upgraded approach. Hence still like to see the POWER4 HWCAP2 flags. I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the ISA (like include/my_cpu.h). But with info, will look closer next week.
(In reply to Daniel Black from comment #3) > https://github.com/MariaDB/server/commit/ > ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7. > > What does `LD_SHOW_AUXV=1 /bin/true` show? % env LD_SHOW_AUXV=1 /bin/true AT_DCACHEBSIZE: 0x80 AT_ICACHEBSIZE: 0x80 AT_UCACHEBSIZE: 0x0 AT_SYSINFO_EHDR: 0x3fffa51c0000 AT_L1I_CACHESIZE: 65536 AT_L1I_CACHEGEOMETRY: 128B line size, Directly mapped AT_L1D_CACHESIZE: 32768 AT_L1D_CACHEGEOMETRY: 128B line size, 2-way set associative AT_L2_CACHESIZE: 1048576 AT_L2_CACHEGEOMETRY: 128B line size, 8-way set associative AT_L3_CACHESIZE: 0 AT_L3_CACHEGEOMETRY: Unknown line size, Unknown associativity AT_HWCAP: power4 mmu fpu altivec ppc64 ppc32 AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x12cbd7040 AT_PHENT: 56 AT_PHNUM: 8 AT_BASE: 0x3fffa5178000 AT_FLAGS: 0x0 AT_ENTRY: 0x12cbf67c8 AT_UID: 501 AT_EUID: 501 AT_GID: 500 AT_EGID: 500 AT_SECURE: 0 AT_RANDOM: 0x3fffd5cac7b2 AT_HWCAP2: AT_EXECFN: /bin/true AT_PLATFORM: ppc970 AT_BASE_PLATFORM: ppc970 > This assumption is definitely wrong: > https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122 > > I think I was aiming for a would be available if hardware upgraded approach. > Hence still like to see the POWER4 HWCAP2 flags. See above. > I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the > ISA (like include/my_cpu.h). But with info, will look closer next week. I can live with a less optimised version if that is easier. This is an old box, but still running a (master) instance fine, it doesn't have to perform at its top.
Guess I shouldn't be surprised HWCAP2 is blank given it all seems to be later CPUs https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/uapi/asm/cputable.h Even if getauxval returns 0 for not supported it should fail https://man7.org/linux/man-pages/man3/getauxval.3.html Feels like something isn't behaving to spec. Do you mind trying to trace the C implementation of getauxval(AT_HWCAP2)?
not sure if this was what you had in mind: % cat x.c #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/auxv.h> int main() { unsigned long x = getauxval(AT_HWCAP2); printf("HWCAP2: %lu, (%s)\n", x, strerror(errno)); x = getauxval(123456); printf("garbage: %lu, (%s)\n", x, strerror(errno)); } % gcc -o x x.c % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) % If I try to interpret the man-page it just means that the value is 0 for real for HWCAP2, not because of an error.
yep, Thanks. That looks like what I expected/intended. I've no idea why https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474 is returning true when HWCAP2 is 0. (hopefully) last guess. Add `-maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector` cflags to your test program and see if somehow the getauxvec is being optimized away/different result.
(sorry for the late reply) % gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) However, after digging somewhat more: % cat x.c #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/auxv.h> int main() { unsigned long x = getauxval(AT_HWCAP2); printf("HWCAP2: %lu, (%s)\n", x, strerror(errno)); x = getauxval(123456); printf("garbage: %lu, (%s)\n", x, strerror(errno)); #if __linux__ printf("we're on Linux\n"); #if defined(__powerpc64__) printf("arch is ppc64\n"); x = getauxval(AT_HWCAP2); printf ("PPC_FEATURE2_VEC_CRYPTO: %s (%lu & %lu)\n", x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no", x, PPC_FEATURE2_VEC_CRYPTO); if (getauxval(AT_HWCAP2) & PPC_FEATURE2_VEC_CRYPTO) printf("all-in-one-statement detects PPC_FEATURE2_VEC_CRYPTO\n"); #endif #endif } % gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c x.c: In function ‘main’: x.c:18:8: error: ‘PPC_FEATURE2_VEC_CRYPTO’ undeclared (first use in this function); did you mean ‘PPC_FEATURE2_HAS_VEC_CRYPTO’? 18 | x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no", | ^~~~~~~~~~~~~~~~~~~~~~~ | PPC_FEATURE2_HAS_VEC_CRYPTO x.c:18:8: note: each undeclared identifier is reported only once for each function it appears in % grep PPC_FEATURE2_VEC_CRYPTO /usr/include/bits/hwcap.h % so is there something defining PPC_FEATURE2_VEC_CRYPTO to 0 via some compatability code or something? That'd obviously make the condition true.
ok, I copied in #ifndef PPC_FEATURE2_VEC_CRYPTO #define PPC_FEATURE2_VEC_CRYPTO 0x02000000 #endif from the top of the file. now we get: % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) we're on Linux arch is ppc64 PPC_FEATURE2_VEC_CRYPTO: no (0 & 33554432) so back to square one.
I just noticed the code is guarded by HAVE_POWER8 and HAS_ALTIVEC guards. The latter I think would be ok, but POWER8 shouldn't, for this is a POWER4 at most and is suspicious. I cannot seem to find what sets HAVE_POWER8, could it be that this is where the unexpected results come from (e.g getauxval never be called)?
it seems that the cmake files are a bit blunt: include(CheckCCompilerFlag) # ppc64 or ppc64le if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") CHECK_C_COMPILER_FLAG("-maltivec" HAS_ALTIVEC) if(HAS_ALTIVEC) message(STATUS " HAS_ALTIVEC yes") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -maltivec") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -maltivec") endif(HAS_ALTIVEC) if(NOT CMAKE_C_FLAGS MATCHES "m(cpu|tune)") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=power8") endif() if(NOT CMAKE_CXX_FLAGS MATCHES "m(cpu|tune)") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcpu=power8") endif() ADD_DEFINITIONS(-DHAVE_POWER8 -DHAS_ALTIVEC) endif(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") likewise: IF(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") SET(HAVE_CRC32_VPMSUM 1) SET(CRC32_LIBRARY crc32-vpmsum) ADD_SUBDIRECTORY(extra/crc32-vpmsum) ENDIF() In other words, if it is ppc64, it is assumed to be HAVE_POWER8. I think PowerPC64 has Altivec alright, but the HAVE_POWER8 should probably be pulled into a conditional of some sort? I've dropped -DHAVE_POWER8 and -DHAVE_CRC32_VPMSUM from the cmake file now in src_prepare to test on my host: # bug 756814 sed -i -e 's/-DHAVE_POWER8 //' \ "${S}"/storage/rocksdb/build_rocksdb.cmake || die sed -i -e 's/ppc64/got-no-power8/' \ "${S}"/cmake/crc32.cmake || die This give me 2021-01-05 19:06:29 0 [Note] InnoDB: Using generic crc32 instructions and a non-crashing mysqld at some first quick tests.
Is this still happening with mariadb-10.5.10? I think some PPC patches landed upstream...
It crashes on startup with Illegal Instruction. But, it writes to the log: 2021-05-29 15:39:28 0 [Note] InnoDB: Using generic crc32 instructions so this particular problem is solved in that version, mysql still isn't usable.